git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] object store: alloc
@ 2018-05-01 21:33 Stefan Beller
  2018-05-01 21:33 ` [PATCH 01/13] repository: introduce object parser field Stefan Beller
                   ` (15 more replies)
  0 siblings, 16 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

This applies on top of sb/oid-object-info and is the logical continuum of
the series that it builds on; this brings the object store into more of
Gits code, removing global state, such that reasoning about the state of
the in-memory representation of the repository is easier.

My original plan was to convert lookup_commit_graft as the next series,
which would be similar to lookup_replace_object, as in sb/object-store-replace.
The grafts and shallow mechanism are very close to each other, such that
they need to be converted at the same time, both depending on the
"parsed object store" that is introduced in this commit.

The next series will then convert code in {object/blob/tree/commit/tag}.c
hopefully finishing the lookup_* functions.

I also debated if it is worth converting alloc.c via this patch series
or if it might make more sense to use the new mem-pool by Jameson[1].

I vaguely wonder about the performance impact, as the object allocation
code seemed to be relevant in the past.

[1] https://public-inbox.org/git/20180430153122.243976-1-jamill@microsoft.com/

Any comments welcome,
Thanks,
Stefan

Jonathan Nieder (1):
  object: add repository argument to grow_object_hash

Stefan Beller (12):
  repository: introduce object parser field
  object: add repository argument to create_object
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  alloc: allow arbitrary repositories for alloc functions

 alloc.c           | 69 +++++++++++++++++++++++------------
 alloc.h           | 21 +++++++++++
 blame.c           |  3 +-
 blob.c            |  5 ++-
 cache.h           |  9 -----
 commit.c          |  4 +-
 merge-recursive.c |  3 +-
 object.c          | 93 +++++++++++++++++++++++++++++++++--------------
 object.h          | 18 ++++++++-
 repository.c      |  7 ++++
 repository.h      | 11 +++++-
 tag.c             |  4 +-
 tree.c            |  4 +-
 13 files changed, 182 insertions(+), 69 deletions(-)
 create mode 100644 alloc.h

-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 01/13] repository: introduce object parser field
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-02 17:17   ` Duy Nguyen
  2018-05-02 20:30   ` Jonathan Tan
  2018-05-01 21:33 ` [PATCH 02/13] object: add repository argument to create_object Stefan Beller
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller, Jonathan Nieder

Git's object access code can be thought of as containing two layers:
the raw object store provides access to raw object content, while the
higher level obj_hash code parses raw objects and keeps track of
parenthood and other object relationships using 'struct object'.
Keeping these layers separate should make it easier to find relevant
functions and to change the implementation of one without having to
touch the other.

Add an object_parser field to 'struct repository' to prepare obj_hash
to be handled per repository.  Callers still only use the_repository
for now --- later patches will adapt them to handle arbitrary
repositories.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c     | 63 +++++++++++++++++++++++++++++++++-------------------
 object.h     |  8 +++++++
 repository.c |  7 ++++++
 repository.h | 11 ++++++++-
 4 files changed, 65 insertions(+), 24 deletions(-)

diff --git a/object.c b/object.c
index 5044d08e96c..503c73e2d1f 100644
--- a/object.c
+++ b/object.c
@@ -8,17 +8,14 @@
 #include "object-store.h"
 #include "packfile.h"
 
-static struct object **obj_hash;
-static int nr_objs, obj_hash_size;
-
 unsigned int get_max_object_index(void)
 {
-	return obj_hash_size;
+	return the_repository->parsed_objects->obj_hash_size;
 }
 
 struct object *get_indexed_object(unsigned int idx)
 {
-	return obj_hash[idx];
+	return the_repository->parsed_objects->obj_hash[idx];
 }
 
 static const char *object_type_strings[] = {
@@ -90,15 +87,16 @@ struct object *lookup_object(const unsigned char *sha1)
 	unsigned int i, first;
 	struct object *obj;
 
-	if (!obj_hash)
+	if (!the_repository->parsed_objects->obj_hash)
 		return NULL;
 
-	first = i = hash_obj(sha1, obj_hash_size);
-	while ((obj = obj_hash[i]) != NULL) {
+	first = i = hash_obj(sha1,
+			     the_repository->parsed_objects->obj_hash_size);
+	while ((obj = the_repository->parsed_objects->obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->oid.hash))
 			break;
 		i++;
-		if (i == obj_hash_size)
+		if (i == the_repository->parsed_objects->obj_hash_size)
 			i = 0;
 	}
 	if (obj && i != first) {
@@ -107,7 +105,8 @@ struct object *lookup_object(const unsigned char *sha1)
 		 * that we do not need to walk the hash table the next
 		 * time we look for it.
 		 */
-		SWAP(obj_hash[i], obj_hash[first]);
+		SWAP(the_repository->parsed_objects->obj_hash[i],
+		     the_repository->parsed_objects->obj_hash[first]);
 	}
 	return obj;
 }
@@ -124,19 +123,19 @@ static void grow_object_hash(void)
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(obj_hash);
-	obj_hash = new_hash;
-	obj_hash_size = new_hash_size;
+	free(the_repository->parsed_objects->obj_hash);
+	the_repository->parsed_objects->obj_hash = new_hash;
+	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object(const unsigned char *sha1, void *o)
@@ -147,11 +146,12 @@ void *create_object(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (obj_hash_size - 1 <= nr_objs * 2)
+	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
 		grow_object_hash();
 
-	insert_obj_hash(obj, obj_hash, obj_hash_size);
-	nr_objs++;
+	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
+			the_repository->parsed_objects->obj_hash_size);
+	the_repository->parsed_objects->nr_objs++;
 	return obj;
 }
 
@@ -431,8 +431,8 @@ void clear_object_flags(unsigned flags)
 {
 	int i;
 
-	for (i=0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i=0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj)
 			obj->flags &= ~flags;
 	}
@@ -442,13 +442,20 @@ void clear_commit_marks_all(unsigned int flags)
 {
 	int i;
 
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj && obj->type == OBJ_COMMIT)
 			obj->flags &= ~flags;
 	}
 }
 
+struct object_parser *object_parser_new(void)
+{
+	struct object_parser *o = xmalloc(sizeof(*o));
+	memset(o, 0, sizeof(*o));
+	return o;
+}
+
 struct raw_object_store *raw_object_store_new(void)
 {
 	struct raw_object_store *o = xmalloc(sizeof(*o));
@@ -488,3 +495,13 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_all_packs(o);
 	o->packed_git = NULL;
 }
+
+void object_parser_clear(struct object_parser *o)
+{
+	/*
+	 * TOOD free objects in o->obj_hash.
+	 *
+	 * As objects are allocated in slabs (see alloc.c), we do
+	 * not need to free each object, but each slab instead.
+	 */
+}
diff --git a/object.h b/object.h
index f13f85b2a94..84380b2b4d5 100644
--- a/object.h
+++ b/object.h
@@ -1,6 +1,14 @@
 #ifndef OBJECT_H
 #define OBJECT_H
 
+struct object_parser {
+	struct object **obj_hash;
+	int nr_objs, obj_hash_size;
+};
+
+struct object_parser *object_parser_new(void);
+void object_parser_clear(struct object_parser *o);
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
diff --git a/repository.c b/repository.c
index a4848c1bd05..208ee10071c 100644
--- a/repository.c
+++ b/repository.c
@@ -2,6 +2,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "config.h"
+#include "object.h"
 #include "submodule-config.h"
 
 /* The main repository */
@@ -14,6 +15,8 @@ void initialize_the_repository(void)
 
 	the_repo.index = &the_index;
 	the_repo.objects = raw_object_store_new();
+	the_repo.parsed_objects = object_parser_new();
+
 	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
@@ -143,6 +146,7 @@ static int repo_init(struct repository *repo,
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
+	repo->parsed_objects = object_parser_new();
 
 	if (repo_init_gitdir(repo, gitdir))
 		goto error;
@@ -226,6 +230,9 @@ void repo_clear(struct repository *repo)
 	raw_object_store_clear(repo->objects);
 	FREE_AND_NULL(repo->objects);
 
+	object_parser_clear(repo->parsed_objects);
+	FREE_AND_NULL(repo->parsed_objects);
+
 	if (repo->config) {
 		git_configset_clear(repo->config);
 		FREE_AND_NULL(repo->config);
diff --git a/repository.h b/repository.h
index e6e00f541bd..8d042e0fa11 100644
--- a/repository.h
+++ b/repository.h
@@ -22,10 +22,19 @@ struct repository {
 	char *commondir;
 
 	/*
-	 * Holds any information related to accessing the raw object content.
+	 * Holds any information needed to retrieve the raw content
+	 * of objects. The object_parser uses this to get object
+	 * content which it then parses.
 	 */
 	struct raw_object_store *objects;
 
+	/*
+	 * State for the object parser. This owns all parsed objects
+	 * (struct object) so callers do not have to manage their
+	 * lifetime.
+	 */
+	struct object_parser *parsed_objects;
+
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 02/13] object: add repository argument to create_object
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
  2018-05-01 21:33 ` [PATCH 01/13] repository: introduce object parser field Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:43   ` Eric Sunshine
  2018-05-01 21:33 ` [PATCH 03/13] object: add repository argument to grow_object_hash Stefan Beller
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller, Jonathan Nieder

Add a repository argument to allow the callers of create_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Add the cocci patch that converted the callers.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 blob.c   | 4 +++-
 commit.c | 3 ++-
 object.c | 5 +++--
 object.h | 3 ++-
 tag.c    | 3 ++-
 tree.c   | 3 ++-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/blob.c b/blob.c
index fa2ab4f7a74..85c2143f299 100644
--- a/blob.c
+++ b/blob.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "blob.h"
+#include "repository.h"
 
 const char *blob_type = "blob";
 
@@ -7,7 +8,8 @@ struct blob *lookup_blob(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_blob_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_blob_node());
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/commit.c b/commit.c
index ca474a7c112..9106acf0aad 100644
--- a/commit.c
+++ b/commit.c
@@ -50,7 +50,8 @@ struct commit *lookup_commit(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_commit_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_commit_node());
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/object.c b/object.c
index 503c73e2d1f..933921e35c9 100644
--- a/object.c
+++ b/object.c
@@ -138,7 +138,7 @@ static void grow_object_hash(void)
 	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, void *o)
+void *create_object_the_repository(const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -178,7 +178,8 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		obj = create_object(sha1, alloc_object_node());
+		obj = create_object(the_repository, sha1,
+				    alloc_object_node());
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 84380b2b4d5..d1869dbc502 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,8 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, void *obj);
+#define create_object(r, s, o) create_object_##r(s, o)
+extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
diff --git a/tag.c b/tag.c
index 3d37c1bd251..7150b759d66 100644
--- a/tag.c
+++ b/tag.c
@@ -93,7 +93,8 @@ struct tag *lookup_tag(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tag_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tag_node());
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
diff --git a/tree.c b/tree.c
index 1c68ea586bd..63730e3fb46 100644
--- a/tree.c
+++ b/tree.c
@@ -196,7 +196,8 @@ struct tree *lookup_tree(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tree_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tree_node());
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 03/13] object: add repository argument to grow_object_hash
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
  2018-05-01 21:33 ` [PATCH 01/13] repository: introduce object parser field Stefan Beller
  2018-05-01 21:33 ` [PATCH 02/13] object: add repository argument to create_object Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:33 ` [PATCH 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Jonathan Nieder, Stefan Beller

From: Jonathan Nieder <jrnieder@gmail.com>

Add a repository argument to allow the caller of grow_object_hash to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 933921e35c9..a6202d11292 100644
--- a/object.c
+++ b/object.c
@@ -116,7 +116,8 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-static void grow_object_hash(void)
+#define grow_object_hash(r) grow_object_hash_##r()
+static void grow_object_hash_the_repository(void)
 {
 	int i;
 	/*
@@ -147,7 +148,7 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	hashcpy(obj->oid.hash, sha1);
 
 	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash();
+		grow_object_hash(the_repository);
 
 	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
 			the_repository->parsed_objects->obj_hash_size);
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 04/13] alloc: add repository argument to alloc_blob_node
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (2 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 03/13] object: add repository argument to grow_object_hash Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-02 20:34   ` Jonathan Tan
  2018-05-01 21:33 ` [PATCH 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 blob.c  | 2 +-
 cache.h | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..6c5c376a25a 100644
--- a/alloc.c
+++ b/alloc.c
@@ -49,7 +49,7 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 }
 
 static struct alloc_state blob_state;
-void *alloc_blob_node(void)
+void *alloc_blob_node_the_repository(void)
 {
 	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
diff --git a/blob.c b/blob.c
index 85c2143f299..9e64f301895 100644
--- a/blob.c
+++ b/blob.c
@@ -9,7 +9,7 @@ struct blob *lookup_blob(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_blob_node());
+				     alloc_blob_node(the_repository));
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/cache.h b/cache.h
index 3a4d80e92bf..2258e611275 100644
--- a/cache.h
+++ b/cache.h
@@ -1764,7 +1764,8 @@ int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
 /* alloc.c */
-extern void *alloc_blob_node(void);
+#define alloc_blob_node(r) alloc_blob_node_##r()
+extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 05/13] alloc: add repository argument to alloc_tree_node
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (3 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:33 ` [PATCH 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tree.c  | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 6c5c376a25a..2c8d1430758 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,7 +57,7 @@ void *alloc_blob_node_the_repository(void)
 }
 
 static struct alloc_state tree_state;
-void *alloc_tree_node(void)
+void *alloc_tree_node_the_repository(void)
 {
 	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
diff --git a/cache.h b/cache.h
index 2258e611275..1717d07a2c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1766,7 +1766,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 /* alloc.c */
 #define alloc_blob_node(r) alloc_blob_node_##r()
 extern void *alloc_blob_node_the_repository(void);
-extern void *alloc_tree_node(void);
+#define alloc_tree_node(r) alloc_tree_node_##r()
+extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
diff --git a/tree.c b/tree.c
index 63730e3fb46..58cf19b4fa8 100644
--- a/tree.c
+++ b/tree.c
@@ -197,7 +197,7 @@ struct tree *lookup_tree(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tree_node());
+				     alloc_tree_node(the_repository));
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 06/13] alloc: add repository argument to alloc_commit_node
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (4 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:33 ` [PATCH 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c           | 2 +-
 blame.c           | 2 +-
 cache.h           | 3 ++-
 commit.c          | 2 +-
 merge-recursive.c | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/alloc.c b/alloc.c
index 2c8d1430758..9e2b897ec1d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -88,7 +88,7 @@ unsigned int alloc_commit_index(void)
 	return count++;
 }
 
-void *alloc_commit_node(void)
+void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
diff --git a/blame.c b/blame.c
index dfa24473dc6..ba9b18e7542 100644
--- a/blame.c
+++ b/blame.c
@@ -161,7 +161,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 
 	read_cache();
 	time(&now);
-	commit = alloc_commit_node();
+	commit = alloc_commit_node(the_repository);
 	commit->object.parsed = 1;
 	commit->date = now;
 	parent_tail = &commit->parents;
diff --git a/cache.h b/cache.h
index 1717d07a2c5..bf6e8c87d83 100644
--- a/cache.h
+++ b/cache.h
@@ -1768,7 +1768,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 extern void *alloc_blob_node_the_repository(void);
 #define alloc_tree_node(r) alloc_tree_node_##r()
 extern void *alloc_tree_node_the_repository(void);
-extern void *alloc_commit_node(void);
+#define alloc_commit_node(r) alloc_commit_node_##r()
+extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
diff --git a/commit.c b/commit.c
index 9106acf0aad..a9a43e79bae 100644
--- a/commit.c
+++ b/commit.c
@@ -51,7 +51,7 @@ struct commit *lookup_commit(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_commit_node());
+				     alloc_commit_node(the_repository));
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624da..6dac8908648 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -98,7 +98,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
 {
-	struct commit *commit = alloc_commit_node();
+	struct commit *commit = alloc_commit_node(the_repository);
 
 	set_merge_remote_desc(commit, comment, (struct object *)commit);
 	commit->tree = tree;
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 07/13] alloc: add repository argument to alloc_tag_node
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (5 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:33 ` [PATCH 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tag.c   | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 9e2b897ec1d..290250e3595 100644
--- a/alloc.c
+++ b/alloc.c
@@ -65,7 +65,7 @@ void *alloc_tree_node_the_repository(void)
 }
 
 static struct alloc_state tag_state;
-void *alloc_tag_node(void)
+void *alloc_tag_node_the_repository(void)
 {
 	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
diff --git a/cache.h b/cache.h
index bf6e8c87d83..32f340cde59 100644
--- a/cache.h
+++ b/cache.h
@@ -1770,7 +1770,8 @@ extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node_the_repository(void);
 #define alloc_commit_node(r) alloc_commit_node_##r()
 extern void *alloc_commit_node_the_repository(void);
-extern void *alloc_tag_node(void);
+#define alloc_tag_node(r) alloc_tag_node_##r()
+extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
diff --git a/tag.c b/tag.c
index 7150b759d66..02ef4eaafc0 100644
--- a/tag.c
+++ b/tag.c
@@ -94,7 +94,7 @@ struct tag *lookup_tag(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tag_node());
+				     alloc_tag_node(the_repository));
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 08/13] alloc: add repository argument to alloc_object_node
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (6 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:33 ` [PATCH 09/13] alloc: add repository argument to alloc_report Stefan Beller
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c  | 2 +-
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 290250e3595..f031ce422d9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -73,7 +73,7 @@ void *alloc_tag_node_the_repository(void)
 }
 
 static struct alloc_state object_state;
-void *alloc_object_node(void)
+void *alloc_object_node_the_repository(void)
 {
 	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
diff --git a/cache.h b/cache.h
index 32f340cde59..2d60359a964 100644
--- a/cache.h
+++ b/cache.h
@@ -1772,7 +1772,8 @@ extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node_the_repository(void);
 #define alloc_tag_node(r) alloc_tag_node_##r()
 extern void *alloc_tag_node_the_repository(void);
-extern void *alloc_object_node(void);
+#define alloc_object_node(r) alloc_object_node_##r()
+extern void *alloc_object_node_the_repository(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
 
diff --git a/object.c b/object.c
index a6202d11292..7d36323445b 100644
--- a/object.c
+++ b/object.c
@@ -180,7 +180,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		obj = create_object(the_repository, sha1,
-				    alloc_object_node());
+				    alloc_object_node(the_repository));
 	return obj;
 }
 
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 09/13] alloc: add repository argument to alloc_report
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (7 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
@ 2018-05-01 21:33 ` Stefan Beller
  2018-05-01 21:34 ` [PATCH 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:33 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index f031ce422d9..28b85b22144 100644
--- a/alloc.c
+++ b/alloc.c
@@ -105,7 +105,7 @@ static void report(const char *name, unsigned int count, size_t size)
 #define REPORT(name, type)	\
     report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
-void alloc_report(void)
+void alloc_report_the_repository(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/cache.h b/cache.h
index 2d60359a964..01cc207d218 100644
--- a/cache.h
+++ b/cache.h
@@ -1774,7 +1774,8 @@ extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node_the_repository(void);
 #define alloc_object_node(r) alloc_object_node_##r()
 extern void *alloc_object_node_the_repository(void);
-extern void alloc_report(void);
+#define alloc_report(r) alloc_report_##r()
+extern void alloc_report_the_repository(void);
 extern unsigned int alloc_commit_index(void);
 
 /* pkt-line.c */
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 10/13] alloc: add repository argument to alloc_commit_index
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (8 preceding siblings ...)
  2018-05-01 21:33 ` [PATCH 09/13] alloc: add repository argument to alloc_report Stefan Beller
@ 2018-05-01 21:34 ` Stefan Beller
  2018-05-01 21:34 ` [PATCH 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:34 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c  | 4 ++--
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 28b85b22144..277dadd221b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -82,7 +82,7 @@ void *alloc_object_node_the_repository(void)
 
 static struct alloc_state commit_state;
 
-unsigned int alloc_commit_index(void)
+unsigned int alloc_commit_index_the_repository(void)
 {
 	static unsigned int count;
 	return count++;
@@ -92,7 +92,7 @@ void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index();
+	c->index = alloc_commit_index(the_repository);
 	return c;
 }
 
diff --git a/cache.h b/cache.h
index 01cc207d218..0e6c5dd5639 100644
--- a/cache.h
+++ b/cache.h
@@ -1776,7 +1776,8 @@ extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node_the_repository(void);
 #define alloc_report(r) alloc_report_##r()
 extern void alloc_report_the_repository(void);
-extern unsigned int alloc_commit_index(void);
+#define alloc_commit_index(r) alloc_commit_index_##r()
+extern unsigned int alloc_commit_index_the_repository(void);
 
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
diff --git a/object.c b/object.c
index 7d36323445b..ddf4b7b196e 100644
--- a/object.c
+++ b/object.c
@@ -162,7 +162,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
-			((struct commit *)obj)->index = alloc_commit_index();
+			((struct commit *)obj)->index = alloc_commit_index(the_repository);
 		obj->type = type;
 		return obj;
 	}
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 11/13] object: allow grow_object_hash to handle arbitrary repositories
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (9 preceding siblings ...)
  2018-05-01 21:34 ` [PATCH 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
@ 2018-05-01 21:34 ` Stefan Beller
  2018-05-01 21:34 ` [PATCH 12/13] object: allow create_object " Stefan Beller
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:34 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller, Jonathan Nieder

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index ddf4b7b196e..43954fadf93 100644
--- a/object.c
+++ b/object.c
@@ -116,27 +116,27 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-#define grow_object_hash(r) grow_object_hash_##r()
-static void grow_object_hash_the_repository(void)
+static void grow_object_hash(struct repository *r)
 {
 	int i;
 	/*
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
+	int new_hash_size = r->parsed_objects->obj_hash_size < 32 ? 32 : 2 * r->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
-		struct object *obj = the_repository->parsed_objects->obj_hash[i];
+	for (i = 0; i < r->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = r->parsed_objects->obj_hash[i];
+
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(the_repository->parsed_objects->obj_hash);
-	the_repository->parsed_objects->obj_hash = new_hash;
-	the_repository->parsed_objects->obj_hash_size = new_hash_size;
+	free(r->parsed_objects->obj_hash);
+	r->parsed_objects->obj_hash = new_hash;
+	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object_the_repository(const unsigned char *sha1, void *o)
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 12/13] object: allow create_object to handle arbitrary repositories
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (10 preceding siblings ...)
  2018-05-01 21:34 ` [PATCH 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
@ 2018-05-01 21:34 ` Stefan Beller
  2018-05-02 20:36   ` Jonathan Tan
  2018-05-01 21:34 ` [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:34 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller, Jonathan Nieder

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c | 12 ++++++------
 object.h |  3 +--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index 43954fadf93..fd27cf54faa 100644
--- a/object.c
+++ b/object.c
@@ -139,7 +139,7 @@ static void grow_object_hash(struct repository *r)
 	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object_the_repository(const unsigned char *sha1, void *o)
+void *create_object(struct repository *r, const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -147,12 +147,12 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash(the_repository);
+	if (r->parsed_objects->obj_hash_size - 1 <= r->parsed_objects->nr_objs * 2)
+		grow_object_hash(r);
 
-	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
-			the_repository->parsed_objects->obj_hash_size);
-	the_repository->parsed_objects->nr_objs++;
+	insert_obj_hash(obj, r->parsed_objects->obj_hash,
+			r->parsed_objects->obj_hash_size);
+	r->parsed_objects->nr_objs++;
 	return obj;
 }
 
diff --git a/object.h b/object.h
index d1869dbc502..5ef6ce1ea96 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-#define create_object(r, s, o) create_object_##r(s, o)
-extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
+extern void *create_object(struct repository *r, const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (11 preceding siblings ...)
  2018-05-01 21:34 ` [PATCH 12/13] object: allow create_object " Stefan Beller
@ 2018-05-01 21:34 ` Stefan Beller
  2018-05-02 17:44   ` Duy Nguyen
                     ` (2 more replies)
  2018-05-02 17:01 ` [PATCH 00/13] object store: alloc Duy Nguyen
                   ` (2 subsequent siblings)
  15 siblings, 3 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-01 21:34 UTC (permalink / raw)
  To: git; +Cc: jamill, Stefan Beller

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c           | 69 ++++++++++++++++++++++++++++++-----------------
 alloc.h           | 21 +++++++++++++++
 blame.c           |  1 +
 blob.c            |  1 +
 cache.h           | 16 -----------
 commit.c          |  1 +
 merge-recursive.c |  1 +
 object.c          | 24 ++++++++++++++---
 object.h          | 10 ++++++-
 repository.c      |  4 +--
 tag.c             |  1 +
 tree.c            |  1 +
 12 files changed, 104 insertions(+), 46 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..66a3d07ba2d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,10 +4,11 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
+ *
+ * TODO: Combine this with mem-pool?
  */
 #include "cache.h"
 #include "object.h"
@@ -30,8 +31,25 @@ struct alloc_state {
 	int count; /* total number of nodes allocated */
 	int nr;    /* number of nodes left in current allocation */
 	void *p;   /* first free node in current allocation */
+
+	/* bookkeeping of allocations */
+	void **slabs;
+	int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+	return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+	while (s->slab_nr > 0) {
+		s->slab_nr--;
+		free(s->slabs[s->slab_nr]);
+	}
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
 	void *ret;
@@ -45,54 +63,56 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+
+	ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+	s->slabs[s->slab_nr++] = ret;
+
 	return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+struct alloc_state the_repository_blob_state;
+void *alloc_blob_node(struct repository *r)
 {
-	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
 	return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+struct alloc_state the_repository_tree_state;
+void *alloc_tree_node(struct repository *r)
 {
-	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
 	return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+struct alloc_state the_repository_tag_state;
+void *alloc_tag_node(struct repository *r)
 {
-	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
 	return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+struct alloc_state the_repository_object_state;
+void *alloc_object_node(struct repository *r)
 {
-	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
 	return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-	static unsigned int count;
-	return count++;
+	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+struct alloc_state the_repository_commit_state;
+void *alloc_commit_node(struct repository *r)
 {
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(the_repository);
+	c->index = alloc_commit_index(r);
 	return c;
 }
 
@@ -103,9 +123,10 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+    report(#name, r->parsed_objects->name##_state->count, \
+		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
 
-void alloc_report_the_repository(void)
+void alloc_report(struct repository *r)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/alloc.h b/alloc.h
new file mode 100644
index 00000000000..a62d7a06307
--- /dev/null
+++ b/alloc.h
@@ -0,0 +1,21 @@
+#ifndef ALLOC_H
+#define ALLOC_H
+
+void *alloc_blob_node(struct repository *r);
+void *alloc_tree_node(struct repository *r);
+void *alloc_commit_node(struct repository *r);
+void *alloc_tag_node(struct repository *r);
+void *alloc_object_node(struct repository *r);
+void alloc_report(struct repository *r);
+unsigned int alloc_commit_index(struct repository *r);
+
+void *allocate_alloc_state(void);
+void clear_alloc_state(struct alloc_state *s);
+
+extern struct alloc_state the_repository_blob_state;
+extern struct alloc_state the_repository_tree_state;
+extern struct alloc_state the_repository_commit_state;
+extern struct alloc_state the_repository_tag_state;
+extern struct alloc_state the_repository_object_state;
+
+#endif
diff --git a/blame.c b/blame.c
index ba9b18e7542..3a11f1ce52b 100644
--- a/blame.c
+++ b/blame.c
@@ -6,6 +6,7 @@
 #include "diffcore.h"
 #include "tag.h"
 #include "blame.h"
+#include "alloc.h"
 
 void blame_origin_decref(struct blame_origin *o)
 {
diff --git a/blob.c b/blob.c
index 9e64f301895..458dafa811e 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "repository.h"
+#include "alloc.h"
 
 const char *blob_type = "blob";
 
diff --git a/cache.h b/cache.h
index 0e6c5dd5639..c75559b7d38 100644
--- a/cache.h
+++ b/cache.h
@@ -1763,22 +1763,6 @@ extern const char *excludes_file;
 int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
-/* alloc.c */
-#define alloc_blob_node(r) alloc_blob_node_##r()
-extern void *alloc_blob_node_the_repository(void);
-#define alloc_tree_node(r) alloc_tree_node_##r()
-extern void *alloc_tree_node_the_repository(void);
-#define alloc_commit_node(r) alloc_commit_node_##r()
-extern void *alloc_commit_node_the_repository(void);
-#define alloc_tag_node(r) alloc_tag_node_##r()
-extern void *alloc_tag_node_the_repository(void);
-#define alloc_object_node(r) alloc_object_node_##r()
-extern void *alloc_object_node_the_repository(void);
-#define alloc_report(r) alloc_report_##r()
-extern void alloc_report_the_repository(void);
-#define alloc_commit_index(r) alloc_commit_index_##r()
-extern unsigned int alloc_commit_index_the_repository(void);
-
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
diff --git a/commit.c b/commit.c
index a9a43e79bae..c3b400d5930 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
 #include "commit-slab.h"
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dac8908648..aa086a85089 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -14,6 +14,7 @@
 #include "tree-walk.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "alloc.h"
 #include "tag.h"
 #include "unpack-trees.h"
 #include "string-list.h"
diff --git a/object.c b/object.c
index fd27cf54faa..003ec55a291 100644
--- a/object.c
+++ b/object.c
@@ -4,6 +4,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "commit.h"
+#include "alloc.h"
 #include "tag.h"
 #include "object-store.h"
 #include "packfile.h"
@@ -451,10 +452,24 @@ void clear_commit_marks_all(unsigned int flags)
 	}
 }
 
-struct object_parser *object_parser_new(void)
+struct object_parser *object_parser_new(int is_the_repo)
 {
 	struct object_parser *o = xmalloc(sizeof(*o));
 	memset(o, 0, sizeof(*o));
+
+	if (is_the_repo) {
+		o->blob_state = &the_repository_blob_state;
+		o->tree_state = &the_repository_tree_state;
+		o->commit_state = &the_repository_commit_state;
+		o->tag_state = &the_repository_tag_state;
+		o->object_state = &the_repository_object_state;
+	} else {
+		o->blob_state = allocate_alloc_state();
+		o->tree_state = allocate_alloc_state();
+		o->commit_state = allocate_alloc_state();
+		o->tag_state = allocate_alloc_state();
+		o->object_state = allocate_alloc_state();
+	}
 	return o;
 }
 
@@ -501,9 +516,12 @@ void raw_object_store_clear(struct raw_object_store *o)
 void object_parser_clear(struct object_parser *o)
 {
 	/*
-	 * TOOD free objects in o->obj_hash.
-	 *
 	 * As objects are allocated in slabs (see alloc.c), we do
 	 * not need to free each object, but each slab instead.
 	 */
+	clear_alloc_state(o->blob_state);
+	clear_alloc_state(o->tree_state);
+	clear_alloc_state(o->commit_state);
+	clear_alloc_state(o->tag_state);
+	clear_alloc_state(o->object_state);
 }
diff --git a/object.h b/object.h
index 5ef6ce1ea96..cf97dc0d472 100644
--- a/object.h
+++ b/object.h
@@ -4,9 +4,17 @@
 struct object_parser {
 	struct object **obj_hash;
 	int nr_objs, obj_hash_size;
+
+	/* TODO: migrate alloc_states to mem-pool? */
+	struct alloc_state *blob_state;
+	struct alloc_state *tree_state;
+	struct alloc_state *commit_state;
+	struct alloc_state *tag_state;
+	struct alloc_state *object_state;
+	unsigned commit_count;
 };
 
-struct object_parser *object_parser_new(void);
+struct object_parser *object_parser_new(int is_the_repo);
 void object_parser_clear(struct object_parser *o);
 
 struct object_list {
diff --git a/repository.c b/repository.c
index 208ee10071c..51eb876617d 100644
--- a/repository.c
+++ b/repository.c
@@ -15,7 +15,7 @@ void initialize_the_repository(void)
 
 	the_repo.index = &the_index;
 	the_repo.objects = raw_object_store_new();
-	the_repo.parsed_objects = object_parser_new();
+	the_repo.parsed_objects = object_parser_new(1);
 
 	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
@@ -146,7 +146,7 @@ static int repo_init(struct repository *repo,
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
-	repo->parsed_objects = object_parser_new();
+	repo->parsed_objects = object_parser_new(0);
 
 	if (repo_init_gitdir(repo, gitdir))
 		goto error;
diff --git a/tag.c b/tag.c
index 02ef4eaafc0..af6a0725b6a 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 
 const char *tag_type = "tag";
diff --git a/tree.c b/tree.c
index 58cf19b4fa8..8f8ef3189af 100644
--- a/tree.c
+++ b/tree.c
@@ -5,6 +5,7 @@
 #include "blob.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "tree-walk.h"
 
 const char *tree_type = "tree";
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH 02/13] object: add repository argument to create_object
  2018-05-01 21:33 ` [PATCH 02/13] object: add repository argument to create_object Stefan Beller
@ 2018-05-01 21:43   ` Eric Sunshine
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Sunshine @ 2018-05-01 21:43 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git List, Jameson Miller, Jonathan Nieder

On Tue, May 1, 2018 at 5:33 PM, Stefan Beller <sbeller@google.com> wrote:
> Add a repository argument to allow the callers of create_object
> to be more specific about which repository to act on. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.

This is only patch 2, and patch 1 did not use a macro trick to catch
callers passing the wrong argument.

> Add the cocci patch that converted the callers.

No cocci patch anywhere in sight.

> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 00/13] object store: alloc
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (12 preceding siblings ...)
  2018-05-01 21:34 ` [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-02 17:01 ` Duy Nguyen
  2018-05-02 18:07   ` Jameson Miller
  2018-05-07 14:05 ` Junio C Hamano
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
  15 siblings, 1 reply; 95+ messages in thread
From: Duy Nguyen @ 2018-05-02 17:01 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, jamill

On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@google.com> wrote:
> I also debated if it is worth converting alloc.c via this patch series
> or if it might make more sense to use the new mem-pool by Jameson[1].
>
> I vaguely wonder about the performance impact, as the object allocation
> code seemed to be relevant in the past.

If I remember correctly, alloc.c was added because malloc() has too
high overhead per allocation (and we create like millions of them). As
long as you keep allocation overhead low, it should be ok. Note that
we allocate a lot more than the mem-pool's main target (cache entries
if I remember correctly). We may have a couple thousands cache
entries.  We already deal with a couple million of struct object.
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 01/13] repository: introduce object parser field
  2018-05-01 21:33 ` [PATCH 01/13] repository: introduce object parser field Stefan Beller
@ 2018-05-02 17:17   ` Duy Nguyen
  2018-05-02 17:26     ` Stefan Beller
  2018-05-02 20:30   ` Jonathan Tan
  1 sibling, 1 reply; 95+ messages in thread
From: Duy Nguyen @ 2018-05-02 17:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, jamill, Jonathan Nieder

On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@google.com> wrote:
>         /*
> -        * Holds any information related to accessing the raw object content.
> +        * Holds any information needed to retrieve the raw content
> +        * of objects. The object_parser uses this to get object
> +        * content which it then parses.
>          */
>         struct raw_object_store *objects;
>
> +       /*
> +        * State for the object parser. This owns all parsed objects
> +        * (struct object) so callers do not have to manage their
> +        * lifetime.
> +        */
> +       struct object_parser *parsed_objects;

I like this name 'parsed_objects'. Should we rename the struct after
it (e.g. parsed_object_store as opposed to raw_object_store above)?

Another suggestion is object_pool, if we keep 'struct object' instead
of 'struct parsed_object' and also want to keep current allocation
behavior: no individual deallocation. If you free, you free the whole
pool (e.g. you could run rev-list --all --objects in a separate pool,
once you're done, you delete the whole pool).
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 01/13] repository: introduce object parser field
  2018-05-02 17:17   ` Duy Nguyen
@ 2018-05-02 17:26     ` Stefan Beller
  2018-05-02 17:58       ` Duy Nguyen
  0 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-02 17:26 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jameson Miller, Jonathan Nieder

On Wed, May 2, 2018 at 10:17 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@google.com> wrote:
>>         /*
>> -        * Holds any information related to accessing the raw object content.
>> +        * Holds any information needed to retrieve the raw content
>> +        * of objects. The object_parser uses this to get object
>> +        * content which it then parses.
>>          */
>>         struct raw_object_store *objects;
>>
>> +       /*
>> +        * State for the object parser. This owns all parsed objects
>> +        * (struct object) so callers do not have to manage their
>> +        * lifetime.
>> +        */
>> +       struct object_parser *parsed_objects;
>
> I like this name 'parsed_objects'. Should we rename the struct after
> it (e.g. parsed_object_store as opposed to raw_object_store above)?

I can rename it to parsed_object_store for consistency.

> Another suggestion is object_pool, if we keep 'struct object' instead
> of 'struct parsed_object' and also want to keep current allocation
> behavior: no individual deallocation. If you free, you free the whole
> pool (e.g. you could run rev-list --all --objects in a separate pool,
> once you're done, you delete the whole pool).

That is what the following patches will be about, you can
only free the whole set of parsed objects.

So if you want to do a separate rev walk, you may need to
instantiate a new repository for it (ideally you'd only need a
separate parsed object store).

I'd want to have the ability to have separate pools for submodules,
such that they can be free'd on a per-repo basis.

> --
> Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-01 21:34 ` [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-02 17:44   ` Duy Nguyen
  2018-05-03 17:24     ` Stefan Beller
  2018-05-02 20:50   ` Jonathan Tan
  2018-05-03 14:58   ` Duy Nguyen
  2 siblings, 1 reply; 95+ messages in thread
From: Duy Nguyen @ 2018-05-02 17:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, jamill

On Tue, May 1, 2018 at 11:34 PM, Stefan Beller <sbeller@google.com> wrote:
>  #include "cache.h"
>  #include "object.h"
> @@ -30,8 +31,25 @@ struct alloc_state {
>         int count; /* total number of nodes allocated */
>         int nr;    /* number of nodes left in current allocation */
>         void *p;   /* first free node in current allocation */
> +
> +       /* bookkeeping of allocations */
> +       void **slabs;

Another way to manage this is linked list: you could reserve one
"object" in each slab to store the "next" (or "prev") pointer to
another slab, then you can just walk through all slabs and free. It's
a bit cheaper than reallocating slabs[], but I guess we reallocate so
few times that readability matters more (whichever way is chosen).

> +       int slab_nr, slab_alloc;
>  };
>
> +void *allocate_alloc_state(void)
> +{
> +       return xcalloc(1, sizeof(struct alloc_state));
> +}
> +
> +void clear_alloc_state(struct alloc_state *s)
> +{
> +       while (s->slab_nr > 0) {
> +               s->slab_nr--;
> +               free(s->slabs[s->slab_nr]);

I think you're leaking memory here. Commit and tree objects may have
more allocations in them (especially trees, but I think we have
commit_list in struct commit too). Those need to be freed as well.

> +       }
> +}
> +
>  static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  {
>         void *ret;
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 01/13] repository: introduce object parser field
  2018-05-02 17:26     ` Stefan Beller
@ 2018-05-02 17:58       ` Duy Nguyen
  0 siblings, 0 replies; 95+ messages in thread
From: Duy Nguyen @ 2018-05-02 17:58 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, Jameson Miller, Jonathan Nieder

On Wed, May 2, 2018 at 7:26 PM, Stefan Beller <sbeller@google.com> wrote:
>> Another suggestion is object_pool, if we keep 'struct object' instead
>> of 'struct parsed_object' and also want to keep current allocation
>> behavior: no individual deallocation. If you free, you free the whole
>> pool (e.g. you could run rev-list --all --objects in a separate pool,
>> once you're done, you delete the whole pool).
>
> That is what the following patches will be about, you can
> only free the whole set of parsed objects.
>
> So if you want to do a separate rev walk, you may need to
> instantiate a new repository for it (ideally you'd only need a
> separate parsed object store).

I'm not sure if it's a good idea to create a separate struct
repository just because you want to free this parsed object store.
What if updates are made in both repositories? All the cache (well,
mostly the delta base cache in sha1_file.c) will double memory usage
as well. Yeah not ideal. But I guess making rev-list related code use
a separate parsed object store is no small task (and kinda risky as
well since we migrate from global lookup_* functions to local ones and
need to choose the correct parsed object store to look up from)

For your information, there is already a good use case for this
wholesale memory free: if we can free the rev-list related memory
early in pack-objects (e.g. part of repack operation) then it could
lower memory pressure significantly when running on large repos. This
has been discussed a bit lately.
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* RE: [PATCH 00/13] object store: alloc
  2018-05-02 17:01 ` [PATCH 00/13] object store: alloc Duy Nguyen
@ 2018-05-02 18:07   ` Jameson Miller
  2018-05-02 18:22     ` Duy Nguyen
  0 siblings, 1 reply; 95+ messages in thread
From: Jameson Miller @ 2018-05-02 18:07 UTC (permalink / raw)
  To: Duy Nguyen, Stefan Beller; +Cc: Git Mailing List



> -----Original Message-----
> From: Duy Nguyen <pclouds@gmail.com>
> Sent: Wednesday, May 2, 2018 1:02 PM
> To: Stefan Beller <sbeller@google.com>
> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
> <jamill@microsoft.com>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@google.com> wrote:
> > I also debated if it is worth converting alloc.c via this patch series
> > or if it might make more sense to use the new mem-pool by Jameson[1].
> >
> > I vaguely wonder about the performance impact, as the object
> > allocation code seemed to be relevant in the past.
> 
> If I remember correctly, alloc.c was added because malloc() has too high
> overhead per allocation (and we create like millions of them). As long as you
> keep allocation overhead low, it should be ok. Note that we allocate a lot more
> than the mem-pool's main target (cache entries if I remember correctly). We
> may have a couple thousands cache entries.  We already deal with a couple
> million of struct object.

The work to move cache entry allocation onto a memory pool was motivated by
the fact that we are looking at indexes with millions of entries. If there is scaling
concern with the current version of mem-pool, we would like to address it there
as well. Or if there is improvements that can be shared, that would be nice as well.

> --
> Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 00/13] object store: alloc
  2018-05-02 18:07   ` Jameson Miller
@ 2018-05-02 18:22     ` Duy Nguyen
  2018-05-02 18:44       ` Jameson Miller
  2018-05-03 22:45       ` Stefan Beller
  0 siblings, 2 replies; 95+ messages in thread
From: Duy Nguyen @ 2018-05-02 18:22 UTC (permalink / raw)
  To: Jameson Miller; +Cc: Stefan Beller, Git Mailing List

On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jamill@microsoft.com> wrote:
>
>
>> -----Original Message-----
>> From: Duy Nguyen <pclouds@gmail.com>
>> Sent: Wednesday, May 2, 2018 1:02 PM
>> To: Stefan Beller <sbeller@google.com>
>> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
>> <jamill@microsoft.com>
>> Subject: Re: [PATCH 00/13] object store: alloc
>>
>> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@google.com> wrote:
>> > I also debated if it is worth converting alloc.c via this patch series
>> > or if it might make more sense to use the new mem-pool by Jameson[1].
>> >
>> > I vaguely wonder about the performance impact, as the object
>> > allocation code seemed to be relevant in the past.
>>
>> If I remember correctly, alloc.c was added because malloc() has too high
>> overhead per allocation (and we create like millions of them). As long as you
>> keep allocation overhead low, it should be ok. Note that we allocate a lot more
>> than the mem-pool's main target (cache entries if I remember correctly). We
>> may have a couple thousands cache entries.  We already deal with a couple
>> million of struct object.
>
> The work to move cache entry allocation onto a memory pool was motivated by
> the fact that we are looking at indexes with millions of entries. If there is scaling
> concern with the current version of mem-pool, we would like to address it there
> as well. Or if there is improvements that can be shared, that would be nice as well.

I think the two have quite different characteristics. alloc.c code is
driven by overhead. struct blob is only 24 bytes each and about 1/3
the repo is blobs, and each malloc has 16 bytes overhead or so if I
remember correctly. struct cache_entry at minimum in 88 bytes so
relative overhead is not that a big deal (but sure reducing it is
still very nice).

mem-pool is about allocation speed, but I think that's not a concern
for alloc.c because when we do full rev walk, I think I/O is always
the bottleneck (maybe object lookup as well). I don't see a good way
to have the one memory allocator that satisfyies both to be honest. If
you could allocate fixed-size cache entries most of the time (e.g.
larger entries will be allocated using malloc or something, and should
be a small number), then perhaps we can unify. Or if mem-pool can have
an option to allocated fixed size pieces with no overhead, that would
be great (sorry I still have not read that mem-pool series yet)
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* RE: [PATCH 00/13] object store: alloc
  2018-05-02 18:22     ` Duy Nguyen
@ 2018-05-02 18:44       ` Jameson Miller
  2018-05-03 22:45       ` Stefan Beller
  1 sibling, 0 replies; 95+ messages in thread
From: Jameson Miller @ 2018-05-02 18:44 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Stefan Beller, Git Mailing List



> -----Original Message-----
> From: Duy Nguyen <pclouds@gmail.com>
> Sent: Wednesday, May 2, 2018 2:23 PM
> To: Jameson Miller <jamill@microsoft.com>
> Cc: Stefan Beller <sbeller@google.com>; Git Mailing List <git@vger.kernel.org>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jamill@microsoft.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Duy Nguyen <pclouds@gmail.com>
> >> Sent: Wednesday, May 2, 2018 1:02 PM
> >> To: Stefan Beller <sbeller@google.com>
> >> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
> >> <jamill@microsoft.com>
> >> Subject: Re: [PATCH 00/13] object store: alloc
> >>
> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@google.com>
> wrote:
> >> > I also debated if it is worth converting alloc.c via this patch
> >> > series or if it might make more sense to use the new mem-pool by
> Jameson[1].
> >> >
> >> > I vaguely wonder about the performance impact, as the object
> >> > allocation code seemed to be relevant in the past.
> >>
> >> If I remember correctly, alloc.c was added because malloc() has too
> >> high overhead per allocation (and we create like millions of them).
> >> As long as you keep allocation overhead low, it should be ok. Note
> >> that we allocate a lot more than the mem-pool's main target (cache
> >> entries if I remember correctly). We may have a couple thousands
> >> cache entries.  We already deal with a couple million of struct object.
> >
> > The work to move cache entry allocation onto a memory pool was
> > motivated by the fact that we are looking at indexes with millions of
> > entries. If there is scaling concern with the current version of
> > mem-pool, we would like to address it there as well. Or if there is
> improvements that can be shared, that would be nice as well.
> 
> I think the two have quite different characteristics. alloc.c code is driven by
> overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, and
> each malloc has 16 bytes overhead or so if I remember correctly. struct
> cache_entry at minimum in 88 bytes so relative overhead is not that a big deal
> (but sure reducing it is still very nice).
> 
> mem-pool is about allocation speed, but I think that's not a concern for alloc.c
> because when we do full rev walk, I think I/O is always the bottleneck (maybe
> object lookup as well). I don't see a good way to have the one memory allocator
> that satisfyies both to be honest. If you could allocate fixed-size cache entries
> most of the time (e.g.
> larger entries will be allocated using malloc or something, and should be a small
> number), then perhaps we can unify. Or if mem-pool can have an option to
> allocated fixed size pieces with no overhead, that would be great (sorry I still
> have not read that mem-pool series yet)
> --
> Duy

Thank you for the extra details - the extra context was helpful -
especially the motivations for each of the areas. I agree with
your general analysis, but with the extra point that the memory
pool does allocate memory (variable sized) without any overhead,
except for possible alignment considerations and differences in
bookkeeping the larger "blocks" of memory from which small
allocations are made from - but I don't think this would be
enough to have a meaningful overall impact.

The mem-pool only tracks the pointer to the next available bit of
memory, and the end of the available memory range. It has a
similar constraint in that individual allocations cannot be freed
- you have to free the whole block.

It may be that the requirements are different enough (or the
gains worth it) to have another dedicated pooling allocator, but
I think the current design of the memory pool would satisfy both
consumers, even if the memory considerations are a bigger
motivation for blob structs. I would be interested in your
thoughts if you get the opportunity to read the mem-pool series.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 01/13] repository: introduce object parser field
  2018-05-01 21:33 ` [PATCH 01/13] repository: introduce object parser field Stefan Beller
  2018-05-02 17:17   ` Duy Nguyen
@ 2018-05-02 20:30   ` Jonathan Tan
  1 sibling, 0 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-02 20:30 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jamill, Jonathan Nieder

On Tue,  1 May 2018 14:33:51 -0700
Stefan Beller <sbeller@google.com> wrote:

> Git's object access code can be thought of as containing two layers:
> the raw object store provides access to raw object content, while the
> higher level obj_hash code parses raw objects and keeps track of
> parenthood and other object relationships using 'struct object'.
> Keeping these layers separate should make it easier to find relevant
> functions and to change the implementation of one without having to
> touch the other.

I only understood this after reading the code below. I think this
paragraph can be removed; I don't see its relevance - of course we need
to store metadata about how to load objects somewhere, and caching
objects we have already loaded is a good idea: and the metadata and
cache are two separate things before and after this patch anyway.

> Add an object_parser field to 'struct repository' to prepare obj_hash
> to be handled per repository.  Callers still only use the_repository
> for now --- later patches will adapt them to handle arbitrary
> repositories.

I think this is better reworded as:

  Convert the existing global cache for parsed objects (obj_hash) into
  repository-specific parsed object caches. Existing code that uses
  obj_hash are modified to use the parsed object cache of
  the_repository; future patches will use the parsed object caches of
  other repositories.

> +struct object_parser *object_parser_new(void)
> +{
> +	struct object_parser *o = xmalloc(sizeof(*o));
> +	memset(o, 0, sizeof(*o));
> +	return o;
> +}

I'm not sure that I like this style of code (I prefer the strbuf style
with _INIT and release(), which I think is more flexible) but I don't
feel too strongly about it.

> +struct object_parser {
> +	struct object **obj_hash;
> +	int nr_objs, obj_hash_size;
> +};

object_parser is probably a bad name. I think Duy had some good
suggestions in [1].

[1] https://public-inbox.org/git/CACsJy8CgX6BME=EhiDBfMRzBOYDBNHE6Ouxv4fZC-GW7Rsi81Q@mail.gmail.com/

>  	/*
> -	 * Holds any information related to accessing the raw object content.
> +	 * Holds any information needed to retrieve the raw content
> +	 * of objects. The object_parser uses this to get object
> +	 * content which it then parses.
>  	 */
>  	struct raw_object_store *objects;

I think the additional sentence ("The object_parser uses this...") is
unnecessary and confusing, especially if its identity is going to be one
of storage (like "parsed_objects" implies).

> +	/*
> +	 * State for the object parser. This owns all parsed objects
> +	 * (struct object) so callers do not have to manage their
> +	 * lifetime.
> +	 */
> +	struct object_parser *parsed_objects;

Even after all the commits in this patch set, this does not store any
state for parsing. Maybe just document as:

  All objects in this repository that have been parsed. This structure
  owns all objects it references, so users of "struct object *"
  generally do not need to free them; instead, when a repository is no
  longer used, call object_parser_clear() on this structure.

(And maybe say that the freeing method on struct repository will
automatically call object_parser_clear().)

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 04/13] alloc: add repository argument to alloc_blob_node
  2018-05-01 21:33 ` [PATCH 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
@ 2018-05-02 20:34   ` Jonathan Tan
  0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-02 20:34 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jamill

On Tue,  1 May 2018 14:33:54 -0700
Stefan Beller <sbeller@google.com> wrote:

> Signed-off-by: Stefan Beller <sbeller@google.com>

Add the same boilerplate explanation to this and subsequent commits. If
editing it for every new function name is cumbersome, maybe use this
shortened version:

  This is a small mechanical change; it doesn't change the
  implementation to handle repositories other than the_repository yet.
  Use a macro to catch callers passing a repository other than
  the_repository at compile time.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 12/13] object: allow create_object to handle arbitrary repositories
  2018-05-01 21:34 ` [PATCH 12/13] object: allow create_object " Stefan Beller
@ 2018-05-02 20:36   ` Jonathan Tan
  0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-02 20:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jamill, Jonathan Nieder

On Tue,  1 May 2018 14:34:02 -0700
Stefan Beller <sbeller@google.com> wrote:

> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>

Downloading this patch set and viewing the whole function modified in
this patch shows that globals are no longer referenced, so this is good.
Same comment for patch 11.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-01 21:34 ` [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-02 17:44   ` Duy Nguyen
@ 2018-05-02 20:50   ` Jonathan Tan
  2018-05-03 17:25     ` Stefan Beller
  2018-05-03 14:58   ` Duy Nguyen
  2 siblings, 1 reply; 95+ messages in thread
From: Jonathan Tan @ 2018-05-02 20:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jamill

On Tue,  1 May 2018 14:34:03 -0700
Stefan Beller <sbeller@google.com> wrote:

> +void *allocate_alloc_state(void)
> +{
> +	return xcalloc(1, sizeof(struct alloc_state));
> +}
> +
> +void clear_alloc_state(struct alloc_state *s)
> +{
> +	while (s->slab_nr > 0) {
> +		s->slab_nr--;
> +		free(s->slabs[s->slab_nr]);
> +	}
> +}

These functions are asymmetrical. I understand why it is this way
(because we use pointers, and we want to use FREE_AND_NULL), but would
still prefer _INIT and _release().

>  static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  {
>  	void *ret;
> @@ -45,54 +63,56 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  	ret = s->p;
>  	s->p = (char *)s->p + node_size;
>  	memset(ret, 0, node_size);
> +
> +	ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
> +	s->slabs[s->slab_nr++] = ret;
> +
>  	return ret;
>  }

This unconditionally grows the slabs for each node allocation. Shouldn't
more than one node fit in each slab?

> +extern struct alloc_state the_repository_blob_state;
> +extern struct alloc_state the_repository_tree_state;
> +extern struct alloc_state the_repository_commit_state;
> +extern struct alloc_state the_repository_tag_state;
> +extern struct alloc_state the_repository_object_state;

(Context: these were in alloc.h)

These seem to be used only in object.c, and only in one function - could
we declare them "static" inside that function instead?

> -struct object_parser *object_parser_new(void)
> +struct object_parser *object_parser_new(int is_the_repo)
>  {
>  	struct object_parser *o = xmalloc(sizeof(*o));
>  	memset(o, 0, sizeof(*o));
> +
> +	if (is_the_repo) {
> +		o->blob_state = &the_repository_blob_state;
> +		o->tree_state = &the_repository_tree_state;
> +		o->commit_state = &the_repository_commit_state;
> +		o->tag_state = &the_repository_tag_state;
> +		o->object_state = &the_repository_object_state;
> +	} else {
> +		o->blob_state = allocate_alloc_state();
> +		o->tree_state = allocate_alloc_state();
> +		o->commit_state = allocate_alloc_state();
> +		o->tag_state = allocate_alloc_state();
> +		o->object_state = allocate_alloc_state();
> +	}
>  	return o;
>  }

I don't think saving 5 allocations is worth the code complexity (of the
additional parameter).

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-01 21:34 ` [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-02 17:44   ` Duy Nguyen
  2018-05-02 20:50   ` Jonathan Tan
@ 2018-05-03 14:58   ` Duy Nguyen
  2 siblings, 0 replies; 95+ messages in thread
From: Duy Nguyen @ 2018-05-03 14:58 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, Jameson Miller

On Tue, May 1, 2018 at 11:34 PM, Stefan Beller <sbeller@google.com> wrote:
> @@ -501,9 +516,12 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void object_parser_clear(struct object_parser *o)
>  {
>         /*
> -        * TOOD free objects in o->obj_hash.
> -        *

You need to free(o->obj_hash) too. If you just want to reuse existing
obj_hash[] then at least clear it, leave no dangling pointers behind.

>          * As objects are allocated in slabs (see alloc.c), we do
>          * not need to free each object, but each slab instead.
>          */
> +       clear_alloc_state(o->blob_state);
> +       clear_alloc_state(o->tree_state);
> +       clear_alloc_state(o->commit_state);
> +       clear_alloc_state(o->tag_state);
> +       clear_alloc_state(o->object_state);
>  }
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-02 17:44   ` Duy Nguyen
@ 2018-05-03 17:24     ` Stefan Beller
  2018-05-03 17:35       ` Duy Nguyen
  0 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-03 17:24 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jameson Miller

On Wed, May 2, 2018 at 10:44 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, May 1, 2018 at 11:34 PM, Stefan Beller <sbeller@google.com> wrote:
>>  #include "cache.h"
>>  #include "object.h"
>> @@ -30,8 +31,25 @@ struct alloc_state {
>>         int count; /* total number of nodes allocated */
>>         int nr;    /* number of nodes left in current allocation */
>>         void *p;   /* first free node in current allocation */
>> +
>> +       /* bookkeeping of allocations */
>> +       void **slabs;
>
> Another way to manage this is linked list: you could reserve one
> "object" in each slab to store the "next" (or "prev") pointer to
> another slab, then you can just walk through all slabs and free. It's
> a bit cheaper than reallocating slabs[], but I guess we reallocate so
> few times that readability matters more (whichever way is chosen).

This is a good idea. I'll do so in a resend.

>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> +       while (s->slab_nr > 0) {
>> +               s->slab_nr--;
>> +               free(s->slabs[s->slab_nr]);
>
> I think you're leaking memory here. Commit and tree objects may have
> more allocations in them (especially trees, but I think we have
> commit_list in struct commit too). Those need to be freed as well.

I would think that tree and commit memory will be free'd in the
parsed_objects release function? (TODO: I need to add it over there)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-02 20:50   ` Jonathan Tan
@ 2018-05-03 17:25     ` Stefan Beller
  0 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-03 17:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Jameson Miller

On Wed, May 2, 2018 at 1:50 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue,  1 May 2018 14:34:03 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> +void *allocate_alloc_state(void)
>> +{
>> +     return xcalloc(1, sizeof(struct alloc_state));
>> +}
>> +
>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> +     while (s->slab_nr > 0) {
>> +             s->slab_nr--;
>> +             free(s->slabs[s->slab_nr]);
>> +     }
>> +}
>
> These functions are asymmetrical. I understand why it is this way
> (because we use pointers, and we want to use FREE_AND_NULL), but would
> still prefer _INIT and _release().
>
>>  static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>>  {
>>       void *ret;
>> @@ -45,54 +63,56 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>>       ret = s->p;
>>       s->p = (char *)s->p + node_size;
>>       memset(ret, 0, node_size);
>> +
>> +     ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
>> +     s->slabs[s->slab_nr++] = ret;
>> +
>>       return ret;
>>  }
>
> This unconditionally grows the slabs for each node allocation. Shouldn't
> more than one node fit in each slab?

Yes. I'll go with Duy's idea and make it a linked list by using the first
object as a pointer to the next slab.

>
>> +extern struct alloc_state the_repository_blob_state;
>> +extern struct alloc_state the_repository_tree_state;
>> +extern struct alloc_state the_repository_commit_state;
>> +extern struct alloc_state the_repository_tag_state;
>> +extern struct alloc_state the_repository_object_state;
>
> (Context: these were in alloc.h)
>
> These seem to be used only in object.c, and only in one function - could
> we declare them "static" inside that function instead?

ok

>
>> -struct object_parser *object_parser_new(void)
>> +struct object_parser *object_parser_new(int is_the_repo)
>>  {
>>       struct object_parser *o = xmalloc(sizeof(*o));
>>       memset(o, 0, sizeof(*o));
>> +
>> +     if (is_the_repo) {
>> +             o->blob_state = &the_repository_blob_state;
>> +             o->tree_state = &the_repository_tree_state;
>> +             o->commit_state = &the_repository_commit_state;
>> +             o->tag_state = &the_repository_tag_state;
>> +             o->object_state = &the_repository_object_state;
>> +     } else {
>> +             o->blob_state = allocate_alloc_state();
>> +             o->tree_state = allocate_alloc_state();
>> +             o->commit_state = allocate_alloc_state();
>> +             o->tag_state = allocate_alloc_state();
>> +             o->object_state = allocate_alloc_state();
>> +     }
>>       return o;
>>  }
>
> I don't think saving 5 allocations is worth the code complexity (of the
> additional parameter).

Ok, I'll remove this overhead.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-03 17:24     ` Stefan Beller
@ 2018-05-03 17:35       ` Duy Nguyen
  0 siblings, 0 replies; 95+ messages in thread
From: Duy Nguyen @ 2018-05-03 17:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git Mailing List, Jameson Miller

On Thu, May 3, 2018 at 7:24 PM, Stefan Beller <sbeller@google.com> wrote:
>>> +void clear_alloc_state(struct alloc_state *s)
>>> +{
>>> +       while (s->slab_nr > 0) {
>>> +               s->slab_nr--;
>>> +               free(s->slabs[s->slab_nr]);
>>
>> I think you're leaking memory here. Commit and tree objects may have
>> more allocations in them (especially trees, but I think we have
>> commit_list in struct commit too). Those need to be freed as well.
>
> I would think that tree and commit memory will be free'd in the
> parsed_objects release function? (TODO: I need to add it over there)

What release function? I know tree->buffer is often released since you
don't want to keep much in memory. But the user should not be required
to release all objects before calling object_parser_clear(). For
struct commit, I'm pretty we make the commit_list (for commit parents)
and never free. Now we need to do it, somewhere.

Oh you mean object_parser_clear() as release function? Yes. I've been
wondering if it makes more sense to go through obj_hash[] and release
objects before free(obj_hash) than doing it here. It's probably better
to do it there since obj_hash[] would contain the very last pointers
to these objects and look like the right place to release resources.

> Thanks,
> Stefan
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 00/13] object store: alloc
  2018-05-02 18:22     ` Duy Nguyen
  2018-05-02 18:44       ` Jameson Miller
@ 2018-05-03 22:45       ` Stefan Beller
  1 sibling, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-03 22:45 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jameson Miller, Git Mailing List

On Wed, May 2, 2018 at 11:22 AM, Duy Nguyen <pclouds@gmail.com> wrote:

> I think the two have quite different characteristics. alloc.c code is
> driven by overhead. struct blob is only 24 bytes each and about 1/3
> the repo is blobs, and each malloc has 16 bytes overhead or so if I
> remember correctly. struct cache_entry at minimum in 88 bytes so
> relative overhead is not that a big deal (but sure reducing it is
> still very nice).
>
> mem-pool is about allocation speed,

I don't think so, given that we do a linear search in each block allocation.

> but I think that's not a concern
> for alloc.c because when we do full rev walk, I think I/O is always
> the bottleneck (maybe object lookup as well). I don't see a good way
> to have the one memory allocator that satisfyies both to be honest.

By changing the allocation size of a block to be larger than 1024 entries
in alloc. we should lessen the impact of management overhead, and then
the mem pool can be more than feasible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 00/13] object store: alloc
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (13 preceding siblings ...)
  2018-05-02 17:01 ` [PATCH 00/13] object store: alloc Duy Nguyen
@ 2018-05-07 14:05 ` Junio C Hamano
  2018-05-07 20:53   ` Stefan Beller
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
  15 siblings, 1 reply; 95+ messages in thread
From: Junio C Hamano @ 2018-05-07 14:05 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jamill

Stefan Beller <sbeller@google.com> writes:

> This applies on top of sb/oid-object-info and is the logical continuum of
> the series that it builds on; this brings the object store into more of
> Gits code, removing global state, such that reasoning about the state of
> the in-memory representation of the repository is easier.

I am not sure how well this topic is done, but I've queued the
following patch at the tip of the topic to make it compile after
getting merged to integration branches (curiously, the topic by
itself compiled file for whatever reason). I think I haven't send
that fixup patch out, so here it is.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Wed, 2 May 2018 19:09:50 +0900
Subject: [PATCH] alloc.c: include alloc.h

Otherwise the definition in alloc.c would not see the matching decl
in alloc.h, triggering warning from compiler.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/alloc.c b/alloc.c
index 66a3d07ba2..f47a67153b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -16,6 +16,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
-- 
2.17.0-391-g1f1cddd558


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH 00/13] object store: alloc
  2018-05-07 14:05 ` Junio C Hamano
@ 2018-05-07 20:53   ` Stefan Beller
  0 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 20:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jameson Miller

Hi Junio,

On Mon, May 7, 2018 at 7:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> This applies on top of sb/oid-object-info and is the logical continuum of
>> the series that it builds on; this brings the object store into more of
>> Gits code, removing global state, such that reasoning about the state of
>> the in-memory representation of the repository is easier.
>
> I am not sure how well this topic is done, but I've queued the
> following patch at the tip of the topic to make it compile after
> getting merged to integration branches (curiously, the topic by
> itself compiled file for whatever reason).

Thanks for the fixup; I will include it with other fixes in a reroll.

The investigation why it would not compile is not found in alloc.c
but in 1da1580e4c2 (Makefile: detect compiler and enable more
warnings in DEVELOPER=1, 2018-04-14), which enabled
-Werror=missing-prototypes, that requires a prototype which
is found in alloc.h

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v2 00/13] object store: alloc
  2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
                   ` (14 preceding siblings ...)
  2018-05-07 14:05 ` Junio C Hamano
@ 2018-05-07 22:59 ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 01/13] repository: introduce parsed objects field Stefan Beller
                     ` (13 more replies)
  15 siblings, 14 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

v2:
* I decided to stick with alloc.c and not migrate it to the mem-pool for now.
  The reasoning for that is that mem-pool.h would introduce some alignment
  waste, which I really did not want to.
* renamed to struct parsed_object_pool;
* free'd the additional memory of trees and commits.
* do not special case the_repository for allocation purposes
* corrected&polished commit messages
* I used the (soon to be renamed?) branch-diff tool to attach a diff below.
  (I still need to get used to that format, I find an interdiff of the
   branches easier to read, but that would not yield the commit messages)
   
   

v1:
This applies on top of sb/oid-object-info and is the logical continuum of
the series that it builds on; this brings the object store into more of
Gits code, removing global state, such that reasoning about the state of
the in-memory representation of the repository is easier.

My original plan was to convert lookup_commit_graft as the next series,
which would be similar to lookup_replace_object, as in sb/object-store-replace.
The grafts and shallow mechanism are very close to each other, such that
they need to be converted at the same time, both depending on the
"parsed object store" that is introduced in this commit.

The next series will then convert code in {object/blob/tree/commit/tag}.c
hopefully finishing the lookup_* functions.

I also debated if it is worth converting alloc.c via this patch series
or if it might make more sense to use the new mem-pool by Jameson[1].

I vaguely wonder about the performance impact, as the object allocation
code seemed to be relevant in the past.

[1] https://public-inbox.org/git/20180430153122.243976-1-jamill@microsoft.com/

Any comments welcome,
Thanks,
Stefan

Jonathan Nieder (1):
  object: add repository argument to grow_object_hash

Stefan Beller (12):
  repository: introduce parsed objects field
  object: add repository argument to create_object
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  alloc: allow arbitrary repositories for alloc functions

 alloc.c           |  63 +++++++++++++++++-----------
 alloc.h           |  15 +++++++
 blame.c           |   3 +-
 blob.c            |   5 ++-
 cache.h           |   9 ----
 commit.c          |   4 +-
 merge-recursive.c |   3 +-
 object.c          | 105 +++++++++++++++++++++++++++++++++-------------
 object.h          |  18 +++++++-
 repository.c      |   7 ++++
 repository.h      |  13 +++++-
 tag.c             |   4 +-
 tree.c            |   4 +-
 13 files changed, 184 insertions(+), 69 deletions(-)
 create mode 100644 alloc.h


1:  94a4aa2a825 ! 1:  c40aae31a1e repository: introduce object parser field
    @@ -1,23 +1,20 @@
     Author: Stefan Beller <sbeller@google.com>
     
    -    repository: introduce object parser field
    +    repository: introduce parsed objects field
     
    -    Git's object access code can be thought of as containing two layers:
    -    the raw object store provides access to raw object content, while the
    -    higher level obj_hash code parses raw objects and keeps track of
    -    parenthood and other object relationships using 'struct object'.
    -    Keeping these layers separate should make it easier to find relevant
    -    functions and to change the implementation of one without having to
    -    touch the other.
    +    Convert the existing global cache for parsed objects (obj_hash) into
    +    repository-specific parsed object caches. Existing code that uses
    +    obj_hash are modified to use the parsed object cache of
    +    the_repository; future patches will use the parsed object caches of
    +    other repositories.
     
    -    Add an object_parser field to 'struct repository' to prepare obj_hash
    -    to be handled per repository.  Callers still only use the_repository
    -    for now --- later patches will adapt them to handle arbitrary
    -    repositories.
    +    Another future use case for a pool of objects is ease of memory management
    +    in revision walking: If we can free the rev-list related memory early in
    +    pack-objects (e.g. part of repack operation) then it could lower memory
    +    pressure significantly when running on large repos. While this has been
    +    discussed on the mailing list lately, this series doesn't implement this.
     
    -    Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/object.c b/object.c
     --- a/object.c
    @@ -139,9 +136,9 @@
      	}
      }
      
    -+struct object_parser *object_parser_new(void)
    ++struct parsed_object_pool *parsed_object_pool_new(void)
     +{
    -+	struct object_parser *o = xmalloc(sizeof(*o));
    ++	struct parsed_object_pool *o = xmalloc(sizeof(*o));
     +	memset(o, 0, sizeof(*o));
     +	return o;
     +}
    @@ -154,7 +151,7 @@
      	o->packed_git = NULL;
      }
     +
    -+void object_parser_clear(struct object_parser *o)
    ++void parsed_object_pool_clear(struct parsed_object_pool *o)
     +{
     +	/*
     +	 * TOOD free objects in o->obj_hash.
    @@ -171,13 +168,13 @@
      #ifndef OBJECT_H
      #define OBJECT_H
      
    -+struct object_parser {
    ++struct parsed_object_pool {
     +	struct object **obj_hash;
     +	int nr_objs, obj_hash_size;
     +};
     +
    -+struct object_parser *object_parser_new(void);
    -+void object_parser_clear(struct object_parser *o);
    ++struct parsed_object_pool *parsed_object_pool_new(void);
    ++void parsed_object_pool_clear(struct parsed_object_pool *o);
     +
      struct object_list {
      	struct object *item;
    @@ -198,7 +195,7 @@
      
      	the_repo.index = &the_index;
      	the_repo.objects = raw_object_store_new();
    -+	the_repo.parsed_objects = object_parser_new();
    ++	the_repo.parsed_objects = parsed_object_pool_new();
     +
      	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
      }
    @@ -207,7 +204,7 @@
      	memset(repo, 0, sizeof(*repo));
      
      	repo->objects = raw_object_store_new();
    -+	repo->parsed_objects = object_parser_new();
    ++	repo->parsed_objects = parsed_object_pool_new();
      
      	if (repo_init_gitdir(repo, gitdir))
      		goto error;
    @@ -215,7 +212,7 @@
      	raw_object_store_clear(repo->objects);
      	FREE_AND_NULL(repo->objects);
      
    -+	object_parser_clear(repo->parsed_objects);
    ++	parsed_object_pool_clear(repo->parsed_objects);
     +	FREE_AND_NULL(repo->parsed_objects);
     +
      	if (repo->config) {
    @@ -237,11 +234,13 @@
      	struct raw_object_store *objects;
      
     +	/*
    -+	 * State for the object parser. This owns all parsed objects
    -+	 * (struct object) so callers do not have to manage their
    -+	 * lifetime.
    ++	 * All objects in this repository that have been parsed. This structure
    ++	 * owns all objects it references, so users of "struct object *"
    ++	 * generally do not need to free them; instead, when a repository is no
    ++	 * longer used, call parsed_object_pool_clear() on this structure, which
    ++	 * is called by the repositories repo_clear on its desconstruction.
     +	 */
    -+	struct object_parser *parsed_objects;
    ++	struct parsed_object_pool *parsed_objects;
     +
      	/* The store in which the refs are held. */
      	struct ref_store *refs;
2:  9ddead5bb7e ! 2:  4ce05bb8b04 object: add repository argument to create_object
    @@ -7,14 +7,8 @@
         mechanical change; it doesn't change the implementation to handle
         repositories other than the_repository yet.
     
    -    As with the previous commits, use a macro to catch callers passing a
    -    repository other than the_repository at compile time.
    -
    -    Add the cocci patch that converted the callers.
    -
         Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/blob.c b/blob.c
     --- a/blob.c
3:  5ec44fe74de ! 3:  27e1d2621c4 object: add repository argument to grow_object_hash
    @@ -7,12 +7,8 @@
         mechanical change; it doesn't change the implementation to handle
         repositories other than the_repository yet.
     
    -    As with the previous commits, use a macro to catch callers passing a
    -    repository other than the_repository at compile time.
    -
         Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/object.c b/object.c
     --- a/object.c
4:  2a1d0220eb9 ! 4:  f8c7fc9b26f alloc: add repository argument to alloc_blob_node
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_blob_node
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
5:  286cde18ff2 ! 5:  b682e50bd80 alloc: add repository argument to alloc_tree_node
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_tree_node
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
6:  b05f0588f37 ! 6:  c89a3f0ca8b alloc: add repository argument to alloc_commit_node
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_commit_node
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
7:  e8e25cf61fc ! 7:  59e74c6ef43 alloc: add repository argument to alloc_tag_node
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_tag_node
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
8:  dbd4e184121 ! 8:  6c11cf164b1 alloc: add repository argument to alloc_object_node
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_object_node
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
9:  ca4a23489b7 ! 9:  eb824871f17 alloc: add repository argument to alloc_report
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_report
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
10:  f0f84627ce8 ! 10:  76e214a2254 alloc: add repository argument to alloc_commit_index
    @@ -2,8 +2,12 @@
     
         alloc: add repository argument to alloc_commit_index
     
    +    This is a small mechanical change; it doesn't change the
    +    implementation to handle repositories other than the_repository yet.
    +    Use a macro to catch callers passing a repository other than
    +    the_repository at compile time.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
11:  60f4e269bb8 ! 11:  3a13500d480 object: allow grow_object_hash to handle arbitrary repositories
    @@ -2,9 +2,9 @@
     
         object: allow grow_object_hash to handle arbitrary repositories
     
    +    Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/object.c b/object.c
     --- a/object.c
12:  5c0b35cfabf ! 12:  8e06fb06d60 object: allow create_object to handle arbitrary repositories
    @@ -2,9 +2,9 @@
     
         object: allow create_object to handle arbitrary repositories
     
    +    Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/object.c b/object.c
     --- a/object.c
13:  10de7382d15 ! 13:  d15807445ad alloc: allow arbitrary repositories for alloc functions
    @@ -12,7 +12,6 @@
         the_repository.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/alloc.c b/alloc.c
     --- a/alloc.c
    @@ -26,11 +25,15 @@
     + * it maintains all the allocation infrastructure, but even more because it ends
       * up with maximal alignment because it doesn't know what the object alignment
       * for the new allocation is.
    -+ *
    -+ * TODO: Combine this with mem-pool?
       */
    - #include "cache.h"
    - #include "object.h"
    +@@
    + #include "tree.h"
    + #include "commit.h"
    + #include "tag.h"
    ++#include "alloc.h"
    + 
    + #define BLOCKING 1024
    + 
     @@
      	int count; /* total number of nodes allocated */
      	int nr;    /* number of nodes left in current allocation */
    @@ -58,19 +61,24 @@
      {
      	void *ret;
     @@
    + 	if (!s->nr) {
    + 		s->nr = BLOCKING;
    + 		s->p = xmalloc(BLOCKING * node_size);
    ++
    ++		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
    ++		s->slabs[s->slab_nr++] = s->p;
    + 	}
    + 	s->nr--;
    + 	s->count++;
      	ret = s->p;
      	s->p = (char *)s->p + node_size;
      	memset(ret, 0, node_size);
    -+
    -+	ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
    -+	s->slabs[s->slab_nr++] = ret;
     +
      	return ret;
      }
      
     -static struct alloc_state blob_state;
     -void *alloc_blob_node_the_repository(void)
    -+struct alloc_state the_repository_blob_state;
     +void *alloc_blob_node(struct repository *r)
      {
     -	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
    @@ -81,7 +89,6 @@
      
     -static struct alloc_state tree_state;
     -void *alloc_tree_node_the_repository(void)
    -+struct alloc_state the_repository_tree_state;
     +void *alloc_tree_node(struct repository *r)
      {
     -	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
    @@ -92,7 +99,6 @@
      
     -static struct alloc_state tag_state;
     -void *alloc_tag_node_the_repository(void)
    -+struct alloc_state the_repository_tag_state;
     +void *alloc_tag_node(struct repository *r)
      {
     -	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
    @@ -103,7 +109,6 @@
      
     -static struct alloc_state object_state;
     -void *alloc_object_node_the_repository(void)
    -+struct alloc_state the_repository_object_state;
     +void *alloc_object_node(struct repository *r)
      {
     -	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
    @@ -123,7 +128,6 @@
      }
      
     -void *alloc_commit_node_the_repository(void)
    -+struct alloc_state the_repository_commit_state;
     +void *alloc_commit_node(struct repository *r)
      {
     -	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
    @@ -167,12 +171,6 @@
     +void *allocate_alloc_state(void);
     +void clear_alloc_state(struct alloc_state *s);
     +
    -+extern struct alloc_state the_repository_blob_state;
    -+extern struct alloc_state the_repository_tree_state;
    -+extern struct alloc_state the_repository_commit_state;
    -+extern struct alloc_state the_repository_tag_state;
    -+extern struct alloc_state the_repository_object_state;
    -+
     +#endif
     
     diff --git a/blame.c b/blame.c
    @@ -262,40 +260,47 @@
      #include "object-store.h"
      #include "packfile.h"
     @@
    - 	}
    - }
    - 
    --struct object_parser *object_parser_new(void)
    -+struct object_parser *object_parser_new(int is_the_repo)
      {
    - 	struct object_parser *o = xmalloc(sizeof(*o));
    + 	struct parsed_object_pool *o = xmalloc(sizeof(*o));
      	memset(o, 0, sizeof(*o));
     +
    -+	if (is_the_repo) {
    -+		o->blob_state = &the_repository_blob_state;
    -+		o->tree_state = &the_repository_tree_state;
    -+		o->commit_state = &the_repository_commit_state;
    -+		o->tag_state = &the_repository_tag_state;
    -+		o->object_state = &the_repository_object_state;
    -+	} else {
    -+		o->blob_state = allocate_alloc_state();
    -+		o->tree_state = allocate_alloc_state();
    -+		o->commit_state = allocate_alloc_state();
    -+		o->tag_state = allocate_alloc_state();
    -+		o->object_state = allocate_alloc_state();
    -+	}
    ++	o->blob_state = allocate_alloc_state();
    ++	o->tree_state = allocate_alloc_state();
    ++	o->commit_state = allocate_alloc_state();
    ++	o->tag_state = allocate_alloc_state();
    ++	o->object_state = allocate_alloc_state();
    ++
      	return o;
      }
      
     @@
    - void object_parser_clear(struct object_parser *o)
    + void parsed_object_pool_clear(struct parsed_object_pool *o)
      {
      	/*
     -	 * TOOD free objects in o->obj_hash.
     -	 *
      	 * As objects are allocated in slabs (see alloc.c), we do
      	 * not need to free each object, but each slab instead.
    ++	 *
    ++	 * Before doing so, we need to free any additional memory
    ++	 * the objects may hold.
      	 */
    ++	unsigned i;
    ++
    ++	for (i = 0; i < o->obj_hash_size; i++) {
    ++		struct object *obj = o->obj_hash[i];
    ++
    ++		if (!obj)
    ++			continue;
    ++
    ++		if (obj->type == OBJ_TREE) {
    ++			free(((struct tree*)obj)->buffer);
    ++		} else if (obj->type == OBJ_COMMIT) {
    ++			free_commit_list(((struct commit*)obj)->parents);
    ++			free(&((struct commit*)obj)->util);
    ++		}
    ++	}
    ++
     +	clear_alloc_state(o->blob_state);
     +	clear_alloc_state(o->tree_state);
     +	clear_alloc_state(o->commit_state);
    @@ -307,7 +312,7 @@
     --- a/object.h
     +++ b/object.h
     @@
    - struct object_parser {
    + struct parsed_object_pool {
      	struct object **obj_hash;
      	int nr_objs, obj_hash_size;
     +
    @@ -320,33 +325,7 @@
     +	unsigned commit_count;
      };
      
    --struct object_parser *object_parser_new(void);
    -+struct object_parser *object_parser_new(int is_the_repo);
    - void object_parser_clear(struct object_parser *o);
    - 
    - struct object_list {
    -
    -diff --git a/repository.c b/repository.c
    ---- a/repository.c
    -+++ b/repository.c
    -@@
    - 
    - 	the_repo.index = &the_index;
    - 	the_repo.objects = raw_object_store_new();
    --	the_repo.parsed_objects = object_parser_new();
    -+	the_repo.parsed_objects = object_parser_new(1);
    - 
    - 	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
    - }
    -@@
    - 	memset(repo, 0, sizeof(*repo));
    - 
    - 	repo->objects = raw_object_store_new();
    --	repo->parsed_objects = object_parser_new();
    -+	repo->parsed_objects = object_parser_new(0);
    - 
    - 	if (repo_init_gitdir(repo, gitdir))
    - 		goto error;
    + struct parsed_object_pool *parsed_object_pool_new(void);
     
     diff --git a/tag.c b/tag.c
     --- a/tag.c
14:  12e8de9e65c < -:  ----------- alloc.c: include alloc.h


-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v2 01/13] repository: introduce parsed objects field
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-08 17:23     ` Jonathan Tan
  2018-05-07 22:59   ` [PATCH v2 02/13] object: add repository argument to create_object Stefan Beller
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

Convert the existing global cache for parsed objects (obj_hash) into
repository-specific parsed object caches. Existing code that uses
obj_hash are modified to use the parsed object cache of
the_repository; future patches will use the parsed object caches of
other repositories.

Another future use case for a pool of objects is ease of memory management
in revision walking: If we can free the rev-list related memory early in
pack-objects (e.g. part of repack operation) then it could lower memory
pressure significantly when running on large repos. While this has been
discussed on the mailing list lately, this series doesn't implement this.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c     | 63 +++++++++++++++++++++++++++++++++-------------------
 object.h     |  8 +++++++
 repository.c |  7 ++++++
 repository.h | 13 ++++++++++-
 4 files changed, 67 insertions(+), 24 deletions(-)

diff --git a/object.c b/object.c
index 5044d08e96c..f7c624a7ba6 100644
--- a/object.c
+++ b/object.c
@@ -8,17 +8,14 @@
 #include "object-store.h"
 #include "packfile.h"
 
-static struct object **obj_hash;
-static int nr_objs, obj_hash_size;
-
 unsigned int get_max_object_index(void)
 {
-	return obj_hash_size;
+	return the_repository->parsed_objects->obj_hash_size;
 }
 
 struct object *get_indexed_object(unsigned int idx)
 {
-	return obj_hash[idx];
+	return the_repository->parsed_objects->obj_hash[idx];
 }
 
 static const char *object_type_strings[] = {
@@ -90,15 +87,16 @@ struct object *lookup_object(const unsigned char *sha1)
 	unsigned int i, first;
 	struct object *obj;
 
-	if (!obj_hash)
+	if (!the_repository->parsed_objects->obj_hash)
 		return NULL;
 
-	first = i = hash_obj(sha1, obj_hash_size);
-	while ((obj = obj_hash[i]) != NULL) {
+	first = i = hash_obj(sha1,
+			     the_repository->parsed_objects->obj_hash_size);
+	while ((obj = the_repository->parsed_objects->obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->oid.hash))
 			break;
 		i++;
-		if (i == obj_hash_size)
+		if (i == the_repository->parsed_objects->obj_hash_size)
 			i = 0;
 	}
 	if (obj && i != first) {
@@ -107,7 +105,8 @@ struct object *lookup_object(const unsigned char *sha1)
 		 * that we do not need to walk the hash table the next
 		 * time we look for it.
 		 */
-		SWAP(obj_hash[i], obj_hash[first]);
+		SWAP(the_repository->parsed_objects->obj_hash[i],
+		     the_repository->parsed_objects->obj_hash[first]);
 	}
 	return obj;
 }
@@ -124,19 +123,19 @@ static void grow_object_hash(void)
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(obj_hash);
-	obj_hash = new_hash;
-	obj_hash_size = new_hash_size;
+	free(the_repository->parsed_objects->obj_hash);
+	the_repository->parsed_objects->obj_hash = new_hash;
+	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object(const unsigned char *sha1, void *o)
@@ -147,11 +146,12 @@ void *create_object(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (obj_hash_size - 1 <= nr_objs * 2)
+	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
 		grow_object_hash();
 
-	insert_obj_hash(obj, obj_hash, obj_hash_size);
-	nr_objs++;
+	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
+			the_repository->parsed_objects->obj_hash_size);
+	the_repository->parsed_objects->nr_objs++;
 	return obj;
 }
 
@@ -431,8 +431,8 @@ void clear_object_flags(unsigned flags)
 {
 	int i;
 
-	for (i=0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i=0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj)
 			obj->flags &= ~flags;
 	}
@@ -442,13 +442,20 @@ void clear_commit_marks_all(unsigned int flags)
 {
 	int i;
 
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj && obj->type == OBJ_COMMIT)
 			obj->flags &= ~flags;
 	}
 }
 
+struct parsed_object_pool *parsed_object_pool_new(void)
+{
+	struct parsed_object_pool *o = xmalloc(sizeof(*o));
+	memset(o, 0, sizeof(*o));
+	return o;
+}
+
 struct raw_object_store *raw_object_store_new(void)
 {
 	struct raw_object_store *o = xmalloc(sizeof(*o));
@@ -488,3 +495,13 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_all_packs(o);
 	o->packed_git = NULL;
 }
+
+void parsed_object_pool_clear(struct parsed_object_pool *o)
+{
+	/*
+	 * TOOD free objects in o->obj_hash.
+	 *
+	 * As objects are allocated in slabs (see alloc.c), we do
+	 * not need to free each object, but each slab instead.
+	 */
+}
diff --git a/object.h b/object.h
index f13f85b2a94..cecda7da370 100644
--- a/object.h
+++ b/object.h
@@ -1,6 +1,14 @@
 #ifndef OBJECT_H
 #define OBJECT_H
 
+struct parsed_object_pool {
+	struct object **obj_hash;
+	int nr_objs, obj_hash_size;
+};
+
+struct parsed_object_pool *parsed_object_pool_new(void);
+void parsed_object_pool_clear(struct parsed_object_pool *o);
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
diff --git a/repository.c b/repository.c
index a4848c1bd05..c23404677eb 100644
--- a/repository.c
+++ b/repository.c
@@ -2,6 +2,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "config.h"
+#include "object.h"
 #include "submodule-config.h"
 
 /* The main repository */
@@ -14,6 +15,8 @@ void initialize_the_repository(void)
 
 	the_repo.index = &the_index;
 	the_repo.objects = raw_object_store_new();
+	the_repo.parsed_objects = parsed_object_pool_new();
+
 	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
@@ -143,6 +146,7 @@ static int repo_init(struct repository *repo,
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
+	repo->parsed_objects = parsed_object_pool_new();
 
 	if (repo_init_gitdir(repo, gitdir))
 		goto error;
@@ -226,6 +230,9 @@ void repo_clear(struct repository *repo)
 	raw_object_store_clear(repo->objects);
 	FREE_AND_NULL(repo->objects);
 
+	parsed_object_pool_clear(repo->parsed_objects);
+	FREE_AND_NULL(repo->parsed_objects);
+
 	if (repo->config) {
 		git_configset_clear(repo->config);
 		FREE_AND_NULL(repo->config);
diff --git a/repository.h b/repository.h
index e6e00f541bd..73389e81afd 100644
--- a/repository.h
+++ b/repository.h
@@ -22,10 +22,21 @@ struct repository {
 	char *commondir;
 
 	/*
-	 * Holds any information related to accessing the raw object content.
+	 * Holds any information needed to retrieve the raw content
+	 * of objects. The object_parser uses this to get object
+	 * content which it then parses.
 	 */
 	struct raw_object_store *objects;
 
+	/*
+	 * All objects in this repository that have been parsed. This structure
+	 * owns all objects it references, so users of "struct object *"
+	 * generally do not need to free them; instead, when a repository is no
+	 * longer used, call parsed_object_pool_clear() on this structure, which
+	 * is called by the repositories repo_clear on its desconstruction.
+	 */
+	struct parsed_object_pool *parsed_objects;
+
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 02/13] object: add repository argument to create_object
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 01/13] repository: introduce parsed objects field Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 03/13] object: add repository argument to grow_object_hash Stefan Beller
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git
  Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller, Jonathan Nieder

Add a repository argument to allow the callers of create_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 blob.c   | 4 +++-
 commit.c | 3 ++-
 object.c | 5 +++--
 object.h | 3 ++-
 tag.c    | 3 ++-
 tree.c   | 3 ++-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/blob.c b/blob.c
index fa2ab4f7a74..85c2143f299 100644
--- a/blob.c
+++ b/blob.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "blob.h"
+#include "repository.h"
 
 const char *blob_type = "blob";
 
@@ -7,7 +8,8 @@ struct blob *lookup_blob(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_blob_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_blob_node());
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/commit.c b/commit.c
index ca474a7c112..9106acf0aad 100644
--- a/commit.c
+++ b/commit.c
@@ -50,7 +50,8 @@ struct commit *lookup_commit(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_commit_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_commit_node());
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/object.c b/object.c
index f7c624a7ba6..2de029275bc 100644
--- a/object.c
+++ b/object.c
@@ -138,7 +138,7 @@ static void grow_object_hash(void)
 	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, void *o)
+void *create_object_the_repository(const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -178,7 +178,8 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		obj = create_object(sha1, alloc_object_node());
+		obj = create_object(the_repository, sha1,
+				    alloc_object_node());
 	return obj;
 }
 
diff --git a/object.h b/object.h
index cecda7da370..2cb0b241083 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,8 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, void *obj);
+#define create_object(r, s, o) create_object_##r(s, o)
+extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
diff --git a/tag.c b/tag.c
index 3d37c1bd251..7150b759d66 100644
--- a/tag.c
+++ b/tag.c
@@ -93,7 +93,8 @@ struct tag *lookup_tag(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tag_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tag_node());
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
diff --git a/tree.c b/tree.c
index 1c68ea586bd..63730e3fb46 100644
--- a/tree.c
+++ b/tree.c
@@ -196,7 +196,8 @@ struct tree *lookup_tree(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tree_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tree_node());
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 03/13] object: add repository argument to grow_object_hash
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 01/13] repository: introduce parsed objects field Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 02/13] object: add repository argument to create_object Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git
  Cc: pclouds, jonathantanmy, gitster, jamill, Jonathan Nieder, Stefan Beller

From: Jonathan Nieder <jrnieder@gmail.com>

Add a repository argument to allow the caller of grow_object_hash to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 2de029275bc..91edc30770c 100644
--- a/object.c
+++ b/object.c
@@ -116,7 +116,8 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-static void grow_object_hash(void)
+#define grow_object_hash(r) grow_object_hash_##r()
+static void grow_object_hash_the_repository(void)
 {
 	int i;
 	/*
@@ -147,7 +148,7 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	hashcpy(obj->oid.hash, sha1);
 
 	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash();
+		grow_object_hash(the_repository);
 
 	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
 			the_repository->parsed_objects->obj_hash_size);
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 04/13] alloc: add repository argument to alloc_blob_node
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (2 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 03/13] object: add repository argument to grow_object_hash Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 blob.c  | 2 +-
 cache.h | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..6c5c376a25a 100644
--- a/alloc.c
+++ b/alloc.c
@@ -49,7 +49,7 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 }
 
 static struct alloc_state blob_state;
-void *alloc_blob_node(void)
+void *alloc_blob_node_the_repository(void)
 {
 	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
diff --git a/blob.c b/blob.c
index 85c2143f299..9e64f301895 100644
--- a/blob.c
+++ b/blob.c
@@ -9,7 +9,7 @@ struct blob *lookup_blob(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_blob_node());
+				     alloc_blob_node(the_repository));
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/cache.h b/cache.h
index 3a4d80e92bf..2258e611275 100644
--- a/cache.h
+++ b/cache.h
@@ -1764,7 +1764,8 @@ int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
 /* alloc.c */
-extern void *alloc_blob_node(void);
+#define alloc_blob_node(r) alloc_blob_node_##r()
+extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 05/13] alloc: add repository argument to alloc_tree_node
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (3 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tree.c  | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 6c5c376a25a..2c8d1430758 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,7 +57,7 @@ void *alloc_blob_node_the_repository(void)
 }
 
 static struct alloc_state tree_state;
-void *alloc_tree_node(void)
+void *alloc_tree_node_the_repository(void)
 {
 	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
diff --git a/cache.h b/cache.h
index 2258e611275..1717d07a2c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1766,7 +1766,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 /* alloc.c */
 #define alloc_blob_node(r) alloc_blob_node_##r()
 extern void *alloc_blob_node_the_repository(void);
-extern void *alloc_tree_node(void);
+#define alloc_tree_node(r) alloc_tree_node_##r()
+extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
diff --git a/tree.c b/tree.c
index 63730e3fb46..58cf19b4fa8 100644
--- a/tree.c
+++ b/tree.c
@@ -197,7 +197,7 @@ struct tree *lookup_tree(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tree_node());
+				     alloc_tree_node(the_repository));
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 06/13] alloc: add repository argument to alloc_commit_node
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (4 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c           | 2 +-
 blame.c           | 2 +-
 cache.h           | 3 ++-
 commit.c          | 2 +-
 merge-recursive.c | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/alloc.c b/alloc.c
index 2c8d1430758..9e2b897ec1d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -88,7 +88,7 @@ unsigned int alloc_commit_index(void)
 	return count++;
 }
 
-void *alloc_commit_node(void)
+void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
diff --git a/blame.c b/blame.c
index dfa24473dc6..ba9b18e7542 100644
--- a/blame.c
+++ b/blame.c
@@ -161,7 +161,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 
 	read_cache();
 	time(&now);
-	commit = alloc_commit_node();
+	commit = alloc_commit_node(the_repository);
 	commit->object.parsed = 1;
 	commit->date = now;
 	parent_tail = &commit->parents;
diff --git a/cache.h b/cache.h
index 1717d07a2c5..bf6e8c87d83 100644
--- a/cache.h
+++ b/cache.h
@@ -1768,7 +1768,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 extern void *alloc_blob_node_the_repository(void);
 #define alloc_tree_node(r) alloc_tree_node_##r()
 extern void *alloc_tree_node_the_repository(void);
-extern void *alloc_commit_node(void);
+#define alloc_commit_node(r) alloc_commit_node_##r()
+extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
diff --git a/commit.c b/commit.c
index 9106acf0aad..a9a43e79bae 100644
--- a/commit.c
+++ b/commit.c
@@ -51,7 +51,7 @@ struct commit *lookup_commit(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_commit_node());
+				     alloc_commit_node(the_repository));
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624da..6dac8908648 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -98,7 +98,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
 {
-	struct commit *commit = alloc_commit_node();
+	struct commit *commit = alloc_commit_node(the_repository);
 
 	set_merge_remote_desc(commit, comment, (struct object *)commit);
 	commit->tree = tree;
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 07/13] alloc: add repository argument to alloc_tag_node
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (5 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tag.c   | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 9e2b897ec1d..290250e3595 100644
--- a/alloc.c
+++ b/alloc.c
@@ -65,7 +65,7 @@ void *alloc_tree_node_the_repository(void)
 }
 
 static struct alloc_state tag_state;
-void *alloc_tag_node(void)
+void *alloc_tag_node_the_repository(void)
 {
 	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
diff --git a/cache.h b/cache.h
index bf6e8c87d83..32f340cde59 100644
--- a/cache.h
+++ b/cache.h
@@ -1770,7 +1770,8 @@ extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node_the_repository(void);
 #define alloc_commit_node(r) alloc_commit_node_##r()
 extern void *alloc_commit_node_the_repository(void);
-extern void *alloc_tag_node(void);
+#define alloc_tag_node(r) alloc_tag_node_##r()
+extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
diff --git a/tag.c b/tag.c
index 7150b759d66..02ef4eaafc0 100644
--- a/tag.c
+++ b/tag.c
@@ -94,7 +94,7 @@ struct tag *lookup_tag(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tag_node());
+				     alloc_tag_node(the_repository));
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 08/13] alloc: add repository argument to alloc_object_node
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (6 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 09/13] alloc: add repository argument to alloc_report Stefan Beller
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c  | 2 +-
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 290250e3595..f031ce422d9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -73,7 +73,7 @@ void *alloc_tag_node_the_repository(void)
 }
 
 static struct alloc_state object_state;
-void *alloc_object_node(void)
+void *alloc_object_node_the_repository(void)
 {
 	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
diff --git a/cache.h b/cache.h
index 32f340cde59..2d60359a964 100644
--- a/cache.h
+++ b/cache.h
@@ -1772,7 +1772,8 @@ extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node_the_repository(void);
 #define alloc_tag_node(r) alloc_tag_node_##r()
 extern void *alloc_tag_node_the_repository(void);
-extern void *alloc_object_node(void);
+#define alloc_object_node(r) alloc_object_node_##r()
+extern void *alloc_object_node_the_repository(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
 
diff --git a/object.c b/object.c
index 91edc30770c..b8c3f923c51 100644
--- a/object.c
+++ b/object.c
@@ -180,7 +180,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		obj = create_object(the_repository, sha1,
-				    alloc_object_node());
+				    alloc_object_node(the_repository));
 	return obj;
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 09/13] alloc: add repository argument to alloc_report
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (7 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index f031ce422d9..28b85b22144 100644
--- a/alloc.c
+++ b/alloc.c
@@ -105,7 +105,7 @@ static void report(const char *name, unsigned int count, size_t size)
 #define REPORT(name, type)	\
     report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
-void alloc_report(void)
+void alloc_report_the_repository(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/cache.h b/cache.h
index 2d60359a964..01cc207d218 100644
--- a/cache.h
+++ b/cache.h
@@ -1774,7 +1774,8 @@ extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node_the_repository(void);
 #define alloc_object_node(r) alloc_object_node_##r()
 extern void *alloc_object_node_the_repository(void);
-extern void alloc_report(void);
+#define alloc_report(r) alloc_report_##r()
+extern void alloc_report_the_repository(void);
 extern unsigned int alloc_commit_index(void);
 
 /* pkt-line.c */
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 10/13] alloc: add repository argument to alloc_commit_index
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (8 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 09/13] alloc: add repository argument to alloc_report Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c  | 4 ++--
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 28b85b22144..277dadd221b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -82,7 +82,7 @@ void *alloc_object_node_the_repository(void)
 
 static struct alloc_state commit_state;
 
-unsigned int alloc_commit_index(void)
+unsigned int alloc_commit_index_the_repository(void)
 {
 	static unsigned int count;
 	return count++;
@@ -92,7 +92,7 @@ void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index();
+	c->index = alloc_commit_index(the_repository);
 	return c;
 }
 
diff --git a/cache.h b/cache.h
index 01cc207d218..0e6c5dd5639 100644
--- a/cache.h
+++ b/cache.h
@@ -1776,7 +1776,8 @@ extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node_the_repository(void);
 #define alloc_report(r) alloc_report_##r()
 extern void alloc_report_the_repository(void);
-extern unsigned int alloc_commit_index(void);
+#define alloc_commit_index(r) alloc_commit_index_##r()
+extern unsigned int alloc_commit_index_the_repository(void);
 
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
diff --git a/object.c b/object.c
index b8c3f923c51..a365a910859 100644
--- a/object.c
+++ b/object.c
@@ -162,7 +162,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
-			((struct commit *)obj)->index = alloc_commit_index();
+			((struct commit *)obj)->index = alloc_commit_index(the_repository);
 		obj->type = type;
 		return obj;
 	}
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 11/13] object: allow grow_object_hash to handle arbitrary repositories
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (9 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 12/13] object: allow create_object " Stefan Beller
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git
  Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller, Jonathan Nieder

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index a365a910859..0fcd6f6df42 100644
--- a/object.c
+++ b/object.c
@@ -116,27 +116,27 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-#define grow_object_hash(r) grow_object_hash_##r()
-static void grow_object_hash_the_repository(void)
+static void grow_object_hash(struct repository *r)
 {
 	int i;
 	/*
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
+	int new_hash_size = r->parsed_objects->obj_hash_size < 32 ? 32 : 2 * r->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
-		struct object *obj = the_repository->parsed_objects->obj_hash[i];
+	for (i = 0; i < r->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = r->parsed_objects->obj_hash[i];
+
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(the_repository->parsed_objects->obj_hash);
-	the_repository->parsed_objects->obj_hash = new_hash;
-	the_repository->parsed_objects->obj_hash_size = new_hash_size;
+	free(r->parsed_objects->obj_hash);
+	r->parsed_objects->obj_hash = new_hash;
+	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object_the_repository(const unsigned char *sha1, void *o)
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 12/13] object: allow create_object to handle arbitrary repositories
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (10 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-07 22:59   ` [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git
  Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller, Jonathan Nieder

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c | 12 ++++++------
 object.h |  3 +--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index 0fcd6f6df42..49b952e9299 100644
--- a/object.c
+++ b/object.c
@@ -139,7 +139,7 @@ static void grow_object_hash(struct repository *r)
 	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object_the_repository(const unsigned char *sha1, void *o)
+void *create_object(struct repository *r, const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -147,12 +147,12 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash(the_repository);
+	if (r->parsed_objects->obj_hash_size - 1 <= r->parsed_objects->nr_objs * 2)
+		grow_object_hash(r);
 
-	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
-			the_repository->parsed_objects->obj_hash_size);
-	the_repository->parsed_objects->nr_objs++;
+	insert_obj_hash(obj, r->parsed_objects->obj_hash,
+			r->parsed_objects->obj_hash_size);
+	r->parsed_objects->nr_objs++;
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 2cb0b241083..b41d7a3accb 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-#define create_object(r, s, o) create_object_##r(s, o)
-extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
+extern void *create_object(struct repository *r, const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (11 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 12/13] object: allow create_object " Stefan Beller
@ 2018-05-07 22:59   ` Stefan Beller
  2018-05-08 10:10     ` Jeff King
                       ` (2 more replies)
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
  13 siblings, 3 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-07 22:59 UTC (permalink / raw)
  To: git; +Cc: pclouds, jonathantanmy, gitster, jamill, Stefan Beller

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 alloc.c           | 63 +++++++++++++++++++++++++++++------------------
 alloc.h           | 15 +++++++++++
 blame.c           |  1 +
 blob.c            |  1 +
 cache.h           | 16 ------------
 commit.c          |  1 +
 merge-recursive.c |  1 +
 object.c          | 34 +++++++++++++++++++++++--
 object.h          |  8 ++++++
 tag.c             |  1 +
 tree.c            |  1 +
 11 files changed, 100 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..cbdbbc1dd2d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
  */
@@ -15,6 +14,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
@@ -30,8 +30,25 @@ struct alloc_state {
 	int count; /* total number of nodes allocated */
 	int nr;    /* number of nodes left in current allocation */
 	void *p;   /* first free node in current allocation */
+
+	/* bookkeeping of allocations */
+	void **slabs;
+	int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+	return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+	while (s->slab_nr > 0) {
+		s->slab_nr--;
+		free(s->slabs[s->slab_nr]);
+	}
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
 	void *ret;
@@ -39,60 +56,57 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	if (!s->nr) {
 		s->nr = BLOCKING;
 		s->p = xmalloc(BLOCKING * node_size);
+
+		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+		s->slabs[s->slab_nr++] = s->p;
 	}
 	s->nr--;
 	s->count++;
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+
 	return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+void *alloc_blob_node(struct repository *r)
 {
-	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
 	return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+void *alloc_tree_node(struct repository *r)
 {
-	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
 	return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+void *alloc_tag_node(struct repository *r)
 {
-	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
 	return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+void *alloc_object_node(struct repository *r)
 {
-	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
 	return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-	static unsigned int count;
-	return count++;
+	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+void *alloc_commit_node(struct repository *r)
 {
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(the_repository);
+	c->index = alloc_commit_index(r);
 	return c;
 }
 
@@ -103,9 +117,10 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+    report(#name, r->parsed_objects->name##_state->count, \
+		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
 
-void alloc_report_the_repository(void)
+void alloc_report(struct repository *r)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/alloc.h b/alloc.h
new file mode 100644
index 00000000000..26dcf96020e
--- /dev/null
+++ b/alloc.h
@@ -0,0 +1,15 @@
+#ifndef ALLOC_H
+#define ALLOC_H
+
+void *alloc_blob_node(struct repository *r);
+void *alloc_tree_node(struct repository *r);
+void *alloc_commit_node(struct repository *r);
+void *alloc_tag_node(struct repository *r);
+void *alloc_object_node(struct repository *r);
+void alloc_report(struct repository *r);
+unsigned int alloc_commit_index(struct repository *r);
+
+void *allocate_alloc_state(void);
+void clear_alloc_state(struct alloc_state *s);
+
+#endif
diff --git a/blame.c b/blame.c
index ba9b18e7542..3a11f1ce52b 100644
--- a/blame.c
+++ b/blame.c
@@ -6,6 +6,7 @@
 #include "diffcore.h"
 #include "tag.h"
 #include "blame.h"
+#include "alloc.h"
 
 void blame_origin_decref(struct blame_origin *o)
 {
diff --git a/blob.c b/blob.c
index 9e64f301895..458dafa811e 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "repository.h"
+#include "alloc.h"
 
 const char *blob_type = "blob";
 
diff --git a/cache.h b/cache.h
index 0e6c5dd5639..c75559b7d38 100644
--- a/cache.h
+++ b/cache.h
@@ -1763,22 +1763,6 @@ extern const char *excludes_file;
 int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
-/* alloc.c */
-#define alloc_blob_node(r) alloc_blob_node_##r()
-extern void *alloc_blob_node_the_repository(void);
-#define alloc_tree_node(r) alloc_tree_node_##r()
-extern void *alloc_tree_node_the_repository(void);
-#define alloc_commit_node(r) alloc_commit_node_##r()
-extern void *alloc_commit_node_the_repository(void);
-#define alloc_tag_node(r) alloc_tag_node_##r()
-extern void *alloc_tag_node_the_repository(void);
-#define alloc_object_node(r) alloc_object_node_##r()
-extern void *alloc_object_node_the_repository(void);
-#define alloc_report(r) alloc_report_##r()
-extern void alloc_report_the_repository(void);
-#define alloc_commit_index(r) alloc_commit_index_##r()
-extern unsigned int alloc_commit_index_the_repository(void);
-
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
diff --git a/commit.c b/commit.c
index a9a43e79bae..c3b400d5930 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
 #include "commit-slab.h"
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dac8908648..aa086a85089 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -14,6 +14,7 @@
 #include "tree-walk.h"
 #include "diff.h"
 #include "diffcore.h"
+#include "alloc.h"
 #include "tag.h"
 #include "unpack-trees.h"
 #include "string-list.h"
diff --git a/object.c b/object.c
index 49b952e9299..2ea2dc7a1cd 100644
--- a/object.c
+++ b/object.c
@@ -4,6 +4,7 @@
 #include "blob.h"
 #include "tree.h"
 #include "commit.h"
+#include "alloc.h"
 #include "tag.h"
 #include "object-store.h"
 #include "packfile.h"
@@ -455,6 +456,13 @@ struct parsed_object_pool *parsed_object_pool_new(void)
 {
 	struct parsed_object_pool *o = xmalloc(sizeof(*o));
 	memset(o, 0, sizeof(*o));
+
+	o->blob_state = allocate_alloc_state();
+	o->tree_state = allocate_alloc_state();
+	o->commit_state = allocate_alloc_state();
+	o->tag_state = allocate_alloc_state();
+	o->object_state = allocate_alloc_state();
+
 	return o;
 }
 
@@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
 void parsed_object_pool_clear(struct parsed_object_pool *o)
 {
 	/*
-	 * TOOD free objects in o->obj_hash.
-	 *
 	 * As objects are allocated in slabs (see alloc.c), we do
 	 * not need to free each object, but each slab instead.
+	 *
+	 * Before doing so, we need to free any additional memory
+	 * the objects may hold.
 	 */
+	unsigned i;
+
+	for (i = 0; i < o->obj_hash_size; i++) {
+		struct object *obj = o->obj_hash[i];
+
+		if (!obj)
+			continue;
+
+		if (obj->type == OBJ_TREE) {
+			free(((struct tree*)obj)->buffer);
+		} else if (obj->type == OBJ_COMMIT) {
+			free_commit_list(((struct commit*)obj)->parents);
+			free(&((struct commit*)obj)->util);
+		}
+	}
+
+	clear_alloc_state(o->blob_state);
+	clear_alloc_state(o->tree_state);
+	clear_alloc_state(o->commit_state);
+	clear_alloc_state(o->tag_state);
+	clear_alloc_state(o->object_state);
 }
diff --git a/object.h b/object.h
index b41d7a3accb..7916edb4edf 100644
--- a/object.h
+++ b/object.h
@@ -4,6 +4,14 @@
 struct parsed_object_pool {
 	struct object **obj_hash;
 	int nr_objs, obj_hash_size;
+
+	/* TODO: migrate alloc_states to mem-pool? */
+	struct alloc_state *blob_state;
+	struct alloc_state *tree_state;
+	struct alloc_state *commit_state;
+	struct alloc_state *tag_state;
+	struct alloc_state *object_state;
+	unsigned commit_count;
 };
 
 struct parsed_object_pool *parsed_object_pool_new(void);
diff --git a/tag.c b/tag.c
index 02ef4eaafc0..af6a0725b6a 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 
 const char *tag_type = "tag";
diff --git a/tree.c b/tree.c
index 58cf19b4fa8..8f8ef3189af 100644
--- a/tree.c
+++ b/tree.c
@@ -5,6 +5,7 @@
 #include "blob.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "tree-walk.h"
 
 const char *tree_type = "tree";
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-07 22:59   ` [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-08 10:10     ` Jeff King
  2018-05-08 15:00     ` Duy Nguyen
  2018-05-08 17:45     ` Jonathan Tan
  2 siblings, 0 replies; 95+ messages in thread
From: Jeff King @ 2018-05-08 10:10 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, jonathantanmy, gitster, jamill

On Mon, May 07, 2018 at 03:59:16PM -0700, Stefan Beller wrote:

> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void parsed_object_pool_clear(struct parsed_object_pool *o)
> [...]
> +	for (i = 0; i < o->obj_hash_size; i++) {
> +		struct object *obj = o->obj_hash[i];
> +
> +		if (!obj)
> +			continue;
> +
> +		if (obj->type == OBJ_TREE) {
> +			free(((struct tree*)obj)->buffer);
> +		} else if (obj->type == OBJ_COMMIT) {
> +			free_commit_list(((struct commit*)obj)->parents);
> +			free(&((struct commit*)obj)->util);
> +		}
> +	}

Coverity complains about this final free(). I think the "&" is doing an
incorrect extra level of indirection?

That said, I'm not sure if it is safe to blindly free the util field. We
don't necessarily know what downstream code has pointed it to. It may
not be allocated memory[1], or it may even be a more complicated data
structure that has sub-components that need freeing[2].

In the long run, it may be worth trying to get rid of this util field
completely, in favor of having callers use a commit_slab. That has
better memory-ownership semantics, and it would save 8 bytes in struct
commit.

[1] Grepping for "commit->util =", sequencer.c seems to assign pointers
    into other arrays, as well as the "(void *)1".

[2] Most assignments seem to be flex-structs, but blame.c assigns a
    linked list.

-Peff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-07 22:59   ` [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-08 10:10     ` Jeff King
@ 2018-05-08 15:00     ` Duy Nguyen
  2018-05-08 18:38       ` Stefan Beller
  2018-05-08 17:45     ` Jonathan Tan
  2 siblings, 1 reply; 95+ messages in thread
From: Duy Nguyen @ 2018-05-08 15:00 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git Mailing List, Jonathan Tan, Junio C Hamano, Jameson Miller

On Tue, May 8, 2018 at 12:59 AM, Stefan Beller <sbeller@google.com> wrote:
> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void parsed_object_pool_clear(struct parsed_object_pool *o)
>  {
>         /*
> -        * TOOD free objects in o->obj_hash.
> -        *
>          * As objects are allocated in slabs (see alloc.c), we do
>          * not need to free each object, but each slab instead.
> +        *
> +        * Before doing so, we need to free any additional memory
> +        * the objects may hold.
>          */
> +       unsigned i;
> +
> +       for (i = 0; i < o->obj_hash_size; i++) {
> +               struct object *obj = o->obj_hash[i];
> +
> +               if (!obj)
> +                       continue;
> +
> +               if (obj->type == OBJ_TREE) {
> +                       free(((struct tree*)obj)->buffer);

It would be nicer to keep this in separate functions, e.g.
release_tree_node() and release_commit_node() to go with
alloc_xxx_node().

> +               } else if (obj->type == OBJ_COMMIT) {
> +                       free_commit_list(((struct commit*)obj)->parents);
> +                       free(&((struct commit*)obj)->util);
> +               }
> +       }

I still don't see who frees obj_hash[] (or at least clears it if not
freed). If I'm going to use this to free memory in pack-objects then
I'd really prefer obj_hash[] freed because it's a big _big_ array.

Just to be clear, what I mean is

FREE_AND_NULL(o->obj_hash);
o->obj_hash_size = 0;

> +
> +       clear_alloc_state(o->blob_state);
> +       clear_alloc_state(o->tree_state);
> +       clear_alloc_state(o->commit_state);
> +       clear_alloc_state(o->tag_state);
> +       clear_alloc_state(o->object_state);
>  }
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v2 01/13] repository: introduce parsed objects field
  2018-05-07 22:59   ` [PATCH v2 01/13] repository: introduce parsed objects field Stefan Beller
@ 2018-05-08 17:23     ` Jonathan Tan
  0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-08 17:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, gitster, jamill

On Mon,  7 May 2018 15:59:04 -0700
Stefan Beller <sbeller@google.com> wrote:

>  	/*
> -	 * Holds any information related to accessing the raw object content.
> +	 * Holds any information needed to retrieve the raw content
> +	 * of objects. The object_parser uses this to get object
> +	 * content which it then parses.

Update this comment - there is no more object_parser. (Maybe just delete
the last sentence, or specifically name some of the functions that
access this field.)

>  	 */
>  	struct raw_object_store *objects;

Other than that, this patch looks good.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-07 22:59   ` [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-08 10:10     ` Jeff King
  2018-05-08 15:00     ` Duy Nguyen
@ 2018-05-08 17:45     ` Jonathan Tan
  2 siblings, 0 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-08 17:45 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, pclouds, gitster, jamill

On Mon,  7 May 2018 15:59:16 -0700
Stefan Beller <sbeller@google.com> wrote:

> +	for (i = 0; i < o->obj_hash_size; i++) {
> +		struct object *obj = o->obj_hash[i];
> +
> +		if (!obj)
> +			continue;
> +
> +		if (obj->type == OBJ_TREE) {
> +			free(((struct tree*)obj)->buffer);
> +		} else if (obj->type == OBJ_COMMIT) {
> +			free_commit_list(((struct commit*)obj)->parents);
> +			free(&((struct commit*)obj)->util);
> +		}

Besides the other comments by Peff and Duy, should the "tag" field of a
tag object be freed too? It is allocated by xmemdupz in tag.c, and is
not assigned to by any other code (verified by renaming it and then
fixing the compile errors one by one).

Other than that, and other than my small comment on patch 1, this patch
set looks good to me.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-08 15:00     ` Duy Nguyen
@ 2018-05-08 18:38       ` Stefan Beller
  0 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 18:38 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Jonathan Tan, Junio C Hamano, Jameson Miller

On Tue, May 8, 2018 at 8:00 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Tue, May 8, 2018 at 12:59 AM, Stefan Beller <sbeller@google.com> wrote:
>> @@ -501,9 +509,31 @@ void raw_object_store_clear(struct raw_object_store *o)
>>  void parsed_object_pool_clear(struct parsed_object_pool *o)
>>  {
>>         /*
>> -        * TOOD free objects in o->obj_hash.
>> -        *
>>          * As objects are allocated in slabs (see alloc.c), we do
>>          * not need to free each object, but each slab instead.
>> +        *
>> +        * Before doing so, we need to free any additional memory
>> +        * the objects may hold.
>>          */
>> +       unsigned i;
>> +
>> +       for (i = 0; i < o->obj_hash_size; i++) {
>> +               struct object *obj = o->obj_hash[i];
>> +
>> +               if (!obj)
>> +                       continue;
>> +
>> +               if (obj->type == OBJ_TREE) {
>> +                       free(((struct tree*)obj)->buffer);
>
> It would be nicer to keep this in separate functions, e.g.
> release_tree_node() and release_commit_node() to go with
> alloc_xxx_node().

ok, I can introduce that, although it seems unnecessary complicated
for now.

On top of this series I started an experiment (which rewrites alloc
and object.c a whole lot more; for performance reasons), which gets
rid of the multiple alloc_states. There will be only one allocation for
one repository, it can allocate across multiple types without alignment
overhead. It will reduce memory footprint of obj_hash by half, via
storing indexes instead of pointers in there.
That said, the experiment shall not influence the
direction of this series. Will fix.

>> +               } else if (obj->type == OBJ_COMMIT) {
>> +                       free_commit_list(((struct commit*)obj)->parents);
>> +                       free(&((struct commit*)obj)->util);
>> +               }
>> +       }
>
> I still don't see who frees obj_hash[] (or at least clears it if not
> freed). If I'm going to use this to free memory in pack-objects then
> I'd really prefer obj_hash[] freed because it's a big _big_ array.

gah!

> Just to be clear, what I mean is
>
> FREE_AND_NULL(o->obj_hash);
> o->obj_hash_size = 0;

ok, I just put it here, just before the calls
to clear_alloc_state()s.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v3 00/13] object store: alloc
  2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
                     ` (12 preceding siblings ...)
  2018-05-07 22:59   ` [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-08 19:37   ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 01/13] repository: introduce parsed objects field Stefan Beller
                       ` (13 more replies)
  13 siblings, 14 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

v3:

* I used the (soon to be renamed?) branch-diff tool to attach a diff below
  between v2 and v3 
  
* fixed comment in patch 1
* correctly free objects and its hashmap in the last patch.
* drop free'ing the commit->util pointer as we do not know where
  it points to.

v2:
* I decided to stick with alloc.c and not migrate it to the mem-pool for now.
  The reasoning for that is that mem-pool.h would introduce some alignment
  waste, which I really did not want to.
* renamed to struct parsed_object_pool;
* free'd the additional memory of trees and commits.
* do not special case the_repository for allocation purposes
* corrected&polished commit messages
* I used the (soon to be renamed?) branch-diff tool to attach a diff below.
  (I still need to get used to that format, I find an interdiff of the
   branches easier to read, but that would not yield the commit messages)



v1:
This applies on top of sb/oid-object-info and is the logical continuum of
the series that it builds on; this brings the object store into more of
Gits code, removing global state, such that reasoning about the state of
the in-memory representation of the repository is easier.

My original plan was to convert lookup_commit_graft as the next series,
which would be similar to lookup_replace_object, as in sb/object-store-replace.
The grafts and shallow mechanism are very close to each other, such that
they need to be converted at the same time, both depending on the
"parsed object store" that is introduced in this commit.

The next series will then convert code in {object/blob/tree/commit/tag}.c
hopefully finishing the lookup_* functions.

I also debated if it is worth converting alloc.c via this patch series
or if it might make more sense to use the new mem-pool by Jameson[1].

I vaguely wonder about the performance impact, as the object allocation
code seemed to be relevant in the past.

[1] https://public-inbox.org/git/20180430153122.243976-1-jamill@microsoft.com/

Any comments welcome,
Thanks,
Stefan

Jonathan Nieder (1):
  object: add repository argument to grow_object_hash

Stefan Beller (12):
  repository: introduce parsed objects field
  object: add repository argument to create_object
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  alloc: allow arbitrary repositories for alloc functions

 alloc.c           |  79 ++++++++++++++++++++++-----------
 alloc.h           |  23 ++++++++++
 blame.c           |   3 +-
 blob.c            |   5 ++-
 cache.h           |   9 ----
 commit.c          |   4 +-
 merge-recursive.c |   3 +-
 object.c          | 108 ++++++++++++++++++++++++++++++++++------------
 object.h          |  18 +++++++-
 repository.c      |   7 +++
 repository.h      |   9 ++++
 tag.c             |   4 +-
 tree.c            |   4 +-
 13 files changed, 208 insertions(+), 68 deletions(-)
 create mode 100644 alloc.h

-- 
2.17.0.255.g8bfb7c0704

1:  9efc685875b ! 1:  f8e521c7c11 repository: introduce parsed objects field
    @@ -15,7 +15,6 @@
         discussed on the mailing list lately, this series doesn't implement this.
     
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/object.c b/object.c
     --- a/object.c
    @@ -224,13 +223,6 @@
     --- a/repository.h
     +++ b/repository.h
     @@
    - 	char *commondir;
    - 
    - 	/*
    --	 * Holds any information related to accessing the raw object content.
    -+	 * Holds any information needed to retrieve the raw content
    -+	 * of objects. The object_parser uses this to get object
    -+	 * content which it then parses.
      	 */
      	struct raw_object_store *objects;
      
2:  0d41290a9e6 = 2:  55c555b32eb object: add repository argument to create_object
3:  0242ec870f5 = 3:  f1661c9e46a object: add repository argument to grow_object_hash
4:  9a6aeee10db = 4:  f72b25db946 alloc: add repository argument to alloc_blob_node
5:  f7ed8da3909 = 5:  87b7ddda195 alloc: add repository argument to alloc_tree_node
6:  253f1bf5c2a = 6:  4480e916bdf alloc: add repository argument to alloc_commit_node
7:  4f8d3dfd460 = 7:  c3aa2a7c252 alloc: add repository argument to alloc_tag_node
8:  6ce5d5b0f0e = 8:  59d33cfaff2 alloc: add repository argument to alloc_object_node
9:  104f158fc37 = 9:  2ba78c289c1 alloc: add repository argument to alloc_report
10:  38d90052c29 = 10:  10ce6c44d4b alloc: add repository argument to alloc_commit_index
11:  eae3dea5763 = 11:  eae95e75b0b object: allow grow_object_hash to handle arbitrary repositories
12:  d08b382662f = 12:  c6d86c8b5db object: allow create_object to handle arbitrary repositories
13:  f87e600c439 ! 13:  2a56520e7af alloc: allow arbitrary repositories for alloc functions
    @@ -139,6 +139,25 @@
      	return c;
      }
      
    ++void release_tree_node(struct tree *t)
    ++{
    ++	free(t->buffer);
    ++}
    ++
    ++void release_commit_node(struct commit *c)
    ++{
    ++	free_commit_list(c->parents);
    ++	/* TODO: what about commit->util? */
    ++}
    ++
    ++void release_tag_node(struct tag *t)
    ++{
    ++	free(t->tag);
    ++}
    ++
    + static void report(const char *name, unsigned int count, size_t size)
    + {
    + 	fprintf(stderr, "%10s: %8u (%"PRIuMAX" kB)\n",
     @@
      }
      
    @@ -161,6 +180,10 @@
     +#ifndef ALLOC_H
     +#define ALLOC_H
     +
    ++struct tree;
    ++struct commit;
    ++struct tag;
    ++
     +void *alloc_blob_node(struct repository *r);
     +void *alloc_tree_node(struct repository *r);
     +void *alloc_commit_node(struct repository *r);
    @@ -172,6 +195,10 @@
     +void *allocate_alloc_state(void);
     +void clear_alloc_state(struct alloc_state *s);
     +
    ++void release_tree_node(struct tree *t);
    ++void release_commit_node(struct commit *c);
    ++void release_tag_node(struct tag *t);
    ++
     +#endif
     
     diff --git a/blame.c b/blame.c
    @@ -241,25 +268,25 @@
     --- a/merge-recursive.c
     +++ b/merge-recursive.c
     @@
    - #include "tree-walk.h"
      #include "diff.h"
      #include "diffcore.h"
    -+#include "alloc.h"
      #include "tag.h"
    ++#include "alloc.h"
      #include "unpack-trees.h"
      #include "string-list.h"
    + #include "xdiff-interface.h"
     
     diff --git a/object.c b/object.c
     --- a/object.c
     +++ b/object.c
     @@
    - #include "blob.h"
      #include "tree.h"
      #include "commit.h"
    -+#include "alloc.h"
      #include "tag.h"
    ++#include "alloc.h"
      #include "object-store.h"
      #include "packfile.h"
    + 
     @@
      {
      	struct parsed_object_pool *o = xmalloc(sizeof(*o));
    @@ -294,14 +321,17 @@
     +		if (!obj)
     +			continue;
     +
    -+		if (obj->type == OBJ_TREE) {
    -+			free(((struct tree*)obj)->buffer);
    -+		} else if (obj->type == OBJ_COMMIT) {
    -+			free_commit_list(((struct commit*)obj)->parents);
    -+			free(&((struct commit*)obj)->util);
    -+		}
    ++		if (obj->type == OBJ_TREE)
    ++			release_tree_node((struct tree*)obj);
    ++		else if (obj->type == OBJ_COMMIT)
    ++			release_commit_node((struct commit*)obj);
    ++		else if (obj->type == OBJ_TAG)
    ++			release_tag_node((struct tag*)obj);
     +	}
     +
    ++	FREE_AND_NULL(o->obj_hash);
    ++	o->obj_hash_size = 0;
    ++
     +	clear_alloc_state(o->blob_state);
     +	clear_alloc_state(o->tree_state);
     +	clear_alloc_state(o->commit_state);

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v3 01/13] repository: introduce parsed objects field
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 02/13] object: add repository argument to create_object Stefan Beller
                       ` (12 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

Convert the existing global cache for parsed objects (obj_hash) into
repository-specific parsed object caches. Existing code that uses
obj_hash are modified to use the parsed object cache of
the_repository; future patches will use the parsed object caches of
other repositories.

Another future use case for a pool of objects is ease of memory management
in revision walking: If we can free the rev-list related memory early in
pack-objects (e.g. part of repack operation) then it could lower memory
pressure significantly when running on large repos. While this has been
discussed on the mailing list lately, this series doesn't implement this.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c     | 63 +++++++++++++++++++++++++++++++++-------------------
 object.h     |  8 +++++++
 repository.c |  7 ++++++
 repository.h |  9 ++++++++
 4 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/object.c b/object.c
index 5044d08e96c..f7c624a7ba6 100644
--- a/object.c
+++ b/object.c
@@ -8,17 +8,14 @@
 #include "object-store.h"
 #include "packfile.h"
 
-static struct object **obj_hash;
-static int nr_objs, obj_hash_size;
-
 unsigned int get_max_object_index(void)
 {
-	return obj_hash_size;
+	return the_repository->parsed_objects->obj_hash_size;
 }
 
 struct object *get_indexed_object(unsigned int idx)
 {
-	return obj_hash[idx];
+	return the_repository->parsed_objects->obj_hash[idx];
 }
 
 static const char *object_type_strings[] = {
@@ -90,15 +87,16 @@ struct object *lookup_object(const unsigned char *sha1)
 	unsigned int i, first;
 	struct object *obj;
 
-	if (!obj_hash)
+	if (!the_repository->parsed_objects->obj_hash)
 		return NULL;
 
-	first = i = hash_obj(sha1, obj_hash_size);
-	while ((obj = obj_hash[i]) != NULL) {
+	first = i = hash_obj(sha1,
+			     the_repository->parsed_objects->obj_hash_size);
+	while ((obj = the_repository->parsed_objects->obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->oid.hash))
 			break;
 		i++;
-		if (i == obj_hash_size)
+		if (i == the_repository->parsed_objects->obj_hash_size)
 			i = 0;
 	}
 	if (obj && i != first) {
@@ -107,7 +105,8 @@ struct object *lookup_object(const unsigned char *sha1)
 		 * that we do not need to walk the hash table the next
 		 * time we look for it.
 		 */
-		SWAP(obj_hash[i], obj_hash[first]);
+		SWAP(the_repository->parsed_objects->obj_hash[i],
+		     the_repository->parsed_objects->obj_hash[first]);
 	}
 	return obj;
 }
@@ -124,19 +123,19 @@ static void grow_object_hash(void)
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(obj_hash);
-	obj_hash = new_hash;
-	obj_hash_size = new_hash_size;
+	free(the_repository->parsed_objects->obj_hash);
+	the_repository->parsed_objects->obj_hash = new_hash;
+	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object(const unsigned char *sha1, void *o)
@@ -147,11 +146,12 @@ void *create_object(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (obj_hash_size - 1 <= nr_objs * 2)
+	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
 		grow_object_hash();
 
-	insert_obj_hash(obj, obj_hash, obj_hash_size);
-	nr_objs++;
+	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
+			the_repository->parsed_objects->obj_hash_size);
+	the_repository->parsed_objects->nr_objs++;
 	return obj;
 }
 
@@ -431,8 +431,8 @@ void clear_object_flags(unsigned flags)
 {
 	int i;
 
-	for (i=0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i=0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj)
 			obj->flags &= ~flags;
 	}
@@ -442,13 +442,20 @@ void clear_commit_marks_all(unsigned int flags)
 {
 	int i;
 
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj && obj->type == OBJ_COMMIT)
 			obj->flags &= ~flags;
 	}
 }
 
+struct parsed_object_pool *parsed_object_pool_new(void)
+{
+	struct parsed_object_pool *o = xmalloc(sizeof(*o));
+	memset(o, 0, sizeof(*o));
+	return o;
+}
+
 struct raw_object_store *raw_object_store_new(void)
 {
 	struct raw_object_store *o = xmalloc(sizeof(*o));
@@ -488,3 +495,13 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_all_packs(o);
 	o->packed_git = NULL;
 }
+
+void parsed_object_pool_clear(struct parsed_object_pool *o)
+{
+	/*
+	 * TOOD free objects in o->obj_hash.
+	 *
+	 * As objects are allocated in slabs (see alloc.c), we do
+	 * not need to free each object, but each slab instead.
+	 */
+}
diff --git a/object.h b/object.h
index f13f85b2a94..cecda7da370 100644
--- a/object.h
+++ b/object.h
@@ -1,6 +1,14 @@
 #ifndef OBJECT_H
 #define OBJECT_H
 
+struct parsed_object_pool {
+	struct object **obj_hash;
+	int nr_objs, obj_hash_size;
+};
+
+struct parsed_object_pool *parsed_object_pool_new(void);
+void parsed_object_pool_clear(struct parsed_object_pool *o);
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
diff --git a/repository.c b/repository.c
index a4848c1bd05..c23404677eb 100644
--- a/repository.c
+++ b/repository.c
@@ -2,6 +2,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "config.h"
+#include "object.h"
 #include "submodule-config.h"
 
 /* The main repository */
@@ -14,6 +15,8 @@ void initialize_the_repository(void)
 
 	the_repo.index = &the_index;
 	the_repo.objects = raw_object_store_new();
+	the_repo.parsed_objects = parsed_object_pool_new();
+
 	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
@@ -143,6 +146,7 @@ static int repo_init(struct repository *repo,
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
+	repo->parsed_objects = parsed_object_pool_new();
 
 	if (repo_init_gitdir(repo, gitdir))
 		goto error;
@@ -226,6 +230,9 @@ void repo_clear(struct repository *repo)
 	raw_object_store_clear(repo->objects);
 	FREE_AND_NULL(repo->objects);
 
+	parsed_object_pool_clear(repo->parsed_objects);
+	FREE_AND_NULL(repo->parsed_objects);
+
 	if (repo->config) {
 		git_configset_clear(repo->config);
 		FREE_AND_NULL(repo->config);
diff --git a/repository.h b/repository.h
index e6e00f541bd..6d199819905 100644
--- a/repository.h
+++ b/repository.h
@@ -26,6 +26,15 @@ struct repository {
 	 */
 	struct raw_object_store *objects;
 
+	/*
+	 * All objects in this repository that have been parsed. This structure
+	 * owns all objects it references, so users of "struct object *"
+	 * generally do not need to free them; instead, when a repository is no
+	 * longer used, call parsed_object_pool_clear() on this structure, which
+	 * is called by the repositories repo_clear on its desconstruction.
+	 */
+	struct parsed_object_pool *parsed_objects;
+
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 02/13] object: add repository argument to create_object
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 01/13] repository: introduce parsed objects field Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 03/13] object: add repository argument to grow_object_hash Stefan Beller
                       ` (11 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

Add a repository argument to allow the callers of create_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 blob.c   | 4 +++-
 commit.c | 3 ++-
 object.c | 5 +++--
 object.h | 3 ++-
 tag.c    | 3 ++-
 tree.c   | 3 ++-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/blob.c b/blob.c
index fa2ab4f7a74..85c2143f299 100644
--- a/blob.c
+++ b/blob.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "blob.h"
+#include "repository.h"
 
 const char *blob_type = "blob";
 
@@ -7,7 +8,8 @@ struct blob *lookup_blob(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_blob_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_blob_node());
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/commit.c b/commit.c
index ca474a7c112..9106acf0aad 100644
--- a/commit.c
+++ b/commit.c
@@ -50,7 +50,8 @@ struct commit *lookup_commit(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_commit_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_commit_node());
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/object.c b/object.c
index f7c624a7ba6..2de029275bc 100644
--- a/object.c
+++ b/object.c
@@ -138,7 +138,7 @@ static void grow_object_hash(void)
 	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, void *o)
+void *create_object_the_repository(const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -178,7 +178,8 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		obj = create_object(sha1, alloc_object_node());
+		obj = create_object(the_repository, sha1,
+				    alloc_object_node());
 	return obj;
 }
 
diff --git a/object.h b/object.h
index cecda7da370..2cb0b241083 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,8 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, void *obj);
+#define create_object(r, s, o) create_object_##r(s, o)
+extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
diff --git a/tag.c b/tag.c
index 3d37c1bd251..7150b759d66 100644
--- a/tag.c
+++ b/tag.c
@@ -93,7 +93,8 @@ struct tag *lookup_tag(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tag_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tag_node());
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
diff --git a/tree.c b/tree.c
index 1c68ea586bd..63730e3fb46 100644
--- a/tree.c
+++ b/tree.c
@@ -196,7 +196,8 @@ struct tree *lookup_tree(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tree_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tree_node());
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 03/13] object: add repository argument to grow_object_hash
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 01/13] repository: introduce parsed objects field Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 02/13] object: add repository argument to create_object Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
                       ` (10 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Add a repository argument to allow the caller of grow_object_hash to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 2de029275bc..91edc30770c 100644
--- a/object.c
+++ b/object.c
@@ -116,7 +116,8 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-static void grow_object_hash(void)
+#define grow_object_hash(r) grow_object_hash_##r()
+static void grow_object_hash_the_repository(void)
 {
 	int i;
 	/*
@@ -147,7 +148,7 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	hashcpy(obj->oid.hash, sha1);
 
 	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash();
+		grow_object_hash(the_repository);
 
 	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
 			the_repository->parsed_objects->obj_hash_size);
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 04/13] alloc: add repository argument to alloc_blob_node
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (2 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 03/13] object: add repository argument to grow_object_hash Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
                       ` (9 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 blob.c  | 2 +-
 cache.h | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..6c5c376a25a 100644
--- a/alloc.c
+++ b/alloc.c
@@ -49,7 +49,7 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 }
 
 static struct alloc_state blob_state;
-void *alloc_blob_node(void)
+void *alloc_blob_node_the_repository(void)
 {
 	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
diff --git a/blob.c b/blob.c
index 85c2143f299..9e64f301895 100644
--- a/blob.c
+++ b/blob.c
@@ -9,7 +9,7 @@ struct blob *lookup_blob(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_blob_node());
+				     alloc_blob_node(the_repository));
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/cache.h b/cache.h
index 3a4d80e92bf..2258e611275 100644
--- a/cache.h
+++ b/cache.h
@@ -1764,7 +1764,8 @@ int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
 /* alloc.c */
-extern void *alloc_blob_node(void);
+#define alloc_blob_node(r) alloc_blob_node_##r()
+extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 05/13] alloc: add repository argument to alloc_tree_node
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (3 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
                       ` (8 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tree.c  | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 6c5c376a25a..2c8d1430758 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,7 +57,7 @@ void *alloc_blob_node_the_repository(void)
 }
 
 static struct alloc_state tree_state;
-void *alloc_tree_node(void)
+void *alloc_tree_node_the_repository(void)
 {
 	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
diff --git a/cache.h b/cache.h
index 2258e611275..1717d07a2c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1766,7 +1766,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 /* alloc.c */
 #define alloc_blob_node(r) alloc_blob_node_##r()
 extern void *alloc_blob_node_the_repository(void);
-extern void *alloc_tree_node(void);
+#define alloc_tree_node(r) alloc_tree_node_##r()
+extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
diff --git a/tree.c b/tree.c
index 63730e3fb46..58cf19b4fa8 100644
--- a/tree.c
+++ b/tree.c
@@ -197,7 +197,7 @@ struct tree *lookup_tree(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tree_node());
+				     alloc_tree_node(the_repository));
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 06/13] alloc: add repository argument to alloc_commit_node
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (4 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
                       ` (7 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c           | 2 +-
 blame.c           | 2 +-
 cache.h           | 3 ++-
 commit.c          | 2 +-
 merge-recursive.c | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/alloc.c b/alloc.c
index 2c8d1430758..9e2b897ec1d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -88,7 +88,7 @@ unsigned int alloc_commit_index(void)
 	return count++;
 }
 
-void *alloc_commit_node(void)
+void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
diff --git a/blame.c b/blame.c
index dfa24473dc6..ba9b18e7542 100644
--- a/blame.c
+++ b/blame.c
@@ -161,7 +161,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 
 	read_cache();
 	time(&now);
-	commit = alloc_commit_node();
+	commit = alloc_commit_node(the_repository);
 	commit->object.parsed = 1;
 	commit->date = now;
 	parent_tail = &commit->parents;
diff --git a/cache.h b/cache.h
index 1717d07a2c5..bf6e8c87d83 100644
--- a/cache.h
+++ b/cache.h
@@ -1768,7 +1768,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 extern void *alloc_blob_node_the_repository(void);
 #define alloc_tree_node(r) alloc_tree_node_##r()
 extern void *alloc_tree_node_the_repository(void);
-extern void *alloc_commit_node(void);
+#define alloc_commit_node(r) alloc_commit_node_##r()
+extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
diff --git a/commit.c b/commit.c
index 9106acf0aad..a9a43e79bae 100644
--- a/commit.c
+++ b/commit.c
@@ -51,7 +51,7 @@ struct commit *lookup_commit(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_commit_node());
+				     alloc_commit_node(the_repository));
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624da..6dac8908648 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -98,7 +98,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
 {
-	struct commit *commit = alloc_commit_node();
+	struct commit *commit = alloc_commit_node(the_repository);
 
 	set_merge_remote_desc(commit, comment, (struct object *)commit);
 	commit->tree = tree;
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 07/13] alloc: add repository argument to alloc_tag_node
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (5 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
                       ` (6 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tag.c   | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 9e2b897ec1d..290250e3595 100644
--- a/alloc.c
+++ b/alloc.c
@@ -65,7 +65,7 @@ void *alloc_tree_node_the_repository(void)
 }
 
 static struct alloc_state tag_state;
-void *alloc_tag_node(void)
+void *alloc_tag_node_the_repository(void)
 {
 	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
diff --git a/cache.h b/cache.h
index bf6e8c87d83..32f340cde59 100644
--- a/cache.h
+++ b/cache.h
@@ -1770,7 +1770,8 @@ extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node_the_repository(void);
 #define alloc_commit_node(r) alloc_commit_node_##r()
 extern void *alloc_commit_node_the_repository(void);
-extern void *alloc_tag_node(void);
+#define alloc_tag_node(r) alloc_tag_node_##r()
+extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
diff --git a/tag.c b/tag.c
index 7150b759d66..02ef4eaafc0 100644
--- a/tag.c
+++ b/tag.c
@@ -94,7 +94,7 @@ struct tag *lookup_tag(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tag_node());
+				     alloc_tag_node(the_repository));
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 08/13] alloc: add repository argument to alloc_object_node
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (6 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 09/13] alloc: add repository argument to alloc_report Stefan Beller
                       ` (5 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c  | 2 +-
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 290250e3595..f031ce422d9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -73,7 +73,7 @@ void *alloc_tag_node_the_repository(void)
 }
 
 static struct alloc_state object_state;
-void *alloc_object_node(void)
+void *alloc_object_node_the_repository(void)
 {
 	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
diff --git a/cache.h b/cache.h
index 32f340cde59..2d60359a964 100644
--- a/cache.h
+++ b/cache.h
@@ -1772,7 +1772,8 @@ extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node_the_repository(void);
 #define alloc_tag_node(r) alloc_tag_node_##r()
 extern void *alloc_tag_node_the_repository(void);
-extern void *alloc_object_node(void);
+#define alloc_object_node(r) alloc_object_node_##r()
+extern void *alloc_object_node_the_repository(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
 
diff --git a/object.c b/object.c
index 91edc30770c..b8c3f923c51 100644
--- a/object.c
+++ b/object.c
@@ -180,7 +180,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		obj = create_object(the_repository, sha1,
-				    alloc_object_node());
+				    alloc_object_node(the_repository));
 	return obj;
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 09/13] alloc: add repository argument to alloc_report
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (7 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
                       ` (4 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index f031ce422d9..28b85b22144 100644
--- a/alloc.c
+++ b/alloc.c
@@ -105,7 +105,7 @@ static void report(const char *name, unsigned int count, size_t size)
 #define REPORT(name, type)	\
     report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
-void alloc_report(void)
+void alloc_report_the_repository(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/cache.h b/cache.h
index 2d60359a964..01cc207d218 100644
--- a/cache.h
+++ b/cache.h
@@ -1774,7 +1774,8 @@ extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node_the_repository(void);
 #define alloc_object_node(r) alloc_object_node_##r()
 extern void *alloc_object_node_the_repository(void);
-extern void alloc_report(void);
+#define alloc_report(r) alloc_report_##r()
+extern void alloc_report_the_repository(void);
 extern unsigned int alloc_commit_index(void);
 
 /* pkt-line.c */
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 10/13] alloc: add repository argument to alloc_commit_index
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (8 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 09/13] alloc: add repository argument to alloc_report Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
                       ` (3 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c  | 4 ++--
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 28b85b22144..277dadd221b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -82,7 +82,7 @@ void *alloc_object_node_the_repository(void)
 
 static struct alloc_state commit_state;
 
-unsigned int alloc_commit_index(void)
+unsigned int alloc_commit_index_the_repository(void)
 {
 	static unsigned int count;
 	return count++;
@@ -92,7 +92,7 @@ void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index();
+	c->index = alloc_commit_index(the_repository);
 	return c;
 }
 
diff --git a/cache.h b/cache.h
index 01cc207d218..0e6c5dd5639 100644
--- a/cache.h
+++ b/cache.h
@@ -1776,7 +1776,8 @@ extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node_the_repository(void);
 #define alloc_report(r) alloc_report_##r()
 extern void alloc_report_the_repository(void);
-extern unsigned int alloc_commit_index(void);
+#define alloc_commit_index(r) alloc_commit_index_##r()
+extern unsigned int alloc_commit_index_the_repository(void);
 
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
diff --git a/object.c b/object.c
index b8c3f923c51..a365a910859 100644
--- a/object.c
+++ b/object.c
@@ -162,7 +162,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
-			((struct commit *)obj)->index = alloc_commit_index();
+			((struct commit *)obj)->index = alloc_commit_index(the_repository);
 		obj->type = type;
 		return obj;
 	}
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 11/13] object: allow grow_object_hash to handle arbitrary repositories
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (9 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 12/13] object: allow create_object " Stefan Beller
                       ` (2 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index a365a910859..0fcd6f6df42 100644
--- a/object.c
+++ b/object.c
@@ -116,27 +116,27 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-#define grow_object_hash(r) grow_object_hash_##r()
-static void grow_object_hash_the_repository(void)
+static void grow_object_hash(struct repository *r)
 {
 	int i;
 	/*
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
+	int new_hash_size = r->parsed_objects->obj_hash_size < 32 ? 32 : 2 * r->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
-		struct object *obj = the_repository->parsed_objects->obj_hash[i];
+	for (i = 0; i < r->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = r->parsed_objects->obj_hash[i];
+
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(the_repository->parsed_objects->obj_hash);
-	the_repository->parsed_objects->obj_hash = new_hash;
-	the_repository->parsed_objects->obj_hash_size = new_hash_size;
+	free(r->parsed_objects->obj_hash);
+	r->parsed_objects->obj_hash = new_hash;
+	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object_the_repository(const unsigned char *sha1, void *o)
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 12/13] object: allow create_object to handle arbitrary repositories
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (10 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 19:37     ` [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.c | 12 ++++++------
 object.h |  3 +--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index 0fcd6f6df42..49b952e9299 100644
--- a/object.c
+++ b/object.c
@@ -139,7 +139,7 @@ static void grow_object_hash(struct repository *r)
 	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object_the_repository(const unsigned char *sha1, void *o)
+void *create_object(struct repository *r, const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -147,12 +147,12 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash(the_repository);
+	if (r->parsed_objects->obj_hash_size - 1 <= r->parsed_objects->nr_objs * 2)
+		grow_object_hash(r);
 
-	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
-			the_repository->parsed_objects->obj_hash_size);
-	the_repository->parsed_objects->nr_objs++;
+	insert_obj_hash(obj, r->parsed_objects->obj_hash,
+			r->parsed_objects->obj_hash_size);
+	r->parsed_objects->nr_objs++;
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 2cb0b241083..b41d7a3accb 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-#define create_object(r, s, o) create_object_##r(s, o)
-extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
+extern void *create_object(struct repository *r, const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (11 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 12/13] object: allow create_object " Stefan Beller
@ 2018-05-08 19:37     ` Stefan Beller
  2018-05-08 20:04       ` Jonathan Tan
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
  13 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 19:37 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c           | 79 +++++++++++++++++++++++++++++++++--------------
 alloc.h           | 23 ++++++++++++++
 blame.c           |  1 +
 blob.c            |  1 +
 cache.h           | 16 ----------
 commit.c          |  1 +
 merge-recursive.c |  1 +
 object.c          | 37 ++++++++++++++++++++--
 object.h          |  8 +++++
 tag.c             |  1 +
 tree.c            |  1 +
 11 files changed, 127 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..4ecf0f160f4 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
  */
@@ -15,6 +14,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
@@ -30,8 +30,25 @@ struct alloc_state {
 	int count; /* total number of nodes allocated */
 	int nr;    /* number of nodes left in current allocation */
 	void *p;   /* first free node in current allocation */
+
+	/* bookkeeping of allocations */
+	void **slabs;
+	int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+	return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+	while (s->slab_nr > 0) {
+		s->slab_nr--;
+		free(s->slabs[s->slab_nr]);
+	}
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
 	void *ret;
@@ -39,63 +56,76 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	if (!s->nr) {
 		s->nr = BLOCKING;
 		s->p = xmalloc(BLOCKING * node_size);
+
+		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+		s->slabs[s->slab_nr++] = s->p;
 	}
 	s->nr--;
 	s->count++;
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+
 	return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+void *alloc_blob_node(struct repository *r)
 {
-	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
 	return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+void *alloc_tree_node(struct repository *r)
 {
-	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
 	return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+void *alloc_tag_node(struct repository *r)
 {
-	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
 	return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+void *alloc_object_node(struct repository *r)
 {
-	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
 	return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-	static unsigned int count;
-	return count++;
+	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+void *alloc_commit_node(struct repository *r)
 {
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(the_repository);
+	c->index = alloc_commit_index(r);
 	return c;
 }
 
+void release_tree_node(struct tree *t)
+{
+	free(t->buffer);
+}
+
+void release_commit_node(struct commit *c)
+{
+	free_commit_list(c->parents);
+	/* TODO: what about commit->util? */
+}
+
+void release_tag_node(struct tag *t)
+{
+	free(t->tag);
+}
+
 static void report(const char *name, unsigned int count, size_t size)
 {
 	fprintf(stderr, "%10s: %8u (%"PRIuMAX" kB)\n",
@@ -103,9 +133,10 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+    report(#name, r->parsed_objects->name##_state->count, \
+		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
 
-void alloc_report_the_repository(void)
+void alloc_report(struct repository *r)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/alloc.h b/alloc.h
new file mode 100644
index 00000000000..941d71960fb
--- /dev/null
+++ b/alloc.h
@@ -0,0 +1,23 @@
+#ifndef ALLOC_H
+#define ALLOC_H
+
+struct tree;
+struct commit;
+struct tag;
+
+void *alloc_blob_node(struct repository *r);
+void *alloc_tree_node(struct repository *r);
+void *alloc_commit_node(struct repository *r);
+void *alloc_tag_node(struct repository *r);
+void *alloc_object_node(struct repository *r);
+void alloc_report(struct repository *r);
+unsigned int alloc_commit_index(struct repository *r);
+
+void *allocate_alloc_state(void);
+void clear_alloc_state(struct alloc_state *s);
+
+void release_tree_node(struct tree *t);
+void release_commit_node(struct commit *c);
+void release_tag_node(struct tag *t);
+
+#endif
diff --git a/blame.c b/blame.c
index ba9b18e7542..3a11f1ce52b 100644
--- a/blame.c
+++ b/blame.c
@@ -6,6 +6,7 @@
 #include "diffcore.h"
 #include "tag.h"
 #include "blame.h"
+#include "alloc.h"
 
 void blame_origin_decref(struct blame_origin *o)
 {
diff --git a/blob.c b/blob.c
index 9e64f301895..458dafa811e 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "repository.h"
+#include "alloc.h"
 
 const char *blob_type = "blob";
 
diff --git a/cache.h b/cache.h
index 0e6c5dd5639..c75559b7d38 100644
--- a/cache.h
+++ b/cache.h
@@ -1763,22 +1763,6 @@ extern const char *excludes_file;
 int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
-/* alloc.c */
-#define alloc_blob_node(r) alloc_blob_node_##r()
-extern void *alloc_blob_node_the_repository(void);
-#define alloc_tree_node(r) alloc_tree_node_##r()
-extern void *alloc_tree_node_the_repository(void);
-#define alloc_commit_node(r) alloc_commit_node_##r()
-extern void *alloc_commit_node_the_repository(void);
-#define alloc_tag_node(r) alloc_tag_node_##r()
-extern void *alloc_tag_node_the_repository(void);
-#define alloc_object_node(r) alloc_object_node_##r()
-extern void *alloc_object_node_the_repository(void);
-#define alloc_report(r) alloc_report_##r()
-extern void alloc_report_the_repository(void);
-#define alloc_commit_index(r) alloc_commit_index_##r()
-extern unsigned int alloc_commit_index_the_repository(void);
-
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
diff --git a/commit.c b/commit.c
index a9a43e79bae..c3b400d5930 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
 #include "commit-slab.h"
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dac8908648..cbded673c28 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -15,6 +15,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "tag.h"
+#include "alloc.h"
 #include "unpack-trees.h"
 #include "string-list.h"
 #include "xdiff-interface.h"
diff --git a/object.c b/object.c
index 49b952e9299..803d34ae189 100644
--- a/object.c
+++ b/object.c
@@ -5,6 +5,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "object-store.h"
 #include "packfile.h"
 
@@ -455,6 +456,13 @@ struct parsed_object_pool *parsed_object_pool_new(void)
 {
 	struct parsed_object_pool *o = xmalloc(sizeof(*o));
 	memset(o, 0, sizeof(*o));
+
+	o->blob_state = allocate_alloc_state();
+	o->tree_state = allocate_alloc_state();
+	o->commit_state = allocate_alloc_state();
+	o->tag_state = allocate_alloc_state();
+	o->object_state = allocate_alloc_state();
+
 	return o;
 }
 
@@ -501,9 +509,34 @@ void raw_object_store_clear(struct raw_object_store *o)
 void parsed_object_pool_clear(struct parsed_object_pool *o)
 {
 	/*
-	 * TOOD free objects in o->obj_hash.
-	 *
 	 * As objects are allocated in slabs (see alloc.c), we do
 	 * not need to free each object, but each slab instead.
+	 *
+	 * Before doing so, we need to free any additional memory
+	 * the objects may hold.
 	 */
+	unsigned i;
+
+	for (i = 0; i < o->obj_hash_size; i++) {
+		struct object *obj = o->obj_hash[i];
+
+		if (!obj)
+			continue;
+
+		if (obj->type == OBJ_TREE)
+			release_tree_node((struct tree*)obj);
+		else if (obj->type == OBJ_COMMIT)
+			release_commit_node((struct commit*)obj);
+		else if (obj->type == OBJ_TAG)
+			release_tag_node((struct tag*)obj);
+	}
+
+	FREE_AND_NULL(o->obj_hash);
+	o->obj_hash_size = 0;
+
+	clear_alloc_state(o->blob_state);
+	clear_alloc_state(o->tree_state);
+	clear_alloc_state(o->commit_state);
+	clear_alloc_state(o->tag_state);
+	clear_alloc_state(o->object_state);
 }
diff --git a/object.h b/object.h
index b41d7a3accb..7916edb4edf 100644
--- a/object.h
+++ b/object.h
@@ -4,6 +4,14 @@
 struct parsed_object_pool {
 	struct object **obj_hash;
 	int nr_objs, obj_hash_size;
+
+	/* TODO: migrate alloc_states to mem-pool? */
+	struct alloc_state *blob_state;
+	struct alloc_state *tree_state;
+	struct alloc_state *commit_state;
+	struct alloc_state *tag_state;
+	struct alloc_state *object_state;
+	unsigned commit_count;
 };
 
 struct parsed_object_pool *parsed_object_pool_new(void);
diff --git a/tag.c b/tag.c
index 02ef4eaafc0..af6a0725b6a 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 
 const char *tag_type = "tag";
diff --git a/tree.c b/tree.c
index 58cf19b4fa8..8f8ef3189af 100644
--- a/tree.c
+++ b/tree.c
@@ -5,6 +5,7 @@
 #include "blob.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "tree-walk.h"
 
 const char *tree_type = "tree";
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-08 19:37     ` [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-08 20:04       ` Jonathan Tan
  2018-05-08 20:37         ` Stefan Beller
  2018-05-09 17:18         ` Duy Nguyen
  0 siblings, 2 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-08 20:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, gitster, jamill, pclouds

On Tue,  8 May 2018 12:37:36 -0700
Stefan Beller <sbeller@google.com> wrote:

> +void clear_alloc_state(struct alloc_state *s)
> +{
> +	while (s->slab_nr > 0) {
> +		s->slab_nr--;
> +		free(s->slabs[s->slab_nr]);
> +	}

I should have caught this earlier, but you need to free s->slabs itself
too.

> +void release_tree_node(struct tree *t);
> +void release_commit_node(struct commit *c);
> +void release_tag_node(struct tag *t);

Do these really need to be defined in alloc.c? I would think that it
would be sufficient to define them as static in object.c.

Having said that, opinions differ (e.g. Duy said he thinks that release_
goes with alloc_ [1]) so I'm OK either way.

[1] https://public-inbox.org/git/CACsJy8D-e-bff3S+LQAMfwB-w8OpkjrfFrV9O5S3ku+M5aAjQA@mail.gmail.com/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-08 20:04       ` Jonathan Tan
@ 2018-05-08 20:37         ` Stefan Beller
  2018-05-09 15:54           ` Duy Nguyen
  2018-05-09 17:18         ` Duy Nguyen
  1 sibling, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-08 20:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Junio C Hamano, Jameson Miller, Duy Nguyen

On Tue, May 8, 2018 at 1:04 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue,  8 May 2018 12:37:36 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> +     while (s->slab_nr > 0) {
>> +             s->slab_nr--;
>> +             free(s->slabs[s->slab_nr]);
>> +     }
>
> I should have caught this earlier, but you need to free s->slabs itself
> too.

ok.

>
>> +void release_tree_node(struct tree *t);
>> +void release_commit_node(struct commit *c);
>> +void release_tag_node(struct tag *t);
>
> Do these really need to be defined in alloc.c? I would think that it
> would be sufficient to define them as static in object.c.
>
> Having said that, opinions differ (e.g. Duy said he thinks that release_
> goes with alloc_ [1]) so I'm OK either way.

I would have preferred static as well, but went with Duys suggestion of
having it in alloc.c.

I can change that.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-08 20:37         ` Stefan Beller
@ 2018-05-09 15:54           ` Duy Nguyen
  0 siblings, 0 replies; 95+ messages in thread
From: Duy Nguyen @ 2018-05-09 15:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, git, Junio C Hamano, Jameson Miller

On Tue, May 8, 2018 at 10:37 PM, Stefan Beller <sbeller@google.com> wrote:
>>> +void release_tree_node(struct tree *t);
>>> +void release_commit_node(struct commit *c);
>>> +void release_tag_node(struct tag *t);
>>
>> Do these really need to be defined in alloc.c? I would think that it
>> would be sufficient to define them as static in object.c.
>>
>> Having said that, opinions differ (e.g. Duy said he thinks that release_
>> goes with alloc_ [1]) so I'm OK either way.
>
> I would have preferred static as well, but went with Duys suggestion of
> having it in alloc.c.
>
> I can change that.

Heh I thought you would make them static ;-) I just wanted to keep
release logic outside that object pool, which is clearer and also
makes it easier to replace it with mem-pool.c later. I'm ok with
making it static. Or if you do export these, please move them close to
the parse_* functions where memory is actually allocated. E.g.
release_commit_node() is moved to commit.c, close to
parse_commit_gently(), release_tree_node() close to
parse_tree_gently().
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-08 20:04       ` Jonathan Tan
  2018-05-08 20:37         ` Stefan Beller
@ 2018-05-09 17:18         ` Duy Nguyen
  2018-05-09 19:20           ` Stefan Beller
  1 sibling, 1 reply; 95+ messages in thread
From: Duy Nguyen @ 2018-05-09 17:18 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Stefan Beller, Git Mailing List, Junio C Hamano, Jameson Miller

On Tue, May 8, 2018 at 10:04 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue,  8 May 2018 12:37:36 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> +void clear_alloc_state(struct alloc_state *s)
>> +{
>> +     while (s->slab_nr > 0) {
>> +             s->slab_nr--;
>> +             free(s->slabs[s->slab_nr]);
>> +     }
>
> I should have caught this earlier, but you need to free s->slabs itself
> too.

And nobody frees 's' either. I'm not saying cler_alloc_state() should,
but somebody should. When I tried repo_clear(the_repository) with
gitster/sb/object-store-alloc I got this

==13250== 944 (32 direct, 912 indirect) bytes in 1 blocks are
definitely lost in loss record 62 of 88
==13250==    at 0x4C2CF25: calloc (vg_replace_malloc.c:718)
==13250==    by 0x1AB7A8: xcalloc (wrapper.c:160)
==13250==    by 0x1BF666: allocate_alloc_state (alloc.c:41)
==13250==    by 0x140090: parsed_object_pool_new (object.c:462)
==13250==    by 0x16CCF4: initialize_the_repository (repository.c:18)
==13250==    by 0x110BD0: main (common-main.c:37)

If you want to reproduce, this is what I used to test this with.

https://gist.github.com/pclouds/86a2df6c28043f1b6fa3d4e72e7a1276
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-09 17:18         ` Duy Nguyen
@ 2018-05-09 19:20           ` Stefan Beller
  2018-05-10 15:43             ` Duy Nguyen
  0 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-09 19:20 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jonathan Tan, Git Mailing List, Junio C Hamano, Jameson Miller

On Wed, May 9, 2018 at 10:18 AM, Duy Nguyen <pclouds@gmail.com> wrote:
>
> If you want to reproduce, this is what I used to test this with.
>
> https://gist.github.com/pclouds/86a2df6c28043f1b6fa3d4e72e7a1276

This only applied cleanly after I created an empty file at
t/helper/test-abc.c, using git-apply. I'll use it to have no leaks here.

Thanks!
Stefan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v4 00/13] object store: alloc
  2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
                       ` (12 preceding siblings ...)
  2018-05-08 19:37     ` [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-10  0:40     ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 01/13] repository: introduce parsed objects field Stefan Beller
                         ` (13 more replies)
  13 siblings, 14 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

v4:
* address the memory issues, an interdiff is below.

v3:

* I used the (soon to be renamed?) branch-diff tool to attach a diff below
  between v2 and v3 
  
* fixed comment in patch 1
* correctly free objects and its hashmap in the last patch.
* drop free'ing the commit->util pointer as we do not know where
  it points to.

v2:
* I decided to stick with alloc.c and not migrate it to the mem-pool for now.
  The reasoning for that is that mem-pool.h would introduce some alignment
  waste, which I really did not want to.
* renamed to struct parsed_object_pool;
* free'd the additional memory of trees and commits.
* do not special case the_repository for allocation purposes
* corrected&polished commit messages
* I used the (soon to be renamed?) branch-diff tool to attach a diff below.
  (I still need to get used to that format, I find an interdiff of the
   branches easier to read, but that would not yield the commit messages)



v1:
This applies on top of sb/oid-object-info and is the logical continuum of
the series that it builds on; this brings the object store into more of
Gits code, removing global state, such that reasoning about the state of
the in-memory representation of the repository is easier.

My original plan was to convert lookup_commit_graft as the next series,
which would be similar to lookup_replace_object, as in sb/object-store-replace.
The grafts and shallow mechanism are very close to each other, such that
they need to be converted at the same time, both depending on the
"parsed object store" that is introduced in this commit.

The next series will then convert code in {object/blob/tree/commit/tag}.c
hopefully finishing the lookup_* functions.

I also debated if it is worth converting alloc.c via this patch series
or if it might make more sense to use the new mem-pool by Jameson[1].

I vaguely wonder about the performance impact, as the object allocation
code seemed to be relevant in the past.

[1] https://public-inbox.org/git/20180430153122.243976-1-jamill@microsoft.com/

Any comments welcome,
Thanks,
Stefan

Jonathan Nieder (1):
  object: add repository argument to grow_object_hash

Stefan Beller (12):
  repository: introduce parsed objects field
  object: add repository argument to create_object
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  alloc: allow arbitrary repositories for alloc functions

 alloc.c           |  65 ++++++++++++++++----------
 alloc.h           |  19 ++++++++
 blame.c           |   3 +-
 blob.c            |   5 +-
 cache.h           |   9 ----
 commit.c          |  11 ++++-
 commit.h          |   6 +++
 merge-recursive.c |   3 +-
 object.c          | 113 ++++++++++++++++++++++++++++++++++------------
 object.h          |  18 +++++++-
 repository.c      |   7 +++
 repository.h      |   9 ++++
 tag.c             |   9 +++-
 tag.h             |   1 +
 tree.c            |   4 +-
 15 files changed, 214 insertions(+), 68 deletions(-)
 create mode 100644 alloc.h

-- 
2.17.0.255.g8bfb7c0704

diff --git c/alloc.c w/alloc.c
index 4ecf0f160f4..714df633169 100644
--- c/alloc.c
+++ w/alloc.c
@@ -47,6 +47,8 @@ void clear_alloc_state(struct alloc_state *s)
 		s->slab_nr--;
 		free(s->slabs[s->slab_nr]);
 	}
+
+	FREE_AND_NULL(s->slabs);
 }
 
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
@@ -110,22 +112,6 @@ void *alloc_commit_node(struct repository *r)
 	return c;
 }
 
-void release_tree_node(struct tree *t)
-{
-	free(t->buffer);
-}
-
-void release_commit_node(struct commit *c)
-{
-	free_commit_list(c->parents);
-	/* TODO: what about commit->util? */
-}
-
-void release_tag_node(struct tag *t)
-{
-	free(t->tag);
-}
-
 static void report(const char *name, unsigned int count, size_t size)
 {
 	fprintf(stderr, "%10s: %8u (%"PRIuMAX" kB)\n",
diff --git c/alloc.h w/alloc.h
index 941d71960fb..3e4e828db48 100644
--- c/alloc.h
+++ w/alloc.h
@@ -16,8 +16,4 @@ unsigned int alloc_commit_index(struct repository *r);
 void *allocate_alloc_state(void);
 void clear_alloc_state(struct alloc_state *s);
 
-void release_tree_node(struct tree *t);
-void release_commit_node(struct commit *c);
-void release_tag_node(struct tag *t);
-
 #endif
diff --git c/commit.c w/commit.c
index c3b400d5930..612ccf7b053 100644
--- c/commit.c
+++ w/commit.c
@@ -297,6 +297,13 @@ void free_commit_buffer(struct commit *commit)
 	}
 }
 
+void release_commit_memory(struct commit *c)
+{
+	free_commit_buffer(c);
+	free_commit_list(c->parents);
+	/* TODO: what about commit->util? */
+}
+
 const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 {
 	struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
diff --git c/commit.h w/commit.h
index 0fb8271665c..2d764ab7d8e 100644
--- c/commit.h
+++ w/commit.h
@@ -99,6 +99,12 @@ void unuse_commit_buffer(const struct commit *, const void *buffer);
  */
 void free_commit_buffer(struct commit *);
 
+/*
+ * Release memory related to a commit, including the parent list and
+ * any cached object buffer.
+ */
+void release_commit_memory(struct commit *c);
+
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
diff --git c/object.c w/object.c
index 803d34ae189..9d5b10d5a20 100644
--- c/object.c
+++ w/object.c
@@ -524,11 +524,11 @@ void parsed_object_pool_clear(struct parsed_object_pool *o)
 			continue;
 
 		if (obj->type == OBJ_TREE)
-			release_tree_node((struct tree*)obj);
+			free_tree_buffer((struct tree*)obj);
 		else if (obj->type == OBJ_COMMIT)
-			release_commit_node((struct commit*)obj);
+			release_commit_memory((struct commit*)obj);
 		else if (obj->type == OBJ_TAG)
-			release_tag_node((struct tag*)obj);
+			free_tag_buffer((struct tag*)obj);
 	}
 
 	FREE_AND_NULL(o->obj_hash);
@@ -539,4 +539,9 @@ void parsed_object_pool_clear(struct parsed_object_pool *o)
 	clear_alloc_state(o->commit_state);
 	clear_alloc_state(o->tag_state);
 	clear_alloc_state(o->object_state);
+	FREE_AND_NULL(o->blob_state);
+	FREE_AND_NULL(o->tree_state);
+	FREE_AND_NULL(o->commit_state);
+	FREE_AND_NULL(o->tag_state);
+	FREE_AND_NULL(o->object_state);
 }
diff --git c/tag.c w/tag.c
index af6a0725b6a..254352c30c6 100644
--- c/tag.c
+++ w/tag.c
@@ -116,6 +116,11 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
 	return parse_timestamp(dateptr, NULL, 10);
 }
 
+void free_tag_buffer(struct tag *t)
+{
+	free(t->tag);
+}
+
 int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
 {
 	struct object_id oid;
diff --git c/tag.h w/tag.h
index d469534e82a..b241fe67bc5 100644
--- c/tag.h
+++ w/tag.h
@@ -15,6 +15,7 @@ struct tag {
 extern struct tag *lookup_tag(const struct object_id *oid);
 extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
+extern void free_tag_buffer(struct tag *t);
 extern struct object *deref_tag(struct object *, const char *, int);
 extern struct object *deref_tag_noverify(struct object *);
 extern int gpg_verify_tag(const struct object_id *oid,

^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 01/13] repository: introduce parsed objects field
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 02/13] object: add repository argument to create_object Stefan Beller
                         ` (12 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

Convert the existing global cache for parsed objects (obj_hash) into
repository-specific parsed object caches. Existing code that uses
obj_hash are modified to use the parsed object cache of
the_repository; future patches will use the parsed object caches of
other repositories.

Another future use case for a pool of objects is ease of memory management
in revision walking: If we can free the rev-list related memory early in
pack-objects (e.g. part of repack operation) then it could lower memory
pressure significantly when running on large repos. While this has been
discussed on the mailing list lately, this series doesn't implement this.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 object.c     | 63 +++++++++++++++++++++++++++++++++-------------------
 object.h     |  8 +++++++
 repository.c |  7 ++++++
 repository.h |  9 ++++++++
 4 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/object.c b/object.c
index 5044d08e96c..f7c624a7ba6 100644
--- a/object.c
+++ b/object.c
@@ -8,17 +8,14 @@
 #include "object-store.h"
 #include "packfile.h"
 
-static struct object **obj_hash;
-static int nr_objs, obj_hash_size;
-
 unsigned int get_max_object_index(void)
 {
-	return obj_hash_size;
+	return the_repository->parsed_objects->obj_hash_size;
 }
 
 struct object *get_indexed_object(unsigned int idx)
 {
-	return obj_hash[idx];
+	return the_repository->parsed_objects->obj_hash[idx];
 }
 
 static const char *object_type_strings[] = {
@@ -90,15 +87,16 @@ struct object *lookup_object(const unsigned char *sha1)
 	unsigned int i, first;
 	struct object *obj;
 
-	if (!obj_hash)
+	if (!the_repository->parsed_objects->obj_hash)
 		return NULL;
 
-	first = i = hash_obj(sha1, obj_hash_size);
-	while ((obj = obj_hash[i]) != NULL) {
+	first = i = hash_obj(sha1,
+			     the_repository->parsed_objects->obj_hash_size);
+	while ((obj = the_repository->parsed_objects->obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->oid.hash))
 			break;
 		i++;
-		if (i == obj_hash_size)
+		if (i == the_repository->parsed_objects->obj_hash_size)
 			i = 0;
 	}
 	if (obj && i != first) {
@@ -107,7 +105,8 @@ struct object *lookup_object(const unsigned char *sha1)
 		 * that we do not need to walk the hash table the next
 		 * time we look for it.
 		 */
-		SWAP(obj_hash[i], obj_hash[first]);
+		SWAP(the_repository->parsed_objects->obj_hash[i],
+		     the_repository->parsed_objects->obj_hash[first]);
 	}
 	return obj;
 }
@@ -124,19 +123,19 @@ static void grow_object_hash(void)
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(obj_hash);
-	obj_hash = new_hash;
-	obj_hash_size = new_hash_size;
+	free(the_repository->parsed_objects->obj_hash);
+	the_repository->parsed_objects->obj_hash = new_hash;
+	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object(const unsigned char *sha1, void *o)
@@ -147,11 +146,12 @@ void *create_object(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (obj_hash_size - 1 <= nr_objs * 2)
+	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
 		grow_object_hash();
 
-	insert_obj_hash(obj, obj_hash, obj_hash_size);
-	nr_objs++;
+	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
+			the_repository->parsed_objects->obj_hash_size);
+	the_repository->parsed_objects->nr_objs++;
 	return obj;
 }
 
@@ -431,8 +431,8 @@ void clear_object_flags(unsigned flags)
 {
 	int i;
 
-	for (i=0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i=0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj)
 			obj->flags &= ~flags;
 	}
@@ -442,13 +442,20 @@ void clear_commit_marks_all(unsigned int flags)
 {
 	int i;
 
-	for (i = 0; i < obj_hash_size; i++) {
-		struct object *obj = obj_hash[i];
+	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = the_repository->parsed_objects->obj_hash[i];
 		if (obj && obj->type == OBJ_COMMIT)
 			obj->flags &= ~flags;
 	}
 }
 
+struct parsed_object_pool *parsed_object_pool_new(void)
+{
+	struct parsed_object_pool *o = xmalloc(sizeof(*o));
+	memset(o, 0, sizeof(*o));
+	return o;
+}
+
 struct raw_object_store *raw_object_store_new(void)
 {
 	struct raw_object_store *o = xmalloc(sizeof(*o));
@@ -488,3 +495,13 @@ void raw_object_store_clear(struct raw_object_store *o)
 	close_all_packs(o);
 	o->packed_git = NULL;
 }
+
+void parsed_object_pool_clear(struct parsed_object_pool *o)
+{
+	/*
+	 * TOOD free objects in o->obj_hash.
+	 *
+	 * As objects are allocated in slabs (see alloc.c), we do
+	 * not need to free each object, but each slab instead.
+	 */
+}
diff --git a/object.h b/object.h
index f13f85b2a94..cecda7da370 100644
--- a/object.h
+++ b/object.h
@@ -1,6 +1,14 @@
 #ifndef OBJECT_H
 #define OBJECT_H
 
+struct parsed_object_pool {
+	struct object **obj_hash;
+	int nr_objs, obj_hash_size;
+};
+
+struct parsed_object_pool *parsed_object_pool_new(void);
+void parsed_object_pool_clear(struct parsed_object_pool *o);
+
 struct object_list {
 	struct object *item;
 	struct object_list *next;
diff --git a/repository.c b/repository.c
index a4848c1bd05..c23404677eb 100644
--- a/repository.c
+++ b/repository.c
@@ -2,6 +2,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "config.h"
+#include "object.h"
 #include "submodule-config.h"
 
 /* The main repository */
@@ -14,6 +15,8 @@ void initialize_the_repository(void)
 
 	the_repo.index = &the_index;
 	the_repo.objects = raw_object_store_new();
+	the_repo.parsed_objects = parsed_object_pool_new();
+
 	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
@@ -143,6 +146,7 @@ static int repo_init(struct repository *repo,
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
+	repo->parsed_objects = parsed_object_pool_new();
 
 	if (repo_init_gitdir(repo, gitdir))
 		goto error;
@@ -226,6 +230,9 @@ void repo_clear(struct repository *repo)
 	raw_object_store_clear(repo->objects);
 	FREE_AND_NULL(repo->objects);
 
+	parsed_object_pool_clear(repo->parsed_objects);
+	FREE_AND_NULL(repo->parsed_objects);
+
 	if (repo->config) {
 		git_configset_clear(repo->config);
 		FREE_AND_NULL(repo->config);
diff --git a/repository.h b/repository.h
index e6e00f541bd..6d199819905 100644
--- a/repository.h
+++ b/repository.h
@@ -26,6 +26,15 @@ struct repository {
 	 */
 	struct raw_object_store *objects;
 
+	/*
+	 * All objects in this repository that have been parsed. This structure
+	 * owns all objects it references, so users of "struct object *"
+	 * generally do not need to free them; instead, when a repository is no
+	 * longer used, call parsed_object_pool_clear() on this structure, which
+	 * is called by the repositories repo_clear on its desconstruction.
+	 */
+	struct parsed_object_pool *parsed_objects;
+
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 02/13] object: add repository argument to create_object
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 01/13] repository: introduce parsed objects field Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 03/13] object: add repository argument to grow_object_hash Stefan Beller
                         ` (11 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

Add a repository argument to allow the callers of create_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 blob.c   | 4 +++-
 commit.c | 3 ++-
 object.c | 5 +++--
 object.h | 3 ++-
 tag.c    | 3 ++-
 tree.c   | 3 ++-
 6 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/blob.c b/blob.c
index fa2ab4f7a74..85c2143f299 100644
--- a/blob.c
+++ b/blob.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "blob.h"
+#include "repository.h"
 
 const char *blob_type = "blob";
 
@@ -7,7 +8,8 @@ struct blob *lookup_blob(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_blob_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_blob_node());
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/commit.c b/commit.c
index ca474a7c112..9106acf0aad 100644
--- a/commit.c
+++ b/commit.c
@@ -50,7 +50,8 @@ struct commit *lookup_commit(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_commit_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_commit_node());
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/object.c b/object.c
index f7c624a7ba6..2de029275bc 100644
--- a/object.c
+++ b/object.c
@@ -138,7 +138,7 @@ static void grow_object_hash(void)
 	the_repository->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, void *o)
+void *create_object_the_repository(const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -178,7 +178,8 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 {
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
-		obj = create_object(sha1, alloc_object_node());
+		obj = create_object(the_repository, sha1,
+				    alloc_object_node());
 	return obj;
 }
 
diff --git a/object.h b/object.h
index cecda7da370..2cb0b241083 100644
--- a/object.h
+++ b/object.h
@@ -93,7 +93,8 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-extern void *create_object(const unsigned char *sha1, void *obj);
+#define create_object(r, s, o) create_object_##r(s, o)
+extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
diff --git a/tag.c b/tag.c
index 3d37c1bd251..7150b759d66 100644
--- a/tag.c
+++ b/tag.c
@@ -93,7 +93,8 @@ struct tag *lookup_tag(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tag_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tag_node());
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
diff --git a/tree.c b/tree.c
index 1c68ea586bd..63730e3fb46 100644
--- a/tree.c
+++ b/tree.c
@@ -196,7 +196,8 @@ struct tree *lookup_tree(const struct object_id *oid)
 {
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
-		return create_object(oid->hash, alloc_tree_node());
+		return create_object(the_repository, oid->hash,
+				     alloc_tree_node());
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 03/13] object: add repository argument to grow_object_hash
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 01/13] repository: introduce parsed objects field Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 02/13] object: add repository argument to create_object Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
                         ` (10 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Add a repository argument to allow the caller of grow_object_hash to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 2de029275bc..91edc30770c 100644
--- a/object.c
+++ b/object.c
@@ -116,7 +116,8 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-static void grow_object_hash(void)
+#define grow_object_hash(r) grow_object_hash_##r()
+static void grow_object_hash_the_repository(void)
 {
 	int i;
 	/*
@@ -147,7 +148,7 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	hashcpy(obj->oid.hash, sha1);
 
 	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash();
+		grow_object_hash(the_repository);
 
 	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
 			the_repository->parsed_objects->obj_hash_size);
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 04/13] alloc: add repository argument to alloc_blob_node
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (2 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 03/13] object: add repository argument to grow_object_hash Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
                         ` (9 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 blob.c  | 2 +-
 cache.h | 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacdd..6c5c376a25a 100644
--- a/alloc.c
+++ b/alloc.c
@@ -49,7 +49,7 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 }
 
 static struct alloc_state blob_state;
-void *alloc_blob_node(void)
+void *alloc_blob_node_the_repository(void)
 {
 	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
diff --git a/blob.c b/blob.c
index 85c2143f299..9e64f301895 100644
--- a/blob.c
+++ b/blob.c
@@ -9,7 +9,7 @@ struct blob *lookup_blob(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_blob_node());
+				     alloc_blob_node(the_repository));
 	return object_as_type(obj, OBJ_BLOB, 0);
 }
 
diff --git a/cache.h b/cache.h
index 3a4d80e92bf..2258e611275 100644
--- a/cache.h
+++ b/cache.h
@@ -1764,7 +1764,8 @@ int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
 /* alloc.c */
-extern void *alloc_blob_node(void);
+#define alloc_blob_node(r) alloc_blob_node_##r()
+extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 05/13] alloc: add repository argument to alloc_tree_node
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (3 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
                         ` (8 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tree.c  | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 6c5c376a25a..2c8d1430758 100644
--- a/alloc.c
+++ b/alloc.c
@@ -57,7 +57,7 @@ void *alloc_blob_node_the_repository(void)
 }
 
 static struct alloc_state tree_state;
-void *alloc_tree_node(void)
+void *alloc_tree_node_the_repository(void)
 {
 	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
diff --git a/cache.h b/cache.h
index 2258e611275..1717d07a2c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1766,7 +1766,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 /* alloc.c */
 #define alloc_blob_node(r) alloc_blob_node_##r()
 extern void *alloc_blob_node_the_repository(void);
-extern void *alloc_tree_node(void);
+#define alloc_tree_node(r) alloc_tree_node_##r()
+extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
diff --git a/tree.c b/tree.c
index 63730e3fb46..58cf19b4fa8 100644
--- a/tree.c
+++ b/tree.c
@@ -197,7 +197,7 @@ struct tree *lookup_tree(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tree_node());
+				     alloc_tree_node(the_repository));
 	return object_as_type(obj, OBJ_TREE, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 06/13] alloc: add repository argument to alloc_commit_node
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (4 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
                         ` (7 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c           | 2 +-
 blame.c           | 2 +-
 cache.h           | 3 ++-
 commit.c          | 2 +-
 merge-recursive.c | 2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/alloc.c b/alloc.c
index 2c8d1430758..9e2b897ec1d 100644
--- a/alloc.c
+++ b/alloc.c
@@ -88,7 +88,7 @@ unsigned int alloc_commit_index(void)
 	return count++;
 }
 
-void *alloc_commit_node(void)
+void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
diff --git a/blame.c b/blame.c
index dfa24473dc6..ba9b18e7542 100644
--- a/blame.c
+++ b/blame.c
@@ -161,7 +161,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 
 	read_cache();
 	time(&now);
-	commit = alloc_commit_node();
+	commit = alloc_commit_node(the_repository);
 	commit->object.parsed = 1;
 	commit->date = now;
 	parent_tail = &commit->parents;
diff --git a/cache.h b/cache.h
index 1717d07a2c5..bf6e8c87d83 100644
--- a/cache.h
+++ b/cache.h
@@ -1768,7 +1768,8 @@ void encode_85(char *buf, const unsigned char *data, int bytes);
 extern void *alloc_blob_node_the_repository(void);
 #define alloc_tree_node(r) alloc_tree_node_##r()
 extern void *alloc_tree_node_the_repository(void);
-extern void *alloc_commit_node(void);
+#define alloc_commit_node(r) alloc_commit_node_##r()
+extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
diff --git a/commit.c b/commit.c
index 9106acf0aad..a9a43e79bae 100644
--- a/commit.c
+++ b/commit.c
@@ -51,7 +51,7 @@ struct commit *lookup_commit(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_commit_node());
+				     alloc_commit_node(the_repository));
 	return object_as_type(obj, OBJ_COMMIT, 0);
 }
 
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624da..6dac8908648 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -98,7 +98,7 @@ static struct tree *shift_tree_object(struct tree *one, struct tree *two,
 
 static struct commit *make_virtual_commit(struct tree *tree, const char *comment)
 {
-	struct commit *commit = alloc_commit_node();
+	struct commit *commit = alloc_commit_node(the_repository);
 
 	set_merge_remote_desc(commit, comment, (struct object *)commit);
 	commit->tree = tree;
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 07/13] alloc: add repository argument to alloc_tag_node
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (5 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
                         ` (6 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 tag.c   | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 9e2b897ec1d..290250e3595 100644
--- a/alloc.c
+++ b/alloc.c
@@ -65,7 +65,7 @@ void *alloc_tree_node_the_repository(void)
 }
 
 static struct alloc_state tag_state;
-void *alloc_tag_node(void)
+void *alloc_tag_node_the_repository(void)
 {
 	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
diff --git a/cache.h b/cache.h
index bf6e8c87d83..32f340cde59 100644
--- a/cache.h
+++ b/cache.h
@@ -1770,7 +1770,8 @@ extern void *alloc_blob_node_the_repository(void);
 extern void *alloc_tree_node_the_repository(void);
 #define alloc_commit_node(r) alloc_commit_node_##r()
 extern void *alloc_commit_node_the_repository(void);
-extern void *alloc_tag_node(void);
+#define alloc_tag_node(r) alloc_tag_node_##r()
+extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
diff --git a/tag.c b/tag.c
index 7150b759d66..02ef4eaafc0 100644
--- a/tag.c
+++ b/tag.c
@@ -94,7 +94,7 @@ struct tag *lookup_tag(const struct object_id *oid)
 	struct object *obj = lookup_object(oid->hash);
 	if (!obj)
 		return create_object(the_repository, oid->hash,
-				     alloc_tag_node());
+				     alloc_tag_node(the_repository));
 	return object_as_type(obj, OBJ_TAG, 0);
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 08/13] alloc: add repository argument to alloc_object_node
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (6 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 09/13] alloc: add repository argument to alloc_report Stefan Beller
                         ` (5 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c  | 2 +-
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/alloc.c b/alloc.c
index 290250e3595..f031ce422d9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -73,7 +73,7 @@ void *alloc_tag_node_the_repository(void)
 }
 
 static struct alloc_state object_state;
-void *alloc_object_node(void)
+void *alloc_object_node_the_repository(void)
 {
 	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
diff --git a/cache.h b/cache.h
index 32f340cde59..2d60359a964 100644
--- a/cache.h
+++ b/cache.h
@@ -1772,7 +1772,8 @@ extern void *alloc_tree_node_the_repository(void);
 extern void *alloc_commit_node_the_repository(void);
 #define alloc_tag_node(r) alloc_tag_node_##r()
 extern void *alloc_tag_node_the_repository(void);
-extern void *alloc_object_node(void);
+#define alloc_object_node(r) alloc_object_node_##r()
+extern void *alloc_object_node_the_repository(void);
 extern void alloc_report(void);
 extern unsigned int alloc_commit_index(void);
 
diff --git a/object.c b/object.c
index 91edc30770c..b8c3f923c51 100644
--- a/object.c
+++ b/object.c
@@ -180,7 +180,7 @@ struct object *lookup_unknown_object(const unsigned char *sha1)
 	struct object *obj = lookup_object(sha1);
 	if (!obj)
 		obj = create_object(the_repository, sha1,
-				    alloc_object_node());
+				    alloc_object_node(the_repository));
 	return obj;
 }
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 09/13] alloc: add repository argument to alloc_report
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (7 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
                         ` (4 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c | 2 +-
 cache.h | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index f031ce422d9..28b85b22144 100644
--- a/alloc.c
+++ b/alloc.c
@@ -105,7 +105,7 @@ static void report(const char *name, unsigned int count, size_t size)
 #define REPORT(name, type)	\
     report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
 
-void alloc_report(void)
+void alloc_report_the_repository(void)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/cache.h b/cache.h
index 2d60359a964..01cc207d218 100644
--- a/cache.h
+++ b/cache.h
@@ -1774,7 +1774,8 @@ extern void *alloc_commit_node_the_repository(void);
 extern void *alloc_tag_node_the_repository(void);
 #define alloc_object_node(r) alloc_object_node_##r()
 extern void *alloc_object_node_the_repository(void);
-extern void alloc_report(void);
+#define alloc_report(r) alloc_report_##r()
+extern void alloc_report_the_repository(void);
 extern unsigned int alloc_commit_index(void);
 
 /* pkt-line.c */
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 10/13] alloc: add repository argument to alloc_commit_index
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (8 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 09/13] alloc: add repository argument to alloc_report Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
                         ` (3 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c  | 4 ++--
 cache.h  | 3 ++-
 object.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/alloc.c b/alloc.c
index 28b85b22144..277dadd221b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -82,7 +82,7 @@ void *alloc_object_node_the_repository(void)
 
 static struct alloc_state commit_state;
 
-unsigned int alloc_commit_index(void)
+unsigned int alloc_commit_index_the_repository(void)
 {
 	static unsigned int count;
 	return count++;
@@ -92,7 +92,7 @@ void *alloc_commit_node_the_repository(void)
 {
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index();
+	c->index = alloc_commit_index(the_repository);
 	return c;
 }
 
diff --git a/cache.h b/cache.h
index 01cc207d218..0e6c5dd5639 100644
--- a/cache.h
+++ b/cache.h
@@ -1776,7 +1776,8 @@ extern void *alloc_tag_node_the_repository(void);
 extern void *alloc_object_node_the_repository(void);
 #define alloc_report(r) alloc_report_##r()
 extern void alloc_report_the_repository(void);
-extern unsigned int alloc_commit_index(void);
+#define alloc_commit_index(r) alloc_commit_index_##r()
+extern unsigned int alloc_commit_index_the_repository(void);
 
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
diff --git a/object.c b/object.c
index b8c3f923c51..a365a910859 100644
--- a/object.c
+++ b/object.c
@@ -162,7 +162,7 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 		return obj;
 	else if (obj->type == OBJ_NONE) {
 		if (type == OBJ_COMMIT)
-			((struct commit *)obj)->index = alloc_commit_index();
+			((struct commit *)obj)->index = alloc_commit_index(the_repository);
 		obj->type = type;
 		return obj;
 	}
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 11/13] object: allow grow_object_hash to handle arbitrary repositories
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (9 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 12/13] object: allow create_object " Stefan Beller
                         ` (2 subsequent siblings)
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index a365a910859..0fcd6f6df42 100644
--- a/object.c
+++ b/object.c
@@ -116,27 +116,27 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-#define grow_object_hash(r) grow_object_hash_##r()
-static void grow_object_hash_the_repository(void)
+static void grow_object_hash(struct repository *r)
 {
 	int i;
 	/*
 	 * Note that this size must always be power-of-2 to match hash_obj
 	 * above.
 	 */
-	int new_hash_size = the_repository->parsed_objects->obj_hash_size < 32 ? 32 : 2 * the_repository->parsed_objects->obj_hash_size;
+	int new_hash_size = r->parsed_objects->obj_hash_size < 32 ? 32 : 2 * r->parsed_objects->obj_hash_size;
 	struct object **new_hash;
 
 	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-	for (i = 0; i < the_repository->parsed_objects->obj_hash_size; i++) {
-		struct object *obj = the_repository->parsed_objects->obj_hash[i];
+	for (i = 0; i < r->parsed_objects->obj_hash_size; i++) {
+		struct object *obj = r->parsed_objects->obj_hash[i];
+
 		if (!obj)
 			continue;
 		insert_obj_hash(obj, new_hash, new_hash_size);
 	}
-	free(the_repository->parsed_objects->obj_hash);
-	the_repository->parsed_objects->obj_hash = new_hash;
-	the_repository->parsed_objects->obj_hash_size = new_hash_size;
+	free(r->parsed_objects->obj_hash);
+	r->parsed_objects->obj_hash = new_hash;
+	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
 void *create_object_the_repository(const unsigned char *sha1, void *o)
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 12/13] object: allow create_object to handle arbitrary repositories
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (10 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10  0:40       ` [PATCH v4 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  2018-05-10 17:16       ` [PATCH v4 00/13] object store: alloc Jonathan Tan
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds, Jonathan Nieder

Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object.c | 12 ++++++------
 object.h |  3 +--
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/object.c b/object.c
index 0fcd6f6df42..49b952e9299 100644
--- a/object.c
+++ b/object.c
@@ -139,7 +139,7 @@ static void grow_object_hash(struct repository *r)
 	r->parsed_objects->obj_hash_size = new_hash_size;
 }
 
-void *create_object_the_repository(const unsigned char *sha1, void *o)
+void *create_object(struct repository *r, const unsigned char *sha1, void *o)
 {
 	struct object *obj = o;
 
@@ -147,12 +147,12 @@ void *create_object_the_repository(const unsigned char *sha1, void *o)
 	obj->flags = 0;
 	hashcpy(obj->oid.hash, sha1);
 
-	if (the_repository->parsed_objects->obj_hash_size - 1 <= the_repository->parsed_objects->nr_objs * 2)
-		grow_object_hash(the_repository);
+	if (r->parsed_objects->obj_hash_size - 1 <= r->parsed_objects->nr_objs * 2)
+		grow_object_hash(r);
 
-	insert_obj_hash(obj, the_repository->parsed_objects->obj_hash,
-			the_repository->parsed_objects->obj_hash_size);
-	the_repository->parsed_objects->nr_objs++;
+	insert_obj_hash(obj, r->parsed_objects->obj_hash,
+			r->parsed_objects->obj_hash_size);
+	r->parsed_objects->nr_objs++;
 	return obj;
 }
 
diff --git a/object.h b/object.h
index 2cb0b241083..b41d7a3accb 100644
--- a/object.h
+++ b/object.h
@@ -93,8 +93,7 @@ extern struct object *get_indexed_object(unsigned int);
  */
 struct object *lookup_object(const unsigned char *sha1);
 
-#define create_object(r, s, o) create_object_##r(s, o)
-extern void *create_object_the_repository(const unsigned char *sha1, void *obj);
+extern void *create_object(struct repository *r, const unsigned char *sha1, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v4 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (11 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 12/13] object: allow create_object " Stefan Beller
@ 2018-05-10  0:40       ` Stefan Beller
  2018-05-10 17:16       ` [PATCH v4 00/13] object store: alloc Jonathan Tan
  13 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10  0:40 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, jamill, jonathantanmy, pclouds

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 alloc.c           | 65 ++++++++++++++++++++++++++++++-----------------
 alloc.h           | 19 ++++++++++++++
 blame.c           |  1 +
 blob.c            |  1 +
 cache.h           | 16 ------------
 commit.c          |  8 ++++++
 commit.h          |  6 +++++
 merge-recursive.c |  1 +
 object.c          | 42 ++++++++++++++++++++++++++++--
 object.h          |  8 ++++++
 tag.c             |  6 +++++
 tag.h             |  1 +
 tree.c            |  1 +
 13 files changed, 133 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..714df633169 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
  */
@@ -15,6 +14,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
@@ -30,8 +30,27 @@ struct alloc_state {
 	int count; /* total number of nodes allocated */
 	int nr;    /* number of nodes left in current allocation */
 	void *p;   /* first free node in current allocation */
+
+	/* bookkeeping of allocations */
+	void **slabs;
+	int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+	return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+	while (s->slab_nr > 0) {
+		s->slab_nr--;
+		free(s->slabs[s->slab_nr]);
+	}
+
+	FREE_AND_NULL(s->slabs);
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
 	void *ret;
@@ -39,60 +58,57 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	if (!s->nr) {
 		s->nr = BLOCKING;
 		s->p = xmalloc(BLOCKING * node_size);
+
+		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+		s->slabs[s->slab_nr++] = s->p;
 	}
 	s->nr--;
 	s->count++;
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+
 	return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+void *alloc_blob_node(struct repository *r)
 {
-	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
 	return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+void *alloc_tree_node(struct repository *r)
 {
-	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
 	return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+void *alloc_tag_node(struct repository *r)
 {
-	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
 	return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+void *alloc_object_node(struct repository *r)
 {
-	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
 	return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-	static unsigned int count;
-	return count++;
+	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+void *alloc_commit_node(struct repository *r)
 {
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(the_repository);
+	c->index = alloc_commit_index(r);
 	return c;
 }
 
@@ -103,9 +119,10 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+    report(#name, r->parsed_objects->name##_state->count, \
+		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
 
-void alloc_report_the_repository(void)
+void alloc_report(struct repository *r)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/alloc.h b/alloc.h
new file mode 100644
index 00000000000..3e4e828db48
--- /dev/null
+++ b/alloc.h
@@ -0,0 +1,19 @@
+#ifndef ALLOC_H
+#define ALLOC_H
+
+struct tree;
+struct commit;
+struct tag;
+
+void *alloc_blob_node(struct repository *r);
+void *alloc_tree_node(struct repository *r);
+void *alloc_commit_node(struct repository *r);
+void *alloc_tag_node(struct repository *r);
+void *alloc_object_node(struct repository *r);
+void alloc_report(struct repository *r);
+unsigned int alloc_commit_index(struct repository *r);
+
+void *allocate_alloc_state(void);
+void clear_alloc_state(struct alloc_state *s);
+
+#endif
diff --git a/blame.c b/blame.c
index ba9b18e7542..3a11f1ce52b 100644
--- a/blame.c
+++ b/blame.c
@@ -6,6 +6,7 @@
 #include "diffcore.h"
 #include "tag.h"
 #include "blame.h"
+#include "alloc.h"
 
 void blame_origin_decref(struct blame_origin *o)
 {
diff --git a/blob.c b/blob.c
index 9e64f301895..458dafa811e 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "repository.h"
+#include "alloc.h"
 
 const char *blob_type = "blob";
 
diff --git a/cache.h b/cache.h
index 0e6c5dd5639..c75559b7d38 100644
--- a/cache.h
+++ b/cache.h
@@ -1763,22 +1763,6 @@ extern const char *excludes_file;
 int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
-/* alloc.c */
-#define alloc_blob_node(r) alloc_blob_node_##r()
-extern void *alloc_blob_node_the_repository(void);
-#define alloc_tree_node(r) alloc_tree_node_##r()
-extern void *alloc_tree_node_the_repository(void);
-#define alloc_commit_node(r) alloc_commit_node_##r()
-extern void *alloc_commit_node_the_repository(void);
-#define alloc_tag_node(r) alloc_tag_node_##r()
-extern void *alloc_tag_node_the_repository(void);
-#define alloc_object_node(r) alloc_object_node_##r()
-extern void *alloc_object_node_the_repository(void);
-#define alloc_report(r) alloc_report_##r()
-extern void alloc_report_the_repository(void);
-#define alloc_commit_index(r) alloc_commit_index_##r()
-extern unsigned int alloc_commit_index_the_repository(void);
-
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
diff --git a/commit.c b/commit.c
index a9a43e79bae..612ccf7b053 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
 #include "commit-slab.h"
@@ -296,6 +297,13 @@ void free_commit_buffer(struct commit *commit)
 	}
 }
 
+void release_commit_memory(struct commit *c)
+{
+	free_commit_buffer(c);
+	free_commit_list(c->parents);
+	/* TODO: what about commit->util? */
+}
+
 const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 {
 	struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
diff --git a/commit.h b/commit.h
index 0fb8271665c..2d764ab7d8e 100644
--- a/commit.h
+++ b/commit.h
@@ -99,6 +99,12 @@ void unuse_commit_buffer(const struct commit *, const void *buffer);
  */
 void free_commit_buffer(struct commit *);
 
+/*
+ * Release memory related to a commit, including the parent list and
+ * any cached object buffer.
+ */
+void release_commit_memory(struct commit *c);
+
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dac8908648..cbded673c28 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -15,6 +15,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "tag.h"
+#include "alloc.h"
 #include "unpack-trees.h"
 #include "string-list.h"
 #include "xdiff-interface.h"
diff --git a/object.c b/object.c
index 49b952e9299..9d5b10d5a20 100644
--- a/object.c
+++ b/object.c
@@ -5,6 +5,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "object-store.h"
 #include "packfile.h"
 
@@ -455,6 +456,13 @@ struct parsed_object_pool *parsed_object_pool_new(void)
 {
 	struct parsed_object_pool *o = xmalloc(sizeof(*o));
 	memset(o, 0, sizeof(*o));
+
+	o->blob_state = allocate_alloc_state();
+	o->tree_state = allocate_alloc_state();
+	o->commit_state = allocate_alloc_state();
+	o->tag_state = allocate_alloc_state();
+	o->object_state = allocate_alloc_state();
+
 	return o;
 }
 
@@ -501,9 +509,39 @@ void raw_object_store_clear(struct raw_object_store *o)
 void parsed_object_pool_clear(struct parsed_object_pool *o)
 {
 	/*
-	 * TOOD free objects in o->obj_hash.
-	 *
 	 * As objects are allocated in slabs (see alloc.c), we do
 	 * not need to free each object, but each slab instead.
+	 *
+	 * Before doing so, we need to free any additional memory
+	 * the objects may hold.
 	 */
+	unsigned i;
+
+	for (i = 0; i < o->obj_hash_size; i++) {
+		struct object *obj = o->obj_hash[i];
+
+		if (!obj)
+			continue;
+
+		if (obj->type == OBJ_TREE)
+			free_tree_buffer((struct tree*)obj);
+		else if (obj->type == OBJ_COMMIT)
+			release_commit_memory((struct commit*)obj);
+		else if (obj->type == OBJ_TAG)
+			free_tag_buffer((struct tag*)obj);
+	}
+
+	FREE_AND_NULL(o->obj_hash);
+	o->obj_hash_size = 0;
+
+	clear_alloc_state(o->blob_state);
+	clear_alloc_state(o->tree_state);
+	clear_alloc_state(o->commit_state);
+	clear_alloc_state(o->tag_state);
+	clear_alloc_state(o->object_state);
+	FREE_AND_NULL(o->blob_state);
+	FREE_AND_NULL(o->tree_state);
+	FREE_AND_NULL(o->commit_state);
+	FREE_AND_NULL(o->tag_state);
+	FREE_AND_NULL(o->object_state);
 }
diff --git a/object.h b/object.h
index b41d7a3accb..7916edb4edf 100644
--- a/object.h
+++ b/object.h
@@ -4,6 +4,14 @@
 struct parsed_object_pool {
 	struct object **obj_hash;
 	int nr_objs, obj_hash_size;
+
+	/* TODO: migrate alloc_states to mem-pool? */
+	struct alloc_state *blob_state;
+	struct alloc_state *tree_state;
+	struct alloc_state *commit_state;
+	struct alloc_state *tag_state;
+	struct alloc_state *object_state;
+	unsigned commit_count;
 };
 
 struct parsed_object_pool *parsed_object_pool_new(void);
diff --git a/tag.c b/tag.c
index 02ef4eaafc0..254352c30c6 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 
 const char *tag_type = "tag";
@@ -115,6 +116,11 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
 	return parse_timestamp(dateptr, NULL, 10);
 }
 
+void free_tag_buffer(struct tag *t)
+{
+	free(t->tag);
+}
+
 int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
 {
 	struct object_id oid;
diff --git a/tag.h b/tag.h
index d469534e82a..b241fe67bc5 100644
--- a/tag.h
+++ b/tag.h
@@ -15,6 +15,7 @@ struct tag {
 extern struct tag *lookup_tag(const struct object_id *oid);
 extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
+extern void free_tag_buffer(struct tag *t);
 extern struct object *deref_tag(struct object *, const char *, int);
 extern struct object *deref_tag_noverify(struct object *);
 extern int gpg_verify_tag(const struct object_id *oid,
diff --git a/tree.c b/tree.c
index 58cf19b4fa8..8f8ef3189af 100644
--- a/tree.c
+++ b/tree.c
@@ -5,6 +5,7 @@
 #include "blob.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "tree-walk.h"
 
 const char *tree_type = "tree";
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions
  2018-05-09 19:20           ` Stefan Beller
@ 2018-05-10 15:43             ` Duy Nguyen
  0 siblings, 0 replies; 95+ messages in thread
From: Duy Nguyen @ 2018-05-10 15:43 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jonathan Tan, Git Mailing List, Junio C Hamano, Jameson Miller

On Wed, May 9, 2018 at 9:20 PM, Stefan Beller <sbeller@google.com> wrote:
> On Wed, May 9, 2018 at 10:18 AM, Duy Nguyen <pclouds@gmail.com> wrote:
>>
>> If you want to reproduce, this is what I used to test this with.
>>
>> https://gist.github.com/pclouds/86a2df6c28043f1b6fa3d4e72e7a1276
>
> This only applied cleanly after I created an empty file at
> t/helper/test-abc.c, using git-apply.

Right. I created the patch with "git add -N". I know exactly what the
bug is but I'll need to be careful with renaming a bit before trying
to fix this. Thanks for reminding me.
-- 
Duy

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v4 00/13] object store: alloc
  2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
                         ` (12 preceding siblings ...)
  2018-05-10  0:40       ` [PATCH v4 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-10 17:16       ` Jonathan Tan
  2018-05-10 17:32         ` Stefan Beller
  2018-05-11 19:17         ` [PATCH] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  13 siblings, 2 replies; 95+ messages in thread
From: Jonathan Tan @ 2018-05-10 17:16 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, gitster, jamill, pclouds

On Wed,  9 May 2018 17:40:11 -0700
Stefan Beller <sbeller@google.com> wrote:

>  		if (obj->type == OBJ_TREE)
> -			release_tree_node((struct tree*)obj);
> +			free_tree_buffer((struct tree*)obj);
>  		else if (obj->type == OBJ_COMMIT)
> -			release_commit_node((struct commit*)obj);
> +			release_commit_memory((struct commit*)obj);
>  		else if (obj->type == OBJ_TAG)
> -			release_tag_node((struct tag*)obj);
> +			free_tag_buffer((struct tag*)obj);

This might seem a bit bikesheddy, but I wouldn't call it
free_tag_buffer(), since what's being freed is not the buffer of the
object itself, but just a string. If you want such a function, I would
just call it release_tag_memory() to match release_commit_memory().

Other than that, all the patches look fine to me.

Some optional comments (this is almost certainly bikeshedding):

 - I would call them release_commit() and release_tag(), to match
   strbuf_release().
 - It might be better to just inline the handling of releasing commit
   and tag memory. This code already knows that, for a tree, it needs to
   free its buffer and only its buffer, so it is not much of a stretch
   to think that it similarly knows the details of commit and tag
   objects too.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v4 00/13] object store: alloc
  2018-05-10 17:16       ` [PATCH v4 00/13] object store: alloc Jonathan Tan
@ 2018-05-10 17:32         ` Stefan Beller
  2018-05-10 20:56           ` Jonathan Tan
  2018-05-11 19:17         ` [PATCH] alloc: allow arbitrary repositories for alloc functions Stefan Beller
  1 sibling, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-10 17:32 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Junio C Hamano, Jameson Miller, Duy Nguyen

On Thu, May 10, 2018 at 10:16 AM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Wed,  9 May 2018 17:40:11 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>>               if (obj->type == OBJ_TREE)
>> -                     release_tree_node((struct tree*)obj);
>> +                     free_tree_buffer((struct tree*)obj);
>>               else if (obj->type == OBJ_COMMIT)
>> -                     release_commit_node((struct commit*)obj);
>> +                     release_commit_memory((struct commit*)obj);
>>               else if (obj->type == OBJ_TAG)
>> -                     release_tag_node((struct tag*)obj);
>> +                     free_tag_buffer((struct tag*)obj);
>
> This might seem a bit bikesheddy, but I wouldn't call it
> free_tag_buffer(), since what's being freed is not the buffer of the
> object itself, but just a string. If you want such a function, I would
> just call it release_tag_memory() to match release_commit_memory().
>
> Other than that, all the patches look fine to me.
>
> Some optional comments (this is almost certainly bikeshedding):

Who doesn't love some bikeshedding in late spring?

>
>  - I would call them release_commit() and release_tag(), to match
>    strbuf_release().

Why not commit_release and tag_release to also have the same order
of words as in strbuf_release ?

>  - It might be better to just inline the handling of releasing commit
>    and tag memory. This code already knows that, for a tree, it needs to
>    free its buffer and only its buffer, so it is not much of a stretch
>    to think that it similarly knows the details of commit and tag
>    objects too.

We still call out to free_tree_buffer? Not sure I understand.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v4 00/13] object store: alloc
  2018-05-10 17:32         ` Stefan Beller
@ 2018-05-10 20:56           ` Jonathan Tan
  2018-05-10 22:36             ` Stefan Beller
  0 siblings, 1 reply; 95+ messages in thread
From: Jonathan Tan @ 2018-05-10 20:56 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Junio C Hamano, Jameson Miller, Duy Nguyen

On Thu, 10 May 2018 10:32:09 -0700
Stefan Beller <sbeller@google.com> wrote:

> >  - I would call them release_commit() and release_tag(), to match
> >    strbuf_release().
> 
> Why not commit_release and tag_release to also have the same order
> of words as in strbuf_release ?

At this point in the discussion, either is fine.

> >  - It might be better to just inline the handling of releasing commit
> >    and tag memory. This code already knows that, for a tree, it needs to
> >    free its buffer and only its buffer, so it is not much of a stretch
> >    to think that it similarly knows the details of commit and tag
> >    objects too.
> 
> We still call out to free_tree_buffer? Not sure I understand.

I meant that since we call out to free_tree_buffer (as you said), this
shows that the code knows the internal details of a tree object (in that
it has a buffer, and that needs to be freed, and that is the only thing
that needs to be freed), so maybe the code should operate on the
internal details of commits and tags as well. But again, this is a minor
point.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v4 00/13] object store: alloc
  2018-05-10 20:56           ` Jonathan Tan
@ 2018-05-10 22:36             ` Stefan Beller
  0 siblings, 0 replies; 95+ messages in thread
From: Stefan Beller @ 2018-05-10 22:36 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Junio C Hamano, Jameson Miller, Duy Nguyen

On Thu, May 10, 2018 at 1:56 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Thu, 10 May 2018 10:32:09 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> >  - I would call them release_commit() and release_tag(), to match
>> >    strbuf_release().
>>
>> Why not commit_release and tag_release to also have the same order
>> of words as in strbuf_release ?
>
> At this point in the discussion, either is fine.

ok, please express your opinion in form of a patch. ;)

I do not plan on resending this series unless someone comments
something that they themselves do not consider bikesheddy.

Stefan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH] alloc: allow arbitrary repositories for alloc functions
  2018-05-10 17:16       ` [PATCH v4 00/13] object store: alloc Jonathan Tan
  2018-05-10 17:32         ` Stefan Beller
@ 2018-05-11 19:17         ` Stefan Beller
  2018-05-11 19:38           ` Eric Sunshine
  1 sibling, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-11 19:17 UTC (permalink / raw)
  To: jonathantanmy; +Cc: git, gitster, jamill, pclouds, sbeller

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

> This might seem a bit bikesheddy, but I wouldn't call it
> free_tag_buffer(), since what's being freed is not the buffer of the
> object itself, but just a string. If you want such a function, I would
> just call it release_tag_memory() to match release_commit_memory().

So you would replace the last commit with a patch like this?

Thanks,
Stefan

Notes:
    diff to what is currently queued:
    diff --git c/commit.c w/commit.c
    index 612ccf7b053..f3a5872c393 100644
    --- c/commit.c
    +++ w/commit.c
    @@ -297,11 +297,15 @@ void free_commit_buffer(struct commit *commit)
     	}
     }
    
    -void release_commit_memory(struct commit *c)
    +void relase_commit_memory(struct commit *c)
     {
    +	c->tree = NULL;
    +	c->index = 0;
     	free_commit_buffer(c);
     	free_commit_list(c->parents);
     	/* TODO: what about commit->util? */
    +
    +	c->object.parsed = 0;
     }
    
     const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
    diff --git c/commit.h w/commit.h
    index 2d764ab7d8e..366c151e0cb 100644
    --- c/commit.h
    +++ w/commit.h
    @@ -103,7 +103,7 @@ void free_commit_buffer(struct commit *);
      * Release memory related to a commit, including the parent list and
      * any cached object buffer.
      */
    -void release_commit_memory(struct commit *c);
    +void relase_commit_memory(struct commit *c);
    
     /*
      * Disassociate any cached object buffer from the commit, but do not free it.
    diff --git c/object.c w/object.c
    index 9d5b10d5a20..a7d1fd4a20b 100644
    --- c/object.c
    +++ w/object.c
    @@ -526,9 +526,9 @@ void parsed_object_pool_clear(struct parsed_object_pool *o)
     		if (obj->type == OBJ_TREE)
     			free_tree_buffer((struct tree*)obj);
     		else if (obj->type == OBJ_COMMIT)
    -			release_commit_memory((struct commit*)obj);
    +			relase_commit_memory((struct commit*)obj);
     		else if (obj->type == OBJ_TAG)
    -			free_tag_buffer((struct tag*)obj);
    +			release_tag_memory((struct tag*)obj);
     	}
    
     	FREE_AND_NULL(o->obj_hash);
    diff --git c/tag.c w/tag.c
    index 254352c30c6..7c12426b4ea 100644
    --- c/tag.c
    +++ w/tag.c
    @@ -116,9 +116,12 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
     	return parse_timestamp(dateptr, NULL, 10);
     }
    
    -void free_tag_buffer(struct tag *t)
    +void release_tag_memory(struct tag *t)
     {
     	free(t->tag);
    +	t->tagged = NULL;
    +	t->object.parsed = 0;
    +	t->date = 0;
     }
    
     int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
    diff --git c/tag.h w/tag.h
    index b241fe67bc5..9057d76a506 100644
    --- c/tag.h
    +++ w/tag.h
    @@ -15,7 +15,7 @@ struct tag {
     extern struct tag *lookup_tag(const struct object_id *oid);
     extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
     extern int parse_tag(struct tag *item);
    -extern void free_tag_buffer(struct tag *t);
    +extern void release_tag_memory(struct tag *t);
     extern struct object *deref_tag(struct object *, const char *, int);
     extern struct object *deref_tag_noverify(struct object *);
     extern int gpg_verify_tag(const struct object_id *oid,

 alloc.c           | 65 ++++++++++++++++++++++++++++++-----------------
 alloc.h           | 19 ++++++++++++++
 blame.c           |  1 +
 blob.c            |  1 +
 cache.h           | 16 ------------
 commit.c          | 12 +++++++++
 commit.h          |  6 +++++
 merge-recursive.c |  1 +
 object.c          | 42 ++++++++++++++++++++++++++++--
 object.h          |  8 ++++++
 tag.c             |  9 +++++++
 tag.h             |  1 +
 tree.c            |  1 +
 13 files changed, 140 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..714df633169 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
  */
@@ -15,6 +14,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
@@ -30,8 +30,27 @@ struct alloc_state {
 	int count; /* total number of nodes allocated */
 	int nr;    /* number of nodes left in current allocation */
 	void *p;   /* first free node in current allocation */
+
+	/* bookkeeping of allocations */
+	void **slabs;
+	int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+	return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+	while (s->slab_nr > 0) {
+		s->slab_nr--;
+		free(s->slabs[s->slab_nr]);
+	}
+
+	FREE_AND_NULL(s->slabs);
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
 	void *ret;
@@ -39,60 +58,57 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	if (!s->nr) {
 		s->nr = BLOCKING;
 		s->p = xmalloc(BLOCKING * node_size);
+
+		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+		s->slabs[s->slab_nr++] = s->p;
 	}
 	s->nr--;
 	s->count++;
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+
 	return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+void *alloc_blob_node(struct repository *r)
 {
-	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
 	return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+void *alloc_tree_node(struct repository *r)
 {
-	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
 	return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+void *alloc_tag_node(struct repository *r)
 {
-	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
 	return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+void *alloc_object_node(struct repository *r)
 {
-	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
 	return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-	static unsigned int count;
-	return count++;
+	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+void *alloc_commit_node(struct repository *r)
 {
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(the_repository);
+	c->index = alloc_commit_index(r);
 	return c;
 }
 
@@ -103,9 +119,10 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+    report(#name, r->parsed_objects->name##_state->count, \
+		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
 
-void alloc_report_the_repository(void)
+void alloc_report(struct repository *r)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/alloc.h b/alloc.h
new file mode 100644
index 00000000000..3e4e828db48
--- /dev/null
+++ b/alloc.h
@@ -0,0 +1,19 @@
+#ifndef ALLOC_H
+#define ALLOC_H
+
+struct tree;
+struct commit;
+struct tag;
+
+void *alloc_blob_node(struct repository *r);
+void *alloc_tree_node(struct repository *r);
+void *alloc_commit_node(struct repository *r);
+void *alloc_tag_node(struct repository *r);
+void *alloc_object_node(struct repository *r);
+void alloc_report(struct repository *r);
+unsigned int alloc_commit_index(struct repository *r);
+
+void *allocate_alloc_state(void);
+void clear_alloc_state(struct alloc_state *s);
+
+#endif
diff --git a/blame.c b/blame.c
index ba9b18e7542..3a11f1ce52b 100644
--- a/blame.c
+++ b/blame.c
@@ -6,6 +6,7 @@
 #include "diffcore.h"
 #include "tag.h"
 #include "blame.h"
+#include "alloc.h"
 
 void blame_origin_decref(struct blame_origin *o)
 {
diff --git a/blob.c b/blob.c
index 9e64f301895..458dafa811e 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "repository.h"
+#include "alloc.h"
 
 const char *blob_type = "blob";
 
diff --git a/cache.h b/cache.h
index 0e6c5dd5639..c75559b7d38 100644
--- a/cache.h
+++ b/cache.h
@@ -1763,22 +1763,6 @@ extern const char *excludes_file;
 int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
-/* alloc.c */
-#define alloc_blob_node(r) alloc_blob_node_##r()
-extern void *alloc_blob_node_the_repository(void);
-#define alloc_tree_node(r) alloc_tree_node_##r()
-extern void *alloc_tree_node_the_repository(void);
-#define alloc_commit_node(r) alloc_commit_node_##r()
-extern void *alloc_commit_node_the_repository(void);
-#define alloc_tag_node(r) alloc_tag_node_##r()
-extern void *alloc_tag_node_the_repository(void);
-#define alloc_object_node(r) alloc_object_node_##r()
-extern void *alloc_object_node_the_repository(void);
-#define alloc_report(r) alloc_report_##r()
-extern void alloc_report_the_repository(void);
-#define alloc_commit_index(r) alloc_commit_index_##r()
-extern unsigned int alloc_commit_index_the_repository(void);
-
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
diff --git a/commit.c b/commit.c
index a9a43e79bae..f3a5872c393 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
 #include "commit-slab.h"
@@ -296,6 +297,17 @@ void free_commit_buffer(struct commit *commit)
 	}
 }
 
+void relase_commit_memory(struct commit *c)
+{
+	c->tree = NULL;
+	c->index = 0;
+	free_commit_buffer(c);
+	free_commit_list(c->parents);
+	/* TODO: what about commit->util? */
+
+	c->object.parsed = 0;
+}
+
 const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 {
 	struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
diff --git a/commit.h b/commit.h
index 0fb8271665c..366c151e0cb 100644
--- a/commit.h
+++ b/commit.h
@@ -99,6 +99,12 @@ void unuse_commit_buffer(const struct commit *, const void *buffer);
  */
 void free_commit_buffer(struct commit *);
 
+/*
+ * Release memory related to a commit, including the parent list and
+ * any cached object buffer.
+ */
+void relase_commit_memory(struct commit *c);
+
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dac8908648..cbded673c28 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -15,6 +15,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "tag.h"
+#include "alloc.h"
 #include "unpack-trees.h"
 #include "string-list.h"
 #include "xdiff-interface.h"
diff --git a/object.c b/object.c
index 49b952e9299..a7d1fd4a20b 100644
--- a/object.c
+++ b/object.c
@@ -5,6 +5,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "object-store.h"
 #include "packfile.h"
 
@@ -455,6 +456,13 @@ struct parsed_object_pool *parsed_object_pool_new(void)
 {
 	struct parsed_object_pool *o = xmalloc(sizeof(*o));
 	memset(o, 0, sizeof(*o));
+
+	o->blob_state = allocate_alloc_state();
+	o->tree_state = allocate_alloc_state();
+	o->commit_state = allocate_alloc_state();
+	o->tag_state = allocate_alloc_state();
+	o->object_state = allocate_alloc_state();
+
 	return o;
 }
 
@@ -501,9 +509,39 @@ void raw_object_store_clear(struct raw_object_store *o)
 void parsed_object_pool_clear(struct parsed_object_pool *o)
 {
 	/*
-	 * TOOD free objects in o->obj_hash.
-	 *
 	 * As objects are allocated in slabs (see alloc.c), we do
 	 * not need to free each object, but each slab instead.
+	 *
+	 * Before doing so, we need to free any additional memory
+	 * the objects may hold.
 	 */
+	unsigned i;
+
+	for (i = 0; i < o->obj_hash_size; i++) {
+		struct object *obj = o->obj_hash[i];
+
+		if (!obj)
+			continue;
+
+		if (obj->type == OBJ_TREE)
+			free_tree_buffer((struct tree*)obj);
+		else if (obj->type == OBJ_COMMIT)
+			relase_commit_memory((struct commit*)obj);
+		else if (obj->type == OBJ_TAG)
+			release_tag_memory((struct tag*)obj);
+	}
+
+	FREE_AND_NULL(o->obj_hash);
+	o->obj_hash_size = 0;
+
+	clear_alloc_state(o->blob_state);
+	clear_alloc_state(o->tree_state);
+	clear_alloc_state(o->commit_state);
+	clear_alloc_state(o->tag_state);
+	clear_alloc_state(o->object_state);
+	FREE_AND_NULL(o->blob_state);
+	FREE_AND_NULL(o->tree_state);
+	FREE_AND_NULL(o->commit_state);
+	FREE_AND_NULL(o->tag_state);
+	FREE_AND_NULL(o->object_state);
 }
diff --git a/object.h b/object.h
index b41d7a3accb..7916edb4edf 100644
--- a/object.h
+++ b/object.h
@@ -4,6 +4,14 @@
 struct parsed_object_pool {
 	struct object **obj_hash;
 	int nr_objs, obj_hash_size;
+
+	/* TODO: migrate alloc_states to mem-pool? */
+	struct alloc_state *blob_state;
+	struct alloc_state *tree_state;
+	struct alloc_state *commit_state;
+	struct alloc_state *tag_state;
+	struct alloc_state *object_state;
+	unsigned commit_count;
 };
 
 struct parsed_object_pool *parsed_object_pool_new(void);
diff --git a/tag.c b/tag.c
index 02ef4eaafc0..7c12426b4ea 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 
 const char *tag_type = "tag";
@@ -115,6 +116,14 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
 	return parse_timestamp(dateptr, NULL, 10);
 }
 
+void release_tag_memory(struct tag *t)
+{
+	free(t->tag);
+	t->tagged = NULL;
+	t->object.parsed = 0;
+	t->date = 0;
+}
+
 int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
 {
 	struct object_id oid;
diff --git a/tag.h b/tag.h
index d469534e82a..9057d76a506 100644
--- a/tag.h
+++ b/tag.h
@@ -15,6 +15,7 @@ struct tag {
 extern struct tag *lookup_tag(const struct object_id *oid);
 extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
+extern void release_tag_memory(struct tag *t);
 extern struct object *deref_tag(struct object *, const char *, int);
 extern struct object *deref_tag_noverify(struct object *);
 extern int gpg_verify_tag(const struct object_id *oid,
diff --git a/tree.c b/tree.c
index 58cf19b4fa8..8f8ef3189af 100644
--- a/tree.c
+++ b/tree.c
@@ -5,6 +5,7 @@
 #include "blob.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "tree-walk.h"
 
 const char *tree_type = "tree";
-- 
2.17.0.255.g8bfb7c0704


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH] alloc: allow arbitrary repositories for alloc functions
  2018-05-11 19:17         ` [PATCH] alloc: allow arbitrary repositories for alloc functions Stefan Beller
@ 2018-05-11 19:38           ` Eric Sunshine
  2018-05-15 21:48             ` Stefan Beller
  0 siblings, 1 reply; 95+ messages in thread
From: Eric Sunshine @ 2018-05-11 19:38 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jonathan Tan, Git List, Junio C Hamano, Jameson Miller,
	Nguyễn Thái Ngọc Duy

On Fri, May 11, 2018 at 3:17 PM Stefan Beller <sbeller@google.com> wrote:
> diff --git a/commit.c b/commit.c
> @@ -296,6 +297,17 @@ void free_commit_buffer(struct commit *commit)
> +void relase_commit_memory(struct commit *c)

s/relase/release/

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH] alloc: allow arbitrary repositories for alloc functions
  2018-05-11 19:38           ` Eric Sunshine
@ 2018-05-15 21:48             ` Stefan Beller
  2018-05-16  2:27               ` Junio C Hamano
  0 siblings, 1 reply; 95+ messages in thread
From: Stefan Beller @ 2018-05-15 21:48 UTC (permalink / raw)
  To: sunshine; +Cc: git, gitster, jamill, jonathantanmy, pclouds, sbeller

We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

Notes:

    Eric, I have fixed s/relase/release/


Jonathan,

> This might seem a bit bikesheddy, but I wouldn't call it
> free_tag_buffer(), since what's being freed is not the buffer of the
> object itself, but just a string. If you want such a function, I would
> just call it release_tag_memory() to match release_commit_memory().

So you would replace the last commit with a patch like this?

Thanks,
Stefan

    diff to what is currently queued:

    diff --git c/commit.c w/commit.c
    index 612ccf7b053..5eb4d2f08f8 100644
    --- c/commit.c
    +++ w/commit.c
    @@ -299,9 +299,13 @@ void free_commit_buffer(struct commit *commit)
    
     void release_commit_memory(struct commit *c)
     {
    +	c->tree = NULL;
    +	c->index = 0;
     	free_commit_buffer(c);
     	free_commit_list(c->parents);
     	/* TODO: what about commit->util? */
    +
    +	c->object.parsed = 0;
     }
    
     const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
    diff --git c/object.c w/object.c
    index 9d5b10d5a20..8e29f63bf23 100644
    --- c/object.c
    +++ w/object.c
    @@ -528,7 +528,7 @@ void parsed_object_pool_clear(struct parsed_object_pool *o)
     		else if (obj->type == OBJ_COMMIT)
     			release_commit_memory((struct commit*)obj);
     		else if (obj->type == OBJ_TAG)
    -			free_tag_buffer((struct tag*)obj);
    +			release_tag_memory((struct tag*)obj);
     	}
    
     	FREE_AND_NULL(o->obj_hash);
    diff --git c/tag.c w/tag.c
    index 254352c30c6..7c12426b4ea 100644
    --- c/tag.c
    +++ w/tag.c
    @@ -116,9 +116,12 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
     	return parse_timestamp(dateptr, NULL, 10);
     }
    
    -void free_tag_buffer(struct tag *t)
    +void release_tag_memory(struct tag *t)
     {
     	free(t->tag);
    +	t->tagged = NULL;
    +	t->object.parsed = 0;
    +	t->date = 0;
     }
    
     int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
    diff --git c/tag.h w/tag.h
    index b241fe67bc5..9057d76a506 100644
    --- c/tag.h
    +++ w/tag.h
    @@ -15,7 +15,7 @@ struct tag {
     extern struct tag *lookup_tag(const struct object_id *oid);
     extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
     extern int parse_tag(struct tag *item);
    -extern void free_tag_buffer(struct tag *t);
    +extern void release_tag_memory(struct tag *t);
     extern struct object *deref_tag(struct object *, const char *, int);
     extern struct object *deref_tag_noverify(struct object *);
     extern int gpg_verify_tag(const struct object_id *oid,

 alloc.c           | 65 ++++++++++++++++++++++++++++++-----------------
 alloc.h           | 19 ++++++++++++++
 blame.c           |  1 +
 blob.c            |  1 +
 cache.h           | 16 ------------
 commit.c          | 12 +++++++++
 commit.h          |  6 +++++
 merge-recursive.c |  1 +
 object.c          | 42 ++++++++++++++++++++++++++++--
 object.h          |  8 ++++++
 tag.c             |  9 +++++++
 tag.h             |  1 +
 tree.c            |  1 +
 13 files changed, 140 insertions(+), 42 deletions(-)
 create mode 100644 alloc.h

diff --git a/alloc.c b/alloc.c
index 277dadd221b..714df633169 100644
--- a/alloc.c
+++ b/alloc.c
@@ -4,8 +4,7 @@
  * Copyright (C) 2006 Linus Torvalds
  *
  * The standard malloc/free wastes too much space for objects, partly because
- * it maintains all the allocation infrastructure (which isn't needed, since
- * we never free an object descriptor anyway), but even more because it ends
+ * it maintains all the allocation infrastructure, but even more because it ends
  * up with maximal alignment because it doesn't know what the object alignment
  * for the new allocation is.
  */
@@ -15,6 +14,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
@@ -30,8 +30,27 @@ struct alloc_state {
 	int count; /* total number of nodes allocated */
 	int nr;    /* number of nodes left in current allocation */
 	void *p;   /* first free node in current allocation */
+
+	/* bookkeeping of allocations */
+	void **slabs;
+	int slab_nr, slab_alloc;
 };
 
+void *allocate_alloc_state(void)
+{
+	return xcalloc(1, sizeof(struct alloc_state));
+}
+
+void clear_alloc_state(struct alloc_state *s)
+{
+	while (s->slab_nr > 0) {
+		s->slab_nr--;
+		free(s->slabs[s->slab_nr]);
+	}
+
+	FREE_AND_NULL(s->slabs);
+}
+
 static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 {
 	void *ret;
@@ -39,60 +58,57 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
 	if (!s->nr) {
 		s->nr = BLOCKING;
 		s->p = xmalloc(BLOCKING * node_size);
+
+		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
+		s->slabs[s->slab_nr++] = s->p;
 	}
 	s->nr--;
 	s->count++;
 	ret = s->p;
 	s->p = (char *)s->p + node_size;
 	memset(ret, 0, node_size);
+
 	return ret;
 }
 
-static struct alloc_state blob_state;
-void *alloc_blob_node_the_repository(void)
+void *alloc_blob_node(struct repository *r)
 {
-	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
+	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
 	b->object.type = OBJ_BLOB;
 	return b;
 }
 
-static struct alloc_state tree_state;
-void *alloc_tree_node_the_repository(void)
+void *alloc_tree_node(struct repository *r)
 {
-	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
+	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
 	t->object.type = OBJ_TREE;
 	return t;
 }
 
-static struct alloc_state tag_state;
-void *alloc_tag_node_the_repository(void)
+void *alloc_tag_node(struct repository *r)
 {
-	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
+	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
 	t->object.type = OBJ_TAG;
 	return t;
 }
 
-static struct alloc_state object_state;
-void *alloc_object_node_the_repository(void)
+void *alloc_object_node(struct repository *r)
 {
-	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
+	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
 	obj->type = OBJ_NONE;
 	return obj;
 }
 
-static struct alloc_state commit_state;
-
-unsigned int alloc_commit_index_the_repository(void)
+unsigned int alloc_commit_index(struct repository *r)
 {
-	static unsigned int count;
-	return count++;
+	return r->parsed_objects->commit_count++;
 }
 
-void *alloc_commit_node_the_repository(void)
+void *alloc_commit_node(struct repository *r)
 {
-	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
+	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
-	c->index = alloc_commit_index(the_repository);
+	c->index = alloc_commit_index(r);
 	return c;
 }
 
@@ -103,9 +119,10 @@ static void report(const char *name, unsigned int count, size_t size)
 }
 
 #define REPORT(name, type)	\
-    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
+    report(#name, r->parsed_objects->name##_state->count, \
+		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
 
-void alloc_report_the_repository(void)
+void alloc_report(struct repository *r)
 {
 	REPORT(blob, struct blob);
 	REPORT(tree, struct tree);
diff --git a/alloc.h b/alloc.h
new file mode 100644
index 00000000000..3e4e828db48
--- /dev/null
+++ b/alloc.h
@@ -0,0 +1,19 @@
+#ifndef ALLOC_H
+#define ALLOC_H
+
+struct tree;
+struct commit;
+struct tag;
+
+void *alloc_blob_node(struct repository *r);
+void *alloc_tree_node(struct repository *r);
+void *alloc_commit_node(struct repository *r);
+void *alloc_tag_node(struct repository *r);
+void *alloc_object_node(struct repository *r);
+void alloc_report(struct repository *r);
+unsigned int alloc_commit_index(struct repository *r);
+
+void *allocate_alloc_state(void);
+void clear_alloc_state(struct alloc_state *s);
+
+#endif
diff --git a/blame.c b/blame.c
index ba9b18e7542..3a11f1ce52b 100644
--- a/blame.c
+++ b/blame.c
@@ -6,6 +6,7 @@
 #include "diffcore.h"
 #include "tag.h"
 #include "blame.h"
+#include "alloc.h"
 
 void blame_origin_decref(struct blame_origin *o)
 {
diff --git a/blob.c b/blob.c
index 9e64f301895..458dafa811e 100644
--- a/blob.c
+++ b/blob.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "repository.h"
+#include "alloc.h"
 
 const char *blob_type = "blob";
 
diff --git a/cache.h b/cache.h
index 0e6c5dd5639..c75559b7d38 100644
--- a/cache.h
+++ b/cache.h
@@ -1763,22 +1763,6 @@ extern const char *excludes_file;
 int decode_85(char *dst, const char *line, int linelen);
 void encode_85(char *buf, const unsigned char *data, int bytes);
 
-/* alloc.c */
-#define alloc_blob_node(r) alloc_blob_node_##r()
-extern void *alloc_blob_node_the_repository(void);
-#define alloc_tree_node(r) alloc_tree_node_##r()
-extern void *alloc_tree_node_the_repository(void);
-#define alloc_commit_node(r) alloc_commit_node_##r()
-extern void *alloc_commit_node_the_repository(void);
-#define alloc_tag_node(r) alloc_tag_node_##r()
-extern void *alloc_tag_node_the_repository(void);
-#define alloc_object_node(r) alloc_object_node_##r()
-extern void *alloc_object_node_the_repository(void);
-#define alloc_report(r) alloc_report_##r()
-extern void alloc_report_the_repository(void);
-#define alloc_commit_index(r) alloc_commit_index_##r()
-extern unsigned int alloc_commit_index_the_repository(void);
-
 /* pkt-line.c */
 void packet_trace_identity(const char *prog);
 
diff --git a/commit.c b/commit.c
index a9a43e79bae..5eb4d2f08f8 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
 #include "commit-slab.h"
@@ -296,6 +297,17 @@ void free_commit_buffer(struct commit *commit)
 	}
 }
 
+void release_commit_memory(struct commit *c)
+{
+	c->tree = NULL;
+	c->index = 0;
+	free_commit_buffer(c);
+	free_commit_list(c->parents);
+	/* TODO: what about commit->util? */
+
+	c->object.parsed = 0;
+}
+
 const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 {
 	struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
diff --git a/commit.h b/commit.h
index 0fb8271665c..2d764ab7d8e 100644
--- a/commit.h
+++ b/commit.h
@@ -99,6 +99,12 @@ void unuse_commit_buffer(const struct commit *, const void *buffer);
  */
 void free_commit_buffer(struct commit *);
 
+/*
+ * Release memory related to a commit, including the parent list and
+ * any cached object buffer.
+ */
+void release_commit_memory(struct commit *c);
+
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
  * The buffer (or NULL, if none) is returned.
diff --git a/merge-recursive.c b/merge-recursive.c
index 6dac8908648..cbded673c28 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -15,6 +15,7 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "tag.h"
+#include "alloc.h"
 #include "unpack-trees.h"
 #include "string-list.h"
 #include "xdiff-interface.h"
diff --git a/object.c b/object.c
index 49b952e9299..8e29f63bf23 100644
--- a/object.c
+++ b/object.c
@@ -5,6 +5,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "object-store.h"
 #include "packfile.h"
 
@@ -455,6 +456,13 @@ struct parsed_object_pool *parsed_object_pool_new(void)
 {
 	struct parsed_object_pool *o = xmalloc(sizeof(*o));
 	memset(o, 0, sizeof(*o));
+
+	o->blob_state = allocate_alloc_state();
+	o->tree_state = allocate_alloc_state();
+	o->commit_state = allocate_alloc_state();
+	o->tag_state = allocate_alloc_state();
+	o->object_state = allocate_alloc_state();
+
 	return o;
 }
 
@@ -501,9 +509,39 @@ void raw_object_store_clear(struct raw_object_store *o)
 void parsed_object_pool_clear(struct parsed_object_pool *o)
 {
 	/*
-	 * TOOD free objects in o->obj_hash.
-	 *
 	 * As objects are allocated in slabs (see alloc.c), we do
 	 * not need to free each object, but each slab instead.
+	 *
+	 * Before doing so, we need to free any additional memory
+	 * the objects may hold.
 	 */
+	unsigned i;
+
+	for (i = 0; i < o->obj_hash_size; i++) {
+		struct object *obj = o->obj_hash[i];
+
+		if (!obj)
+			continue;
+
+		if (obj->type == OBJ_TREE)
+			free_tree_buffer((struct tree*)obj);
+		else if (obj->type == OBJ_COMMIT)
+			release_commit_memory((struct commit*)obj);
+		else if (obj->type == OBJ_TAG)
+			release_tag_memory((struct tag*)obj);
+	}
+
+	FREE_AND_NULL(o->obj_hash);
+	o->obj_hash_size = 0;
+
+	clear_alloc_state(o->blob_state);
+	clear_alloc_state(o->tree_state);
+	clear_alloc_state(o->commit_state);
+	clear_alloc_state(o->tag_state);
+	clear_alloc_state(o->object_state);
+	FREE_AND_NULL(o->blob_state);
+	FREE_AND_NULL(o->tree_state);
+	FREE_AND_NULL(o->commit_state);
+	FREE_AND_NULL(o->tag_state);
+	FREE_AND_NULL(o->object_state);
 }
diff --git a/object.h b/object.h
index b41d7a3accb..7916edb4edf 100644
--- a/object.h
+++ b/object.h
@@ -4,6 +4,14 @@
 struct parsed_object_pool {
 	struct object **obj_hash;
 	int nr_objs, obj_hash_size;
+
+	/* TODO: migrate alloc_states to mem-pool? */
+	struct alloc_state *blob_state;
+	struct alloc_state *tree_state;
+	struct alloc_state *commit_state;
+	struct alloc_state *tag_state;
+	struct alloc_state *object_state;
+	unsigned commit_count;
 };
 
 struct parsed_object_pool *parsed_object_pool_new(void);
diff --git a/tag.c b/tag.c
index 02ef4eaafc0..7c12426b4ea 100644
--- a/tag.c
+++ b/tag.c
@@ -3,6 +3,7 @@
 #include "commit.h"
 #include "tree.h"
 #include "blob.h"
+#include "alloc.h"
 #include "gpg-interface.h"
 
 const char *tag_type = "tag";
@@ -115,6 +116,14 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
 	return parse_timestamp(dateptr, NULL, 10);
 }
 
+void release_tag_memory(struct tag *t)
+{
+	free(t->tag);
+	t->tagged = NULL;
+	t->object.parsed = 0;
+	t->date = 0;
+}
+
 int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
 {
 	struct object_id oid;
diff --git a/tag.h b/tag.h
index d469534e82a..9057d76a506 100644
--- a/tag.h
+++ b/tag.h
@@ -15,6 +15,7 @@ struct tag {
 extern struct tag *lookup_tag(const struct object_id *oid);
 extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
 extern int parse_tag(struct tag *item);
+extern void release_tag_memory(struct tag *t);
 extern struct object *deref_tag(struct object *, const char *, int);
 extern struct object *deref_tag_noverify(struct object *);
 extern int gpg_verify_tag(const struct object_id *oid,
diff --git a/tree.c b/tree.c
index 58cf19b4fa8..8f8ef3189af 100644
--- a/tree.c
+++ b/tree.c
@@ -5,6 +5,7 @@
 #include "blob.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 #include "tree-walk.h"
 
 const char *tree_type = "tree";
-- 
2.17.0.582.gccdcbd54c44.dirty


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH] alloc: allow arbitrary repositories for alloc functions
  2018-05-15 21:48             ` Stefan Beller
@ 2018-05-16  2:27               ` Junio C Hamano
  0 siblings, 0 replies; 95+ messages in thread
From: Junio C Hamano @ 2018-05-16  2:27 UTC (permalink / raw)
  To: Stefan Beller; +Cc: sunshine, git, jamill, jonathantanmy, pclouds

Stefan Beller <sbeller@google.com> writes:

> We have to convert all of the alloc functions at once, because alloc_report
> uses a funky macro for reporting. It is better for the sake of mechanical
> conversion to convert multiple functions at once rather than changing the
> structure of the reporting function.
>
> We record all memory allocation in alloc.c, and free them in
> clear_alloc_state, which is called for all repositories except
> the_repository.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
> Notes:
>
>     Eric, I have fixed s/relase/release/
>
>
> Jonathan,
>
>> This might seem a bit bikesheddy, but I wouldn't call it
>> free_tag_buffer(), since what's being freed is not the buffer of the
>> object itself, but just a string. If you want such a function, I would
>> just call it release_tag_memory() to match release_commit_memory().
>
> So you would replace the last commit with a patch like this?

Quite honestly, I do not see much difference either way between free
and release.  None of the resetting the .parsed bit, .index to 0,
and tagged to NULL is about releasing or freeing resources.  These
operations are about de-initializing the object, which may well
involve releasing resources directly associated with the object
itself, but even more.  The release_commit_memory() function may
want to do something to various commit slabs that refer back to the
commit, for example.

But as I said, I do not deeply care either way ;-)

Thanks, the patch looks sensible.

>
> Thanks,
> Stefan
>
>     diff to what is currently queued:
>
>     diff --git c/commit.c w/commit.c
>     index 612ccf7b053..5eb4d2f08f8 100644
>     --- c/commit.c
>     +++ w/commit.c
>     @@ -299,9 +299,13 @@ void free_commit_buffer(struct commit *commit)
>     
>      void release_commit_memory(struct commit *c)
>      {
>     +	c->tree = NULL;
>     +	c->index = 0;
>      	free_commit_buffer(c);
>      	free_commit_list(c->parents);
>      	/* TODO: what about commit->util? */
>     +
>     +	c->object.parsed = 0;
>      }
>     
>      const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
>     diff --git c/object.c w/object.c
>     index 9d5b10d5a20..8e29f63bf23 100644
>     --- c/object.c
>     +++ w/object.c
>     @@ -528,7 +528,7 @@ void parsed_object_pool_clear(struct parsed_object_pool *o)
>      		else if (obj->type == OBJ_COMMIT)
>      			release_commit_memory((struct commit*)obj);
>      		else if (obj->type == OBJ_TAG)
>     -			free_tag_buffer((struct tag*)obj);
>     +			release_tag_memory((struct tag*)obj);
>      	}
>     
>      	FREE_AND_NULL(o->obj_hash);
>     diff --git c/tag.c w/tag.c
>     index 254352c30c6..7c12426b4ea 100644
>     --- c/tag.c
>     +++ w/tag.c
>     @@ -116,9 +116,12 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
>      	return parse_timestamp(dateptr, NULL, 10);
>      }
>     
>     -void free_tag_buffer(struct tag *t)
>     +void release_tag_memory(struct tag *t)
>      {
>      	free(t->tag);
>     +	t->tagged = NULL;
>     +	t->object.parsed = 0;
>     +	t->date = 0;
>      }
>     
>      int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
>     diff --git c/tag.h w/tag.h
>     index b241fe67bc5..9057d76a506 100644
>     --- c/tag.h
>     +++ w/tag.h
>     @@ -15,7 +15,7 @@ struct tag {
>      extern struct tag *lookup_tag(const struct object_id *oid);
>      extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
>      extern int parse_tag(struct tag *item);
>     -extern void free_tag_buffer(struct tag *t);
>     +extern void release_tag_memory(struct tag *t);
>      extern struct object *deref_tag(struct object *, const char *, int);
>      extern struct object *deref_tag_noverify(struct object *);
>      extern int gpg_verify_tag(const struct object_id *oid,
>
>  alloc.c           | 65 ++++++++++++++++++++++++++++++-----------------
>  alloc.h           | 19 ++++++++++++++
>  blame.c           |  1 +
>  blob.c            |  1 +
>  cache.h           | 16 ------------
>  commit.c          | 12 +++++++++
>  commit.h          |  6 +++++
>  merge-recursive.c |  1 +
>  object.c          | 42 ++++++++++++++++++++++++++++--
>  object.h          |  8 ++++++
>  tag.c             |  9 +++++++
>  tag.h             |  1 +
>  tree.c            |  1 +
>  13 files changed, 140 insertions(+), 42 deletions(-)
>  create mode 100644 alloc.h
>
> diff --git a/alloc.c b/alloc.c
> index 277dadd221b..714df633169 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -4,8 +4,7 @@
>   * Copyright (C) 2006 Linus Torvalds
>   *
>   * The standard malloc/free wastes too much space for objects, partly because
> - * it maintains all the allocation infrastructure (which isn't needed, since
> - * we never free an object descriptor anyway), but even more because it ends
> + * it maintains all the allocation infrastructure, but even more because it ends
>   * up with maximal alignment because it doesn't know what the object alignment
>   * for the new allocation is.
>   */
> @@ -15,6 +14,7 @@
>  #include "tree.h"
>  #include "commit.h"
>  #include "tag.h"
> +#include "alloc.h"
>  
>  #define BLOCKING 1024
>  
> @@ -30,8 +30,27 @@ struct alloc_state {
>  	int count; /* total number of nodes allocated */
>  	int nr;    /* number of nodes left in current allocation */
>  	void *p;   /* first free node in current allocation */
> +
> +	/* bookkeeping of allocations */
> +	void **slabs;
> +	int slab_nr, slab_alloc;
>  };
>  
> +void *allocate_alloc_state(void)
> +{
> +	return xcalloc(1, sizeof(struct alloc_state));
> +}
> +
> +void clear_alloc_state(struct alloc_state *s)
> +{
> +	while (s->slab_nr > 0) {
> +		s->slab_nr--;
> +		free(s->slabs[s->slab_nr]);
> +	}
> +
> +	FREE_AND_NULL(s->slabs);
> +}
> +
>  static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  {
>  	void *ret;
> @@ -39,60 +58,57 @@ static inline void *alloc_node(struct alloc_state *s, size_t node_size)
>  	if (!s->nr) {
>  		s->nr = BLOCKING;
>  		s->p = xmalloc(BLOCKING * node_size);
> +
> +		ALLOC_GROW(s->slabs, s->slab_nr + 1, s->slab_alloc);
> +		s->slabs[s->slab_nr++] = s->p;
>  	}
>  	s->nr--;
>  	s->count++;
>  	ret = s->p;
>  	s->p = (char *)s->p + node_size;
>  	memset(ret, 0, node_size);
> +
>  	return ret;
>  }
>  
> -static struct alloc_state blob_state;
> -void *alloc_blob_node_the_repository(void)
> +void *alloc_blob_node(struct repository *r)
>  {
> -	struct blob *b = alloc_node(&blob_state, sizeof(struct blob));
> +	struct blob *b = alloc_node(r->parsed_objects->blob_state, sizeof(struct blob));
>  	b->object.type = OBJ_BLOB;
>  	return b;
>  }
>  
> -static struct alloc_state tree_state;
> -void *alloc_tree_node_the_repository(void)
> +void *alloc_tree_node(struct repository *r)
>  {
> -	struct tree *t = alloc_node(&tree_state, sizeof(struct tree));
> +	struct tree *t = alloc_node(r->parsed_objects->tree_state, sizeof(struct tree));
>  	t->object.type = OBJ_TREE;
>  	return t;
>  }
>  
> -static struct alloc_state tag_state;
> -void *alloc_tag_node_the_repository(void)
> +void *alloc_tag_node(struct repository *r)
>  {
> -	struct tag *t = alloc_node(&tag_state, sizeof(struct tag));
> +	struct tag *t = alloc_node(r->parsed_objects->tag_state, sizeof(struct tag));
>  	t->object.type = OBJ_TAG;
>  	return t;
>  }
>  
> -static struct alloc_state object_state;
> -void *alloc_object_node_the_repository(void)
> +void *alloc_object_node(struct repository *r)
>  {
> -	struct object *obj = alloc_node(&object_state, sizeof(union any_object));
> +	struct object *obj = alloc_node(r->parsed_objects->object_state, sizeof(union any_object));
>  	obj->type = OBJ_NONE;
>  	return obj;
>  }
>  
> -static struct alloc_state commit_state;
> -
> -unsigned int alloc_commit_index_the_repository(void)
> +unsigned int alloc_commit_index(struct repository *r)
>  {
> -	static unsigned int count;
> -	return count++;
> +	return r->parsed_objects->commit_count++;
>  }
>  
> -void *alloc_commit_node_the_repository(void)
> +void *alloc_commit_node(struct repository *r)
>  {
> -	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
> +	struct commit *c = alloc_node(r->parsed_objects->commit_state, sizeof(struct commit));
>  	c->object.type = OBJ_COMMIT;
> -	c->index = alloc_commit_index(the_repository);
> +	c->index = alloc_commit_index(r);
>  	return c;
>  }
>  
> @@ -103,9 +119,10 @@ static void report(const char *name, unsigned int count, size_t size)
>  }
>  
>  #define REPORT(name, type)	\
> -    report(#name, name##_state.count, name##_state.count * sizeof(type) >> 10)
> +    report(#name, r->parsed_objects->name##_state->count, \
> +		  r->parsed_objects->name##_state->count * sizeof(type) >> 10)
>  
> -void alloc_report_the_repository(void)
> +void alloc_report(struct repository *r)
>  {
>  	REPORT(blob, struct blob);
>  	REPORT(tree, struct tree);
> diff --git a/alloc.h b/alloc.h
> new file mode 100644
> index 00000000000..3e4e828db48
> --- /dev/null
> +++ b/alloc.h
> @@ -0,0 +1,19 @@
> +#ifndef ALLOC_H
> +#define ALLOC_H
> +
> +struct tree;
> +struct commit;
> +struct tag;
> +
> +void *alloc_blob_node(struct repository *r);
> +void *alloc_tree_node(struct repository *r);
> +void *alloc_commit_node(struct repository *r);
> +void *alloc_tag_node(struct repository *r);
> +void *alloc_object_node(struct repository *r);
> +void alloc_report(struct repository *r);
> +unsigned int alloc_commit_index(struct repository *r);
> +
> +void *allocate_alloc_state(void);
> +void clear_alloc_state(struct alloc_state *s);
> +
> +#endif
> diff --git a/blame.c b/blame.c
> index ba9b18e7542..3a11f1ce52b 100644
> --- a/blame.c
> +++ b/blame.c
> @@ -6,6 +6,7 @@
>  #include "diffcore.h"
>  #include "tag.h"
>  #include "blame.h"
> +#include "alloc.h"
>  
>  void blame_origin_decref(struct blame_origin *o)
>  {
> diff --git a/blob.c b/blob.c
> index 9e64f301895..458dafa811e 100644
> --- a/blob.c
> +++ b/blob.c
> @@ -1,6 +1,7 @@
>  #include "cache.h"
>  #include "blob.h"
>  #include "repository.h"
> +#include "alloc.h"
>  
>  const char *blob_type = "blob";
>  
> diff --git a/cache.h b/cache.h
> index 0e6c5dd5639..c75559b7d38 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1763,22 +1763,6 @@ extern const char *excludes_file;
>  int decode_85(char *dst, const char *line, int linelen);
>  void encode_85(char *buf, const unsigned char *data, int bytes);
>  
> -/* alloc.c */
> -#define alloc_blob_node(r) alloc_blob_node_##r()
> -extern void *alloc_blob_node_the_repository(void);
> -#define alloc_tree_node(r) alloc_tree_node_##r()
> -extern void *alloc_tree_node_the_repository(void);
> -#define alloc_commit_node(r) alloc_commit_node_##r()
> -extern void *alloc_commit_node_the_repository(void);
> -#define alloc_tag_node(r) alloc_tag_node_##r()
> -extern void *alloc_tag_node_the_repository(void);
> -#define alloc_object_node(r) alloc_object_node_##r()
> -extern void *alloc_object_node_the_repository(void);
> -#define alloc_report(r) alloc_report_##r()
> -extern void alloc_report_the_repository(void);
> -#define alloc_commit_index(r) alloc_commit_index_##r()
> -extern unsigned int alloc_commit_index_the_repository(void);
> -
>  /* pkt-line.c */
>  void packet_trace_identity(const char *prog);
>  
> diff --git a/commit.c b/commit.c
> index a9a43e79bae..5eb4d2f08f8 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -6,6 +6,7 @@
>  #include "diff.h"
>  #include "revision.h"
>  #include "notes.h"
> +#include "alloc.h"
>  #include "gpg-interface.h"
>  #include "mergesort.h"
>  #include "commit-slab.h"
> @@ -296,6 +297,17 @@ void free_commit_buffer(struct commit *commit)
>  	}
>  }
>  
> +void release_commit_memory(struct commit *c)
> +{
> +	c->tree = NULL;
> +	c->index = 0;
> +	free_commit_buffer(c);
> +	free_commit_list(c->parents);
> +	/* TODO: what about commit->util? */
> +
> +	c->object.parsed = 0;
> +}
> +
>  const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
>  {
>  	struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
> diff --git a/commit.h b/commit.h
> index 0fb8271665c..2d764ab7d8e 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -99,6 +99,12 @@ void unuse_commit_buffer(const struct commit *, const void *buffer);
>   */
>  void free_commit_buffer(struct commit *);
>  
> +/*
> + * Release memory related to a commit, including the parent list and
> + * any cached object buffer.
> + */
> +void release_commit_memory(struct commit *c);
> +
>  /*
>   * Disassociate any cached object buffer from the commit, but do not free it.
>   * The buffer (or NULL, if none) is returned.
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 6dac8908648..cbded673c28 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -15,6 +15,7 @@
>  #include "diff.h"
>  #include "diffcore.h"
>  #include "tag.h"
> +#include "alloc.h"
>  #include "unpack-trees.h"
>  #include "string-list.h"
>  #include "xdiff-interface.h"
> diff --git a/object.c b/object.c
> index 49b952e9299..8e29f63bf23 100644
> --- a/object.c
> +++ b/object.c
> @@ -5,6 +5,7 @@
>  #include "tree.h"
>  #include "commit.h"
>  #include "tag.h"
> +#include "alloc.h"
>  #include "object-store.h"
>  #include "packfile.h"
>  
> @@ -455,6 +456,13 @@ struct parsed_object_pool *parsed_object_pool_new(void)
>  {
>  	struct parsed_object_pool *o = xmalloc(sizeof(*o));
>  	memset(o, 0, sizeof(*o));
> +
> +	o->blob_state = allocate_alloc_state();
> +	o->tree_state = allocate_alloc_state();
> +	o->commit_state = allocate_alloc_state();
> +	o->tag_state = allocate_alloc_state();
> +	o->object_state = allocate_alloc_state();
> +
>  	return o;
>  }
>  
> @@ -501,9 +509,39 @@ void raw_object_store_clear(struct raw_object_store *o)
>  void parsed_object_pool_clear(struct parsed_object_pool *o)
>  {
>  	/*
> -	 * TOOD free objects in o->obj_hash.
> -	 *
>  	 * As objects are allocated in slabs (see alloc.c), we do
>  	 * not need to free each object, but each slab instead.
> +	 *
> +	 * Before doing so, we need to free any additional memory
> +	 * the objects may hold.
>  	 */
> +	unsigned i;
> +
> +	for (i = 0; i < o->obj_hash_size; i++) {
> +		struct object *obj = o->obj_hash[i];
> +
> +		if (!obj)
> +			continue;
> +
> +		if (obj->type == OBJ_TREE)
> +			free_tree_buffer((struct tree*)obj);
> +		else if (obj->type == OBJ_COMMIT)
> +			release_commit_memory((struct commit*)obj);
> +		else if (obj->type == OBJ_TAG)
> +			release_tag_memory((struct tag*)obj);
> +	}
> +
> +	FREE_AND_NULL(o->obj_hash);
> +	o->obj_hash_size = 0;
> +
> +	clear_alloc_state(o->blob_state);
> +	clear_alloc_state(o->tree_state);
> +	clear_alloc_state(o->commit_state);
> +	clear_alloc_state(o->tag_state);
> +	clear_alloc_state(o->object_state);
> +	FREE_AND_NULL(o->blob_state);
> +	FREE_AND_NULL(o->tree_state);
> +	FREE_AND_NULL(o->commit_state);
> +	FREE_AND_NULL(o->tag_state);
> +	FREE_AND_NULL(o->object_state);
>  }
> diff --git a/object.h b/object.h
> index b41d7a3accb..7916edb4edf 100644
> --- a/object.h
> +++ b/object.h
> @@ -4,6 +4,14 @@
>  struct parsed_object_pool {
>  	struct object **obj_hash;
>  	int nr_objs, obj_hash_size;
> +
> +	/* TODO: migrate alloc_states to mem-pool? */
> +	struct alloc_state *blob_state;
> +	struct alloc_state *tree_state;
> +	struct alloc_state *commit_state;
> +	struct alloc_state *tag_state;
> +	struct alloc_state *object_state;
> +	unsigned commit_count;
>  };
>  
>  struct parsed_object_pool *parsed_object_pool_new(void);
> diff --git a/tag.c b/tag.c
> index 02ef4eaafc0..7c12426b4ea 100644
> --- a/tag.c
> +++ b/tag.c
> @@ -3,6 +3,7 @@
>  #include "commit.h"
>  #include "tree.h"
>  #include "blob.h"
> +#include "alloc.h"
>  #include "gpg-interface.h"
>  
>  const char *tag_type = "tag";
> @@ -115,6 +116,14 @@ static timestamp_t parse_tag_date(const char *buf, const char *tail)
>  	return parse_timestamp(dateptr, NULL, 10);
>  }
>  
> +void release_tag_memory(struct tag *t)
> +{
> +	free(t->tag);
> +	t->tagged = NULL;
> +	t->object.parsed = 0;
> +	t->date = 0;
> +}
> +
>  int parse_tag_buffer(struct tag *item, const void *data, unsigned long size)
>  {
>  	struct object_id oid;
> diff --git a/tag.h b/tag.h
> index d469534e82a..9057d76a506 100644
> --- a/tag.h
> +++ b/tag.h
> @@ -15,6 +15,7 @@ struct tag {
>  extern struct tag *lookup_tag(const struct object_id *oid);
>  extern int parse_tag_buffer(struct tag *item, const void *data, unsigned long size);
>  extern int parse_tag(struct tag *item);
> +extern void release_tag_memory(struct tag *t);
>  extern struct object *deref_tag(struct object *, const char *, int);
>  extern struct object *deref_tag_noverify(struct object *);
>  extern int gpg_verify_tag(const struct object_id *oid,
> diff --git a/tree.c b/tree.c
> index 58cf19b4fa8..8f8ef3189af 100644
> --- a/tree.c
> +++ b/tree.c
> @@ -5,6 +5,7 @@
>  #include "blob.h"
>  #include "commit.h"
>  #include "tag.h"
> +#include "alloc.h"
>  #include "tree-walk.h"
>  
>  const char *tree_type = "tree";

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2018-05-16  2:27 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-01 21:33 [PATCH 00/13] object store: alloc Stefan Beller
2018-05-01 21:33 ` [PATCH 01/13] repository: introduce object parser field Stefan Beller
2018-05-02 17:17   ` Duy Nguyen
2018-05-02 17:26     ` Stefan Beller
2018-05-02 17:58       ` Duy Nguyen
2018-05-02 20:30   ` Jonathan Tan
2018-05-01 21:33 ` [PATCH 02/13] object: add repository argument to create_object Stefan Beller
2018-05-01 21:43   ` Eric Sunshine
2018-05-01 21:33 ` [PATCH 03/13] object: add repository argument to grow_object_hash Stefan Beller
2018-05-01 21:33 ` [PATCH 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
2018-05-02 20:34   ` Jonathan Tan
2018-05-01 21:33 ` [PATCH 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
2018-05-01 21:33 ` [PATCH 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
2018-05-01 21:33 ` [PATCH 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
2018-05-01 21:33 ` [PATCH 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
2018-05-01 21:33 ` [PATCH 09/13] alloc: add repository argument to alloc_report Stefan Beller
2018-05-01 21:34 ` [PATCH 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
2018-05-01 21:34 ` [PATCH 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
2018-05-01 21:34 ` [PATCH 12/13] object: allow create_object " Stefan Beller
2018-05-02 20:36   ` Jonathan Tan
2018-05-01 21:34 ` [PATCH 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
2018-05-02 17:44   ` Duy Nguyen
2018-05-03 17:24     ` Stefan Beller
2018-05-03 17:35       ` Duy Nguyen
2018-05-02 20:50   ` Jonathan Tan
2018-05-03 17:25     ` Stefan Beller
2018-05-03 14:58   ` Duy Nguyen
2018-05-02 17:01 ` [PATCH 00/13] object store: alloc Duy Nguyen
2018-05-02 18:07   ` Jameson Miller
2018-05-02 18:22     ` Duy Nguyen
2018-05-02 18:44       ` Jameson Miller
2018-05-03 22:45       ` Stefan Beller
2018-05-07 14:05 ` Junio C Hamano
2018-05-07 20:53   ` Stefan Beller
2018-05-07 22:59 ` [PATCH v2 " Stefan Beller
2018-05-07 22:59   ` [PATCH v2 01/13] repository: introduce parsed objects field Stefan Beller
2018-05-08 17:23     ` Jonathan Tan
2018-05-07 22:59   ` [PATCH v2 02/13] object: add repository argument to create_object Stefan Beller
2018-05-07 22:59   ` [PATCH v2 03/13] object: add repository argument to grow_object_hash Stefan Beller
2018-05-07 22:59   ` [PATCH v2 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
2018-05-07 22:59   ` [PATCH v2 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
2018-05-07 22:59   ` [PATCH v2 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
2018-05-07 22:59   ` [PATCH v2 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
2018-05-07 22:59   ` [PATCH v2 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
2018-05-07 22:59   ` [PATCH v2 09/13] alloc: add repository argument to alloc_report Stefan Beller
2018-05-07 22:59   ` [PATCH v2 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
2018-05-07 22:59   ` [PATCH v2 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
2018-05-07 22:59   ` [PATCH v2 12/13] object: allow create_object " Stefan Beller
2018-05-07 22:59   ` [PATCH v2 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
2018-05-08 10:10     ` Jeff King
2018-05-08 15:00     ` Duy Nguyen
2018-05-08 18:38       ` Stefan Beller
2018-05-08 17:45     ` Jonathan Tan
2018-05-08 19:37   ` [PATCH v3 00/13] object store: alloc Stefan Beller
2018-05-08 19:37     ` [PATCH v3 01/13] repository: introduce parsed objects field Stefan Beller
2018-05-08 19:37     ` [PATCH v3 02/13] object: add repository argument to create_object Stefan Beller
2018-05-08 19:37     ` [PATCH v3 03/13] object: add repository argument to grow_object_hash Stefan Beller
2018-05-08 19:37     ` [PATCH v3 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
2018-05-08 19:37     ` [PATCH v3 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
2018-05-08 19:37     ` [PATCH v3 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
2018-05-08 19:37     ` [PATCH v3 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
2018-05-08 19:37     ` [PATCH v3 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
2018-05-08 19:37     ` [PATCH v3 09/13] alloc: add repository argument to alloc_report Stefan Beller
2018-05-08 19:37     ` [PATCH v3 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
2018-05-08 19:37     ` [PATCH v3 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
2018-05-08 19:37     ` [PATCH v3 12/13] object: allow create_object " Stefan Beller
2018-05-08 19:37     ` [PATCH v3 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
2018-05-08 20:04       ` Jonathan Tan
2018-05-08 20:37         ` Stefan Beller
2018-05-09 15:54           ` Duy Nguyen
2018-05-09 17:18         ` Duy Nguyen
2018-05-09 19:20           ` Stefan Beller
2018-05-10 15:43             ` Duy Nguyen
2018-05-10  0:40     ` [PATCH v4 00/13] object store: alloc Stefan Beller
2018-05-10  0:40       ` [PATCH v4 01/13] repository: introduce parsed objects field Stefan Beller
2018-05-10  0:40       ` [PATCH v4 02/13] object: add repository argument to create_object Stefan Beller
2018-05-10  0:40       ` [PATCH v4 03/13] object: add repository argument to grow_object_hash Stefan Beller
2018-05-10  0:40       ` [PATCH v4 04/13] alloc: add repository argument to alloc_blob_node Stefan Beller
2018-05-10  0:40       ` [PATCH v4 05/13] alloc: add repository argument to alloc_tree_node Stefan Beller
2018-05-10  0:40       ` [PATCH v4 06/13] alloc: add repository argument to alloc_commit_node Stefan Beller
2018-05-10  0:40       ` [PATCH v4 07/13] alloc: add repository argument to alloc_tag_node Stefan Beller
2018-05-10  0:40       ` [PATCH v4 08/13] alloc: add repository argument to alloc_object_node Stefan Beller
2018-05-10  0:40       ` [PATCH v4 09/13] alloc: add repository argument to alloc_report Stefan Beller
2018-05-10  0:40       ` [PATCH v4 10/13] alloc: add repository argument to alloc_commit_index Stefan Beller
2018-05-10  0:40       ` [PATCH v4 11/13] object: allow grow_object_hash to handle arbitrary repositories Stefan Beller
2018-05-10  0:40       ` [PATCH v4 12/13] object: allow create_object " Stefan Beller
2018-05-10  0:40       ` [PATCH v4 13/13] alloc: allow arbitrary repositories for alloc functions Stefan Beller
2018-05-10 17:16       ` [PATCH v4 00/13] object store: alloc Jonathan Tan
2018-05-10 17:32         ` Stefan Beller
2018-05-10 20:56           ` Jonathan Tan
2018-05-10 22:36             ` Stefan Beller
2018-05-11 19:17         ` [PATCH] alloc: allow arbitrary repositories for alloc functions Stefan Beller
2018-05-11 19:38           ` Eric Sunshine
2018-05-15 21:48             ` Stefan Beller
2018-05-16  2:27               ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).