All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] diff-lib: clear `pending` object-array in `index_differs_from()`
@ 2017-09-20 19:47 Martin Ågren
  2017-09-20 20:02 ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-20 19:47 UTC (permalink / raw)
  To: git; +Cc: Stephan Beyer

Instead of conditionally freeing `rev.pending.objects`, just call
`object_array_clear()` on `rev.pending`. This means we don't poke as
much into the implementation, which is already a good thing, but also
that we free the individual entries as well, thereby fixing a
memory-leak.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 diff-lib.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 2a52b07..4e0980c 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -549,7 +549,6 @@ int index_differs_from(const char *def, int diff_flags,
 	rev.diffopt.flags |= diff_flags;
 	rev.diffopt.ita_invisible_in_index = ita_invisible_in_index;
 	run_diff_index(&rev, 1);
-	if (rev.pending.alloc)
-		free(rev.pending.objects);
+	object_array_clear(&rev.pending);
 	return (DIFF_OPT_TST(&rev.diffopt, HAS_CHANGES) != 0);
 }
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] diff-lib: clear `pending` object-array in `index_differs_from()`
  2017-09-20 19:47 [PATCH] diff-lib: clear `pending` object-array in `index_differs_from()` Martin Ågren
@ 2017-09-20 20:02 ` Jeff King
  2017-09-21  3:56   ` Martin Ågren
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2017-09-20 20:02 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Stephan Beyer

On Wed, Sep 20, 2017 at 09:47:24PM +0200, Martin Ågren wrote:

> Instead of conditionally freeing `rev.pending.objects`, just call
> `object_array_clear()` on `rev.pending`. This means we don't poke as
> much into the implementation, which is already a good thing, but also
> that we free the individual entries as well, thereby fixing a
> memory-leak.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  diff-lib.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/diff-lib.c b/diff-lib.c
> index 2a52b07..4e0980c 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -549,7 +549,6 @@ int index_differs_from(const char *def, int diff_flags,
>  	rev.diffopt.flags |= diff_flags;
>  	rev.diffopt.ita_invisible_in_index = ita_invisible_in_index;
>  	run_diff_index(&rev, 1);
> -	if (rev.pending.alloc)
> -		free(rev.pending.objects);
> +	object_array_clear(&rev.pending);

Looks good. A similar bug was the exact reason for adding the function
in 46be82312. I did a grep for 'free.*\.objects' to see if there were
other cases.

There are some hits. E.g., the one in orphaned_commit_warning(). It does
something funny with setting revs.leak_pending. But I _think_ after the
whole thing is done and we call free(refs.objects), that probably ought
to be an object_array_clear().

As I suspect you're working your way through leak-checker results, I'm
OK if you want to punt on digging into more cases for now and just fix
ones that the tool has identified as real leaks.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] diff-lib: clear `pending` object-array in `index_differs_from()`
  2017-09-20 20:02 ` Jeff King
@ 2017-09-21  3:56   ` Martin Ågren
  2017-09-21  4:52     ` Jeff King
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
  1 sibling, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-21  3:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Stephan Beyer

On 20 September 2017 at 22:02, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 20, 2017 at 09:47:24PM +0200, Martin Ågren wrote:
>
>> Instead of conditionally freeing `rev.pending.objects`, just call
>> `object_array_clear()` on `rev.pending`. This means we don't poke as
>> much into the implementation, which is already a good thing, but also
>> that we free the individual entries as well, thereby fixing a
>> memory-leak.

> Looks good. A similar bug was the exact reason for adding the function
> in 46be82312. I did a grep for 'free.*\.objects' to see if there were
> other cases.

Ah. I grepped for "pending.objects", but didn't go more general than that.

> There are some hits. E.g., the one in orphaned_commit_warning(). It does
> something funny with setting revs.leak_pending. But I _think_ after the
> whole thing is done and we call free(refs.objects), that probably ought
> to be an object_array_clear().

I'll look into that one in the evening.

> As I suspect you're working your way through leak-checker results, I'm
> OK if you want to punt on digging into more cases for now and just fix
> ones that the tool has identified as real leaks.

I am indeed going through ASan results. Your UNLEAK helps immensely!
(I'm collecting UNLEAKs on the side. I see now that UNLEAK is in master,
so I should probably submit those where I believe things are
"UNLEAK-complete".)

Thanks for looking at this.

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] diff-lib: clear `pending` object-array in `index_differs_from()`
  2017-09-21  3:56   ` Martin Ågren
@ 2017-09-21  4:52     ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2017-09-21  4:52 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Stephan Beyer

On Thu, Sep 21, 2017 at 05:56:21AM +0200, Martin Ågren wrote:

> > Looks good. A similar bug was the exact reason for adding the function
> > in 46be82312. I did a grep for 'free.*\.objects' to see if there were
> > other cases.
> 
> Ah. I grepped for "pending.objects", but didn't go more general than that.

I did at first, too. My optimism lessened when I ran the more general
grep. :)

> > As I suspect you're working your way through leak-checker results, I'm
> > OK if you want to punt on digging into more cases for now and just fix
> > ones that the tool has identified as real leaks.
> 
> I am indeed going through ASan results. Your UNLEAK helps immensely!
> (I'm collecting UNLEAKs on the side. I see now that UNLEAK is in master,
> so I should probably submit those where I believe things are
> "UNLEAK-complete".)

Thanks! That's exactly what I'd hoped might happen when I started on
UNLEAK.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-20 20:02 ` Jeff King
  2017-09-21  3:56   ` Martin Ågren
@ 2017-09-22 23:34   ` Martin Ågren
  2017-09-22 23:34     ` [PATCH v2 1/6] builtin/commit: fix memory leak in `prepare_index()` Martin Ågren
                       ` (7 more replies)
  1 sibling, 8 replies; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

On 20 September 2017 at 22:02, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 20, 2017 at 09:47:24PM +0200, Martin Ågren wrote:
>
>> Instead of conditionally freeing `rev.pending.objects`, just call
>> `object_array_clear()` on `rev.pending`. This means we don't poke as
>> much into the implementation, which is already a good thing, but also
>> that we free the individual entries as well, thereby fixing a
>> memory-leak.
>
> Looks good. A similar bug was the exact reason for adding the function
> in 46be82312. I did a grep for 'free.*\.objects' to see if there were
> other cases.
>
> There are some hits. E.g., the one in orphaned_commit_warning(). It does
> something funny with setting revs.leak_pending. But I _think_ after the
> whole thing is done and we call free(refs.objects), that probably ought
> to be an object_array_clear().

Looking into this, I found various classes of leaks and oddities. I
ended up replacing this patch with four patches.

Since Junio collected my "independent" patches into ma/leakplugs, this
is a re-roll of that whole topic. I got the first two patches from
Junio's tree. The only difference there is "builtin/" at the very start
of the first commit message.

The third patch handles the places where we use `leak_pending`. The
fourth addresses the rest of what your grep found and includes the
original patch from this thread. While working on those, I felt that
some callers fiddle a bit too much with the internals of `object_array`.
One pattern leaks, another only looks like it. That resulted in the last
two patches.

Since ma/plugleaks is branched off maint, I did the same with this
series. It applies on master and next and has a minor conflict on pu
(`handle_commit()` got a new argument).

Martin

Martin Ågren (6):
  builtin/commit: fix memory leak in `prepare_index()`
  commit: fix memory leak in `reduce_heads()`
  leak_pending: use `object_array_clear()`, not `free()`
  object_array: use `object_array_clear()`, not `free()`
  object_array: add and use `object_array_pop()`
  pack-bitmap[-write]: use `object_array_clear()`, don't leak

 bisect.c              |  3 ++-
 builtin/checkout.c    |  9 ++++++++-
 builtin/commit.c      | 15 ++++++++++-----
 builtin/fast-export.c |  3 +--
 builtin/fsck.c        |  7 +------
 builtin/reflog.c      |  6 +++---
 bundle.c              |  9 ++++++++-
 commit.c              |  1 +
 diff-lib.c            |  3 +--
 object.c              | 13 +++++++++++++
 object.h              |  7 +++++++
 pack-bitmap-write.c   |  4 +---
 pack-bitmap.c         | 10 +++-------
 revision.h            | 11 +++++++++++
 shallow.c             |  2 +-
 submodule.c           |  4 ++--
 upload-pack.c         |  2 +-
 17 files changed, 74 insertions(+), 35 deletions(-)

-- 
2.14.1.727.g9ddaf86


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/6] builtin/commit: fix memory leak in `prepare_index()`
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
@ 2017-09-22 23:34     ` Martin Ågren
  2017-09-22 23:34     ` [PATCH v2 2/6] commit: fix memory leak in `reduce_heads()` Martin Ågren
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Release `pathspec` and the string list `partial`.

When we clear the string list, make sure we do not free the `util`
pointers. That would result in double-freeing, since we set them up as
`item->util = item` in `list_paths()`.

Initialize the string list early, so that we can always release it. That
introduces some unnecessary overhead in various code paths, but means
there is one and only one way out of the function. If we ever accumulate
more things we need to free, it should be straightforward to do so.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/commit.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 1a0da71a4..ce76b6f96 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -336,7 +336,7 @@ static void refresh_cache_or_die(int refresh_flags)
 static const char *prepare_index(int argc, const char **argv, const char *prefix,
 				 const struct commit *current_head, int is_status)
 {
-	struct string_list partial;
+	struct string_list partial = STRING_LIST_INIT_DUP;
 	struct pathspec pathspec;
 	int refresh_flags = REFRESH_QUIET;
 	const char *ret;
@@ -381,7 +381,8 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 			warning(_("Failed to update main cache tree"));
 
 		commit_style = COMMIT_NORMAL;
-		return get_lock_file_path(&index_lock);
+		ret = get_lock_file_path(&index_lock);
+		goto out;
 	}
 
 	/*
@@ -404,7 +405,8 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 		if (write_locked_index(&the_index, &index_lock, CLOSE_LOCK))
 			die(_("unable to write new_index file"));
 		commit_style = COMMIT_NORMAL;
-		return get_lock_file_path(&index_lock);
+		ret = get_lock_file_path(&index_lock);
+		goto out;
 	}
 
 	/*
@@ -430,7 +432,8 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 			rollback_lock_file(&index_lock);
 		}
 		commit_style = COMMIT_AS_IS;
-		return get_index_file();
+		ret = get_index_file();
+		goto out;
 	}
 
 	/*
@@ -461,7 +464,6 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 			die(_("cannot do a partial commit during a cherry-pick."));
 	}
 
-	string_list_init(&partial, 1);
 	if (list_paths(&partial, !current_head ? NULL : "HEAD", prefix, &pathspec))
 		exit(1);
 
@@ -491,6 +493,9 @@ static const char *prepare_index(int argc, const char **argv, const char *prefix
 	discard_cache();
 	ret = get_lock_file_path(&false_lock);
 	read_cache_from(ret);
+out:
+	string_list_clear(&partial, 0);
+	clear_pathspec(&pathspec);
 	return ret;
 }
 
-- 
2.14.1.727.g9ddaf86


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/6] commit: fix memory leak in `reduce_heads()`
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
  2017-09-22 23:34     ` [PATCH v2 1/6] builtin/commit: fix memory leak in `prepare_index()` Martin Ågren
@ 2017-09-22 23:34     ` Martin Ågren
  2017-09-22 23:34     ` [PATCH v2 3/6] leak_pending: use `object_array_clear()`, not `free()` Martin Ågren
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

We don't free the temporary scratch space we use with
`remove_redundant()`. Free it similar to how we do it in
`get_merge_bases_many_0()`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/commit.c b/commit.c
index d3150d627..f73976bcc 100644
--- a/commit.c
+++ b/commit.c
@@ -1080,6 +1080,7 @@ struct commit_list *reduce_heads(struct commit_list *heads)
 	num_head = remove_redundant(array, num_head);
 	for (i = 0; i < num_head; i++)
 		tail = &commit_list_insert(array[i], tail)->next;
+	free(array);
 	return result;
 }
 
-- 
2.14.1.727.g9ddaf86


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/6] leak_pending: use `object_array_clear()`, not `free()`
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
  2017-09-22 23:34     ` [PATCH v2 1/6] builtin/commit: fix memory leak in `prepare_index()` Martin Ågren
  2017-09-22 23:34     ` [PATCH v2 2/6] commit: fix memory leak in `reduce_heads()` Martin Ågren
@ 2017-09-22 23:34     ` Martin Ågren
  2017-09-23  3:47       ` Jeff King
  2017-09-22 23:34     ` [PATCH v2 4/6] object_array: " Martin Ågren
                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Setting `leak_pending = 1` tells `prepare_revision_walk()` not to
release the `pending` array, and makes that the caller's responsibility.
See 4a43d374f (revision: add leak_pending flag, 2011-10-01) and
353f5657a (bisect: use leak_pending flag, 2011-10-01).

Commit 1da1e07c8 (clean up name allocation in prepare_revision_walk,
2014-10-15) fixed a memory leak in `prepare_revision_walk()` by
switching from `free()` to `object_array_clear()`. However, where we use
the `leak_pending`-mechanism, we're still only calling `free()`.

Use `object_array_clear()` instead. Copy some helpful comments from
353f5657a to the other callers that we update to clarify the memory
responsibilities, and to highlight that the commits are not affected
when we clear the array -- it is indeed correct to both tidy up the
commit flags and clear the object array.

Document `leak_pending` in revision.h to help future users get this
right.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 bisect.c           |  3 ++-
 builtin/checkout.c |  9 ++++++++-
 bundle.c           |  9 ++++++++-
 revision.h         | 11 +++++++++++
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/bisect.c b/bisect.c
index a9fd9fbc6..fc797f6ae 100644
--- a/bisect.c
+++ b/bisect.c
@@ -826,7 +826,8 @@ static int check_ancestors(const char *prefix)
 
 	/* Clean up objects used, as they will be reused. */
 	clear_commit_marks_for_object_array(&pending_copy, ALL_REV_FLAGS);
-	free(pending_copy.objects);
+
+	object_array_clear(&pending_copy);
 
 	return res;
 }
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 2d75ac66c..52f1b6770 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -796,9 +796,14 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 	for_each_ref(add_pending_uninteresting_ref, &revs);
 	add_pending_oid(&revs, "HEAD", &new->object.oid, UNINTERESTING);
 
+	/* Save pending objects, so they can be cleaned up later. */
 	refs = revs.pending;
 	revs.leak_pending = 1;
 
+	/*
+	 * prepare_revision_walk (together with .leak_pending = 1) makes us
+	 * the sole owner of the list of pending objects.
+	 */
 	if (prepare_revision_walk(&revs))
 		die(_("internal error in revision walk"));
 	if (!(old->object.flags & UNINTERESTING))
@@ -806,8 +811,10 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 	else
 		describe_detached_head(_("Previous HEAD position was"), old);
 
+	/* Clean up objects used, as they will be reused. */
 	clear_commit_marks_for_object_array(&refs, ALL_REV_FLAGS);
-	free(refs.objects);
+
+	object_array_clear(&refs);
 }
 
 static int switch_branches(const struct checkout_opts *opts,
diff --git a/bundle.c b/bundle.c
index d15db03c8..c092d5d68 100644
--- a/bundle.c
+++ b/bundle.c
@@ -157,9 +157,14 @@ int verify_bundle(struct bundle_header *header, int verbose)
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
+	/* Save pending objects, so they can be cleaned up later. */
 	refs = revs.pending;
 	revs.leak_pending = 1;
 
+	/*
+	 * prepare_revision_walk (together with .leak_pending = 1) makes us
+	 * the sole owner of the list of pending objects.
+	 */
 	if (prepare_revision_walk(&revs))
 		die(_("revision walk setup failed"));
 
@@ -176,8 +181,10 @@ int verify_bundle(struct bundle_header *header, int verbose)
 				refs.objects[i].name);
 		}
 
+	/* Clean up objects used, as they will be reused. */
 	clear_commit_marks_for_object_array(&refs, ALL_REV_FLAGS);
-	free(refs.objects);
+
+	object_array_clear(&refs);
 
 	if (verbose) {
 		struct ref_list *r;
diff --git a/revision.h b/revision.h
index bc18487d6..3162cc78e 100644
--- a/revision.h
+++ b/revision.h
@@ -149,6 +149,17 @@ struct rev_info {
 			date_mode_explicit:1,
 			preserve_subject:1;
 	unsigned int	disable_stdin:1;
+	/*
+	 * Set `leak_pending` to prevent `prepare_revision_walk()` from clearing
+	 * the array of pending objects (`pending`). It will still forget about
+	 * the array and its entries, so they really are leaked. This can be
+	 * useful if the `struct object_array` `pending` is copied before
+	 * calling `prepare_revision_walk()`. By setting `leak_pending`, you
+	 * effectively claim ownership of the old array, so you should most
+	 * likely call `object_array_clear(&pending_copy)` once you are done.
+	 * Observe that this is about ownership of the array and its entries,
+	 * not the commits referenced by those entries.
+	 */
 	unsigned int	leak_pending:1;
 	/* --show-linear-break */
 	unsigned int	track_linear:1,
-- 
2.14.1.727.g9ddaf86


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/6] object_array: use `object_array_clear()`, not `free()`
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
                       ` (2 preceding siblings ...)
  2017-09-22 23:34     ` [PATCH v2 3/6] leak_pending: use `object_array_clear()`, not `free()` Martin Ågren
@ 2017-09-22 23:34     ` Martin Ågren
  2017-09-23  4:04       ` Jeff King
  2017-09-22 23:34     ` [PATCH v2 5/6] object_array: add and use `object_array_pop()` Martin Ågren
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Instead of freeing `foo.objects` for an object array `foo` (sometimes
conditionally), call `object_array_clear(&foo)`. This means we don't
poke as much into the implementation, which is already a good thing, but
also that we release the individual entries as well, thereby fixing at
least one memory-leak (in diff-lib.c).

If someone is holding on to a pointer to an element's `name` or `path`,
that is now a dangling pointer, i.e., we'd be turning an unpleasant
situation into an outright bug. To the best of my understanding no such
long-term pointers are being taken.

The way we handle `study` in builting/reflog.c still looks like it might
leak. That will be addressed in the next commit.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 builtin/reflog.c | 4 ++--
 diff-lib.c       | 3 +--
 submodule.c      | 4 ++--
 upload-pack.c    | 2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/builtin/reflog.c b/builtin/reflog.c
index e237d927a..6b34f23e7 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -182,8 +182,8 @@ static int commit_is_complete(struct commit *commit)
 			found.objects[i].item->flags |= SEEN;
 	}
 	/* free object arrays */
-	free(study.objects);
-	free(found.objects);
+	object_array_clear(&study);
+	object_array_clear(&found);
 	return !is_incomplete;
 }
 
diff --git a/diff-lib.c b/diff-lib.c
index 2a52b0795..4e0980caa 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -549,7 +549,6 @@ int index_differs_from(const char *def, int diff_flags,
 	rev.diffopt.flags |= diff_flags;
 	rev.diffopt.ita_invisible_in_index = ita_invisible_in_index;
 	run_diff_index(&rev, 1);
-	if (rev.pending.alloc)
-		free(rev.pending.objects);
+	object_array_clear(&rev.pending);
 	return (DIFF_OPT_TST(&rev.diffopt, HAS_CHANGES) != 0);
 }
diff --git a/submodule.c b/submodule.c
index 36f45f5a5..79fd01f7b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1728,7 +1728,7 @@ static int find_first_merges(struct object_array *result, const char *path,
 			add_object_array(merges.objects[i].item, NULL, result);
 	}
 
-	free(merges.objects);
+	object_array_clear(&merges);
 	return result->nr;
 }
 
@@ -1833,7 +1833,7 @@ int merge_submodule(struct object_id *result, const char *path,
 			print_commit((struct commit *) merges.objects[i].item);
 	}
 
-	free(merges.objects);
+	object_array_clear(&merges);
 	return 0;
 }
 
diff --git a/upload-pack.c b/upload-pack.c
index 7efff2fbf..ec0eee8fc 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -888,7 +888,7 @@ static void receive_needs(void)
 		}
 
 	shallow_nr += shallows.nr;
-	free(shallows.objects);
+	object_array_clear(&shallows);
 }
 
 /* return non-zero if the ref is hidden, otherwise 0 */
-- 
2.14.1.727.g9ddaf86


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/6] object_array: add and use `object_array_pop()`
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
                       ` (3 preceding siblings ...)
  2017-09-22 23:34     ` [PATCH v2 4/6] object_array: " Martin Ågren
@ 2017-09-22 23:34     ` Martin Ågren
  2017-09-23  4:27       ` Jeff King
  2017-09-22 23:34     ` [PATCH v2 6/6] pack-bitmap[-write]: use `object_array_clear()`, don't leak Martin Ågren
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

In a couple of places, we pop objects off an object array `foo` by
decreasing `foo.nr`. We access `foo.nr` in many places, but most if not
all other times we do so read-only, e.g., as we iterate over the array.
But when we change `foo.nr` behind the array's back, it feels a bit
nasty and looks like it might leak memory.

Leaks happen if the popped element has an allocated `name` or `path`.
At the moment, that is not the case. Still, 1) the object array might
gain more fields that want to be freed, 2) a code path where we pop
might start using names or paths, 3) one of these code paths might be
copied to somewhere where we do, and 4) using a dedicated function for
popping is conceptually cleaner.

Introduce and use `object_array_pop()` instead. Release memory in the
new function. Document that popping an object leaves the associated
elements in limbo.

The converted places were identified by grepping for "\.nr\>" and
looking for "--".

Make the new function return NULL on an empty array. This is consistent
with `pop_commit()` and allows the following:

	while ((o = object_array_pop(&foo)) != NULL) {
		// do something
	}

But as noted above, we don't need to go out of our way to avoid reading
`foo.nr`. This is probably more readable:

	while (foo.nr) {
		... o = object_array_pop(&foo);
		// do something
	}

The name of `object_array_pop()` does not quite align with
`add_object_array()`. That is unfortunate. On the other hand, it matches
`object_array_clear()`. Arguably it's `add_...` that is the odd one out,
since it reads like it's used to "add" an "object array". For that
reason, side with `object_array_clear()`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 builtin/fast-export.c |  3 +--
 builtin/fsck.c        |  7 +------
 builtin/reflog.c      |  2 +-
 object.c              | 13 +++++++++++++
 object.h              |  7 +++++++
 shallow.c             |  2 +-
 6 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d412c0a8f..cff8d0fc5 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -634,11 +634,10 @@ static void handle_tail(struct object_array *commits, struct rev_info *revs)
 {
 	struct commit *commit;
 	while (commits->nr) {
-		commit = (struct commit *)commits->objects[commits->nr - 1].item;
+		commit = (struct commit *)object_array_pop(commits);
 		if (has_unshown_parent(commit))
 			return;
 		handle_commit(commit, revs);
-		commits->nr--;
 	}
 }
 
diff --git a/builtin/fsck.c b/builtin/fsck.c
index d18244ab5..7d4ad0273 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -181,12 +181,7 @@ static int traverse_reachable(void)
 	if (show_progress)
 		progress = start_progress_delay(_("Checking connectivity"), 0, 0, 2);
 	while (pending.nr) {
-		struct object_array_entry *entry;
-		struct object *obj;
-
-		entry = pending.objects + --pending.nr;
-		obj = entry->item;
-		result |= traverse_one_object(obj);
+		result |= traverse_one_object(object_array_pop(&pending));
 		display_progress(progress, ++nr);
 	}
 	stop_progress(&progress);
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 6b34f23e7..2067cca5b 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -126,7 +126,7 @@ static int commit_is_complete(struct commit *commit)
 		struct commit *c;
 		struct commit_list *parent;
 
-		c = (struct commit *)study.objects[--study.nr].item;
+		c = (struct commit *)object_array_pop(&study);
 		if (!c->object.parsed && !parse_object(&c->object.oid))
 			c->object.flags |= INCOMPLETE;
 
diff --git a/object.c b/object.c
index 321d7e920..b9a4a0e50 100644
--- a/object.c
+++ b/object.c
@@ -353,6 +353,19 @@ static void object_array_release_entry(struct object_array_entry *ent)
 	free(ent->path);
 }
 
+struct object *object_array_pop(struct object_array *array)
+{
+	struct object *ret;
+
+	if (!array->nr)
+		return NULL;
+
+	ret = array->objects[array->nr - 1].item;
+	object_array_release_entry(&array->objects[array->nr - 1]);
+	array->nr--;
+	return ret;
+}
+
 void object_array_filter(struct object_array *array,
 			 object_array_each_func_t want, void *cb_data)
 {
diff --git a/object.h b/object.h
index 0a419ba8d..b7629fe92 100644
--- a/object.h
+++ b/object.h
@@ -115,6 +115,13 @@ int object_list_contains(struct object_list *list, struct object *obj);
 /* Object array handling .. */
 void add_object_array(struct object *obj, const char *name, struct object_array *array);
 void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
+/*
+ * Returns NULL if the array is empty. Otherwise, returns the last object
+ * after removing its entry from the array. Other resources associated
+ * with that object are left in an unspecified state and should not be
+ * examined.
+ */
+struct object *object_array_pop(struct object_array *array);
 
 typedef int (*object_array_each_func_t)(struct object_array_entry *, void *);
 
diff --git a/shallow.c b/shallow.c
index 54359d549..901ac9791 100644
--- a/shallow.c
+++ b/shallow.c
@@ -99,7 +99,7 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
 				cur_depth = 0;
 			} else {
 				commit = (struct commit *)
-					stack.objects[--stack.nr].item;
+					object_array_pop(&stack);
 				cur_depth = *(int *)commit->util;
 			}
 		}
-- 
2.14.1.727.g9ddaf86


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/6] pack-bitmap[-write]: use `object_array_clear()`, don't leak
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
                       ` (4 preceding siblings ...)
  2017-09-22 23:34     ` [PATCH v2 5/6] object_array: add and use `object_array_pop()` Martin Ågren
@ 2017-09-22 23:34     ` Martin Ågren
  2017-09-23  4:35       ` Jeff King
  2017-09-23  4:37     ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Jeff King
  2017-09-24  7:01     ` Junio C Hamano
  7 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-22 23:34 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Instead of setting the fields of rev->pending to 0/NULL, thereby leaking
memory, call `object_array_clear(&rev->pending)`.

In pack-bitmap.c, we make copies of those fields as `pending_nr` and
`pending_e`. We never update the aliases and the original fields never
change, so the aliases are not really needed and just make it harder
than necessary to understand the code. While we're here, remove the
aliases to make the code easier to follow.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 pack-bitmap-write.c |  4 +---
 pack-bitmap.c       | 10 +++-------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 8e47a96b3..a8df5ce2a 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -297,9 +297,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 			traverse_commit_list(&revs, show_commit, show_object, base);
 
-			revs.pending.nr = 0;
-			revs.pending.alloc = 0;
-			revs.pending.objects = NULL;
+			object_array_clear(&revs.pending);
 
 			stored->bitmap = bitmap_to_ewah(base);
 			need_reset = 0;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 327634cd7..0a49c1595 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -653,8 +653,6 @@ static int in_bitmapped_pack(struct object_list *roots)
 int prepare_bitmap_walk(struct rev_info *revs)
 {
 	unsigned int i;
-	unsigned int pending_nr = revs->pending.nr;
-	struct object_array_entry *pending_e = revs->pending.objects;
 
 	struct object_list *wants = NULL;
 	struct object_list *haves = NULL;
@@ -669,8 +667,8 @@ int prepare_bitmap_walk(struct rev_info *revs)
 			return -1;
 	}
 
-	for (i = 0; i < pending_nr; ++i) {
-		struct object *object = pending_e[i].item;
+	for (i = 0; i < revs->pending.nr; ++i) {
+		struct object *object = revs->pending.objects[i].item;
 
 		if (object->type == OBJ_NONE)
 			parse_object_or_die(&object->oid, NULL);
@@ -714,9 +712,7 @@ int prepare_bitmap_walk(struct rev_info *revs)
 	if (!bitmap_git.loaded && load_pack_bitmap() < 0)
 		return -1;
 
-	revs->pending.nr = 0;
-	revs->pending.alloc = 0;
-	revs->pending.objects = NULL;
+	object_array_clear(&revs->pending);
 
 	if (haves) {
 		revs->ignore_missing_links = 1;
-- 
2.14.1.727.g9ddaf86


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 3/6] leak_pending: use `object_array_clear()`, not `free()`
  2017-09-22 23:34     ` [PATCH v2 3/6] leak_pending: use `object_array_clear()`, not `free()` Martin Ågren
@ 2017-09-23  3:47       ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2017-09-23  3:47 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Sat, Sep 23, 2017 at 01:34:51AM +0200, Martin Ågren wrote:

> Setting `leak_pending = 1` tells `prepare_revision_walk()` not to
> release the `pending` array, and makes that the caller's responsibility.
> See 4a43d374f (revision: add leak_pending flag, 2011-10-01) and
> 353f5657a (bisect: use leak_pending flag, 2011-10-01).
> 
> Commit 1da1e07c8 (clean up name allocation in prepare_revision_walk,
> 2014-10-15) fixed a memory leak in `prepare_revision_walk()` by
> switching from `free()` to `object_array_clear()`. However, where we use
> the `leak_pending`-mechanism, we're still only calling `free()`.

Thanks for digging up the history here. Reviewing those it seems clear
that doing a full object_array_clear() is the right thing.

> Use `object_array_clear()` instead. Copy some helpful comments from
> 353f5657a to the other callers that we update to clarify the memory
> responsibilities, and to highlight that the commits are not affected
> when we clear the array -- it is indeed correct to both tidy up the
> commit flags and clear the object array.
> 
> Document `leak_pending` in revision.h to help future users get this
> right.

This all looks good to me. Since the main use of "leak_pending" seems to
be to clear commit marks, I have a feeling that a better API would have
been something like:

  /* normal setup and use */
  init_revisions(&rev);
  setup_revisions(&rev, ...);
  prepare_revision_walk(&rev);
  while (get_revision(&rev))
	...;

  /* optional, but sensible if there may be other callers */
  clear_commit_marks(&rev);

  /* actually free any held memory */
  clear_revisions(&rev);

which would imply rev_info holding onto the "old" pending list
automatically until we decommission the rev_info in the final step.

But that's obviously a much bigger change. What you have here looks like
a nice clean fix to the leak problems.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/6] object_array: use `object_array_clear()`, not `free()`
  2017-09-22 23:34     ` [PATCH v2 4/6] object_array: " Martin Ågren
@ 2017-09-23  4:04       ` Jeff King
  2017-09-23  9:41         ` Martin Ågren
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-09-23  4:04 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Sat, Sep 23, 2017 at 01:34:52AM +0200, Martin Ågren wrote:

> Instead of freeing `foo.objects` for an object array `foo` (sometimes
> conditionally), call `object_array_clear(&foo)`. This means we don't
> poke as much into the implementation, which is already a good thing, but
> also that we release the individual entries as well, thereby fixing at
> least one memory-leak (in diff-lib.c).
> 
> If someone is holding on to a pointer to an element's `name` or `path`,
> that is now a dangling pointer, i.e., we'd be turning an unpleasant
> situation into an outright bug. To the best of my understanding no such
> long-term pointers are being taken.

It would be nice if we had a good way to verify this automatically, but
I couldn't think of one. It passes the test suite with ASan, but of
course our coverage is not 100%.

We do know a few things:

  1. Any caller which keeps a pointer to the object-entries themselves
     is already broken, since we free that already. We only have to care
     about the name and path strings.

  2. Any caller which uses add_object_array() has a NULL path, so we
     don't have to worry about leaving that dangling.

  3. Most of the callers outside of revision.c use a NULL name argument,
     as well.

So let's see:

> diff --git a/builtin/reflog.c b/builtin/reflog.c
> index e237d927a..6b34f23e7 100644
> --- a/builtin/reflog.c
> +++ b/builtin/reflog.c
> @@ -182,8 +182,8 @@ static int commit_is_complete(struct commit *commit)
>  			found.objects[i].item->flags |= SEEN;
>  	}
>  	/* free object arrays */
> -	free(study.objects);
> -	free(found.objects);
> +	object_array_clear(&study);
> +	object_array_clear(&found);
>  	return !is_incomplete;
>  }

These ones always have NULL names and paths, so are good.

(An orthogonal low-hanging fruit: these probably ought to use
OBJECT_ARRAY_INIT instead of memset. Maybe a good #leftoverbits for some
of the Outreachy applicants).

> diff --git a/diff-lib.c b/diff-lib.c
> index 2a52b0795..4e0980caa 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -549,7 +549,6 @@ int index_differs_from(const char *def, int diff_flags,
>  	rev.diffopt.flags |= diff_flags;
>  	rev.diffopt.ita_invisible_in_index = ita_invisible_in_index;
>  	run_diff_index(&rev, 1);
> -	if (rev.pending.alloc)
> -		free(rev.pending.objects);
> +	object_array_clear(&rev.pending);
>  	return (DIFF_OPT_TST(&rev.diffopt, HAS_CHANGES) != 0);
>  }

This one gets the entries from setup_revisions(), so they may have
actual names. It calls diff_flush() before the clear, though, so I'm
pretty sure we would have dropped any queued entries (and I'm also
pretty sure that the queued entries make their own copies of any names
anyway). So this one isn't trivial, but I think it's fine.

> diff --git a/submodule.c b/submodule.c
> index 36f45f5a5..79fd01f7b 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1728,7 +1728,7 @@ static int find_first_merges(struct object_array *result, const char *path,
>  			add_object_array(merges.objects[i].item, NULL, result);
>  	}
>  
> -	free(merges.objects);
> +	object_array_clear(&merges);
>  	return result->nr;
>  }

Another always-NULL case.

> @@ -1833,7 +1833,7 @@ int merge_submodule(struct object_id *result, const char *path,
>  			print_commit((struct commit *) merges.objects[i].item);
>  	}
>  
> -	free(merges.objects);
> +	object_array_clear(&merges);
>  	return 0;
>  }

This one is fed by the "result" array of find_first_merges(), which is
also always-NULL.

> diff --git a/upload-pack.c b/upload-pack.c
> index 7efff2fbf..ec0eee8fc 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -888,7 +888,7 @@ static void receive_needs(void)
>  		}
>  
>  	shallow_nr += shallows.nr;
> -	free(shallows.objects);
> +	object_array_clear(&shallows);
>  }

Also always NULL.

So I think all of these cases are good (and weren't actually leaking in
the first place, since the only thing to get rid of was the entries
themselves).

> The way we handle `study` in builting/reflog.c still looks like it might
> leak. That will be addressed in the next commit.

This confused me for a minute, since the leak is not visible in the
context. ;) But reading the next commit, it makes sense.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] object_array: add and use `object_array_pop()`
  2017-09-22 23:34     ` [PATCH v2 5/6] object_array: add and use `object_array_pop()` Martin Ågren
@ 2017-09-23  4:27       ` Jeff King
  2017-09-23  9:49         ` Martin Ågren
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-09-23  4:27 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Sat, Sep 23, 2017 at 01:34:53AM +0200, Martin Ågren wrote:

> In a couple of places, we pop objects off an object array `foo` by
> decreasing `foo.nr`. We access `foo.nr` in many places, but most if not
> all other times we do so read-only, e.g., as we iterate over the array.
> But when we change `foo.nr` behind the array's back, it feels a bit
> nasty and looks like it might leak memory.
> 
> Leaks happen if the popped element has an allocated `name` or `path`.
> At the moment, that is not the case. Still, 1) the object array might
> gain more fields that want to be freed, 2) a code path where we pop
> might start using names or paths, 3) one of these code paths might be
> copied to somewhere where we do, and 4) using a dedicated function for
> popping is conceptually cleaner.

All good reasons, I think.

> Introduce and use `object_array_pop()` instead. Release memory in the
> new function. Document that popping an object leaves the associated
> elements in limbo.

The interface looks appropriate for all of the current cases. Though I
do suspect there's a bit of catch-22 here. If a caller _did_ care about
the "name" and "path" fields, then this pop function would be
inappropriate, because it returns only the object field.

So in the most general form, you'd want:

  while (foo.nr) {
	  struct object_array_entry *e = object_array_pop(&foo);

	  ... do stuff with e->name, etc ...

	  object_array_release_entry(e);
  }

But that is certainly more cumbersome for these callers. I think we can
punt on that until it becomes necessary (which likely is never).

> Make the new function return NULL on an empty array. This is consistent
> with `pop_commit()` and allows the following:
> 
> 	while ((o = object_array_pop(&foo)) != NULL) {
> 		// do something
> 	}
> 
> But as noted above, we don't need to go out of our way to avoid reading
> `foo.nr`. This is probably more readable:
> 
> 	while (foo.nr) {
> 		... o = object_array_pop(&foo);
> 		// do something
> 	}

Agreed that the latter is more readable (though I am also happy that the
pop function does something sensible for an empty array).

> The name of `object_array_pop()` does not quite align with
> `add_object_array()`. That is unfortunate. On the other hand, it matches
> `object_array_clear()`. Arguably it's `add_...` that is the odd one out,
> since it reads like it's used to "add" an "object array". For that
> reason, side with `object_array_clear()`.

Yes, we're dreadfully inconsistent here. I tend to prefer noun_verb()
when "noun" is a struct we're operating on. But we have quite a bit of
verb_noun(). I find that noun_verb() is a bit more discoverable (e.g.,
tab completion does something sensible), but I'm not sure if it's worth
trying to do a mass-conversion.

Perhaps it's something that should be mentioned in CodingGuidelines, if
it isn't already.

> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  builtin/fast-export.c |  3 +--
>  builtin/fsck.c        |  7 +------
>  builtin/reflog.c      |  2 +-
>  object.c              | 13 +++++++++++++
>  object.h              |  7 +++++++
>  shallow.c             |  2 +-
>  6 files changed, 24 insertions(+), 10 deletions(-)

The patch itself looks good, with one tiny nit:

> diff --git a/object.h b/object.h
> index 0a419ba8d..b7629fe92 100644
> --- a/object.h
> +++ b/object.h
> @@ -115,6 +115,13 @@ int object_list_contains(struct object_list *list, struct object *obj);
>  /* Object array handling .. */
>  void add_object_array(struct object *obj, const char *name, struct object_array *array);
>  void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
> +/*
> + * Returns NULL if the array is empty. Otherwise, returns the last object
> + * after removing its entry from the array. Other resources associated
> + * with that object are left in an unspecified state and should not be
> + * examined.
> + */
> +struct object *object_array_pop(struct object_array *array);

I'm very happy to see a comment over the declaration here. But I think
it's a bit more readable if we put a blank line between the prior
function and the start of that comment.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/6] pack-bitmap[-write]: use `object_array_clear()`, don't leak
  2017-09-22 23:34     ` [PATCH v2 6/6] pack-bitmap[-write]: use `object_array_clear()`, don't leak Martin Ågren
@ 2017-09-23  4:35       ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2017-09-23  4:35 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Sat, Sep 23, 2017 at 01:34:54AM +0200, Martin Ågren wrote:

> Instead of setting the fields of rev->pending to 0/NULL, thereby leaking
> memory, call `object_array_clear(&rev->pending)`.

I wondered if these cases might sometimes work on an uninitialized
struct (so setting the fields unconditionally without looking at them
would be the right thing). But nope, they are always initialized by the
revision machinery. So freeing is the right thing.

Like the earlier patch, we'd want to be sure there are no dangling
pointers here to the name/path fields. But I looked over the code and I
feel confident there are not (famous last words...).

> In pack-bitmap.c, we make copies of those fields as `pending_nr` and
> `pending_e`. We never update the aliases and the original fields never
> change, so the aliases are not really needed and just make it harder
> than necessary to understand the code. While we're here, remove the
> aliases to make the code easier to follow.

Agreed. Sometimes aliasing like this helps readability when the original
variable is wordy to access. But I don't think that is the case here.

It _can_ also allow the compiler to optimize harder (e.g., by putting
"pending_nr" into a register, because it's not 100% sure that
revs->pending.nr is unchanged by function calls within the loop). But
unless we know something is a tight loop, I don't think that level of
micro-optimization is worth the readability hit.

> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  pack-bitmap-write.c |  4 +---
>  pack-bitmap.c       | 10 +++-------
>  2 files changed, 4 insertions(+), 10 deletions(-)

Patch itself is as advertised. Looks good.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
                       ` (5 preceding siblings ...)
  2017-09-22 23:34     ` [PATCH v2 6/6] pack-bitmap[-write]: use `object_array_clear()`, don't leak Martin Ågren
@ 2017-09-23  4:37     ` Jeff King
  2017-09-23  9:54       ` Martin Ågren
  2017-09-24  7:01     ` Junio C Hamano
  7 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-09-23  4:37 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Sat, Sep 23, 2017 at 01:34:48AM +0200, Martin Ågren wrote:

> Martin Ågren (6):
>   builtin/commit: fix memory leak in `prepare_index()`
>   commit: fix memory leak in `reduce_heads()`
>   leak_pending: use `object_array_clear()`, not `free()`
>   object_array: use `object_array_clear()`, not `free()`
>   object_array: add and use `object_array_pop()`
>   pack-bitmap[-write]: use `object_array_clear()`, don't leak

All six look good to me. Thanks again for poking into this. I'm afraid
to ask how far we have left to go on running the test suite all the way
through with leak-checking turned on. :)

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/6] object_array: use `object_array_clear()`, not `free()`
  2017-09-23  4:04       ` Jeff King
@ 2017-09-23  9:41         ` Martin Ågren
  0 siblings, 0 replies; 27+ messages in thread
From: Martin Ågren @ 2017-09-23  9:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 23 September 2017 at 06:04, Jeff King <peff@peff.net> wrote:
> On Sat, Sep 23, 2017 at 01:34:52AM +0200, Martin Ågren wrote:
>
>> The way we handle `study` in builting/reflog.c still looks like it might
>> leak. That will be addressed in the next commit.
>
> This confused me for a minute, since the leak is not visible in the
> context. ;) But reading the next commit, it makes sense.

Hmm, I added this so that a careful reader would /not/ get confused and
go "wait, doesn't this look a bit fishy?". But I also didn't want to get
into too many details, so of course it ended up confusing anyway. :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] object_array: add and use `object_array_pop()`
  2017-09-23  4:27       ` Jeff King
@ 2017-09-23  9:49         ` Martin Ågren
  2017-09-23 15:47           ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-23  9:49 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 23 September 2017 at 06:27, Jeff King <peff@peff.net> wrote:
> On Sat, Sep 23, 2017 at 01:34:53AM +0200, Martin Ågren wrote:
>
>> Introduce and use `object_array_pop()` instead. Release memory in the
>> new function. Document that popping an object leaves the associated
>> elements in limbo.
>
> The interface looks appropriate for all of the current cases. Though I
> do suspect there's a bit of catch-22 here. If a caller _did_ care about
> the "name" and "path" fields, then this pop function would be
> inappropriate, because it returns only the object field.
>
> So in the most general form, you'd want:
>
>   while (foo.nr) {
>           struct object_array_entry *e = object_array_pop(&foo);
>
>           ... do stuff with e->name, etc ...
>
>           object_array_release_entry(e);
>   }
>
> But that is certainly more cumbersome for these callers. I think we can
> punt on that until it becomes necessary (which likely is never).

I considered something like this. I felt that making the function
`object_array_release_entr()` available to the general public would also
carry some risk. Right now, it's only object.c that needs to get things
right (never release twice, never release without removing, never be
clever, ..)

>> The name of `object_array_pop()` does not quite align with
>> `add_object_array()`. That is unfortunate. On the other hand, it matches
>> `object_array_clear()`. Arguably it's `add_...` that is the odd one out,
>> since it reads like it's used to "add" an "object array". For that
>> reason, side with `object_array_clear()`.
>
> Yes, we're dreadfully inconsistent here. I tend to prefer noun_verb()
> when "noun" is a struct we're operating on. But we have quite a bit of
> verb_noun(). I find that noun_verb() is a bit more discoverable (e.g.,
> tab completion does something sensible), but I'm not sure if it's worth
> trying to do a mass-conversion.
>
> Perhaps it's something that should be mentioned in CodingGuidelines, if
> it isn't already.

When writing the above paragaph (so yes, after I had already chosen the
name), I tried to find something, but couldn't. Admittedly, I just had a
cursory look.

> The patch itself looks good, with one tiny nit:
>
>> diff --git a/object.h b/object.h
>> index 0a419ba8d..b7629fe92 100644
>> --- a/object.h
>> +++ b/object.h
>> @@ -115,6 +115,13 @@ int object_list_contains(struct object_list *list, struct object *obj);
>>  /* Object array handling .. */
>>  void add_object_array(struct object *obj, const char *name, struct object_array *array);
>>  void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
>> +/*
>> + * Returns NULL if the array is empty. Otherwise, returns the last object
>> + * after removing its entry from the array. Other resources associated
>> + * with that object are left in an unspecified state and should not be
>> + * examined.
>> + */
>> +struct object *object_array_pop(struct object_array *array);
>
> I'm very happy to see a comment over the declaration here. But I think
> it's a bit more readable if we put a blank line between the prior
> function and the start of that comment.

Yes, that looks strange. :( I could re-roll and/or ask Junio to fiddle
with it. On closer look, this file is pretty close to documenting all
functions and there are some other comment-formatting issues. So here's
a promise that I'll get back to clean this up more generally in the not
too distant future. Would that be an acceptable punt? :-?

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-23  4:37     ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Jeff King
@ 2017-09-23  9:54       ` Martin Ågren
  2017-09-23 16:13         ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-23  9:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 23 September 2017 at 06:37, Jeff King <peff@peff.net> wrote:
> On Sat, Sep 23, 2017 at 01:34:48AM +0200, Martin Ågren wrote:
>
>> Martin Ågren (6):
>>   builtin/commit: fix memory leak in `prepare_index()`
>>   commit: fix memory leak in `reduce_heads()`
>>   leak_pending: use `object_array_clear()`, not `free()`
>>   object_array: use `object_array_clear()`, not `free()`
>>   object_array: add and use `object_array_pop()`
>>   pack-bitmap[-write]: use `object_array_clear()`, don't leak
>
> All six look good to me. Thanks again for poking into this. I'm afraid
> to ask how far we have left to go on running the test suite all the way
> through with leak-checking turned on. :)

Thanks for reviewing.

Unfortunately, I have not figured out how to get LSan to simply report
the leaks and continue. Its default behavior is to abort if there are
leaks. That's useful for finding the first leaking test, but not much
else. (Later tests might depend on that test doing everything it should,
so now those later tests will start failing and/or executing different
code paths.)

I can tell LeakSanitizer to exit with an exit code instead, but then all
leaking git-processes exit with the same exit code. That also interferes
with the tests.

What I would like is for the git-process to exit with the same exit
status it would have had without the leak-checking. That would make it
possible to run the whole test suite, collect all leaks, identify
duplicates, sort them, categorize them, prioritize them, track how we
are progressing...

I did spend some time looking into this before I gave up. I'd love to be
told I've missed something obvious.

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] object_array: add and use `object_array_pop()`
  2017-09-23  9:49         ` Martin Ågren
@ 2017-09-23 15:47           ` Jeff King
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff King @ 2017-09-23 15:47 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sat, Sep 23, 2017 at 11:49:16AM +0200, Martin Ågren wrote:

> >>  void add_object_array_with_path(struct object *obj, const char *name, struct object_array *array, unsigned mode, const char *path);
> >> +/*
> >> + * Returns NULL if the array is empty. Otherwise, returns the last object
> >> + * after removing its entry from the array. Other resources associated
> >> + * with that object are left in an unspecified state and should not be
> >> + * examined.
> >> + */
> >> +struct object *object_array_pop(struct object_array *array);
> >
> > I'm very happy to see a comment over the declaration here. But I think
> > it's a bit more readable if we put a blank line between the prior
> > function and the start of that comment.
> 
> Yes, that looks strange. :( I could re-roll and/or ask Junio to fiddle
> with it. On closer look, this file is pretty close to documenting all
> functions and there are some other comment-formatting issues. So here's
> a promise that I'll get back to clean this up more generally in the not
> too distant future. Would that be an acceptable punt? :-?

Yeah, I don't think it is a show-stopper (and I think there is a
reasonable chance Junio will just mark it up as he applies).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-23  9:54       ` Martin Ågren
@ 2017-09-23 16:13         ` Jeff King
  2017-09-23 16:38           ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-09-23 16:13 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sat, Sep 23, 2017 at 11:54:31AM +0200, Martin Ågren wrote:

> Unfortunately, I have not figured out how to get LSan to simply report
> the leaks and continue. Its default behavior is to abort if there are
> leaks. That's useful for finding the first leaking test, but not much
> else. (Later tests might depend on that test doing everything it should,
> so now those later tests will start failing and/or executing different
> code paths.)
> 
> I can tell LeakSanitizer to exit with an exit code instead, but then all
> leaking git-processes exit with the same exit code. That also interferes
> with the tests.
> 
> What I would like is for the git-process to exit with the same exit
> status it would have had without the leak-checking. That would make it
> possible to run the whole test suite, collect all leaks, identify
> duplicates, sort them, categorize them, prioritize them, track how we
> are progressing...

Doesn't:

  LSAN_OPTIONS=abort_on_error=0:exitcode=0

do what you want? I get:

  $ export LSAN_OPTIONS=abort_on_error=0:exitcode=0

  [no leaks, no error]
  $ ./git rev-parse HEAD; echo $?
  0

  [no leaks, error]
  $ ./git rev-parse broken; echo $?
  ...
  128

  [leaks, no error]
  $ ./git --no-pager show; echo $?
  ...
  SUMMARY: LeakSanitizer: 40 byte(s) leaked in 4 allocation(s).
  0

  [leaks, error]
  $ ./git --no-pager show broken; echo $?
  ...
  SUMMARY: LeakSanitizer: 35 byte(s) leaked in 3 allocation(s).
  128

In theory you should be able to just add "log_path=/tmp/lsan/output" to
that, which should put all the logs in a convenient place (and stop the
extra output from confusing any tests which capture stderr). But I can't
seem to get log_path to do anything, contrary to the documentation.

Doing log_to_syslog=1 does work for me. I'm not sure if it's a bug or
I'm holding it wrong. But it does seem like it should be possible to do
what you want.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-23 16:13         ` Jeff King
@ 2017-09-23 16:38           ` Jeff King
  2017-09-24 19:59             ` Martin Ågren
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-09-23 16:38 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sat, Sep 23, 2017 at 12:13:16PM -0400, Jeff King wrote:

> In theory you should be able to just add "log_path=/tmp/lsan/output" to
> that, which should put all the logs in a convenient place (and stop the
> extra output from confusing any tests which capture stderr). But I can't
> seem to get log_path to do anything, contrary to the documentation.
> 
> Doing log_to_syslog=1 does work for me. I'm not sure if it's a bug or
> I'm holding it wrong. But it does seem like it should be possible to do
> what you want.

Hrm. log_path doesn't seem to work for me directly from LSAN_OPTIONS.
But if you compile with ASan, it _does_ work from there. That seems like
a bug from my reading of the documentation, but maybe I'm missing
something.

Anyway, doing:

  ASAN_OPTIONS=detect_leaks=1:abort_on_error=0:exitcode=0:log_path=/tmp/lsan/output \
  make SANITIZE=address,leak test

should pass the whole suite and give you a host of files to analyze.

I'm not sure of the best way to count things. Surely many of these are
the same leak over and over. One way to visualize is to look at the
unique entries at each level of the call stack:

  show() {
    level=$1
    perl -lne 'print $1 if /#'$level' .*? in (.*)/' * |
    sort | uniq -c | sort -rn
  }

Level 0 isn't that interesting, it's just malloc:

  $ show 0
    27217 realloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9f20)
    23297 __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x74490)
    19370 malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9b70)
    12105 __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd9d48)

And level 1 is just our allocators:

  $ show 1
    28321 xrealloc /home/peff/compile/git/wrapper.c:138
    25199 xstrdup /home/peff/compile/git/wrapper.c:44
    18856 do_xmalloc /home/peff/compile/git/wrapper.c:60
    12664 xcalloc /home/peff/compile/git/wrapper.c:160
      660 __getdelim (/lib/x86_64-linux-gnu/libc.so.6+0x682b7)

But level 2 starts to get interesting (and is too big to copy here).
Some of them are direct sources of leaks. But others like strbuf_grow()
are yet another level of indirection, and you'd want to jump to level 3
or 4.

If you want the "real" source of the leak, probably you'd have to mark
certain functions as uninteresting and keep walking each stack trace
until you find an interesting function to report.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
                       ` (6 preceding siblings ...)
  2017-09-23  4:37     ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Jeff King
@ 2017-09-24  7:01     ` Junio C Hamano
  2017-09-24 20:00       ` Martin Ågren
  7 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2017-09-24  7:01 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Jeff King

Martin Ågren <martin.agren@gmail.com> writes:

> Since Junio collected my "independent" patches into ma/leakplugs, this
> is a re-roll of that whole topic. I got the first two patches from
> Junio's tree. The only difference there is "builtin/" at the very start
> of the first commit message.

Thanks, all 6 look sensible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-23 16:38           ` Jeff King
@ 2017-09-24 19:59             ` Martin Ågren
  2017-09-25 16:08               ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Martin Ågren @ 2017-09-24 19:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 23 September 2017 at 18:38, Jeff King <peff@peff.net> wrote:
> On Sat, Sep 23, 2017 at 12:13:16PM -0400, Jeff King wrote:
>
>> In theory you should be able to just add "log_path=/tmp/lsan/output" to
>> that, which should put all the logs in a convenient place (and stop the
>> extra output from confusing any tests which capture stderr). But I can't
>> seem to get log_path to do anything, contrary to the documentation.
>>
>> Doing log_to_syslog=1 does work for me. I'm not sure if it's a bug or
>> I'm holding it wrong. But it does seem like it should be possible to do
>> what you want.
>
> Hrm. log_path doesn't seem to work for me directly from LSAN_OPTIONS.
> But if you compile with ASan, it _does_ work from there. That seems like
> a bug from my reading of the documentation, but maybe I'm missing
> something.
>
> Anyway, doing:
>
>   ASAN_OPTIONS=detect_leaks=1:abort_on_error=0:exitcode=0:log_path=/tmp/lsan/output \
>   make SANITIZE=address,leak test
>
> should pass the whole suite and give you a host of files to analyze.

Thanks. My reading of the documentation was off. Turns out exitcode=0
does not set the exit code to 0, but rather turns it off. Duh. It also
didn't help that I tested with (my) gcc, which seems to have some
issues. With clang, things work fine. Thanks for looking into this.

> I'm not sure of the best way to count things.

Right. It's a tricky problem. And in the end, all we find out is where
we allocate and how we got there. Exactly where we lose/leak that piece
of allocated memory is another question...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-24  7:01     ` Junio C Hamano
@ 2017-09-24 20:00       ` Martin Ågren
  0 siblings, 0 replies; 27+ messages in thread
From: Martin Ågren @ 2017-09-24 20:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Jeff King

On 24 September 2017 at 09:01, Junio C Hamano <gitster@pobox.com> wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
>
>> Since Junio collected my "independent" patches into ma/leakplugs, this
>> is a re-roll of that whole topic. I got the first two patches from
>> Junio's tree. The only difference there is "builtin/" at the very start
>> of the first commit message.
>
> Thanks, all 6 look sensible.

Thanks for queuing and thanks for adding the empty line in patch 5.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-24 19:59             ` Martin Ågren
@ 2017-09-25 16:08               ` Jeff King
  2017-10-01 15:04                 ` Martin Ågren
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2017-09-25 16:08 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List

On Sun, Sep 24, 2017 at 09:59:28PM +0200, Martin Ågren wrote:

> > Anyway, doing:
> >
> >   ASAN_OPTIONS=detect_leaks=1:abort_on_error=0:exitcode=0:log_path=/tmp/lsan/output \
> >   make SANITIZE=address,leak test
> >
> > should pass the whole suite and give you a host of files to analyze.
> 
> Thanks. My reading of the documentation was off. Turns out exitcode=0
> does not set the exit code to 0, but rather turns it off. Duh.

Actually, the docs are quite confusing. The LSan "exitcode" option
defaults to 23, and that is the exit code you get. So I think it
probably is interpreted as the code to exit with, or 0 for "use the
original exit code".

> > I'm not sure of the best way to count things.
> 
> Right. It's a tricky problem. And in the end, all we find out is where
> we allocate and how we got there. Exactly where we lose/leak that piece
> of allocated memory is another question...

For hunting a particular trace, I think you have to walk up the list of
called functions and see where the pointers go out of scope. I'm not
sure how to make that easier (in theory a compiler-instrumentation like
LSan could do it by performing a leak-check when pointers go out of
scope. But it would also need to know about copies you've made of
pointers, so I imagine it would be extremely slow to run).

But at least on the topic of "how many unique leaks are there", I wrote
the script below to try to give some basic answers. It just finds the
first non-boring entry in each stack trace and reports that. Where
"boring" is really "this function is not expected to free, but hands off
memory ownership to somebody else".

You can use it to do:

  perl leaks.pl /tmp/lsan/output.* | sort | uniq -c | sort -rn | head

to see places that leak a lot. These are either boring calls that need
to be annotated, or are high-value targets for de-leaking.

I notice ref-filter.c has quite a few high entries on the list. I'm not
sure yet which case it falls into. :)

The other interesting thing is seeing how many "unique" leaks there are:

  perl leaks.pl /tmp/lsan/output.* | sort -u | wc -l

I get a bit over 800 with a run of the test suite. Which is a lot, but
fewer than I expected. And I'm sure quite a few of them are really
"duplicates" that can be eliminated in chunks.

So I don't know how useful any of that will be, but it at least should
give _some_ metric that should be diminishing as we fix leaks.

-- >8 --
#!/usr/bin/perl

my $boring = join('|',
  # These are allocation functions that get called from a lot of places.
  qw(
        __interceptor_strdup
        __interceptor_calloc
        realloc
        malloc
        xstrdup
        xcalloc
        strbuf_
        xmemdupz
        xstrvfmt
        xstrfmt
        xstrndup
  ),
  # These are really just the revision machinery not getting cleaned up;
  # for many we'd probably want to just UNLEAK() at the apex caller
  qw(
        add_rev_cmdline
        add_object_array_with_path
        add_pending_object_with_path
        add_pending_object_with_mode
        add_pending_object
        handle_revision_arg
        setup_revisions
  ),
  # More allocators that drop memory ownership
  qw(
        alloc_ref_with_prefix
        alloc_ref
        copy_ref
        commit_list_insert
        copy_pathspec
  ),
);
my $boring_re = qr/^$boring/;
my $skipping;

while (<>) {
  if (/^\s*#[0-9]+ 0x[0-9a-f]+ in (.*)/) {
    next if $skipping; # we already reported this trace
    next if $1 =~ $boring_re;

    print $1, "\n";
    $skipping = 1;
  } else {
    $skipping = 0;
  }
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes
  2017-09-25 16:08               ` Jeff King
@ 2017-10-01 15:04                 ` Martin Ågren
  0 siblings, 0 replies; 27+ messages in thread
From: Martin Ågren @ 2017-10-01 15:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On 25 September 2017 at 18:08, Jeff King <peff@peff.net> wrote:
> On Sun, Sep 24, 2017 at 09:59:28PM +0200, Martin Ågren wrote:
>
>> > I'm not sure of the best way to count things.
>>
>
> But at least on the topic of "how many unique leaks are there", I wrote
> the script below to try to give some basic answers. It just finds the
> first non-boring entry in each stack trace and reports that. Where
> "boring" is really "this function is not expected to free, but hands off
> memory ownership to somebody else".

Thanks. I combined your script with this:

-- >8 --
#!/usr/bin/perl -w

# Extract the stacktraces and identify them
# by their SHA hashes (these identifiers are
# not guaranteed to be stable across
# re-compilations of the Git binaries).

use Digest::SHA qw(sha1 sha1_hex);

my $ctx = Digest::SHA->new("SHA-1");
my $stage = 0;

while (<>) {
  my $collect = 0;
  if ($stage == 0 && /irect leak of \d+ byte.*allocated from:$/) {
    $stage++;
    $collect = 1;
  } elsif($stage == 1 && /^\s*\#\d+\s+/) {
    $collect = 1;
  } elsif ($stage == 1 && /^\s*$/) {
    $digest = $ctx->hexdigest;
    printf "Stacktrace-hash: %s\n", $digest;
    $ctx = Digest::SHA->new("SHA-1");
    $stage = 0;
  } elsif ($stage == 1) {
    print "warning: unidentified string '$_'\n";
  }
  if ($collect) {
    $ctx->add_bits($_);
    print;
  }
}
-- >8 --

Then I report various ad-hoc metrics:

-- >8 --
#!/bin/bash

for d in "$@"
do
  echo $d

  echo -n "  direct leaks: "
  grep "Direct leak" "$d"/* | wc -l

  echo -n "  indirect leaks: "
  grep "Indirect leak" "$d"/* | wc -l

  echo -n "  allocating places: "
  perl leaks.pl "$d"/* | sort -u | wc -l

  echo -n "  most common allocating place: "
  perl leaks.pl "$d"/* | sort \
       | uniq -c | sort -nr | head -1 | awk '{print $1;}'

  echo -n "  size of leak-reports: "
  cat "$d"/* | wc -l

  echo -n "  unique leaking stacktraces: "
  perl extract-traces.pl "$d"/* | grep "Stacktrace-hash" | sort -u | wc -l

  echo -n "  most common stacktrace: "
  perl extract-traces.pl "$d"/* | grep "Stacktrace-hash" | sort \
       | uniq -c | sort -nr | head -1 | awk '{print $1;}'
done
-- >8 --

If PIDs of leaking processes collide, reports are lost. Something like
this as root helps: `echo 4194303 > /proc/sys/kernel/pid_max`

Still, the numbers vary for back-to-back runs. Here are two runs on
master and two runs on master plus the lockfile-patches I just sent.
(I don't run all tests.)

lsan_ea220ee4
  direct leaks: 127165
  indirect leaks: 83897
  allocating places: 504
  most common allocating place: 10212
  size of leak-reports: 3662204
  unique leaking stacktraces: 83265
  most common stacktrace: 55
lsan_ea220ee4-rerun
  direct leaks: 127172
  indirect leaks: 83903
  allocating places: 504
  most common allocating place: 10212
  size of leak-reports: 3662334
  unique leaking stacktraces: 83644
  most common stacktrace: 57
lsan_ea220ee4+lockfile_fixes
  direct leaks: 118678
  indirect leaks: 83908
  allocating places: 493
  most common allocating place: 10212
  size of leak-reports: 3545563
  unique leaking stacktraces: 99834
  most common stacktrace: 32
lsan_ea220ee4+lockfile_fixes-rerun
  direct leaks: 118678
  indirect leaks: 83902
  allocating places: 491
  most common allocating place: 10212
  size of leak-reports: 3545463
  unique leaking stacktraces: 82171
  most common stacktrace: 40

> So I don't know how useful any of that will be, but it at least should
> give _some_ metric that should be diminishing as we fix leaks.

Indeed.

Martin

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-10-01 15:04 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 19:47 [PATCH] diff-lib: clear `pending` object-array in `index_differs_from()` Martin Ågren
2017-09-20 20:02 ` Jeff King
2017-09-21  3:56   ` Martin Ågren
2017-09-21  4:52     ` Jeff King
2017-09-22 23:34   ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Martin Ågren
2017-09-22 23:34     ` [PATCH v2 1/6] builtin/commit: fix memory leak in `prepare_index()` Martin Ågren
2017-09-22 23:34     ` [PATCH v2 2/6] commit: fix memory leak in `reduce_heads()` Martin Ågren
2017-09-22 23:34     ` [PATCH v2 3/6] leak_pending: use `object_array_clear()`, not `free()` Martin Ågren
2017-09-23  3:47       ` Jeff King
2017-09-22 23:34     ` [PATCH v2 4/6] object_array: " Martin Ågren
2017-09-23  4:04       ` Jeff King
2017-09-23  9:41         ` Martin Ågren
2017-09-22 23:34     ` [PATCH v2 5/6] object_array: add and use `object_array_pop()` Martin Ågren
2017-09-23  4:27       ` Jeff King
2017-09-23  9:49         ` Martin Ågren
2017-09-23 15:47           ` Jeff King
2017-09-22 23:34     ` [PATCH v2 6/6] pack-bitmap[-write]: use `object_array_clear()`, don't leak Martin Ågren
2017-09-23  4:35       ` Jeff King
2017-09-23  4:37     ` [PATCH v2 0/6] reroll ma/plugleaks; more `object_array`-fixes Jeff King
2017-09-23  9:54       ` Martin Ågren
2017-09-23 16:13         ` Jeff King
2017-09-23 16:38           ` Jeff King
2017-09-24 19:59             ` Martin Ågren
2017-09-25 16:08               ` Jeff King
2017-10-01 15:04                 ` Martin Ågren
2017-09-24  7:01     ` Junio C Hamano
2017-09-24 20:00       ` Martin Ågren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.