All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Clean up leaks in commit-graph.c
@ 2018-10-02 14:58 Derrick Stolee via GitGitGadget
  2018-10-02 14:58 ` [PATCH 1/2] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-10-02 14:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

While looking at the commit-graph code, I noticed some memory leaks. These
can be found by running

valgrind --leak-check=full ./git commit-graph write --reachable

The impact of these leaks are small, as we never call write_commit_graph
_reachable in a loop, but it is best to be diligent here.

While looking at memory consumption within write_commit_graph(), I noticed
that we initialize our oid list with "object count / 4", which seems to be a
huge over-count. I reduce this by a factor of eight.

I built off of ab/commit-graph-progress, because my patch involves lines
close to those changes.

Thanks, -Stolee

Derrick Stolee (2):
  commit-graph: clean up leaked memory during write
  commit-graph: reduce initial oid allocation

 commit-graph.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


base-commit: 6b89a34c89fc763292f06012318b852b74825619
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-42%2Fderrickstolee%2Fcommit-graph-leak-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-42/derrickstolee/commit-graph-leak-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/42
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 14:58 [PATCH 0/2] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
@ 2018-10-02 14:58 ` Derrick Stolee via GitGitGadget
  2018-10-02 15:40   ` Martin Ågren
  2018-10-02 14:58 ` [PATCH 2/2] commit-graph: reduce initial oid allocation Derrick Stolee via GitGitGadget
  2018-10-03 17:12 ` [PATCH v2 0/3] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
  2 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-10-02 14:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method in commit-graph.c leaks some lits
and strings during execution. In addition, a list of strings is
leaked in write_commit_graph_reachable(). Clean these up so our
memory checking is cleaner.

Running 'valgrind --leak-check=full git commit-graph write
--reachable' demonstrates these leaks and how they are fixed after
this change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2a24eb8b5a..7226bd6b58 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -698,6 +698,8 @@ void write_commit_graph_reachable(const char *obj_dir, int append,
 	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
 	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+
+	string_list_clear(&list, 0);
 }
 
 void write_commit_graph(const char *obj_dir,
@@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
-	if (safe_create_leading_directories(graph_name))
+	if (safe_create_leading_directories(graph_name)) {
+		UNLEAK(graph_name);
 		die_errno(_("unable to create leading directories of %s"),
 			  graph_name);
+	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
@@ -893,6 +897,8 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+	free(graph_name);
+	free(commits.list);
 	free(oids.list);
 	oids.alloc = 0;
 	oids.nr = 0;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] commit-graph: reduce initial oid allocation
  2018-10-02 14:58 [PATCH 0/2] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
  2018-10-02 14:58 ` [PATCH 1/2] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
@ 2018-10-02 14:58 ` Derrick Stolee via GitGitGadget
  2018-10-03 17:12 ` [PATCH v2 0/3] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 20+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-10-02 14:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While writing a commit-graph file, we store the full list of
commits in a flat list. We use this list for sorting and ensuring
we are closed under reachability.

The initial allocation assumed that (at most) one in four objects
is a commit. This is a dramatic over-count for many repos,
especially large ones. Since we grow the repo dynamically, reduce
this count by a factor of eight. We still set it to a minimum of
1024 before allocating.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7226bd6b58..a24cceb55f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -721,7 +721,7 @@ void write_commit_graph(const char *obj_dir,
 	struct progress *progress = NULL;
 
 	oids.nr = 0;
-	oids.alloc = approximate_object_count() / 4;
+	oids.alloc = approximate_object_count() / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 14:58 ` [PATCH 1/2] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
@ 2018-10-02 15:40   ` Martin Ågren
  2018-10-02 17:59     ` Stefan Beller
  0 siblings, 1 reply; 20+ messages in thread
From: Martin Ågren @ 2018-10-02 15:40 UTC (permalink / raw)
  To: gitgitgadget; +Cc: Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, 2 Oct 2018 at 17:01, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/commit-graph.c b/commit-graph.c
> index 2a24eb8b5a..7226bd6b58 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -698,6 +698,8 @@ void write_commit_graph_reachable(const char *obj_dir, int append,
>         string_list_init(&list, 1);
>         for_each_ref(add_ref_to_list, &list);
>         write_commit_graph(obj_dir, NULL, &list, append, report_progress);
> +
> +       string_list_clear(&list, 0);
>  }

Nit: The blank line adds some asymmetry, IMVHO.

>  void write_commit_graph(const char *obj_dir,
> @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
>         compute_generation_numbers(&commits, report_progress);
>
>         graph_name = get_commit_graph_filename(obj_dir);
> -       if (safe_create_leading_directories(graph_name))
> +       if (safe_create_leading_directories(graph_name)) {
> +               UNLEAK(graph_name);
>                 die_errno(_("unable to create leading directories of %s"),
>                           graph_name);
> +       }

Do you really need this hunk? In my testing with LeakSanitizer and
valgrind, I don't need this hunk to be leak-free. Generally speaking, it
seems impossible to UNLEAK when dying, since we don't know what we have
allocated higher up in the call-stack.

Without this hunk, this patch can have my

Reviewed-by: Martin Ågren <martin.agren@gmail.com>

as I've verified the leaks before and after. With this hunk, I am
puzzled and feel uneasy, both about having to UNLEAK before dying and
about having to UNLEAK outside of builtin/.

> +       free(graph_name);
> +       free(commits.list);
>         free(oids.list);
>         oids.alloc = 0;
>         oids.nr = 0;

Both `commits` and `oids` are on the stack here, so cleaning up one more
than the other is a bit asymmetrical. Also, if we try to zero the counts
-- which seems unnecessary to me, but which is not new with this patch --
we should perhaps use `FREE_AND_NULL` too. But personally, I would just
use `free` and leave `nr` and `alloc` at whatever values they happen to
have.

Martin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 15:40   ` Martin Ågren
@ 2018-10-02 17:59     ` Stefan Beller
  2018-10-02 19:08       ` Martin Ågren
  2018-10-02 22:37       ` [PATCH 1/2] commit-graph: clean up leaked memory during write Jeff King
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Beller @ 2018-10-02 17:59 UTC (permalink / raw)
  To: Martin Ågren; +Cc: gitgitgadget, git, Junio C Hamano, Derrick Stolee

On Tue, Oct 2, 2018 at 8:40 AM Martin Ågren <martin.agren@gmail.com> wrote:
>
> On Tue, 2 Oct 2018 at 17:01, Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > diff --git a/commit-graph.c b/commit-graph.c
> > index 2a24eb8b5a..7226bd6b58 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -698,6 +698,8 @@ void write_commit_graph_reachable(const char *obj_dir, int append,
> >         string_list_init(&list, 1);
> >         for_each_ref(add_ref_to_list, &list);
> >         write_commit_graph(obj_dir, NULL, &list, append, report_progress);
> > +
> > +       string_list_clear(&list, 0);
> >  }
>
> Nit: The blank line adds some asymmetry, IMVHO.

I think these blank lines are super common, as in:

    {
      declarations;

      multiple;
      lines(of);
      code;

      cleanup;
      and_frees;
    }

(c.f. display_table in column.c, which I admit to have
cherry-picked as an example).

While in nit territory, I would rather move the string list init
into the first block:

  {
    struct string_list list = STRING_LIST_INIT_DUP;

    for_each_ref(add_ref_to_list, &list);
    write_commit_graph(obj_dir, NULL, &list, append);

    string_list_clear(&list, 0);
  }




>
> >  void write_commit_graph(const char *obj_dir,
> > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
> >         compute_generation_numbers(&commits, report_progress);
> >
> >         graph_name = get_commit_graph_filename(obj_dir);
> > -       if (safe_create_leading_directories(graph_name))
> > +       if (safe_create_leading_directories(graph_name)) {
> > +               UNLEAK(graph_name);
> >                 die_errno(_("unable to create leading directories of %s"),
> >                           graph_name);
> > +       }
>
> Do you really need this hunk?

graph_name is produced via xstrfmt in get_commit_graph_filename,
so it needs to be free'd in any return/exit path.

> In my testing with LeakSanitizer and
> valgrind, I don't need this hunk to be leak-free.


> Generally speaking, it
> seems impossible to UNLEAK when dying, since we don't know what we have
> allocated higher up in the call-stack.

I do not understand; I thought UNLEAK was specifically for the purpose of
die() calls without imposing extra overhead; rereading 0e5bba53af
(add UNLEAK annotation for reducing leak false positives, 2017-09-08)
doesn't provide an example for prematurely die()ing, only for regular
program exit.

> Reviewed-by: Martin Ågren <martin.agren@gmail.com>
>
> as I've verified the leaks before and after. With this hunk, I am
> puzzled and feel uneasy, both about having to UNLEAK before dying and
> about having to UNLEAK outside of builtin/.

I am not uneasy about an UNLEAK before dying, but about dying outside
builtin/ in general (but having a die call accompanied by UNLEAK seems
to be the right thing). Can you explain the worries you have regarding the
allocations on the call stack, as xstrfmt is allocating on the heap and we
only UNLEAK the pointer to that?

Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 17:59     ` Stefan Beller
@ 2018-10-02 19:08       ` Martin Ågren
  2018-10-02 19:44         ` Stefan Beller
  2018-10-02 22:37       ` [PATCH 1/2] commit-graph: clean up leaked memory during write Jeff King
  1 sibling, 1 reply; 20+ messages in thread
From: Martin Ågren @ 2018-10-02 19:08 UTC (permalink / raw)
  To: Stefan Beller
  Cc: gitgitgadget, Git Mailing List, Junio C Hamano, Derrick Stolee

On Tue, 2 Oct 2018 at 19:59, Stefan Beller <sbeller@google.com> wrote:
> > > +
> > > +       string_list_clear(&list, 0);
> > >  }
> >
> > Nit: The blank line adds some asymmetry, IMVHO.
>
> I think these blank lines are super common, as in:
>
>     {
>       declarations;
>
>       multiple;
>       lines(of);
>       code;
>
>       cleanup;
>       and_frees;
>     }
>
> (c.f. display_table in column.c, which I admit to have
> cherry-picked as an example).
>
> While in nit territory, I would rather move the string list init
> into the first block:
>
>   {
>     struct string_list list = STRING_LIST_INIT_DUP;
>
>     for_each_ref(add_ref_to_list, &list);
>     write_commit_graph(obj_dir, NULL, &list, append);
>
>     string_list_clear(&list, 0);
>   }

Now this looks very symmetrical. :-)

> > >  void write_commit_graph(const char *obj_dir,
> > > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
> > >         compute_generation_numbers(&commits, report_progress);
> > >
> > >         graph_name = get_commit_graph_filename(obj_dir);
> > > -       if (safe_create_leading_directories(graph_name))
> > > +       if (safe_create_leading_directories(graph_name)) {
> > > +               UNLEAK(graph_name);
> > >                 die_errno(_("unable to create leading directories of %s"),
> > >                           graph_name);
> > > +       }
> >
> > Do you really need this hunk?
>
> graph_name is produced via xstrfmt in get_commit_graph_filename,
> so it needs to be free'd in any return/exit path.

Agreed. Although I am questioning that `die()` and its siblings count.

> > In my testing with LeakSanitizer and
> > valgrind, I don't need this hunk to be leak-free.
>
>
> > Generally speaking, it
> > seems impossible to UNLEAK when dying, since we don't know what we have
> > allocated higher up in the call-stack.
>
> I do not understand; I thought UNLEAK was specifically for the purpose of
> die() calls without imposing extra overhead; rereading 0e5bba53af
> (add UNLEAK annotation for reducing leak false positives, 2017-09-08)
> doesn't provide an example for prematurely die()ing, only for regular
> program exit.
>
> > [...] With this hunk, I am
> > puzzled and feel uneasy, both about having to UNLEAK before dying and
> > about having to UNLEAK outside of builtin/.
>
> I am not uneasy about an UNLEAK before dying, but about dying outside
> builtin/ in general

Yeah, not dying would be even better (out of scope for this patch).

> (but having a die call accompanied by UNLEAK seems
> to be the right thing). Can you explain the worries you have regarding the
> allocations on the call stack, as xstrfmt is allocating on the heap and we
> only UNLEAK the pointer to that?

I think we agree that leaking things "allocat[ed] on the call stack"
isn't much of a worry. The reason I mentioned the call stack is that
we've got any number of calls behind us on it, and we might have made
all sorts of allocations on the heap, and at this point, we have no
idea about what we should be UNLEAK-ing.

My worry is that one of these would seem to be true:

* UNLEAK is unsuitable for the job. Whenever we have a `die()` as we do
  here, we can UNLEAK the variables we know of, but we can't do anything
  about the allocations we have made higher up the call-chain. Our test
  suite obviously provokes lots of calls to `die()` -- imagine that each
  of those leaves a few leaked allocations behind. We'd have a semi-huge
  number of leaks being reported. While we could mark with UNLEAK to
  reduce that number, we wouldn't be able to bring the number of leaks
  down to anywhere near manageable where we'd be able to find the last
  few true positives.

* We add code with no purpose. In this case, we're not talking a lot of
  lines, but across the code base, if they bring no gain, they are bound
  to provide a negative net value given enough time.

Martin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 19:08       ` Martin Ågren
@ 2018-10-02 19:44         ` Stefan Beller
  2018-10-02 22:34           ` Jeff King
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2018-10-02 19:44 UTC (permalink / raw)
  To: Martin Ågren; +Cc: gitgitgadget, git, Junio C Hamano, Derrick Stolee

On Tue, Oct 2, 2018 at 12:09 PM Martin Ågren <martin.agren@gmail.com> wrote:
>
> On Tue, 2 Oct 2018 at 19:59, Stefan Beller <sbeller@google.com> wrote:
> > > > +
> > > > +       string_list_clear(&list, 0);
> > > >  }
> > >
> > > Nit: The blank line adds some asymmetry, IMVHO.
> >
> > I think these blank lines are super common, as in:
> >
> >     {
> >       declarations;
> >
> >       multiple;
> >       lines(of);
> >       code;
> >
> >       cleanup;
> >       and_frees;
> >     }
> >
> > (c.f. display_table in column.c, which I admit to have
> > cherry-picked as an example).
> >
> > While in nit territory, I would rather move the string list init
> > into the first block:
> >
> >   {
> >     struct string_list list = STRING_LIST_INIT_DUP;
> >
> >     for_each_ref(add_ref_to_list, &list);
> >     write_commit_graph(obj_dir, NULL, &list, append);
> >
> >     string_list_clear(&list, 0);
> >   }
>
> Now this looks very symmetrical. :-)
>
> > > >  void write_commit_graph(const char *obj_dir,
> > > > @@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
> > > >         compute_generation_numbers(&commits, report_progress);
> > > >
> > > >         graph_name = get_commit_graph_filename(obj_dir);
> > > > -       if (safe_create_leading_directories(graph_name))
> > > > +       if (safe_create_leading_directories(graph_name)) {
> > > > +               UNLEAK(graph_name);
> > > >                 die_errno(_("unable to create leading directories of %s"),
> > > >                           graph_name);
> > > > +       }
> > >
> > > Do you really need this hunk?
> >
> > graph_name is produced via xstrfmt in get_commit_graph_filename,
> > so it needs to be free'd in any return/exit path.
>
> Agreed. Although I am questioning that `die()` and its siblings count.
>
> > > In my testing with LeakSanitizer and
> > > valgrind, I don't need this hunk to be leak-free.
> >
> >
> > > Generally speaking, it
> > > seems impossible to UNLEAK when dying, since we don't know what we have
> > > allocated higher up in the call-stack.
> >
> > I do not understand; I thought UNLEAK was specifically for the purpose of
> > die() calls without imposing extra overhead; rereading 0e5bba53af
> > (add UNLEAK annotation for reducing leak false positives, 2017-09-08)
> > doesn't provide an example for prematurely die()ing, only for regular
> > program exit.
> >
> > > [...] With this hunk, I am
> > > puzzled and feel uneasy, both about having to UNLEAK before dying and
> > > about having to UNLEAK outside of builtin/.
> >
> > I am not uneasy about an UNLEAK before dying, but about dying outside
> > builtin/ in general
>
> Yeah, not dying would be even better (out of scope for this patch).
>
> > (but having a die call accompanied by UNLEAK seems
> > to be the right thing). Can you explain the worries you have regarding the
> > allocations on the call stack, as xstrfmt is allocating on the heap and we
> > only UNLEAK the pointer to that?
>
> I think we agree that leaking things "allocat[ed] on the call stack"
> isn't much of a worry. The reason I mentioned the call stack is that
> we've got any number of calls behind us on it, and we might have made
> all sorts of allocations on the heap, and at this point, we have no
> idea about what we should be UNLEAK-ing.

Wouldn't that be the responsibility of each function to make sure things
are UNLEAK'd or free'd before the function is either over or stopped
intermittently (by a subroutine dying) ?

In an ideal world we'd only ever exit/die in the functions high up
the call chain (which are in builtin/) and all other code would gracefully
return error codes or messages instead or even cope with some failure
conditions?

> My worry is that one of these would seem to be true:
>
> * UNLEAK is unsuitable for the job. Whenever we have a `die()` as we do
>   here, we can UNLEAK the variables we know of, but we can't do anything
>   about the allocations we have made higher up the call-chain.

IMHO that is the issue of the functions higher up the call chain and ought
to not affect this patch. By doing the right thing here locally the code base
will approach a good state eventually.

> Our test
>   suite obviously provokes lots of calls to `die()` -- imagine that each
>   of those leaves a few leaked allocations behind. We'd have a semi-huge
>   number of leaks being reported. While we could mark with UNLEAK to
>   reduce that number, we wouldn't be able to bring the number of leaks
>   down to anywhere near manageable where we'd be able to find the last
>   few true positives.

Makes sense.

> * We add code with no purpose. In this case, we're not talking a lot of
>   lines, but across the code base, if they bring no gain, they are bound
>   to provide a negative net value given enough time.

I see. I did not estimate its negative impact to be high enough, as the
UNLEAK near a die() call was obvious good thing (locally).

I don't know what the best way to proceed is in this case.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 19:44         ` Stefan Beller
@ 2018-10-02 22:34           ` Jeff King
  2018-10-02 22:44             ` Stefan Beller
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff King @ 2018-10-02 22:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Martin Ågren, gitgitgadget, git, Junio C Hamano, Derrick Stolee

On Tue, Oct 02, 2018 at 12:44:09PM -0700, Stefan Beller wrote:

> > My worry is that one of these would seem to be true:
> >
> > * UNLEAK is unsuitable for the job. Whenever we have a `die()` as we do
> >   here, we can UNLEAK the variables we know of, but we can't do anything
> >   about the allocations we have made higher up the call-chain.
> 
> IMHO that is the issue of the functions higher up the call chain and ought
> to not affect this patch. By doing the right thing here locally the code base
> will approach a good state eventually.

But it's impossible. If I do this:

  foo = xstrdup(bar);
  subfunction(foo);

then I cannot protect myself from leaking "foo" when subfunction() calls
die(). It must be valid when I enter the function, and I have no
opportunity to run code when it leaves (because it never does).

> > * We add code with no purpose. In this case, we're not talking a lot of
> >   lines, but across the code base, if they bring no gain, they are bound
> >   to provide a negative net value given enough time.
> 
> I see. I did not estimate its negative impact to be high enough, as the
> UNLEAK near a die() call was obvious good thing (locally).
> 
> I don't know what the best way to proceed is in this case.

My preference is to avoid them in the name of simplicity. If you're
using "make SANITIZE=leak test" to check for leaks, it will skip these
cases. If you're using valgrind, I think these may be reported as
"reachable". But that number already isn't useful for finding real
leaks, because it includes cases like the "foo" above as well as
program-lifetime globals.

The only argument (IMHO) for such an UNLEAK() is that it annotates the
location for when somebody later changes the function to "return -1"
instead of dying. But if we are going to do such annotation, we may as
well actually call free(), which is what the "return" version will
ultimately have to do.

I'd actually be _more_ favorable to calling free() instead of UNLEAK()
there, but I'm still mildly negative, just because it may go stale (and
our leak-checking wouldn't usefully notice these cases). Anybody
converting that die() to a return needs to re-analyze the function for
what might need to be released (and that includes non-memory bits like
descriptors, too).

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 17:59     ` Stefan Beller
  2018-10-02 19:08       ` Martin Ågren
@ 2018-10-02 22:37       ` Jeff King
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff King @ 2018-10-02 22:37 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Martin Ågren, gitgitgadget, git, Junio C Hamano, Derrick Stolee

On Tue, Oct 02, 2018 at 10:59:28AM -0700, Stefan Beller wrote:

> > Generally speaking, it
> > seems impossible to UNLEAK when dying, since we don't know what we have
> > allocated higher up in the call-stack.
> 
> I do not understand; I thought UNLEAK was specifically for the purpose of
> die() calls without imposing extra overhead; rereading 0e5bba53af
> (add UNLEAK annotation for reducing leak false positives, 2017-09-08)
> doesn't provide an example for prematurely die()ing, only for regular
> program exit.

I responded elsewhere, but as the author of UNLEAK, let me comment here:
it was intended only for program exit. That's why there are no such
examples. :)

If you're using it anywhere except the return from a cmd_* function, or
a static-local helper that's called from a cmd_*, you should probably
actually be freeing the memory.

-Peff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 22:34           ` Jeff King
@ 2018-10-02 22:44             ` Stefan Beller
  2018-10-03 12:04               ` Derrick Stolee
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Beller @ 2018-10-02 22:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Martin Ågren, gitgitgadget, git, Junio C Hamano, Derrick Stolee

>
> My preference is to avoid them in the name of simplicity. If you're
> using "make SANITIZE=leak test" to check for leaks, it will skip these
> cases. If you're using valgrind, I think these may be reported as
> "reachable". But that number already isn't useful for finding real
> leaks, because it includes cases like the "foo" above as well as
> program-lifetime globals.
>
> The only argument (IMHO) for such an UNLEAK() is that it annotates the
> location for when somebody later changes the function to "return -1"
> instead of dying. But if we are going to do such annotation, we may as
> well actually call free(), which is what the "return" version will
> ultimately have to do.

Heh, that was part of my reasoning why we'd want to have *something*.

> I'd actually be _more_ favorable to calling free() instead of UNLEAK()
> there, but I'm still mildly negative, just because it may go stale (and
> our leak-checking wouldn't usefully notice these cases). Anybody
> converting that die() to a return needs to re-analyze the function for
> what might need to be released (and that includes non-memory bits like
> descriptors, too).

Sounds reasonable, so then the consensus (between Martin, you and me)
is to drop the UNLEAK.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] commit-graph: clean up leaked memory during write
  2018-10-02 22:44             ` Stefan Beller
@ 2018-10-03 12:04               ` Derrick Stolee
  2018-10-03 15:36                 ` [PATCH 0/2] commit-graph: more leak fixes Martin Ågren
  0 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2018-10-03 12:04 UTC (permalink / raw)
  To: Stefan Beller, Jeff King
  Cc: Martin Ågren, gitgitgadget, git, Junio C Hamano, Derrick Stolee

On 10/2/2018 6:44 PM, Stefan Beller wrote:
>> My preference is to avoid them in the name of simplicity. If you're
>> using "make SANITIZE=leak test" to check for leaks, it will skip these
>> cases. If you're using valgrind, I think these may be reported as
>> "reachable". But that number already isn't useful for finding real
>> leaks, because it includes cases like the "foo" above as well as
>> program-lifetime globals.
>>
>> The only argument (IMHO) for such an UNLEAK() is that it annotates the
>> location for when somebody later changes the function to "return -1"
>> instead of dying. But if we are going to do such annotation, we may as
>> well actually call free(), which is what the "return" version will
>> ultimately have to do.
> Heh, that was part of my reasoning why we'd want to have *something*.
>
>> I'd actually be _more_ favorable to calling free() instead of UNLEAK()
>> there, but I'm still mildly negative, just because it may go stale (and
>> our leak-checking wouldn't usefully notice these cases). Anybody
>> converting that die() to a return needs to re-analyze the function for
>> what might need to be released (and that includes non-memory bits like
>> descriptors, too).
> Sounds reasonable, so then the consensus (between Martin, you and me)
> is to drop the UNLEAK.
Thanks for the discussion here. I'll drop the UNLEAK for now and think 
about how to remove the die() calls from commit-graph.c in a later series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/2] commit-graph: more leak fixes
  2018-10-03 12:04               ` Derrick Stolee
@ 2018-10-03 15:36                 ` Martin Ågren
  2018-10-03 15:36                   ` [PATCH 1/2] commit-graph: free `struct packed_git` after closing it Martin Ågren
                                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Martin Ågren @ 2018-10-03 15:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Stefan Beller, Jeff King, gitgitgadget, git, Junio C Hamano,
	Derrick Stolee

Hi Derrick,

These two patches on top of yours make the test suite (i.e., the subset
of it that I run) leak-free with respect to builtin/commit-graph.c and
commit-graph.c.

The first could be squashed into your patch 1/2. It touches the same
function, but it requires a different usage to trigger, so squashing it
in would require broadening the scope. I understand if you don't want to
do that.

If you want to pick these up as part of your re-roll in any way, shape
or form, go ahead. If not, they can go in separately, either in parallel
or after your series lands. Whatever the destiny of this posting, I'll
follow through as appropriate.

Martin

Martin Ågren (2):
  commit-graph: free `struct packed_git` after closing it
  builtin/commit-graph.c: UNLEAK variables

 builtin/commit-graph.c | 11 ++++++-----
 commit-graph.c         |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/2] commit-graph: free `struct packed_git` after closing it
  2018-10-03 15:36                 ` [PATCH 0/2] commit-graph: more leak fixes Martin Ågren
@ 2018-10-03 15:36                   ` Martin Ågren
  2018-10-03 15:36                   ` [PATCH 2/2] builtin/commit-graph.c: UNLEAK variables Martin Ågren
  2018-10-03 16:19                   ` [PATCH 0/2] commit-graph: more leak fixes Derrick Stolee
  2 siblings, 0 replies; 20+ messages in thread
From: Martin Ågren @ 2018-10-03 15:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Stefan Beller, Jeff King, gitgitgadget, git, Junio C Hamano,
	Derrick Stolee

`close_pack(p)` does not free the memory which `p` points to, so follow
up with a call to `free(p)`. All other users of `close_pack()` look ok.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 commit-graph.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/commit-graph.c b/commit-graph.c
index 3d644fddc0..9b481bcd06 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -766,6 +766,7 @@ void write_commit_graph(const char *obj_dir,
 				die(_("error opening index for %s"), packname.buf);
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
+			free(p);
 		}
 		stop_progress(&oids.progress);
 		strbuf_release(&packname);
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] builtin/commit-graph.c: UNLEAK variables
  2018-10-03 15:36                 ` [PATCH 0/2] commit-graph: more leak fixes Martin Ågren
  2018-10-03 15:36                   ` [PATCH 1/2] commit-graph: free `struct packed_git` after closing it Martin Ågren
@ 2018-10-03 15:36                   ` Martin Ågren
  2018-10-03 16:19                   ` [PATCH 0/2] commit-graph: more leak fixes Derrick Stolee
  2 siblings, 0 replies; 20+ messages in thread
From: Martin Ågren @ 2018-10-03 15:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Stefan Beller, Jeff King, gitgitgadget, git, Junio C Hamano,
	Derrick Stolee

`graph_verify()`, `graph_read()` and `graph_write()` do the hard work of
`cmd_commit_graph()`. As soon as these return, so does
`cmd_commit_graph()`.

`strbuf_getline()` may allocate memory in the strbuf, yet return EOF.
We need to release the strbuf or UNLEAK it. Go for the latter since we
are close to returning from `graph_write()`.

`graph_write()` also fails to free the strings in the string list. They
have been added to the list with `strdup_strings` set to 0. We could
flip `strdup_strings` before clearing the list, which is our usual hack
in situations like this. But since we are about to exit, let's just
UNLEAK the whole string list instead.

UNLEAK `graph` in `graph_verify`. While at it, and for consistency,
UNLEAK in `graph_read()` as well, and remove an unnecessary UNLEAK just
before dying.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 builtin/commit-graph.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index bc0fa9ba52..66f12eb009 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -64,6 +64,7 @@ static int graph_verify(int argc, const char **argv)
 	if (!graph)
 		return 0;
 
+	UNLEAK(graph);
 	return verify_commit_graph(the_repository, graph);
 }
 
@@ -89,10 +90,8 @@ static int graph_read(int argc, const char **argv)
 	graph_name = get_commit_graph_filename(opts.obj_dir);
 	graph = load_commit_graph_one(graph_name);
 
-	if (!graph) {
-		UNLEAK(graph_name);
+	if (!graph)
 		die("graph file %s does not exist", graph_name);
-	}
 
 	FREE_AND_NULL(graph_name);
 
@@ -115,7 +114,7 @@ static int graph_read(int argc, const char **argv)
 		printf(" large_edges");
 	printf("\n");
 
-	free_commit_graph(graph);
+	UNLEAK(graph);
 
 	return 0;
 }
@@ -166,6 +165,8 @@ static int graph_write(int argc, const char **argv)
 			pack_indexes = &lines;
 		if (opts.stdin_commits)
 			commit_hex = &lines;
+
+		UNLEAK(buf);
 	}
 
 	write_commit_graph(opts.obj_dir,
@@ -174,7 +175,7 @@ static int graph_write(int argc, const char **argv)
 			   opts.append,
 			   1);
 
-	string_list_clear(&lines, 0);
+	UNLEAK(lines);
 	return 0;
 }
 
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] commit-graph: more leak fixes
  2018-10-03 15:36                 ` [PATCH 0/2] commit-graph: more leak fixes Martin Ågren
  2018-10-03 15:36                   ` [PATCH 1/2] commit-graph: free `struct packed_git` after closing it Martin Ågren
  2018-10-03 15:36                   ` [PATCH 2/2] builtin/commit-graph.c: UNLEAK variables Martin Ågren
@ 2018-10-03 16:19                   ` Derrick Stolee
  2018-10-03 16:24                     ` Martin Ågren
  2 siblings, 1 reply; 20+ messages in thread
From: Derrick Stolee @ 2018-10-03 16:19 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Stefan Beller, Jeff King, gitgitgadget, git, Junio C Hamano,
	Derrick Stolee

On 10/3/2018 11:36 AM, Martin Ågren wrote:
> Hi Derrick,
>
> These two patches on top of yours make the test suite (i.e., the subset
> of it that I run) leak-free with respect to builtin/commit-graph.c and
> commit-graph.c.

Thanks!

> The first could be squashed into your patch 1/2. It touches the same
> function, but it requires a different usage to trigger, so squashing it
> in would require broadening the scope. I understand if you don't want to
> do that.
I'm fine with squashing it in with both our sign-offs. It is the same 
idea, it just requires a different set of arguments to hit it. I'll 
adjust the commit message as necessary.

> If you want to pick these up as part of your re-roll in any way, shape
> or form, go ahead. If not, they can go in separately, either in parallel
> or after your series lands. Whatever the destiny of this posting, I'll
> follow through as appropriate.

I'll add your PATCH 2/2 to my v2. Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/2] commit-graph: more leak fixes
  2018-10-03 16:19                   ` [PATCH 0/2] commit-graph: more leak fixes Derrick Stolee
@ 2018-10-03 16:24                     ` Martin Ågren
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Ågren @ 2018-10-03 16:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Stefan Beller, Jeff King, gitgitgadget, Git Mailing List,
	Junio C Hamano, Derrick Stolee

On Wed, 3 Oct 2018 at 18:19, Derrick Stolee <stolee@gmail.com> wrote:
> I'm fine with squashing it in with both our sign-offs. It is the same
> idea, it just requires a different set of arguments to hit it. I'll
> adjust the commit message as necessary.

> I'll add your PATCH 2/2 to my v2. Thanks!

Cool, thanks a lot.

Martin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/3] Clean up leaks in commit-graph.c
  2018-10-02 14:58 [PATCH 0/2] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
  2018-10-02 14:58 ` [PATCH 1/2] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
  2018-10-02 14:58 ` [PATCH 2/2] commit-graph: reduce initial oid allocation Derrick Stolee via GitGitGadget
@ 2018-10-03 17:12 ` Derrick Stolee via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 1/3] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
                     ` (2 more replies)
  2 siblings, 3 replies; 20+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-10-03 17:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

While looking at the commit-graph code, I noticed some memory leaks. These
can be found by running

valgrind --leak-check=full ./git commit-graph write --reachable

The impact of these leaks are small, as we never call write_commit_graph
_reachable in a loop, but it is best to be diligent here.

While looking at memory consumption within write_commit_graph(), I noticed
that we initialize our oid list with "object count / 4", which seems to be a
huge over-count. I reduce this by a factor of eight.

I built off of ab/commit-graph-progress, because my patch involves lines
close to those changes.

V2 includes feedback from V1 along with Martin's additional patches.

Thanks, -Stolee

Derrick Stolee (2):
  commit-graph: clean up leaked memory during write
  commit-graph: reduce initial oid allocation

Martin Ågren (1):
  builtin/commit-graph.c: UNLEAK variables

 builtin/commit-graph.c | 11 ++++++-----
 commit-graph.c         | 16 ++++++++++------
 2 files changed, 16 insertions(+), 11 deletions(-)


base-commit: 6b89a34c89fc763292f06012318b852b74825619
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-42%2Fderrickstolee%2Fcommit-graph-leak-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-42/derrickstolee/commit-graph-leak-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/42

Range-diff vs v1:

 1:  6906c25415 ! 1:  ba65680b3d commit-graph: clean up leaked memory during write
     @@ -7,17 +7,29 @@
          leaked in write_commit_graph_reachable(). Clean these up so our
          memory checking is cleaner.
      
     -    Running 'valgrind --leak-check=full git commit-graph write
     -    --reachable' demonstrates these leaks and how they are fixed after
     -    this change.
     +    Further, if we use a list of pack-files to find the commits, we
     +    can leak the packed_git structs after scanning them for commits.
      
     +    Running the following commands demonstrates the leak before and
     +    the fix after:
     +
     +    * valgrind --leak-check=full ./git commit-graph write --reachable
     +    * valgrind --leak-check=full ./git commit-graph write --stdin-packs
     +
     +    Signed-off-by: Martin Ågren <martin.agren@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
      diff --git a/commit-graph.c b/commit-graph.c
      --- a/commit-graph.c
      +++ b/commit-graph.c
      @@
     - 	string_list_init(&list, 1);
     + void write_commit_graph_reachable(const char *obj_dir, int append,
     + 				  int report_progress)
     + {
     +-	struct string_list list;
     ++	struct string_list list = STRING_LIST_INIT_DUP;
     + 
     +-	string_list_init(&list, 1);
       	for_each_ref(add_ref_to_list, &list);
       	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
      +
     @@ -25,6 +37,14 @@
       }
       
       void write_commit_graph(const char *obj_dir,
     +@@
     + 				die(_("error opening index for %s"), packname.buf);
     + 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
     + 			close_pack(p);
     ++			free(p);
     + 		}
     + 		stop_progress(&oids.progress);
     + 		strbuf_release(&packname);
      @@
       	compute_generation_numbers(&commits, report_progress);
       
     @@ -45,5 +65,8 @@
      +	free(graph_name);
      +	free(commits.list);
       	free(oids.list);
     - 	oids.alloc = 0;
     - 	oids.nr = 0;
     +-	oids.alloc = 0;
     +-	oids.nr = 0;
     + }
     + 
     + #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
 -:  ---------- > 2:  13032d8475 builtin/commit-graph.c: UNLEAK variables
 2:  e29a0eaf03 = 3:  1002fd34fc commit-graph: reduce initial oid allocation

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/3] commit-graph: clean up leaked memory during write
  2018-10-03 17:12 ` [PATCH v2 0/3] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
@ 2018-10-03 17:12   ` Derrick Stolee via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 2/3] builtin/commit-graph.c: UNLEAK variables Martin Ågren via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 3/3] commit-graph: reduce initial oid allocation Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 20+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-10-03 17:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The write_commit_graph() method in commit-graph.c leaks some lits
and strings during execution. In addition, a list of strings is
leaked in write_commit_graph_reachable(). Clean these up so our
memory checking is cleaner.

Further, if we use a list of pack-files to find the commits, we
can leak the packed_git structs after scanning them for commits.

Running the following commands demonstrates the leak before and
the fix after:

* valgrind --leak-check=full ./git commit-graph write --reachable
* valgrind --leak-check=full ./git commit-graph write --stdin-packs

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 2a24eb8b5a..ceca6026b0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -693,11 +693,12 @@ static int add_ref_to_list(const char *refname,
 void write_commit_graph_reachable(const char *obj_dir, int append,
 				  int report_progress)
 {
-	struct string_list list;
+	struct string_list list = STRING_LIST_INIT_DUP;
 
-	string_list_init(&list, 1);
 	for_each_ref(add_ref_to_list, &list);
 	write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+
+	string_list_clear(&list, 0);
 }
 
 void write_commit_graph(const char *obj_dir,
@@ -764,6 +765,7 @@ void write_commit_graph(const char *obj_dir,
 				die(_("error opening index for %s"), packname.buf);
 			for_each_object_in_pack(p, add_packed_commits, &oids, 0);
 			close_pack(p);
+			free(p);
 		}
 		stop_progress(&oids.progress);
 		strbuf_release(&packname);
@@ -846,9 +848,11 @@ void write_commit_graph(const char *obj_dir,
 	compute_generation_numbers(&commits, report_progress);
 
 	graph_name = get_commit_graph_filename(obj_dir);
-	if (safe_create_leading_directories(graph_name))
+	if (safe_create_leading_directories(graph_name)) {
+		UNLEAK(graph_name);
 		die_errno(_("unable to create leading directories of %s"),
 			  graph_name);
+	}
 
 	hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
@@ -893,9 +897,9 @@ void write_commit_graph(const char *obj_dir,
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
+	free(graph_name);
+	free(commits.list);
 	free(oids.list);
-	oids.alloc = 0;
-	oids.nr = 0;
 }
 
 #define VERIFY_COMMIT_GRAPH_ERROR_HASH 2
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] builtin/commit-graph.c: UNLEAK variables
  2018-10-03 17:12 ` [PATCH v2 0/3] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 1/3] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
@ 2018-10-03 17:12   ` Martin Ågren via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 3/3] commit-graph: reduce initial oid allocation Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 20+ messages in thread
From: Martin Ågren via GitGitGadget @ 2018-10-03 17:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

From: =?UTF-8?q?Martin=20=C3=85gren?= <martin.agren@gmail.com>

`graph_verify()`, `graph_read()` and `graph_write()` do the hard work of
`cmd_commit_graph()`. As soon as these return, so does
`cmd_commit_graph()`.

`strbuf_getline()` may allocate memory in the strbuf, yet return EOF.
We need to release the strbuf or UNLEAK it. Go for the latter since we
are close to returning from `graph_write()`.

`graph_write()` also fails to free the strings in the string list. They
have been added to the list with `strdup_strings` set to 0. We could
flip `strdup_strings` before clearing the list, which is our usual hack
in situations like this. But since we are about to exit, let's just
UNLEAK the whole string list instead.

UNLEAK `graph` in `graph_verify`. While at it, and for consistency,
UNLEAK in `graph_read()` as well, and remove an unnecessary UNLEAK just
before dying.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/commit-graph.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index bc0fa9ba52..66f12eb009 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -64,6 +64,7 @@ static int graph_verify(int argc, const char **argv)
 	if (!graph)
 		return 0;
 
+	UNLEAK(graph);
 	return verify_commit_graph(the_repository, graph);
 }
 
@@ -89,10 +90,8 @@ static int graph_read(int argc, const char **argv)
 	graph_name = get_commit_graph_filename(opts.obj_dir);
 	graph = load_commit_graph_one(graph_name);
 
-	if (!graph) {
-		UNLEAK(graph_name);
+	if (!graph)
 		die("graph file %s does not exist", graph_name);
-	}
 
 	FREE_AND_NULL(graph_name);
 
@@ -115,7 +114,7 @@ static int graph_read(int argc, const char **argv)
 		printf(" large_edges");
 	printf("\n");
 
-	free_commit_graph(graph);
+	UNLEAK(graph);
 
 	return 0;
 }
@@ -166,6 +165,8 @@ static int graph_write(int argc, const char **argv)
 			pack_indexes = &lines;
 		if (opts.stdin_commits)
 			commit_hex = &lines;
+
+		UNLEAK(buf);
 	}
 
 	write_commit_graph(opts.obj_dir,
@@ -174,7 +175,7 @@ static int graph_write(int argc, const char **argv)
 			   opts.append,
 			   1);
 
-	string_list_clear(&lines, 0);
+	UNLEAK(lines);
 	return 0;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] commit-graph: reduce initial oid allocation
  2018-10-03 17:12 ` [PATCH v2 0/3] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 1/3] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
  2018-10-03 17:12   ` [PATCH v2 2/3] builtin/commit-graph.c: UNLEAK variables Martin Ågren via GitGitGadget
@ 2018-10-03 17:12   ` Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 20+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-10-03 17:12 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

While writing a commit-graph file, we store the full list of
commits in a flat list. We use this list for sorting and ensuring
we are closed under reachability.

The initial allocation assumed that (at most) one in four objects
is a commit. This is a dramatic over-count for many repos,
especially large ones. Since we grow the repo dynamically, reduce
this count by a factor of eight. We still set it to a minimum of
1024 before allocating.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index ceca6026b0..e773703e1d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -720,7 +720,7 @@ void write_commit_graph(const char *obj_dir,
 	struct progress *progress = NULL;
 
 	oids.nr = 0;
-	oids.alloc = approximate_object_count() / 4;
+	oids.alloc = approximate_object_count() / 32;
 	oids.progress = NULL;
 	oids.progress_done = 0;
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-10-03 17:12 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02 14:58 [PATCH 0/2] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
2018-10-02 14:58 ` [PATCH 1/2] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
2018-10-02 15:40   ` Martin Ågren
2018-10-02 17:59     ` Stefan Beller
2018-10-02 19:08       ` Martin Ågren
2018-10-02 19:44         ` Stefan Beller
2018-10-02 22:34           ` Jeff King
2018-10-02 22:44             ` Stefan Beller
2018-10-03 12:04               ` Derrick Stolee
2018-10-03 15:36                 ` [PATCH 0/2] commit-graph: more leak fixes Martin Ågren
2018-10-03 15:36                   ` [PATCH 1/2] commit-graph: free `struct packed_git` after closing it Martin Ågren
2018-10-03 15:36                   ` [PATCH 2/2] builtin/commit-graph.c: UNLEAK variables Martin Ågren
2018-10-03 16:19                   ` [PATCH 0/2] commit-graph: more leak fixes Derrick Stolee
2018-10-03 16:24                     ` Martin Ågren
2018-10-02 22:37       ` [PATCH 1/2] commit-graph: clean up leaked memory during write Jeff King
2018-10-02 14:58 ` [PATCH 2/2] commit-graph: reduce initial oid allocation Derrick Stolee via GitGitGadget
2018-10-03 17:12 ` [PATCH v2 0/3] Clean up leaks in commit-graph.c Derrick Stolee via GitGitGadget
2018-10-03 17:12   ` [PATCH v2 1/3] commit-graph: clean up leaked memory during write Derrick Stolee via GitGitGadget
2018-10-03 17:12   ` [PATCH v2 2/3] builtin/commit-graph.c: UNLEAK variables Martin Ågren via GitGitGadget
2018-10-03 17:12   ` [PATCH v2 3/3] commit-graph: reduce initial oid allocation Derrick Stolee via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.