All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 3/3] dir: fix problematic API to avoid memory leaks
Date: Sun, 16 Aug 2020 05:11:54 -0400	[thread overview]
Message-ID: <20200816091154.GC1221900@coredump.intra.peff.net> (raw)
In-Reply-To: <b9310e9941e91336258edd97a913e5908180720e.1597561152.git.gitgitgadget@gmail.com>

On Sun, Aug 16, 2020 at 06:59:11AM +0000, Elijah Newren via GitGitGadget wrote:

> Digging further, I found that despite the pretty clear documentation
> near the top of dir.h that folks were supposed to call clear_directory()
> when the user no longer needed the dir_struct, there were four callers
> that didn't bother doing that at all.  However, two of them clearly
> thought about leaks since they had an UNLEAK(dir) directive, which to me
> suggests that the method to free the data was too unclear.  I suspect
> the non-obviousness of the API and its holes led folks to avoid it,
> which then snowballed into further problems with the entries[],
> ignored[], parent_hashmap, and recursive_hashmap problems.

The UNLEAK() ones are sort-of my fault, and are a combination of:

  - The commit adding them says "in some cases (e.g., dir_struct) we
    don't even have a function which knows how to free all of the struct
    members". I'm not sure if why I didn't fix clear_directory() as you
    did. I may not have known about it, or I may have been lazy. Or more
    likely I was simply holding the UNLEAK() hammer, so it looked like a
    nail. ;)

  - My focus was on making leak-checker output cleaner. And it wasn't
    clear for cases where we're about to exit the program whether we
    should be actually freeing (which has small but non-zero cost) or
    just annotating (which is zero-cost, but clearly has confused some
    people about how UNLEAK() was meant to be used). I think I'm leaning
    these days towards "free if it is easy to do so".

So this definitely seems like a good direction to me.

> Rename clear_directory() to dir_free() to be more in line with other
> data structures in git, and introduce a dir_init() to handle the
> suggested memsetting of dir_struct to all zeroes.  I hope that a name
> like "dir_free()" is more clear, and that the presence of dir_init()
> will provide a hint to those looking at the code that there may be a
> corresponding dir_free() that they need to call.

I think having a pair of matched calls is good. I _thought_ we had
established a pattern that we should prefer "clear" to "free" for cases
where the struct itself its not freed. But grepping around, I see there
are a few exceptions (hashmap_free() is the big one, and then
oidmap_free() which is built around it seems to have inherited it).

The rest seem to follow that pattern, though: attr_check_free,
cache_tree_free, and submodule_cache_free all actually free the pointer
passed in. And "git grep '_clear(' *.h" shows lots of
clear-but-don't-free examples.

Would dir_clear() make the pairing more obvious, but keep the clear
name?

(I also wouldn't be opposed to changing hashmap and oidmap to use the
name "clear", but that's obviously a separate patch).

>  builtin/add.c          |  4 ++--
>  builtin/check-ignore.c |  4 ++--
>  builtin/clean.c        |  8 ++++----
>  builtin/grep.c         |  3 ++-
>  builtin/ls-files.c     |  4 ++--
>  builtin/stash.c        |  4 ++--
>  dir.c                  |  7 ++++++-
>  dir.h                  | 19 ++++++++++---------
>  merge.c                |  3 ++-
>  wt-status.c            |  4 ++--
>  10 files changed, 34 insertions(+), 26 deletions(-)

That patch itself looks good except for two minor points:

>  /* Frees memory within dir which was allocated.  Does not free dir itself. */
> -void clear_directory(struct dir_struct *dir)
> +void dir_free(struct dir_struct *dir)
>  {
>  	int i, j;
>  	struct exclude_list_group *group;

As I mentioned in my previous email, I think it would be nice if this
called dir_init() at the end, so that the struct is always in a
consistent state.

> diff --git a/dir.h b/dir.h
> index 7d76d0644f..7c55c1a460 100644
> --- a/dir.h
> +++ b/dir.h
> [...]
> - * - Use `dir.entries[]`.
> + * - Use `dir.entries[]` and `dir.ignored[]`.
>   *
>   * - Call `clear_directory()` when the contained elements are no longer in use.
>   *

This last line should become dir_free() / dir_clear().

-Peff

  reply	other threads:[~2020-08-16  9:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-16  6:59 [PATCH 0/3] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-16  6:59 ` [PATCH 1/3] dir: fix leak of parent_hashmap and recursive_hashmap Elijah Newren via GitGitGadget
2020-08-16  8:43   ` Jeff King
2020-08-17 16:57     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 2/3] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-16  8:54   ` Jeff King
2020-08-17 16:58     ` Elijah Newren
2020-08-16  6:59 ` [PATCH 3/3] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-16  9:11   ` Jeff King [this message]
2020-08-17 17:19     ` Elijah Newren
2020-08-17 19:00       ` Jeff King
2020-08-18 22:58 ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 1/2] dir: make clear_directory() free all relevant memory Elijah Newren via GitGitGadget
2020-08-18 22:58   ` [PATCH v2 2/2] dir: fix problematic API to avoid memory leaks Elijah Newren via GitGitGadget
2020-08-19 13:51   ` [PATCH v2 0/2] Clean up some memory leaks in and around dir.c Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200816091154.GC1221900@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.