git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] UNLEAK style fixes
@ 2020-08-13 15:54 Jeff King
  2020-08-13 15:55 ` [PATCH 1/2] stop calling UNLEAK() before die() Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jeff King @ 2020-08-13 15:54 UTC (permalink / raw)
  To: git

Although we introduced UNLEAK() long ago, I don't know that anybody has
really made a concerted effort to annotate enough variables to make
running a leak-checker useful. So I haven't paid too much attention to
its use.

But a few people have added some annotations, and I think some of them
aren't great examples. So I decided to clean them up. This by definition
has no impact on regular builds (since UNLEAK is a noop there), but even
in leak-checking builds should give no behavior change.

Another category that I was tempted to change is when variables _could_
be freed, but we just don't bother to do so. E.g., at the end of
bugreport.c, we have:

  UNLEAK(buffer);
  UNLEAK(report_path);
  return !!launch_editor(report_path.buf, NULL, NULL);

Using UNLEAK(report_path) makes sense; we can't free it because we're
passing it to a function that runs until program end. But we _could_
free "buffer" here, which isn't otherwise used again (i.e., that could
be strbuf_release() instead of UNLEAK).

But that does have a run-time cost (we'd actually free the memory, even
though we could just exit and let the OS handle it). My guess is that
it's not a measurable cost, and the code might be cleaner to actually
clean up instead of sprinkling more UNLEAKs around. But until we're
actually pushing forward with a real effort to get a leak-checker
running clean, I don't see much point in doing one or the other.

(As a side note, if we want to declare UNLEAK() a failure because nobody
cares enough to really use it, I'm OK with that, too).

  [1/2]: stop calling UNLEAK() before die()
  [2/2]: ls-remote: simplify UNLEAK() usage

 bugreport.c         | 4 +---
 builtin/ls-remote.c | 8 +++-----
 midx.c              | 8 ++------
 3 files changed, 6 insertions(+), 14 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] stop calling UNLEAK() before die()
  2020-08-13 15:54 [PATCH 0/2] UNLEAK style fixes Jeff King
@ 2020-08-13 15:55 ` Jeff King
  2020-08-13 18:08   ` Derrick Stolee
  2020-08-13 15:55 ` [PATCH 2/2] ls-remote: simplify UNLEAK() usage Jeff King
  2020-08-13 19:32 ` [PATCH 0/2] UNLEAK style fixes Eric Sunshine
  2 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-08-13 15:55 UTC (permalink / raw)
  To: git

The point of UNLEAK() is to make a reference to a variable that is about
to go out of scope so that leak-checkers will consider it to be
not-leaked. Doing so right before die() is therefore pointless; even
though we are about to exit the program, the variable will still be on
the stack and accessible to leak-checkers.

These annotations aren't really hurting anything, but they clutter the
code and set a bad example of how to use UNLEAK().

Signed-off-by: Jeff King <peff@peff.net>
---
 bugreport.c | 4 +---
 midx.c      | 8 ++------
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/bugreport.c b/bugreport.c
index 09579e268d..7ca0fba1b8 100644
--- a/bugreport.c
+++ b/bugreport.c
@@ -175,10 +175,8 @@ int cmd_main(int argc, const char **argv)
 	/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
 	report = open(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
 
-	if (report < 0) {
-		UNLEAK(report_path);
+	if (report < 0)
 		die(_("couldn't create a new file at '%s'"), report_path.buf);
-	}
 
 	if (write_in_full(report, buffer.buf, buffer.len) < 0)
 		die_errno(_("unable to write to %s"), report_path.buf);
diff --git a/midx.c b/midx.c
index a5fb797ede..737420f157 100644
--- a/midx.c
+++ b/midx.c
@@ -807,11 +807,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	int result = 0;
 
 	midx_name = get_midx_filename(object_dir);
-	if (safe_create_leading_directories(midx_name)) {
-		UNLEAK(midx_name);
+	if (safe_create_leading_directories(midx_name))
 		die_errno(_("unable to create leading directories of %s"),
 			  midx_name);
-	}
 
 	if (m)
 		packs.m = m;
@@ -1051,10 +1049,8 @@ void clear_midx_file(struct repository *r)
 		r->objects->multi_pack_index = NULL;
 	}
 
-	if (remove_path(midx)) {
-		UNLEAK(midx);
+	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
-	}
 
 	free(midx);
 }
-- 
2.28.0.573.gec6564704b


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] ls-remote: simplify UNLEAK() usage
  2020-08-13 15:54 [PATCH 0/2] UNLEAK style fixes Jeff King
  2020-08-13 15:55 ` [PATCH 1/2] stop calling UNLEAK() before die() Jeff King
@ 2020-08-13 15:55 ` Jeff King
  2020-08-13 18:11   ` Derrick Stolee
  2020-08-13 19:32 ` [PATCH 0/2] UNLEAK style fixes Eric Sunshine
  2 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-08-13 15:55 UTC (permalink / raw)
  To: git

We UNLEAK() the "sorting" list created by parsing command-line options
(which is essentially used until the program exits). But we do so right
before leaving the cmd_ls_remote() function, which means we have to hit
all of the exits. But the point of UNLEAK() is that it's an annotation
which doesn't impact the variable itself. We can mark it as soon as
we're done writing its value, and then we only have to do so once.

This gives us a minor code reduction, and serves as a better example of
how UNLEAK() can be used.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/ls-remote.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index ea91679f33..092917eca2 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -83,6 +83,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 	dest = argv[0];
 
+	UNLEAK(sorting);
+
 	if (argc > 1) {
 		int i;
 		pattern = xcalloc(argc, sizeof(const char *));
@@ -107,7 +109,6 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 
 	if (get_url) {
 		printf("%s\n", *remote->url);
-		UNLEAK(sorting);
 		return 0;
 	}
 
@@ -122,10 +123,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
 		repo_set_hash_algo(the_repository, hash_algo);
 	}
-	if (transport_disconnect(transport)) {
-		UNLEAK(sorting);
+	if (transport_disconnect(transport))
 		return 1;
-	}
 
 	if (!dest && !quiet)
 		fprintf(stderr, "From %s\n", *remote->url);
@@ -150,7 +149,6 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		status = 0; /* we found something */
 	}
 
-	UNLEAK(sorting);
 	ref_array_clear(&ref_array);
 	return status;
 }
-- 
2.28.0.573.gec6564704b

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] stop calling UNLEAK() before die()
  2020-08-13 15:55 ` [PATCH 1/2] stop calling UNLEAK() before die() Jeff King
@ 2020-08-13 18:08   ` Derrick Stolee
  2020-08-14 10:17     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2020-08-13 18:08 UTC (permalink / raw)
  To: Jeff King, git

On 8/13/2020 11:55 AM, Jeff King wrote:
> The point of UNLEAK() is to make a reference to a variable that is about
> to go out of scope so that leak-checkers will consider it to be
> not-leaked. Doing so right before die() is therefore pointless; even
> though we are about to exit the program, the variable will still be on
> the stack and accessible to leak-checkers.
> 
> These annotations aren't really hurting anything, but they clutter the
> code and set a bad example of how to use UNLEAK().

Good justification. I'll stop being a bad example ;)

Thanks,
-Stolee

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  bugreport.c | 4 +---
>  midx.c      | 8 ++------
>  2 files changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/bugreport.c b/bugreport.c
> index 09579e268d..7ca0fba1b8 100644
> --- a/bugreport.c
> +++ b/bugreport.c
> @@ -175,10 +175,8 @@ int cmd_main(int argc, const char **argv)
>  	/* fopen doesn't offer us an O_EXCL alternative, except with glibc. */
>  	report = open(report_path.buf, O_CREAT | O_EXCL | O_WRONLY, 0666);
>  
> -	if (report < 0) {
> -		UNLEAK(report_path);
> +	if (report < 0)
>  		die(_("couldn't create a new file at '%s'"), report_path.buf);
> -	}
>  
>  	if (write_in_full(report, buffer.buf, buffer.len) < 0)
>  		die_errno(_("unable to write to %s"), report_path.buf);
> diff --git a/midx.c b/midx.c
> index a5fb797ede..737420f157 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -807,11 +807,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  	int result = 0;
>  
>  	midx_name = get_midx_filename(object_dir);
> -	if (safe_create_leading_directories(midx_name)) {
> -		UNLEAK(midx_name);
> +	if (safe_create_leading_directories(midx_name))
>  		die_errno(_("unable to create leading directories of %s"),
>  			  midx_name);
> -	}
>  
>  	if (m)
>  		packs.m = m;
> @@ -1051,10 +1049,8 @@ void clear_midx_file(struct repository *r)
>  		r->objects->multi_pack_index = NULL;
>  	}
>  
> -	if (remove_path(midx)) {
> -		UNLEAK(midx);
> +	if (remove_path(midx))
>  		die(_("failed to clear multi-pack-index at %s"), midx);
> -	}
>  
>  	free(midx);
>  }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] ls-remote: simplify UNLEAK() usage
  2020-08-13 15:55 ` [PATCH 2/2] ls-remote: simplify UNLEAK() usage Jeff King
@ 2020-08-13 18:11   ` Derrick Stolee
  0 siblings, 0 replies; 9+ messages in thread
From: Derrick Stolee @ 2020-08-13 18:11 UTC (permalink / raw)
  To: Jeff King, git

On 8/13/2020 11:55 AM, Jeff King wrote:
> We UNLEAK() the "sorting" list created by parsing command-line options
> (which is essentially used until the program exits). But we do so right
> before leaving the cmd_ls_remote() function, which means we have to hit
> all of the exits. But the point of UNLEAK() is that it's an annotation
> which doesn't impact the variable itself. We can mark it as soon as
> we're done writing its value, and then we only have to do so once.
> 
> This gives us a minor code reduction, and serves as a better example of
> how UNLEAK() can be used.

LGTM. In hindsight, it's obvious that UNLEAK() can be called before we are
done using a variable. Otherwise, we'd just free() it instead.

-Stolee

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/ls-remote.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
> index ea91679f33..092917eca2 100644
> --- a/builtin/ls-remote.c
> +++ b/builtin/ls-remote.c
> @@ -83,6 +83,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
>  	dest = argv[0];
>  
> +	UNLEAK(sorting);
> +
>  	if (argc > 1) {
>  		int i;
>  		pattern = xcalloc(argc, sizeof(const char *));
> @@ -107,7 +109,6 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  
>  	if (get_url) {
>  		printf("%s\n", *remote->url);
> -		UNLEAK(sorting);
>  		return 0;
>  	}
>  
> @@ -122,10 +123,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
>  		repo_set_hash_algo(the_repository, hash_algo);
>  	}
> -	if (transport_disconnect(transport)) {
> -		UNLEAK(sorting);
> +	if (transport_disconnect(transport))
>  		return 1;
> -	}
>  
>  	if (!dest && !quiet)
>  		fprintf(stderr, "From %s\n", *remote->url);
> @@ -150,7 +149,6 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
>  		status = 0; /* we found something */
>  	}
>  
> -	UNLEAK(sorting);
>  	ref_array_clear(&ref_array);
>  	return status;
>  }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] UNLEAK style fixes
  2020-08-13 15:54 [PATCH 0/2] UNLEAK style fixes Jeff King
  2020-08-13 15:55 ` [PATCH 1/2] stop calling UNLEAK() before die() Jeff King
  2020-08-13 15:55 ` [PATCH 2/2] ls-remote: simplify UNLEAK() usage Jeff King
@ 2020-08-13 19:32 ` Eric Sunshine
  2020-08-14 10:34   ` Jeff King
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Sunshine @ 2020-08-13 19:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Thu, Aug 13, 2020 at 11:54 AM Jeff King <peff@peff.net> wrote:
> (As a side note, if we want to declare UNLEAK() a failure because nobody
> cares enough to really use it, I'm OK with that, too).

Perhaps the reason that UNLEAK() has not been particularly successful,
in general, is that it requires extra knowledge and reasoning to know
when to use it and how to do so properly. Couple that with the fact
that the scope of cases where it can be used is quite narrow compared
to sum total of all code in project for which we simply free resources
when we're done with them. So, it's hard to keep the specialized
UNLEAK() knowledge in one's head.

Speaking from personal experience, the several times I have had to
deal with UNLEAK(), I had to re-learn it from scratch each time. That
meant studying the header comment, studying the implementation, and
studying existing callers before things "clicked" enough to be able to
feel confident about how to use it (assuming it wasn't false
confidence).

Even today, reading this patch series, I had to go through all that
again just to understand the changes made by the patches, and
especially the commit message of patch [1/2]. It took several
re-reads, plus re-examining UNLEAK() documentation, plus looking at
the UNLEAK() implementation a couple times before the [1/2] commit
message finally "clicked".

That all represents a lot of cognitive overhead versus the common
practice of simply freeing resources when you're done with them, which
requires no extra cognitive load since it is something we think about
_always_ when working with a language like C with no built-in garbage
collection.

So, I for one would not be especially sad to see UNLEAK() retired.

(The patch series itself looked fine and made sense once I had
re-acquired the necessary UNLEAK() knowledge.)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] stop calling UNLEAK() before die()
  2020-08-13 18:08   ` Derrick Stolee
@ 2020-08-14 10:17     ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2020-08-14 10:17 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Thu, Aug 13, 2020 at 02:08:45PM -0400, Derrick Stolee wrote:

> On 8/13/2020 11:55 AM, Jeff King wrote:
> > The point of UNLEAK() is to make a reference to a variable that is about
> > to go out of scope so that leak-checkers will consider it to be
> > not-leaked. Doing so right before die() is therefore pointless; even
> > though we are about to exit the program, the variable will still be on
> > the stack and accessible to leak-checkers.
> > 
> > These annotations aren't really hurting anything, but they clutter the
> > code and set a bad example of how to use UNLEAK().
> 
> Good justification. I'll stop being a bad example ;)

To be fair, it seems clear that UNLEAK() as a concept is rather
confusing. I really never intended anybody to start sprinkling it around
the code. It was meant to be a tool for folks who are interested in
running leak-checkers to do in-code annotations (for "yes, I know this
leaks but not until the program effectively ends").

I certainly don't mind if people writing new code preemptively annotate
this kind of leak. But I also wouldn't really encourage authors to put a
lot of effort into it, given the current state of the annotations.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] UNLEAK style fixes
  2020-08-13 19:32 ` [PATCH 0/2] UNLEAK style fixes Eric Sunshine
@ 2020-08-14 10:34   ` Jeff King
  2020-08-14 16:23     ` Eric Sunshine
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-08-14 10:34 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Aug 13, 2020 at 03:32:56PM -0400, Eric Sunshine wrote:

> On Thu, Aug 13, 2020 at 11:54 AM Jeff King <peff@peff.net> wrote:
> > (As a side note, if we want to declare UNLEAK() a failure because nobody
> > cares enough to really use it, I'm OK with that, too).
> 
> Perhaps the reason that UNLEAK() has not been particularly successful,
> in general, is that it requires extra knowledge and reasoning to know
> when to use it and how to do so properly. Couple that with the fact
> that the scope of cases where it can be used is quite narrow compared
> to sum total of all code in project for which we simply free resources
> when we're done with them. So, it's hard to keep the specialized
> UNLEAK() knowledge in one's head.
> 
> Speaking from personal experience, the several times I have had to
> deal with UNLEAK(), I had to re-learn it from scratch each time. That
> meant studying the header comment, studying the implementation, and
> studying existing callers before things "clicked" enough to be able to
> feel confident about how to use it (assuming it wasn't false
> confidence).

I think this is really the meat of it. I never intended UNLEAK() to be
something people dealt with unless they were trying to get LSAN or
valgrind to run without complaining.

> That all represents a lot of cognitive overhead versus the common
> practice of simply freeing resources when you're done with them, which
> requires no extra cognitive load since it is something we think about
> _always_ when working with a language like C with no built-in garbage
> collection.

To be clear, I have no problem with _actually_ freeing resources if
that's an option. The point of UNLEAK() was:

  - to help with structs that don't have an easy way to free all
    elements (e.g., rev_info)

  - to preempt arguments about whether calling free(buf) right before
    programming exit is wasted effort. Whereas UNLEAK() is true
    zero-cost for non-leak-checking builds.

  - to avoid asking people to rewrite:

      return foo(bar);

     into:

       ret = foo(bar);
       free(bar);
       return ret;

So we could go that direction, but I'd wait on it until somebody feels
like sinking some time into getting us leak-checker-clean.

In the meantime, I have a slight preference to leave UNLEAK() there as a
potential tool for somebody digging into leak-checkers. But we almost
certainly shouldn't be asking new authors to use it in reviews, etc.
TBH, I'm not sure why people starting sprinkling UNLEAK() around in the
first place. ;)

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] UNLEAK style fixes
  2020-08-14 10:34   ` Jeff King
@ 2020-08-14 16:23     ` Eric Sunshine
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Sunshine @ 2020-08-14 16:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Fri, Aug 14, 2020 at 6:35 AM Jeff King <peff@peff.net> wrote:
> On Thu, Aug 13, 2020 at 03:32:56PM -0400, Eric Sunshine wrote:
> > That all represents a lot of cognitive overhead versus the common
> > practice of simply freeing resources when you're done with them, which
> > requires no extra cognitive load since it is something we think about
> > _always_ when working with a language like C with no built-in garbage
> > collection.
>
> In the meantime, I have a slight preference to leave UNLEAK() there as a
> potential tool for somebody digging into leak-checkers. But we almost
> certainly shouldn't be asking new authors to use it in reviews, etc.

I don't think it works that way in practice, though. There are enough
UNLEAK()'s sprinkled around that anyone working on or around code with
an existing UNLEAK() is compelled to understand/[re-]study it in order
to avoid breaking existing uses and/or to correctly mirror existing
uses when dealing with new resource allocations.

The same applies to patches. As a reviewer, I have two choices when I
see UNLEAK(): either I ignore it because I don't have the specialized
knowledge in my head (which makes me feel like my review is
ineffective), or I re-acquire the knowledge. And it's not just patches
like the ones in this series which are actively adjusting UNLEAK()
callers, but any patch which adds or removes an UNLEAK() corresponding
to the central meaty changes of the patch, or even a patch in which
UNLEAK() appears only in context lines, or even patches which don't
contains any UNLEAK() calls, but the source file to which the patch
applies does use UNLEAK(), if the reviewer consults the original
source code in addition to the patch.

> TBH, I'm not sure why people starting sprinkling UNLEAK() around in the
> first place. ;)

For the same reason that people are concerned about calling free() or
otherwise releasing or unlocking resources which they have acquired:
they're trying to be responsible. When a programmer sees UNLEAK()
being used in or around the code being changed, he or she will attempt
to maintain the fidelity of the existing code by being careful to
mimic existing nearby resource handling practices.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-14 16:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-13 15:54 [PATCH 0/2] UNLEAK style fixes Jeff King
2020-08-13 15:55 ` [PATCH 1/2] stop calling UNLEAK() before die() Jeff King
2020-08-13 18:08   ` Derrick Stolee
2020-08-14 10:17     ` Jeff King
2020-08-13 15:55 ` [PATCH 2/2] ls-remote: simplify UNLEAK() usage Jeff King
2020-08-13 18:11   ` Derrick Stolee
2020-08-13 19:32 ` [PATCH 0/2] UNLEAK style fixes Eric Sunshine
2020-08-14 10:34   ` Jeff King
2020-08-14 16:23     ` Eric Sunshine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).