Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK())

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Andrzej Hunt" <ajrhunt@google.com>,
	"Martin Ågren" <martin.agren@gmail.com>
Subject: Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK())
Date: Fri, 18 Feb 2022 13:35:12 +0100	[thread overview]
Message-ID: <220218.861r00ib86.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <xmqqtucyslz9.fsf@gitster.g>

On Wed, Feb 16 2022, Junio C Hamano wrote:

[CC-ing some people using/interested in UNLEAK()]

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> Convert the UNLEAK() added in 0e5bba53af7 (add UNLEAK annotation for
>>> reducing leak false positives, 2017-09-08) to release the memory using
>>> strbuf_release() instead.
>>>
>>> The tests being marked as passing with
>>> "TEST_PASSES_SANITIZE_LEAK=true" already passed before due to the
>>> UNLEAK(), but now they really don't leak memory, so let's mark them as
>>> such.
>>
>> That smells like a brave move.
>>
>> Specifically, the cited commit turned an existing strbuf_release()
>> on &err into UNLEAK().  If that and the other strbuf (sb) were so
>> easily releasable, why didn't we do so back then already?
>
> I suspect that the answer to the above question is because these
> allocations are in the top-level cmd_commit() function, which is
> never called recursively or repeatedly as a subroutine.  The only
> significant thing that happens after we return from it is to exit.
>
> In such a code path, marking a variable as UNLEAK() is a better
> thing to do than calling strbuf_release().  Both will work as a way
> to squelch sanitizers from reporting a leak that does not matter,
> but calling strbuf_release() means we'd spend extra cycles to return
> pieces of memory to the pool, even though we know that the pool
> itself will be cleaned immediately later at exit.
>
> We already have UNLEAK to tell sanitizers not to distract us from
> spotting and plugging real leaks by reporting these apparent leaks
> that do not matter.  It is of somewhat dubious value to do a "we
> care too much about pleasing sanitizer and spend extra cycles at
> runtime while real users are doing real work" change.

We've had several discussions about the utility of UNLEAK() as I've been
submitting these patches, and I thought that if we weren't 100% on the
same page it was at least clear what I was going for here.

Per https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ the
real goal I have in mind here is to use the built-ins as a stand-in for
testing whether the underlying APIs are leak-free.

Because of that having to reason about UNLEAK() doing the right thing or
not is just unneeded distraction. Yes for a "struct strbuf" it won't
matter, but most of what we UNLEAK() is more complex stuff like "struct
rev_info". We won't really make headway making revision.c not leak
memory without using "git log" et al as the test subjects for whether
that API leaks.

We are also freeing most of this already even for built-ins, e.g. see
(not all of these are applicable obviously, but per the numbers enough
are to make the point):

    $ git grep '\bstrbuf_release\(' -- builtin | wc -l
    456
    $ git grep '\bUNLEAK\(' -- builtin | wc -l
    29

So one goal I've got with these patches is to eventually get rid of
UNLEAK() entirely. We only use it in these few instances:

    $ git grep '\bUNLEAK\(' -- '*.c' | wc -l
    31

Per the above we'd want to convert any that deal with the complex
structures that are the big source of leaks to doing real releases for
testing the APIs. That'll leave only a handful of remaining legitimate
uses, which I don't think are worth keeping some UNLEAK() API around
for, v.s. just freeing them.

I think we've also somewhat been talking past each other in past
exchanges (including me with Jeff King) about what you call "real
leaks".

When I'm referring to memory leaks I'm talking about them in the more
inclusive sense explained in this valgrind documentation:
https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks;
Basically "a malloc() not followed by a free() is a leak".

But you and Jeff King have (I think) commonly used that term to exclude
what's called "still reachable" in that table. Those *are* memory leaks,
but as explained in that table just ones that arguably don't matter. Or
at least ones most people tracing leaks don't want reported by default.

UNLEAK() is just a mechanism for moving a memory leak from another
category into "still reachable". That will make LSAN blind to it, and
valgrind in many cases due to it being a heap tracer. But it & other
memory leaking tools *will* report even on those if given the right
options.

So I've found it useful to get rid of UNLEAK() in some cases for those,
because even if LSAN runs clean it's useful to run valgrind in its more
pedantic tracing modes to see even the "still unreachable", and if we
should care about it or not.

Some of the leaks in those category *are* "real leaks". I.e. you can
have an ever-growing structure with global reach that's always "still
reachable", but when using the API would eventually OOM you.

So while it's a useful heuristic for spotting "will be cleaned up at
exit anyway" you can't rely on it, so I've been looking at them anyway.

So being able to just free() them so I can permanently ignore them as I
fix more leaks would be useful to me, so I hope you'll agree on just
talking this as-is, thanks! :)