git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: "Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Phillip Wood" <phillip.wood@dunelm.org.uk>,
	"Daniel Li" <dan@danielyli.com>,
	git@vger.kernel.org
Subject: the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix)
Date: Fri, 20 May 2022 17:23:01 +0200	[thread overview]
Message-ID: <220520.86pmk81a9z.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <6af1aed1-ab13-ee0e-e979-d2f826ec776a@web.de>


On Sun, May 01 2022, René Scharfe wrote:

> Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón:
>> On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
>>> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
>>> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
>>> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
>>> that mechanism mandatory.
>>>
>>> git format-patch only sets no_free when --output is given, causing it to
>>> forget pathspecs after the first commit.  Set no_free unconditionally
>>> instead.
>>
>> I remember when I saw the first commit long ago, and thought; well that is
>> very round about way to reintroduce the UNLEAK removal that might have made
>> it visible.
>>
>> Haven't looked too closely, but considering that we were warned[1] the
>> interface was hard to use and might cause problems later and it did.
>>
>> wouldn't it a better and more secure solution to UNLEAK and remove all this
>> code, at least until it could be refactored cleanly, of course?
>
> Silently self-destructing pathspecs are a safety hazard indeed.
>
> no_free also affects freeing ignore_regex and parseopts, and even
> closing the output file.  I don't know about the file, but leaking the
> first two is harmless.  So removing the flag is safe as long as we make
> sure the output file is closed as needed.
>
> A safe diff_free() would only be called a particular diffopt once, when
> it's no longer needed.  It could check for reuse by setting a flag the
> first time, like in the patch below.  1426 tests in 163 test scripts
> fail for me with it applied on top of the regression fixes from this
> thread.
>
> Removing the diff_free() calls from diff.c::diff_flush() and
> log-tree.c::log_tree_commit() reduces this to just one or two in t7527
> (seems to be flaky).  Perhaps this is still salvageable?

Thanks both for handling this, and sorry that I was away at the time.

AFAICT the current status in this area is that with 2cc712324d5 (Merge
branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge
branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs
related to this have been fixed, along with 3da993f2e63 (Merge branch
'jc/diff-tree-stdin-fix', 2022-04-28).

"This" being my e900d494dcf (diff: add an API for deferred freeing,
2021-02-11), and 244c27242f4 (diff.[ch]: have diff_free() call
clear_pathspec(opts.pathspec), 2022-02-16) for the diff-tree case.

Not coincidentally around the same time my ab/plug-leak-in-revisions got
un-marked for "next" from [1] to [2], and I'm looking for a path forward
for this whole thing...

1. https://lore.kernel.org/git/xmqqbkwyz78z.fsf@gitster.g/
2. https://lore.kernel.org/git/xmqqwnfcskw2.fsf@gitster.g/

>> [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/
>
>
> ---
>  diff.c | 3 +++
>  diff.h | 1 +
>  2 files changed, 4 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index ef7159968b..01296829b5 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options)
>  	if (options->no_free)
>  		return;
>
> +	if (options->is_dead)
> +		BUG("double diff_free() on %p", (void *)options);
>  	diff_free_file(options);
>  	diff_free_ignore_regex(options);
>  	clear_pathspec(&options->pathspec);
>  	FREE_AND_NULL(options->parseopts);
> +	options->is_dead = 1;
>  }
>
>  void diff_flush(struct diff_options *options)
> diff --git a/diff.h b/diff.h
> index 8ae18e5ab1..c31d32ba19 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -398,6 +398,7 @@ struct diff_options {
>  	struct strmap *additional_path_headers;
>
>  	int no_free;
> +	int is_dead;
>  };
>
>  unsigned diff_filter_bit(char status);

Yes, that's quite scary. It shows that in general diff_free() isn't
reentrant-safe, but that we do call it repeatedly again.

However if we patch it like this instead we can see that (gulp!) we just
barely putter along, according to our test coverage at least. I.e. we
don't end up calling the parts of it that would be unsafe to call again:
	
	diff --git a/diff.c b/diff.c
	index ef7159968b6..0fe8bc5fade 100644
	--- a/diff.c
	+++ b/diff.c
	@@ -6438,14 +6438,23 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
	 
	 static void diff_free_file(struct diff_options *options)
	 {
	-	if (options->close_file)
	+	if (options->close_file) {
	+		if (options->is_dead)
	+			BUG("double diff_free() on %p", (void *)options);
	 		fclose(options->file);
	+	}
	 }
	 
	 static void diff_free_ignore_regex(struct diff_options *options)
	 {
	 	int i;
	 
	+	if (!options->ignore_regex_nr && !options->ignore_regex)
	+		return;
	+
	+	if (options->is_dead)
	+		BUG("double diff_free() on %p", (void *)options);
	+
	 	for (i = 0; i < options->ignore_regex_nr; i++) {
	 		regfree(options->ignore_regex[i]);
	 		free(options->ignore_regex[i]);
	@@ -6462,6 +6471,7 @@ void diff_free(struct diff_options *options)
	 	diff_free_ignore_regex(options);
	 	clear_pathspec(&options->pathspec);
	 	FREE_AND_NULL(options->parseopts);
	+	options->is_dead = 1;
	 }
	 
	 void diff_flush(struct diff_options *options)
	@@ -6560,7 +6570,6 @@ void diff_flush(struct diff_options *options)
	 free_queue:
	 	free(q->queue);
	 	DIFF_QUEUE_CLEAR(q);
	-	diff_free(options);
	 
	 	/*
	 	 * Report the content-level differences with HAS_CHANGES;
	diff --git a/diff.h b/diff.h
	index 8ae18e5ab1e..c31d32ba192 100644
	--- a/diff.h
	+++ b/diff.h
	@@ -398,6 +398,7 @@ struct diff_options {
	 	struct strmap *additional_path_headers;
	 
	 	int no_free;
	+	int is_dead;
	 };
	 
	 unsigned diff_filter_bit(char status);

I'd really like to fix this properly, but AFAICT the best way to do that
is to:

 A. Get ab/plug-leak-in-revisions merged down
 B. Fix diff_free() on top of that

Before I knew of these bugs I'd already written patches to get rid of
that whole "no_free" business. In retrospect it was completely the wrong
thing to do, but in hindsight something like it was needed to fix those
leaks as long as we didn't have a revisions_release().

I.e. the tricky cases where I ended up needing to set "no_free" are ones
where all the complexity neatly goes away once we start releasing the
"struct rev_info" properly, as it contains the data we'd like to
diff_free() at the end.

How does that plan sound, and is there anything I've missed?

I could also re-roll ab/plug-leak-in-revisions to include a fix that
makes it safe in the interim, i.e.:

	diff --git a/diff.c b/diff.c
	index ef7159968b6..2bc7ee81e4e 100644
	--- a/diff.c
	+++ b/diff.c
	@@ -6438,8 +6438,12 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
	 
	 static void diff_free_file(struct diff_options *options)
	 {
	-	if (options->close_file)
	+	if (options->close_file) {
	 		fclose(options->file);
	+
	+		options->file = NULL;
	+		options->close_file = 0;
	+	}
	 }
	 
	 static void diff_free_ignore_regex(struct diff_options *options)
	@@ -6450,7 +6454,8 @@ static void diff_free_ignore_regex(struct diff_options *options)
	 		regfree(options->ignore_regex[i]);
	 		free(options->ignore_regex[i]);
	 	}
	-	free(options->ignore_regex);
	+	options->ignore_regex_nr = 0;
	+	FREE_AND_NULL(options->ignore_regex);
	 }
	 
	 void diff_free(struct diff_options *options)

But as long as we're not adding new API users of it until the follow-up
after ab/plug-leak-in-revisions we should also be safe for now, but
perhaps it's prudent to do it anyway.

I *could* potentially produce a shorter series than
ab/plug-leak-in-revisions to narrowly try to remove "no_free" from
diff.c first, but it would basically need to first introduce a
release_revisions(), and without the other revisions API leaks being
fixed testing it would be much tricker. I'd really prefer not to do
that.

How does all that sound?

  reply	other threads:[~2022-05-20 15:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-30  2:22 Bug: `git show` honors path filters only for the first commit Daniel Li
2022-04-30  4:59 ` Junio C Hamano
2022-04-30  5:29   ` [PATCH] 2.36 show regression fix Junio C Hamano
2022-04-30 10:32     ` [PATCH] 2.36 format-patch " René Scharfe
2022-04-30 16:32       ` Carlo Marcelo Arenas Belón
2022-05-01  9:35         ` René Scharfe
2022-05-20 15:23           ` Ævar Arnfjörð Bjarmason [this message]
2022-05-20 17:23             ` the state of diff_free() and release_revisions() Junio C Hamano
2022-04-30 14:31     ` [PATCH] 2.36 fast-export regression fix René Scharfe
2022-04-30 20:50       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220520.86pmk81a9z.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=dan@danielyli.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).