* Bug: `git show` honors path filters only for the first commit @ 2022-04-30 2:22 Daniel Li 2022-04-30 4:59 ` Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Daniel Li @ 2022-04-30 2:22 UTC (permalink / raw) To: git git version: 2.36.0 OS: macOS Monterey 12.2.1 Installed via: homebrew Bug summary: When `git show` is invoked with more than one commit, it only respects the `<path>` filters for the first commit. This is best illustrated with an example. In git's own source repo, invoke `git show --oneline --name-only ecc7c8841d 961b130d20 d3115660b4` and observe the files these commits touch (the `--oneline --name-only` arguments are to make this example short; they're not relevant to the bug): $ git show --oneline --numstat ecc7c8841d 961b130d20 d3115660b4 ecc7c8841d repo_read_index: add config to expect files outside sparse patterns 2 0 Documentation/config.txt 27 0 Documentation/config/sparse.txt 1 0 cache.h 14 0 config.c 1 0 environment.c 2 1 sparse-index.c 19 0 t/t1090-sparse-checkout-scope.sh 961b130d20 branch: add --recurse-submodules option for branch creation 3 0 Documentation/config/advice.txt 26 11 Documentation/config/submodule.txt 18 1 Documentation/git-branch.txt 1 0 advice.c 1 0 advice.h 141 0 branch.c 29 0 branch.h 38 6 builtin/branch.c 38 0 builtin/submodule--helper.c 61 0 submodule-config.c 34 0 submodule-config.h 9 2 submodule.c 3 0 submodule.h 292 0 t/t3207-branch-submodule.sh d3115660b4 branch: add flags and config to inherit tracking 2 1 Documentation/config/branch.txt 17 7 Documentation/git-branch.txt 1 1 Documentation/git-checkout.txt 1 1 Documentation/git-switch.txt 42 7 branch.c 2 1 branch.h 4 2 builtin/branch.c 4 2 builtin/checkout.c 3 0 config.c 16 0 parse-options-cb.c 2 0 parse-options.h 10 1 t/t2017-checkout-orphan.sh 23 0 t/t2027-checkout-track.sh 28 0 t/t2060-switch.sh 33 0 t/t3200-branch.sh 17 0 t/t7201-co.sh Now invoke the same command but filtered on files under the `Documentation/` directory. Observe that this path is only respected for the first commit: $ git show --oneline --numstat ecc7c8841d 961b130d20 d3115660b4 -- Documentation ecc7c8841d repo_read_index: add config to expect files outside sparse patterns 2 0 Documentation/config.txt 27 0 Documentation/config/sparse.txt 961b130d20 branch: add --recurse-submodules option for branch creation 3 0 Documentation/config/advice.txt 26 11 Documentation/config/submodule.txt 18 1 Documentation/git-branch.txt 1 0 advice.c 1 0 advice.h 141 0 branch.c 29 0 branch.h 38 6 builtin/branch.c 38 0 builtin/submodule--helper.c 61 0 submodule-config.c 34 0 submodule-config.h 9 2 submodule.c 3 0 submodule.h 292 0 t/t3207-branch-submodule.sh d3115660b4 branch: add flags and config to inherit tracking 2 1 Documentation/config/branch.txt 17 7 Documentation/git-branch.txt 1 1 Documentation/git-checkout.txt 1 1 Documentation/git-switch.txt 42 7 branch.c 2 1 branch.h 4 2 builtin/branch.c 4 2 builtin/checkout.c 3 0 config.c 16 0 parse-options-cb.c 2 0 parse-options.h 10 1 t/t2017-checkout-orphan.sh 23 0 t/t2027-checkout-track.sh 28 0 t/t2060-switch.sh 33 0 t/t3200-branch.sh 17 0 t/t7201-co.sh The expected output should be: $ git show --oneline --numstat ecc7c8841d 961b130d20 d3115660b4 -- Documentation ecc7c8841d repo_read_index: add config to expect files outside sparse patterns 2 0 Documentation/config.txt 27 0 Documentation/config/sparse.txt 961b130d20 branch: add --recurse-submodules option for branch creation 3 0 Documentation/config/advice.txt 26 11 Documentation/config/submodule.txt 18 1 Documentation/git-branch.txt d3115660b4 branch: add flags and config to inherit tracking 2 1 Documentation/config/branch.txt 17 7 Documentation/git-branch.txt 1 1 Documentation/git-checkout.txt 1 1 Documentation/git-switch.txt Bonus: Surprisingly, the `<path>` *is* respected for commits that don't have any files satisfying the `<path>`. For example, the following command correctly excludes commit `f36d4f8316` from the output because it doesn't contain any files under `Documentation/`: $ git show --oneline --numstat ecc7c8841d f36d4f8316 961b130d20 d3115660b4 -- Documentation ecc7c8841d repo_read_index: add config to expect files outside sparse patterns 2 0 Documentation/config.txt 27 0 Documentation/config/sparse.txt 961b130d20 branch: add --recurse-submodules option for branch creation 3 0 Documentation/config/advice.txt 26 11 Documentation/config/submodule.txt 18 1 Documentation/git-branch.txt 1 0 advice.c 1 0 advice.h 141 0 branch.c 29 0 branch.h 38 6 builtin/branch.c 38 0 builtin/submodule--helper.c 61 0 submodule-config.c 34 0 submodule-config.h 9 2 submodule.c 3 0 submodule.h 292 0 t/t3207-branch-submodule.sh d3115660b4 branch: add flags and config to inherit tracking 2 1 Documentation/config/branch.txt 17 7 Documentation/git-branch.txt 1 1 Documentation/git-checkout.txt 1 1 Documentation/git-switch.txt 42 7 branch.c 2 1 branch.h 4 2 builtin/branch.c 4 2 builtin/checkout.c 3 0 config.c 16 0 parse-options-cb.c 2 0 parse-options.h 10 1 t/t2017-checkout-orphan.sh 23 0 t/t2027-checkout-track.sh 28 0 t/t2060-switch.sh 33 0 t/t3200-branch.sh 17 0 t/t7201-co.sh Cheers, Dan Li -- Daniel Li dan@danielyli.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Bug: `git show` honors path filters only for the first commit 2022-04-30 2:22 Bug: `git show` honors path filters only for the first commit Daniel Li @ 2022-04-30 4:59 ` Junio C Hamano 2022-04-30 5:29 ` [PATCH] 2.36 show regression fix Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Junio C Hamano @ 2022-04-30 4:59 UTC (permalink / raw) To: Daniel Li, Ævar Arnfjörð Bjarmason, Phillip Wood, René Scharfe Cc: git Daniel Li <dan@danielyli.com> writes: > git version: 2.36.0 > OS: macOS Monterey 12.2.1 > Installed via: homebrew I think this is the same regression as the recently talked about "diff-tree --stdin" aka "gitk" regression. https://lore.kernel.org/git/xmqq7d7bsu2n.fsf@gitster.g/ e900d494 (diff: add an API for deferred freeing, 2021-02-11), broke cmd_log_walk(), and we started to lose some setting that was parsed from the command line and stored in the diff_options structure after cmd_log_walk() runs just once. But "git show A B" runs the function once for each commit. A recent change in 2.36.0 made it worse by adding <pathspec> to the set of setting that gets lost after cmd_log_walk() runs once. Thanks for a report. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] 2.36 show regression fix 2022-04-30 4:59 ` Junio C Hamano @ 2022-04-30 5:29 ` Junio C Hamano 2022-04-30 10:32 ` [PATCH] 2.36 format-patch " René Scharfe 2022-04-30 14:31 ` [PATCH] 2.36 fast-export regression fix René Scharfe 0 siblings, 2 replies; 10+ messages in thread From: Junio C Hamano @ 2022-04-30 5:29 UTC (permalink / raw) To: Daniel Li Cc: Ævar Arnfjörð Bjarmason, Phillip Wood, René Scharfe, git This only surfaced as a regression after 2.36 release, but the breakage was already there with us for at least a year. e900d494 (diff: add an API for deferred freeing, 2021-02-11) introduced a mechanism to delay freeing resources held in diff_options struct that need to be kept as long as the struct will be reused to compute diff. "git log -p" was taught to utilize the mechanism but it was done with an incorrect assumption that the underlying helper function, cmd_log_walk(), is called only once, and it is OK to do the freeing at the end of it. Alas, for "git show A B", the function is called once for each commit given, so it is not OK to free the resources until we finish calling it for all the commits given from the command line. During 2.36 release cycle, we started clearing the <pathspec> as part of this freeing, which made the bug a lot more visible. Fix this breakage by tweaking how cmd_log_walk() frees the resources at the end and using a variant of it that does not immediately free the resources to show each commit object from the command line in "git show". Protect the fix with a few new tests. Reported-by: Daniel Li <dan@danielyli.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Junio C Hamano <gitster@pobox.com> writes: > Daniel Li <dan@danielyli.com> writes: > >> git version: 2.36.0 >> OS: macOS Monterey 12.2.1 >> Installed via: homebrew > > I think this is the same regression as the recently talked about > "diff-tree --stdin" aka "gitk" regression. > > https://lore.kernel.org/git/xmqq7d7bsu2n.fsf@gitster.g/ > > e900d494 (diff: add an API for deferred freeing, 2021-02-11), broke > cmd_log_walk(), and we started to lose some setting that was parsed > from the command line and stored in the diff_options structure after > cmd_log_walk() runs just once. But "git show A B" runs the function > once for each commit. A recent change in 2.36.0 made it worse by > adding <pathspec> to the set of setting that gets lost after > cmd_log_walk() runs once. > > Thanks for a report. builtin/log.c | 23 ++++++++++++++++++----- t/t4013-diff-various.sh | 19 +++++++++++++++++++ 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index c211d66d1d..6696c4cfd0 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -417,7 +417,7 @@ static void finish_early_output(struct rev_info *rev) show_early_header(rev, "done", n); } -static int cmd_log_walk(struct rev_info *rev) +static int cmd_log_walk_no_free(struct rev_info *rev) { struct commit *commit; int saved_nrl = 0; @@ -444,7 +444,6 @@ static int cmd_log_walk(struct rev_info *rev) * and HAS_CHANGES being accumulated in rev->diffopt, so be careful to * retain that state information if replacing rev->diffopt in this loop */ - rev->diffopt.no_free = 1; while ((commit = get_revision(rev)) != NULL) { if (!log_tree_commit(rev, commit) && rev->max_count >= 0) /* @@ -469,8 +468,6 @@ static int cmd_log_walk(struct rev_info *rev) } rev->diffopt.degraded_cc_to_c = saved_dcctc; rev->diffopt.needed_rename_limit = saved_nrl; - rev->diffopt.no_free = 0; - diff_free(&rev->diffopt); if (rev->remerge_diff) { tmp_objdir_destroy(rev->remerge_objdir); @@ -484,6 +481,17 @@ static int cmd_log_walk(struct rev_info *rev) return diff_result_code(&rev->diffopt, 0); } +static int cmd_log_walk(struct rev_info *rev) +{ + int retval; + + rev->diffopt.no_free = 1; + retval = cmd_log_walk_no_free(rev); + rev->diffopt.no_free = 0; + diff_free(&rev->diffopt); + return retval; +} + static int git_log_config(const char *var, const char *value, void *cb) { const char *slot_name; @@ -680,6 +688,7 @@ int cmd_show(int argc, const char **argv, const char *prefix) count = rev.pending.nr; objects = rev.pending.objects; + rev.diffopt.no_free = 1; for (i = 0; i < count && !ret; i++) { struct object *o = objects[i].item; const char *name = objects[i].name; @@ -725,12 +734,16 @@ int cmd_show(int argc, const char **argv, const char *prefix) rev.pending.nr = rev.pending.alloc = 0; rev.pending.objects = NULL; add_object_array(o, name, &rev.pending); - ret = cmd_log_walk(&rev); + ret = cmd_log_walk_no_free(&rev); break; default: ret = error(_("unknown type: %d"), o->type); } } + + rev.diffopt.no_free = 0; + diff_free(&rev.diffopt); + free(objects); return ret; } diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 750aee17ea..7a44d5d595 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -542,6 +542,25 @@ test_expect_success 'diff-tree --stdin with log formatting' ' test_cmp expect actual ' +test_expect_success 'show A B ... -- <pathspec>' ' + # side touches dir/sub, file0, and file3 + # master^ touches dir/sub, and file1 + # master^^ touches dir/sub, file0, and file2 + git show --name-only --format="<%s>" side master^ master^^ -- dir >actual && + cat >expect <<-\EOF && + <Side> + + dir/sub + <Third> + + dir/sub + <Second> + + dir/sub + EOF + test_cmp expect actual +' + test_expect_success 'diff -I<regex>: setup' ' git checkout master && test_seq 50 >file0 && -- 2.36.0-256-g547811d5a1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH] 2.36 format-patch regression fix 2022-04-30 5:29 ` [PATCH] 2.36 show regression fix Junio C Hamano @ 2022-04-30 10:32 ` René Scharfe 2022-04-30 16:32 ` Carlo Marcelo Arenas Belón 2022-04-30 14:31 ` [PATCH] 2.36 fast-export regression fix René Scharfe 1 sibling, 1 reply; 10+ messages in thread From: René Scharfe @ 2022-04-30 10:32 UTC (permalink / raw) To: Junio C Hamano, Ævar Arnfjörð Bjarmason Cc: Phillip Wood, Daniel Li, git e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a way to allow reusing diffopts: the no_free bit. 244c27242f (diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made that mechanism mandatory. git format-patch only sets no_free when --output is given, causing it to forget pathspecs after the first commit. Set no_free unconditionally instead. The existing test was unable to detect this breakage because it checks stderr for the absence of a certain string, but format-patch writes to stdout. Also the test was not checking the case of one commit modifying multiple files and a pathspec limiting the diff. Replace it with a more thorough one. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/log.c | 9 ++------- t/t4014-format-patch.sh | 33 +++++++++++++++++++++++++++++++-- 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index c211d66d1d..9acc130594 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1883,6 +1883,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) rev.diff = 1; rev.max_parents = 1; rev.diffopt.flags.recursive = 1; + rev.diffopt.no_free = 1; rev.subject_prefix = fmt_patch_subject_prefix; memset(&s_r_opt, 0, sizeof(s_r_opt)); s_r_opt.def = "HEAD"; @@ -2008,13 +2009,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) if (use_stdout) { setup_pager(); - } else if (rev.diffopt.close_file) { - /* - * The diff code parsed --output; it has already opened the - * file, but we must instruct it not to close after each diff. - */ - rev.diffopt.no_free = 1; - } else { + } else if (!rev.diffopt.close_file) { int saved; if (!output_directory) diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index 7dc5a5c736..fbec8ad2ef 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -926,11 +926,40 @@ test_expect_success 'format-patch --numstat should produce a patch' ' ' test_expect_success 'format-patch -- <path>' ' - git format-patch main..side -- file 2>error && - ! grep "Use .--" error + rm -f *.patch && + git checkout -b pathspec main && + + echo file_a 1 >file_a && + echo file_b 1 >file_b && + git add file_a file_b && + git commit -m pathspec_initial && + + echo file_a 2 >>file_a && + git add file_a && + git commit -m pathspec_a && + + echo file_b 2 >>file_b && + git add file_b && + git commit -m pathspec_b && + + echo file_a 3 >>file_a && + echo file_b 3 >>file_b && + git add file_a file_b && + git commit -m pathspec_ab && + + cat >expect <<-\EOF && + 0001-pathspec_initial.patch + 0002-pathspec_a.patch + 0003-pathspec_ab.patch + EOF + + git format-patch main..pathspec -- file_a >output && + test_cmp expect output && + ! grep file_b *.patch ' test_expect_success 'format-patch --ignore-if-in-upstream HEAD' ' + git checkout side && git format-patch --ignore-if-in-upstream HEAD ' -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] 2.36 format-patch regression fix 2022-04-30 10:32 ` [PATCH] 2.36 format-patch " René Scharfe @ 2022-04-30 16:32 ` Carlo Marcelo Arenas Belón 2022-05-01 9:35 ` René Scharfe 0 siblings, 1 reply; 10+ messages in thread From: Carlo Marcelo Arenas Belón @ 2022-04-30 16:32 UTC (permalink / raw) To: René Scharfe Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Phillip Wood, Daniel Li, git On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote: > e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a > way to allow reusing diffopts: the no_free bit. 244c27242f (diff.[ch]: > have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made > that mechanism mandatory. > > git format-patch only sets no_free when --output is given, causing it to > forget pathspecs after the first commit. Set no_free unconditionally > instead. I remember when I saw the first commit long ago, and thought; well that is very round about way to reintroduce the UNLEAK removal that might have made it visible. Haven't looked too closely, but considering that we were warned[1] the interface was hard to use and might cause problems later and it did. wouldn't it a better and more secure solution to UNLEAK and remove all this code, at least until it could be refactored cleanly, of course? Carlo [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] 2.36 format-patch regression fix 2022-04-30 16:32 ` Carlo Marcelo Arenas Belón @ 2022-05-01 9:35 ` René Scharfe 2022-05-20 15:23 ` the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 10+ messages in thread From: René Scharfe @ 2022-05-01 9:35 UTC (permalink / raw) To: Carlo Marcelo Arenas Belón Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason, Phillip Wood, Daniel Li, git Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón: > On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote: >> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a >> way to allow reusing diffopts: the no_free bit. 244c27242f (diff.[ch]: >> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made >> that mechanism mandatory. >> >> git format-patch only sets no_free when --output is given, causing it to >> forget pathspecs after the first commit. Set no_free unconditionally >> instead. > > I remember when I saw the first commit long ago, and thought; well that is > very round about way to reintroduce the UNLEAK removal that might have made > it visible. > > Haven't looked too closely, but considering that we were warned[1] the > interface was hard to use and might cause problems later and it did. > > wouldn't it a better and more secure solution to UNLEAK and remove all this > code, at least until it could be refactored cleanly, of course? Silently self-destructing pathspecs are a safety hazard indeed. no_free also affects freeing ignore_regex and parseopts, and even closing the output file. I don't know about the file, but leaking the first two is harmless. So removing the flag is safe as long as we make sure the output file is closed as needed. A safe diff_free() would only be called a particular diffopt once, when it's no longer needed. It could check for reuse by setting a flag the first time, like in the patch below. 1426 tests in 163 test scripts fail for me with it applied on top of the regression fixes from this thread. Removing the diff_free() calls from diff.c::diff_flush() and log-tree.c::log_tree_commit() reduces this to just one or two in t7527 (seems to be flaky). Perhaps this is still salvageable? > > Carlo > > [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/ --- diff.c | 3 +++ diff.h | 1 + 2 files changed, 4 insertions(+) diff --git a/diff.c b/diff.c index ef7159968b..01296829b5 100644 --- a/diff.c +++ b/diff.c @@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options) if (options->no_free) return; + if (options->is_dead) + BUG("double diff_free() on %p", (void *)options); diff_free_file(options); diff_free_ignore_regex(options); clear_pathspec(&options->pathspec); FREE_AND_NULL(options->parseopts); + options->is_dead = 1; } void diff_flush(struct diff_options *options) diff --git a/diff.h b/diff.h index 8ae18e5ab1..c31d32ba19 100644 --- a/diff.h +++ b/diff.h @@ -398,6 +398,7 @@ struct diff_options { struct strmap *additional_path_headers; int no_free; + int is_dead; }; unsigned diff_filter_bit(char status); -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) 2022-05-01 9:35 ` René Scharfe @ 2022-05-20 15:23 ` Ævar Arnfjörð Bjarmason 2022-05-20 17:23 ` the state of diff_free() and release_revisions() Junio C Hamano 0 siblings, 1 reply; 10+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-05-20 15:23 UTC (permalink / raw) To: René Scharfe Cc: Carlo Marcelo Arenas Belón, Junio C Hamano, Phillip Wood, Daniel Li, git On Sun, May 01 2022, René Scharfe wrote: > Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón: >> On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote: >>> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a >>> way to allow reusing diffopts: the no_free bit. 244c27242f (diff.[ch]: >>> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made >>> that mechanism mandatory. >>> >>> git format-patch only sets no_free when --output is given, causing it to >>> forget pathspecs after the first commit. Set no_free unconditionally >>> instead. >> >> I remember when I saw the first commit long ago, and thought; well that is >> very round about way to reintroduce the UNLEAK removal that might have made >> it visible. >> >> Haven't looked too closely, but considering that we were warned[1] the >> interface was hard to use and might cause problems later and it did. >> >> wouldn't it a better and more secure solution to UNLEAK and remove all this >> code, at least until it could be refactored cleanly, of course? > > Silently self-destructing pathspecs are a safety hazard indeed. > > no_free also affects freeing ignore_regex and parseopts, and even > closing the output file. I don't know about the file, but leaking the > first two is harmless. So removing the flag is safe as long as we make > sure the output file is closed as needed. > > A safe diff_free() would only be called a particular diffopt once, when > it's no longer needed. It could check for reuse by setting a flag the > first time, like in the patch below. 1426 tests in 163 test scripts > fail for me with it applied on top of the regression fixes from this > thread. > > Removing the diff_free() calls from diff.c::diff_flush() and > log-tree.c::log_tree_commit() reduces this to just one or two in t7527 > (seems to be flaky). Perhaps this is still salvageable? Thanks both for handling this, and sorry that I was away at the time. AFAICT the current status in this area is that with 2cc712324d5 (Merge branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs related to this have been fixed, along with 3da993f2e63 (Merge branch 'jc/diff-tree-stdin-fix', 2022-04-28). "This" being my e900d494dcf (diff: add an API for deferred freeing, 2021-02-11), and 244c27242f4 (diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) for the diff-tree case. Not coincidentally around the same time my ab/plug-leak-in-revisions got un-marked for "next" from [1] to [2], and I'm looking for a path forward for this whole thing... 1. https://lore.kernel.org/git/xmqqbkwyz78z.fsf@gitster.g/ 2. https://lore.kernel.org/git/xmqqwnfcskw2.fsf@gitster.g/ >> [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/ > > > --- > diff.c | 3 +++ > diff.h | 1 + > 2 files changed, 4 insertions(+) > > diff --git a/diff.c b/diff.c > index ef7159968b..01296829b5 100644 > --- a/diff.c > +++ b/diff.c > @@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options) > if (options->no_free) > return; > > + if (options->is_dead) > + BUG("double diff_free() on %p", (void *)options); > diff_free_file(options); > diff_free_ignore_regex(options); > clear_pathspec(&options->pathspec); > FREE_AND_NULL(options->parseopts); > + options->is_dead = 1; > } > > void diff_flush(struct diff_options *options) > diff --git a/diff.h b/diff.h > index 8ae18e5ab1..c31d32ba19 100644 > --- a/diff.h > +++ b/diff.h > @@ -398,6 +398,7 @@ struct diff_options { > struct strmap *additional_path_headers; > > int no_free; > + int is_dead; > }; > > unsigned diff_filter_bit(char status); Yes, that's quite scary. It shows that in general diff_free() isn't reentrant-safe, but that we do call it repeatedly again. However if we patch it like this instead we can see that (gulp!) we just barely putter along, according to our test coverage at least. I.e. we don't end up calling the parts of it that would be unsafe to call again: diff --git a/diff.c b/diff.c index ef7159968b6..0fe8bc5fade 100644 --- a/diff.c +++ b/diff.c @@ -6438,14 +6438,23 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) static void diff_free_file(struct diff_options *options) { - if (options->close_file) + if (options->close_file) { + if (options->is_dead) + BUG("double diff_free() on %p", (void *)options); fclose(options->file); + } } static void diff_free_ignore_regex(struct diff_options *options) { int i; + if (!options->ignore_regex_nr && !options->ignore_regex) + return; + + if (options->is_dead) + BUG("double diff_free() on %p", (void *)options); + for (i = 0; i < options->ignore_regex_nr; i++) { regfree(options->ignore_regex[i]); free(options->ignore_regex[i]); @@ -6462,6 +6471,7 @@ void diff_free(struct diff_options *options) diff_free_ignore_regex(options); clear_pathspec(&options->pathspec); FREE_AND_NULL(options->parseopts); + options->is_dead = 1; } void diff_flush(struct diff_options *options) @@ -6560,7 +6570,6 @@ void diff_flush(struct diff_options *options) free_queue: free(q->queue); DIFF_QUEUE_CLEAR(q); - diff_free(options); /* * Report the content-level differences with HAS_CHANGES; diff --git a/diff.h b/diff.h index 8ae18e5ab1e..c31d32ba192 100644 --- a/diff.h +++ b/diff.h @@ -398,6 +398,7 @@ struct diff_options { struct strmap *additional_path_headers; int no_free; + int is_dead; }; unsigned diff_filter_bit(char status); I'd really like to fix this properly, but AFAICT the best way to do that is to: A. Get ab/plug-leak-in-revisions merged down B. Fix diff_free() on top of that Before I knew of these bugs I'd already written patches to get rid of that whole "no_free" business. In retrospect it was completely the wrong thing to do, but in hindsight something like it was needed to fix those leaks as long as we didn't have a revisions_release(). I.e. the tricky cases where I ended up needing to set "no_free" are ones where all the complexity neatly goes away once we start releasing the "struct rev_info" properly, as it contains the data we'd like to diff_free() at the end. How does that plan sound, and is there anything I've missed? I could also re-roll ab/plug-leak-in-revisions to include a fix that makes it safe in the interim, i.e.: diff --git a/diff.c b/diff.c index ef7159968b6..2bc7ee81e4e 100644 --- a/diff.c +++ b/diff.c @@ -6438,8 +6438,12 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) static void diff_free_file(struct diff_options *options) { - if (options->close_file) + if (options->close_file) { fclose(options->file); + + options->file = NULL; + options->close_file = 0; + } } static void diff_free_ignore_regex(struct diff_options *options) @@ -6450,7 +6454,8 @@ static void diff_free_ignore_regex(struct diff_options *options) regfree(options->ignore_regex[i]); free(options->ignore_regex[i]); } - free(options->ignore_regex); + options->ignore_regex_nr = 0; + FREE_AND_NULL(options->ignore_regex); } void diff_free(struct diff_options *options) But as long as we're not adding new API users of it until the follow-up after ab/plug-leak-in-revisions we should also be safe for now, but perhaps it's prudent to do it anyway. I *could* potentially produce a shorter series than ab/plug-leak-in-revisions to narrowly try to remove "no_free" from diff.c first, but it would basically need to first introduce a release_revisions(), and without the other revisions API leaks being fixed testing it would be much tricker. I'd really prefer not to do that. How does all that sound? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: the state of diff_free() and release_revisions() 2022-05-20 15:23 ` the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) Ævar Arnfjörð Bjarmason @ 2022-05-20 17:23 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2022-05-20 17:23 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: René Scharfe, Carlo Marcelo Arenas Belón, Phillip Wood, Daniel Li, git Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > AFAICT the current status in this area is that with 2cc712324d5 (Merge > branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge > branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs > related to this have been fixed, along with 3da993f2e63 (Merge branch > 'jc/diff-tree-stdin-fix', 2022-04-28). There is another possible known breakage around "diff -I --cc" reported, no? https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/ > Not coincidentally around the same time my ab/plug-leak-in-revisions got > un-marked for "next" from [1] to [2], and I'm looking for a path forward > for this whole thing... I think this was not because it had known bugs but only because we wanted to avoid distractions and too much churns in the tree. I think it is a good time to reactivate it; Phillip Wood took a serious look at the latest round, if I am reading [*] correctly. * https://lore.kernel.org/git/2cc2d026-1496-1ed9-838b-6fdf8cfba285@gmail.com/ Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] 2.36 fast-export regression fix 2022-04-30 5:29 ` [PATCH] 2.36 show regression fix Junio C Hamano 2022-04-30 10:32 ` [PATCH] 2.36 format-patch " René Scharfe @ 2022-04-30 14:31 ` René Scharfe 2022-04-30 20:50 ` Junio C Hamano 1 sibling, 1 reply; 10+ messages in thread From: René Scharfe @ 2022-04-30 14:31 UTC (permalink / raw) To: Junio C Hamano, Ævar Arnfjörð Bjarmason Cc: Phillip Wood, git, Daniel Li e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a way to allow reusing diffopts: the no_free bit. 244c27242f (diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made that mechanism mandatory. git fast-export doesn't set no_free, so path limiting stopped working after the first commit. Set the flag and add a basic test to make sure only changes to the specified files are exported. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/fast-export.c | 1 + t/t9350-fast-export.sh | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index a7d72697fb..1355b5a96d 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -1261,6 +1261,7 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) revs.diffopt.format_callback = show_filemodify; revs.diffopt.format_callback_data = &paths_of_changed_objects; revs.diffopt.flags.recursive = 1; + revs.diffopt.no_free = 1; while ((commit = get_revision(&revs))) handle_commit(commit, &revs, &paths_of_changed_objects); diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 7b7a18dd2c..fc99703fc5 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -500,6 +500,13 @@ test_expect_success 'path limiting with import-marks does not lose unmodified fi grep file0 actual ' +test_expect_success 'path limiting works' ' + git fast-export simple -- file >actual && + sed -ne "s/^M .* //p" <actual | sort -u >actual.files && + echo file >expect && + test_cmp expect actual.files +' + test_expect_success 'avoid corrupt stream with non-existent mark' ' test_create_repo avoid_non_existent_mark && ( -- 2.35.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] 2.36 fast-export regression fix 2022-04-30 14:31 ` [PATCH] 2.36 fast-export regression fix René Scharfe @ 2022-04-30 20:50 ` Junio C Hamano 0 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2022-04-30 20:50 UTC (permalink / raw) To: René Scharfe Cc: Ævar Arnfjörð Bjarmason, Phillip Wood, git, Daniel Li René Scharfe <l.s.r@web.de> writes: > e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a > way to allow reusing diffopts: the no_free bit. 244c27242f (diff.[ch]: > have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made > that mechanism mandatory. > > git fast-export doesn't set no_free, so path limiting stopped working > after the first commit. Set the flag and add a basic test to make sure > only changes to the specified files are exported. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- Both format-patch fix and this one look good to me. We have UNLEAK() in format-patch so that the fix will not result in additional false positives from the leak checker. But this one, we didn't have UNLEAK() so this may start tickling the leak checker again. We may need to add UNLEAK() at some point. Stopping a leak checker from complaining about a known singleton leak that is at the top-level (i.e. does not become bigger with the data) by spending extra cycles is pointless, compared to using UNLEAK() to do the same, and it is doubly misguided if it breaks the behaviour X-<. Will queue both, thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-05-20 17:23 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-30 2:22 Bug: `git show` honors path filters only for the first commit Daniel Li 2022-04-30 4:59 ` Junio C Hamano 2022-04-30 5:29 ` [PATCH] 2.36 show regression fix Junio C Hamano 2022-04-30 10:32 ` [PATCH] 2.36 format-patch " René Scharfe 2022-04-30 16:32 ` Carlo Marcelo Arenas Belón 2022-05-01 9:35 ` René Scharfe 2022-05-20 15:23 ` the state of diff_free() and release_revisions() (was: [PATCH] 2.36 format-patch regression fix) Ævar Arnfjörð Bjarmason 2022-05-20 17:23 ` the state of diff_free() and release_revisions() Junio C Hamano 2022-04-30 14:31 ` [PATCH] 2.36 fast-export regression fix René Scharfe 2022-04-30 20:50 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).