git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v3 1/5] t4001: add a test comparing basename similarity and content similarity
Date: Fri, 12 Feb 2021 17:15:30 -0800	[thread overview]
Message-ID: <xmqq4kigvkct.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <3e6af929d135ef2dc239e2f47f92a7e2e91cbd17.1612970140.git.gitgitgadget@gmail.com> (Elijah Newren via GitGitGadget's message of "Wed, 10 Feb 2021 15:15:36 +0000")

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
>
> Add a simple test where a removed file is similar to two different added
> files; one of them has the same basename, and the other has a slightly
> higher content similarity.  Without break detection, filename similarity
> of 100% trumps content similarity for pairing up related files.  For
> any filename similarity less than 100%, the opposite is true -- content
> similarity is all that matters.  Add a testcase that documents this.

I am not sure why it is the "opposite".  When contents are similar
to the same degree of 100%, we tiebreak with the filename.  We never
favor a pair between the same filename over a pair between different
filenames with better content similarity.

And when contents are similar to the same degree of less than 100%,
we do not favor a pair between the same filename over a pair between
different filenames, as long as they are similar to the same degree.

So, I do not think "opposite" is helping readers to understand what
is going on.

> +test_expect_success 'basename similarity vs best similarity' '
> +	mkdir subdir &&
> +	test_write_lines line1 line2 line3 line4 line5 \
> +			 line6 line7 line8 line9 line10 >subdir/file.txt &&
> +	git add subdir/file.txt &&
> +	git commit -m "base txt" &&
> +
> +	git rm subdir/file.txt &&
> +	test_write_lines line1 line2 line3 line4 line5 \
> +			  line6 line7 line8 >file.txt &&
> +	test_write_lines line1 line2 line3 line4 line5 \
> +			  line6 line7 line8 line9 >file.md &&
> +	git add file.txt file.md &&
> +	git commit -a -m "rename" &&
> +	git diff-tree -r -M --name-status HEAD^ HEAD >actual &&
> +	# subdir/file.txt is 89% similar to file.md, 78% similar to file.txt,
> +	# but since same basenames are checked first...

I am not sure what the second line of this comment wants to imply
with the ellipses here.  Care to finish the sentence?

Or was the second line planned to be added when we start applying
the "check only the same filename first and see if we find a
better-than-reasonable match" heuristics but somehow survived
"rebase -i" and ended up here?

> +	cat >expected <<-\EOF &&
> +	R088	subdir/file.txt	file.md
> +	A	file.txt
> +	EOF
> +	test_cmp expected actual

Thanks.

  reply	other threads:[~2021-02-13  1:16 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-06 22:52 [PATCH 0/3] Optimization batch 7: use file basenames to guide rename detection Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 1/3] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 2/3] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 3/3] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-07 14:38   ` Derrick Stolee
2021-02-07 19:51     ` Junio C Hamano
2021-02-08  8:38       ` Elijah Newren
2021-02-08 11:43         ` Derrick Stolee
2021-02-08 16:25           ` Elijah Newren
2021-02-08 17:37         ` Junio C Hamano
2021-02-08 22:00           ` Elijah Newren
2021-02-08 23:43             ` Junio C Hamano
2021-02-08 23:52               ` Elijah Newren
2021-02-08  8:27     ` Elijah Newren
2021-02-08 11:31       ` Derrick Stolee
2021-02-08 16:09         ` Elijah Newren
2021-02-07  5:19 ` [PATCH 0/3] Optimization batch 7: use file basenames to guide rename detection Junio C Hamano
2021-02-07  6:05   ` Elijah Newren
2021-02-09 11:32 ` [PATCH v2 0/4] " Elijah Newren via GitGitGadget
2021-02-09 11:32   ` [PATCH v2 1/4] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-09 13:17     ` Derrick Stolee
2021-02-09 16:56       ` Elijah Newren
2021-02-09 17:02         ` Derrick Stolee
2021-02-09 17:42           ` Elijah Newren
2021-02-09 11:32   ` [PATCH v2 2/4] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-09 13:25     ` Derrick Stolee
2021-02-09 17:17       ` Elijah Newren
2021-02-09 17:34         ` Derrick Stolee
2021-02-09 11:32   ` [PATCH v2 3/4] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-09 13:33     ` Derrick Stolee
2021-02-09 17:41       ` Elijah Newren
2021-02-09 18:59         ` Junio C Hamano
2021-02-09 11:32   ` [PATCH v2 4/4] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-09 12:59     ` Derrick Stolee
2021-02-09 17:03       ` Junio C Hamano
2021-02-09 17:44         ` Elijah Newren
2021-02-10 15:15   ` [PATCH v3 0/5] Optimization batch 7: use file basenames to guide " Elijah Newren via GitGitGadget
2021-02-10 15:15     ` [PATCH v3 1/5] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-13  1:15       ` Junio C Hamano [this message]
2021-02-13  4:50         ` Elijah Newren
2021-02-13 23:56           ` Junio C Hamano
2021-02-14  1:24             ` Elijah Newren
2021-02-14  1:32               ` Junio C Hamano
2021-02-14  3:14                 ` Elijah Newren
2021-02-10 15:15     ` [PATCH v3 2/5] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-13  1:32       ` Junio C Hamano
2021-02-10 15:15     ` [PATCH v3 3/5] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-13  1:48       ` Junio C Hamano
2021-02-13 18:34         ` Elijah Newren
2021-02-13 23:55           ` Junio C Hamano
2021-02-14  3:08             ` Elijah Newren
2021-02-10 15:15     ` [PATCH v3 4/5] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-13  1:49       ` Junio C Hamano
2021-02-10 15:15     ` [PATCH v3 5/5] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-10 16:41       ` Junio C Hamano
2021-02-10 17:20         ` Elijah Newren
2021-02-11  8:15     ` [PATCH v4 0/6] Optimization batch 7: use file basenames to guide " Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 1/6] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 2/6] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 3/6] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 4/6] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 5/6] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 6/6] merge-ort: call diffcore_rename() directly Elijah Newren via GitGitGadget
2021-02-13  1:53       ` [PATCH v4 0/6] Optimization batch 7: use file basenames to guide rename detection Junio C Hamano
2021-02-14  7:51       ` [PATCH v5 " Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 1/6] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 2/6] diffcore-rename: compute basenames of source and dest candidates Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 3/6] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 4/6] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 5/6] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 6/6] merge-ort: call diffcore_rename() directly Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4kigvkct.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).