From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3C2EC433E0 for ; Sun, 14 Feb 2021 01:33:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B5D464E3F for ; Sun, 14 Feb 2021 01:33:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229703AbhBNBde convert rfc822-to-8bit (ORCPT ); Sat, 13 Feb 2021 20:33:34 -0500 Received: from mail-ej1-f42.google.com ([209.85.218.42]:38888 "EHLO mail-ej1-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229690AbhBNBdd (ORCPT ); Sat, 13 Feb 2021 20:33:33 -0500 Received: by mail-ej1-f42.google.com with SMTP id bl23so5704623ejb.5 for ; Sat, 13 Feb 2021 17:33:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=s05qwL5N66erPzOoF7s+Vd0Ua/oCAXLJsgPF6uaHyg4=; b=j61ehDFAzH6t96HPss286tpYsnMX/pVt9yz7aAhrvQRcpI0AUSrj4eqJJX49oelteF zXm7GTAAiu2y6aoeJ7bmMDyKbBAXsMrnHx/zhajPRWhU9CM6WabyX/mpr/dxDpDDK4LH j4O34YiUtD3D07gktIFvmzzpGGHVtjODBucysqXD4eTSi5VEHH9ZfQzAs/JSnbKs6zY+ sH0qksAAF+ojaM6vsr+BJcZWZrhf6NX/iWNkGVq5HUxPYaM+zC+dR2lg2MYtoC6GJS8Y JkroaXMLXs+oTWPYTSqw22jtDZQKZdy1S4B9932TFlIUs4HvGfP/5z1PZNju3xGTW1sw SLTw== X-Gm-Message-State: AOAM530LBI/5SijS8nQO7gpiuw7PtJyLp99T5buz8aTuz1ppQOcD/UjB WxM32vraa18hpmFvsXHNEL51AxByvO6pQnJHpO8= X-Google-Smtp-Source: ABdhPJzT3DCZmdeDhSKi08UcXzseuqUF6OnsSe8LWhZY7PxMFd+lYT3Mi6KhyiycLCw8XOtF9TZyZ9ESWJhNZvAWfHM= X-Received: by 2002:a17:906:444d:: with SMTP id i13mr9368493ejp.170.1613266371076; Sat, 13 Feb 2021 17:32:51 -0800 (PST) MIME-Version: 1.0 References: <3e6af929d135ef2dc239e2f47f92a7e2e91cbd17.1612970140.git.gitgitgadget@gmail.com> In-Reply-To: From: Junio C Hamano Date: Sat, 13 Feb 2021 17:32:40 -0800 Message-ID: Subject: Re: [PATCH v3 1/5] t4001: add a test comparing basename similarity and content similarity To: Elijah Newren Cc: Elijah Newren via GitGitGadget , Git Mailing List , Derrick Stolee , Jonathan Tan , Taylor Blau , Jeff King , Derrick Stolee Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org I do not consider "the same file changed in place" the same as "we seem to have lost a file in the old tree, ah, we found one that has the same basename in a different directory" at all, so your argument still does not make any sense to me, sorry. 2021年2月13日(土) 17:25 Elijah Newren : > > On Sat, Feb 13, 2021 at 3:56 PM Junio C Hamano wrote: > > > > Elijah Newren writes: > > > > > This is not true. If src/main.c is 99% similar to src/foo.c, and is > > > 0% similar to the src/main.c in the new commit, we match the old > > > src/main.c to the new src/main.c despite being far more similar > > > src/foo.c. Unless break detection is turned on, we do not allow > > > content similarity to trump (full) filename equality. > > > > Absolutely. And we are talking about a new optimization that kicks > > in only when there is no break or no copy detection going on, no? > > Yes, precisely, we are only considering cases without break > detection...and thus we are considering cases where for the last 15 > years or more, sufficiently large filename similarity (an exact > fullname match) trumps any level of content similarity. I think it is > useful to note that while my optimization is adding more > considerations that can overrule maximal content similarity, it is not > the first such code choice to do that. > > But let me back up a bit... > > When I submitted the series, you and Stolee went into a long > discussion about an optimization that I didn't submit, one that feels > looser on "matching" than anything I submitted, and which I think > might counter-intuitively reduce performance rather than aid it. (The > performance side only comes into view in combination with later > series, but it was why I harped so much since then on only comparing > against at most one other file in the steps before full inexact rename > detection.) I was quite surprised by the diversion, but it made it > clear to me that my descriptions and commit messages were far too > vague and could be read to imply a completely different algorithm than > I intended. So, I tried to be far more careful in subsequent > iterations by adding wider context and contrasts. > > Further, after I wrote various things to try to clarify the > misunderstandings, I noticed that Stolee picked out one thing and > stated that "This idea of optimizing first for 100% filename > similarity is a good perspective on Git's rename detection algorithm." > (see https://lore.kernel.org/git/57d30e7d-7727-8d98-e3ef-bcfeebf9edd3@gmail.com/) > So, that particular point seemed to help him understand more, and > thus might be useful extra context for others reading along now or in > the future. > > Given all the above, I was trying to address earlier misunderstandings > and provide more context. Perhaps I swung the pendulum too far and > talked too much about other cases, or perhaps I just worded things > poorly again. All I was attempting to do in the commit message was > point out the multiple basic rules with filename and content > similarity, to lay the groundwork for new rules that do alternative > weightings. > > Anyway, I've added a few more tweaks to try to improve the wording for > the next round I'll submit today. Given my track record so far, it > would not be surprising if it still needed more tweaks.