From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id D99DA13E9 for ; Thu, 5 Sep 2019 08:12:20 +0000 (UTC) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1781C7DB for ; Thu, 5 Sep 2019 08:12:20 +0000 (UTC) Date: Thu, 5 Sep 2019 08:12:19 +0000 From: Eric Wong To: Doug Anderson Message-ID: <20190905081219.GA26003@dcvr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: ksummit-discuss@lists.linuxfoundation.org Subject: Re: [Ksummit-discuss] Allowing something Change-Id (or something like it) in kernel commits List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Doug Anderson wrote: > Hi, > > As everyone is probably aware, when you use the gerrit code review > system all of your commits get an extra line in them that looks > something like: > > Change-Id: I6a007dfe91ee1077a437963cf26d91370fdd9556 > > The Linux kernel has always viewed these Change-Id tags as obnoxious > and useless spam. Anyone who accidentally leaves a Change-Id in their > patch when posting to the mailing list is told to please re-post their > patch without the Change-Id. In this email, I will attempt to argue > that the Linux kernel ought to relax this restriction and allow > (possibly even encourage) Change-Ids. Hello Doug, sometimes-present public-inbox author here, chiming in late, replying off Mailman-mangled archives... > To begin with, let me make sure we're on the same page about what > Change-Ids are. As I understand it: > > * A change ID is much alike a UUID. It is locally generated on a > developer's computer and is (in theory) unique across the universe. > > * When a developer keeps the same Change-Id across two patches they > are making the assertion that the two patches are either the same or > should be treated as two versions of the same logical change. For > instance, v1, v2, and v3 of the same patch should have the same > Change-Id. Even if v2 and v3 of the patch have different subjects and > touch different files, if they have the same Change-Id then the > developer is asserting that v3 should be considered a new version of > the same logical change as v2. If it helps to think about it, > Change-Id is used by gerrit servers to know that a new patch uploaded > should replace an older version with the same Change-Id. This sounds like the debates around rename-tracking in VCSes circa 2005. Other VCSes at the time had explicit rename tracking; git was (and still is :) powerful enough to infer renames without extra input from the user. Drawing inspiration from git from not having "arch-id:" tags or other cruft for rename-tracking; MY philosophy is to have "Really Powerful Search" being able to tie things together after-the-fact without user intervention. The "Really Powerful Search" part is something I've been working on in public-inbox off-and-on for a few years :) > Specifically, let me list the problems I'd like to solve: > > 1. If I see a commit in Linux, I would like to be able to easily find > all of the mailing list discussions relevant to that commit. I know > there are proposals about including the Message-Id of the final post > in the commit log and that is certainly better than nothing, but the > Message-Id will only get you a link to the final version of the patch. > If the relevant discussion happened on a previous version of that > patch then you need to find it yourself. This gets harder if the > patch changed subject, touched different files, if parts of the series > landed at different times, and if multiple people were involved in > posting different versions of the patch. If the commit in Linux has a > Change-Id then the old versions are logically linked and easier to > associated with one another. > > 2. If I do a search through old mailing list archives and I stumble > upon a patch that didn't land, I can more easily find different > versions of that patch if I have a Change-Id. Some of these different > versions may have relevant discussions that explains why the patch > didn't land. Finding these other patches without a Change-Id might be > hard, again because they may touch different files, have a different > subject, or have been posted by a different person. I'm curious if you're taking advantage some of the public-inbox search features (in lore.kernel.org) which is designed with patches in mind. 1. "Phrase searching" with double quotes can be combined with the "s:" prefix to search most commit titles/subjects (*). 2. The dfa:/dfb: prefixes could be an alternative to "git blame" for searching text (and works for pre-git patches) 3. dfpre:/dfpost:/dfblob: can find exact blob SHA-1s 4. date/author can also match the git commit authorship info 5. all or any combination of the above can be combined in a single search to improve results All the Xapian prefixes are documented at $INBOX_URL/_text/help/, e.g: I'm also planning to support searching for the output of "git patch-id --stable" in the future. public-inbox also lacks pairity with "mairix -t" (being able to grab an entire thread if on message in the thread matches). > At the moment using a Change-Id in the way I described would require > searching through mailing lists for the Change-Id string to find other > versions of the same patch. However, I would expect it would only be > a matter of time before tools like patchwork are able to use Change-Id > to associate one version of a patch with the next version. I would > also expect that allowing Change-Id to exist would allow someone to > (perhaps) create a gerrit instance that watched the kernel mailing > list and mirrored mailing list discussions in the GUI. In other > words, once such tools exist presumably Change-Id will be much more > useful: you will eventually be able to paste a Change-Id into a tool > and get links to all relevant discussion and related posts. (*) I'm also linking the phrase searching through cgit to link commit titles to mail discussions shown below: https://80x24.org/mirrors/git.git/commit/?id=b777f3fd619ac2af507ffd3e7c5fe0d6e36e81f2 (following the "xdiff: clamp function context indices in post-image" link brings the public-inbox.org/git discussion) That's all done with a Lua script supported via cgit: https://80x24.org/public-inbox.git/tree/examples/cgit-commit-filter.lua It works OK in most cases, but would need to be combined (as noted in 5.) for best results. I haven't gotten dfpre/dfpost/dfblob/dfa/dfb links working via cgit + Lua, yet. It's on my personal roadmap, some hacking on cgit will be needed to expose that info (and I keep getting sidetracked on other stuff) > The basic summary is that I'd like there to be some way to track a > logical patch over its lifetime. I don't believe there is a reliable > (non-heuristic) way to do this today and I think Change-Id provides a > nice solution. While we could come up with a new and different > solution (because Change-Id was not invented here), it feels like > adopting Change-Id is convenient and easy and provides a true benefit. > Change-Id works super well with the decentralized/email workflow for > patches and can be phased in over time (or it can stay optional > forever). This sounds exactly like the "arch-id:" comments GNU arch users used to pollute their sources with for rename tracking :) It's ugly and made redundant by something which understands the content. Change-Id also won't work for historical patches (including pre-git stuff). Even without relying on git blob IDs for search, matching on exact text/phrases is pretty good for pre-git patches, already. And back to rename tracking, one yet-to-be-realized promise from git's lack of explicit tracking was being able to track code movements (individual functions or hunks, not just entire files). Having a good search engine gets us there :)