* FORMAT_PATCH_NAME_MAX increase @ 2017-05-29 8:49 Laszlo Ersek 2017-05-30 1:34 ` Junio C Hamano 2017-05-30 11:36 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 11+ messages in thread From: Laszlo Ersek @ 2017-05-29 8:49 UTC (permalink / raw) To: git Hi, would it be possible to - increase the FORMAT_PATCH_NAME_MAX macro from 64 to, say, 128? - Or else to introduce a new git-config knob for it? I have a small review-helper / interdiff script that matches patches from adjacent versions of a series against each other, based on subject line. (Using the formatted file name with the patch number stripped.) The project in question uses long common prefixes in subjects, and the current limit of 64 does not always ensure unicity (again, with the number stripped). Thank you, Laszlo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-29 8:49 FORMAT_PATCH_NAME_MAX increase Laszlo Ersek @ 2017-05-30 1:34 ` Junio C Hamano 2017-05-30 11:03 ` Laszlo Ersek 2017-05-30 11:36 ` Ævar Arnfjörð Bjarmason 1 sibling, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2017-05-30 1:34 UTC (permalink / raw) To: Laszlo Ersek; +Cc: git Laszlo Ersek <lersek@redhat.com> writes: > would it be possible to > > - increase the FORMAT_PATCH_NAME_MAX macro from 64 to, say, 128? > > - Or else to introduce a new git-config knob for it? It's open source, so both are "possible", but you are interested in learning if these are acceptable project-wide. > I have a small review-helper / interdiff script that matches patches > from adjacent versions of a series against each other, based on subject > line. (Using the formatted file name with the patch number stripped.) > The project in question uses long common prefixes in subjects, and the > current limit of 64 does not always ensure unicity (again, with the > number stripped). The original motivation of the design is that leading numbers are to ensure they are unique, and munging and truncating of title are to ensure that they are safe on the filesystem while being identifiable for eyeballing. And the current truncation limit may not work well for certain projects, failing the "identifiabl" goal. But I do not think that is true only for your custom script. For example, with such a project, output from "git shortlog" or the labels "gitk" gives to commits would not be very useful, because 2/3 of your display are filled with and wasted by the same long prefix to the left. The real issue is not that one particular "length limit" used by format-patch is too short; it is that the project convention wastes too many bytes for common things that can be left unsaid, and if we come up with a general way to address that, we'd make all of Git that summarises commits more useful, not just format-patch. So I think lengthening FORMAT_PATCH_NAME_MAX, whether it is done unconditionally or conditionally, is rather an ad-hoc hack than a real improvement. I cannot offhand guess what other places would suffer from such a project convention, because I do not work with such a project, but you may be able to come up with a list of various places in Git where the commit titles are used, and that if there were a mechanism to take these commit titles, pass them to your cutomization script, which abbreviates these "long common prefixes" to a shorter string, and to use the output from that script instead of the actual commit title, it would help all of these places. The list of commits "git merge" records when merging a side branch may benefit from the same mechanism. For example, if the titles of your commit title look like this: Subject: RedHat Kernel Maintenance: ipv4: add reference count Subject: RedHat Kernel Maintenance: net: llc: add lock_sock ... your custom script may be #!/bin/sh exec sed -e "s/^RedHat Kernel Maintenance: /RHKM-/" ... there may be more common patterns to shorten ... and the resulting output from format-patch might become 0001-RHKM-ipv4-add-reference-count.patch 0002-RHKM-net-llc-add-lock_sock.patch while "git shortlog" output and "gitk" display may also be helped by the same mechanism. It's OK for the _initial_ application of such a mechanism were to affect the names of files format-patch creates and nothing else. Interested parties can then use the same mechanism in other programs (like "gitk" and "git shortlog"), as long as that mechansim is designed well. And then the end users need to learn that mechanism only once. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 1:34 ` Junio C Hamano @ 2017-05-30 11:03 ` Laszlo Ersek 2017-05-30 13:28 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Laszlo Ersek @ 2017-05-30 11:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 05/30/17 03:34, Junio C Hamano wrote: > I cannot offhand guess what other places would suffer from such a > project convention, because I do not work with such a project, but > you may be able to come up with a list of various places in Git > where the commit titles are used, and that if there were a mechanism > to take these commit titles, pass them to your cutomization script, > which abbreviates these "long common prefixes" to a shorter string, > and to use the output from that script instead of the actual commit > title, it would help all of these places. The problem is that I can't really automate the subject munging. The concrete subjects in this case were: > OvmfPkg/QemuFwCfgLib: Implement SEV internal function for SEC phase > OvmfPkg/QemuFwCfgLib: Implement SEV internal functions for PEI phase > OvmfPkg/QemuFwCfgLib: Implement SEV internal function for Dxe phase which got formatted as > 0010-OvmfPkg-QemuFwCfgLib-Implement-SEV-internal-function.patch > 0011-OvmfPkg-QemuFwCfgLib-Implement-SEV-internal-function.patch > 0012-OvmfPkg-QemuFwCfgLib-Implement-SEV-internal-function.patch and these filenames differ only in the running counter on the left. The patch subjects themselves aren't overlong (the longest is 68 characters). At best I could "normalize away" the "OvmfPkg/QemuFwCfgLib" prefix, but that exact prefix is pretty accidental. Any standalone module (driver or library instance) in the edk2 project is supposed to be named like this in patch subjects, so all those prefixes would have to be normalized somehow. We generally try to limit subjects (and commit messages in general) to 74 columns. I think for one source of inspiration, we used the kernel documentation, when setting that limit. <https://www.kernel.org/doc/Documentation/process/submitting-patches.rst> says, under section 14, "The canonical patch format", > The canonical patch subject line is:: > > Subject: [PATCH 001/123] subsystem: summary phrase > > The canonical patch message body contains the following: > > [...] > > - The body of the explanation, line wrapped at 75 columns, which will > be copied to the permanent changelog to describe this patch. It does not specify the subject length, but perhaps we can apply the body line length to the subject as well. So, even in kernel land, if subjects up to 75 columns are permitted, but FORMAT_PATCH_NAME_MAX is 64, conflicts are possible, at least in theory, aren't they? With the numbers stripped, of course. Thanks, Laszlo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 11:03 ` Laszlo Ersek @ 2017-05-30 13:28 ` Junio C Hamano 0 siblings, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2017-05-30 13:28 UTC (permalink / raw) To: Laszlo Ersek; +Cc: git Laszlo Ersek <lersek@redhat.com> writes: > The problem is that I can't really automate the subject munging. The > concrete subjects in this case were: > >> OvmfPkg/QemuFwCfgLib: Implement SEV internal function for SEC phase >> OvmfPkg/QemuFwCfgLib: Implement SEV internal functions for PEI phase >> OvmfPkg/QemuFwCfgLib: Implement SEV internal function for Dxe phase > ... > So, even in kernel land, if subjects up to 75 columns are permitted, but > FORMAT_PATCH_NAME_MAX is 64, conflicts are possible, at least in theory, > aren't they? With the numbers stripped, of course. Yup, configurable lengthening or unconditional lengthening to 75 or so do not sound _too_ bad. If I sounded like I was opposed to lengthening, that wasn't what I meant. It was more like "if you can meaningfully abbreviate, you may help not just format-patch filenames but other use cases, and you might even be able to get away without lengthening"; if there is no meaningful way to abbreviate, raising the max length may be the only workable solution. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-29 8:49 FORMAT_PATCH_NAME_MAX increase Laszlo Ersek 2017-05-30 1:34 ` Junio C Hamano @ 2017-05-30 11:36 ` Ævar Arnfjörð Bjarmason 2017-05-30 12:28 ` Laszlo Ersek 1 sibling, 1 reply; 11+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2017-05-30 11:36 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Git Mailing List On Mon, May 29, 2017 at 10:49 AM, Laszlo Ersek <lersek@redhat.com> wrote: > Hi, > > would it be possible to > > - increase the FORMAT_PATCH_NAME_MAX macro from 64 to, say, 128? > > - Or else to introduce a new git-config knob for it? > > I have a small review-helper / interdiff script that matches patches > from adjacent versions of a series against each other, based on subject > line. (Using the formatted file name with the patch number stripped.) > The project in question uses long common prefixes in subjects, and the > current limit of 64 does not always ensure unicity (again, with the > number stripped). I don't see why this shouldn't be made configurable, but more generally if you have a script like this it seems like a futile effort in general to just make the subject longer to solve the general case, consider: (cd /tmp/; rm -rf test; git init test; cd test && for i in {1..3}; do touch $i && git add $i && git commit -m"test"; done && git format-patch -2 && git reset --hard HEAD~2 && git am *patch) Which now yields: 0001-test.patch 0002-test.patch Git projects in general will have plenty of patches like this, e.g. "fix build" or "update tests" or whatever. Isn't a better solution for your script to e.g. key on git-patch-id? $ grep "^From " *patch 0001-test.patch:From 870e37afa1a5aeb7eef76e607345adcfd4a9022d Mon Sep 17 00:00:00 2001 0002-test.patch:From de8c37a1532a4f6ae71ffa65400479ba77438f3b Mon Sep 17 00:00:00 2001 $ cat *patch | git patch-id c71eb8f2c8c461ba6040668e9d79f996f5a04a61 870e37afa1a5aeb7eef76e607345adcfd4a9022d 735aff6fb601d7ce99506dc7701be3e8a9b5d38c de8c37a1532a4f6ae71ffa65400479ba77438f3b Other potential heuristics could be keying not just on the subject but on the subject + subject of the last N commits for each commit, which should give more of a unique key, or key on the whole commit message etc. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 11:36 ` Ævar Arnfjörð Bjarmason @ 2017-05-30 12:28 ` Laszlo Ersek 2017-05-30 12:33 ` Laszlo Ersek 0 siblings, 1 reply; 11+ messages in thread From: Laszlo Ersek @ 2017-05-30 12:28 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List On 05/30/17 13:36, Ævar Arnfjörð Bjarmason wrote: > On Mon, May 29, 2017 at 10:49 AM, Laszlo Ersek <lersek@redhat.com> wrote: >> Hi, >> >> would it be possible to >> >> - increase the FORMAT_PATCH_NAME_MAX macro from 64 to, say, 128? >> >> - Or else to introduce a new git-config knob for it? >> >> I have a small review-helper / interdiff script that matches patches >> from adjacent versions of a series against each other, based on subject >> line. (Using the formatted file name with the patch number stripped.) >> The project in question uses long common prefixes in subjects, and the >> current limit of 64 does not always ensure unicity (again, with the >> number stripped). > > I don't see why this shouldn't be made configurable, but more > generally if you have a script like this it seems like a futile effort > in general to just make the subject longer to solve the general case, > consider: > > (cd /tmp/; rm -rf test; git init test; cd test && for i in {1..3}; > do touch $i && git add $i && git commit -m"test"; done && git > format-patch -2 && git reset --hard HEAD~2 && git am *patch) > > Which now yields: > > 0001-test.patch > 0002-test.patch > > Git projects in general will have plenty of patches like this, e.g. > "fix build" or "update tests" or whatever. Isn't a better solution for > your script to e.g. key on git-patch-id? > > $ grep "^From " *patch > 0001-test.patch:From 870e37afa1a5aeb7eef76e607345adcfd4a9022d Mon > Sep 17 00:00:00 2001 > 0002-test.patch:From de8c37a1532a4f6ae71ffa65400479ba77438f3b Mon > Sep 17 00:00:00 2001 > $ cat *patch | git patch-id > c71eb8f2c8c461ba6040668e9d79f996f5a04a61 > 870e37afa1a5aeb7eef76e607345adcfd4a9022d > 735aff6fb601d7ce99506dc7701be3e8a9b5d38c > de8c37a1532a4f6ae71ffa65400479ba77438f3b > > Other potential heuristics could be keying not just on the subject but > on the subject + subject of the last N commits for each commit, which > should give more of a unique key, or key on the whole commit message > etc. > My use case is the following: - Contributor posts version 1 of his/her patch series to the mailing list, and also pushes the series to his/her publicly accessible git repo on some branch. I review it and suggest a number of improvements. - Contributor posts version 2 of his/her patch series to the mailing list, and pushes the series to another branch in his/her repo too. I want to review the v2 series incrementally; that is, I'd like to diff each individual v2 patch against its v1 counterpart. While patches may be dropped, added, replaced between patch set versions, the general case is that most patches remain in the series, and they preserve their original subjects. Thus, finding the v1 counterpart for a v2 patch, based on the subject, as formatted by git-format-patch into filenames, works reliably most of the time. Note that in such an incremental review, I specifically wish to compare patches against each other (i.e., I'd like to see diffs of diffs, AKA interdiffs), and not the source tree at two, v1<->v2, commits into the two series. The latter would only give me the difference between the v1 and v2 patch application results, and while that is valuable as well, I'd really like to diff the actual patches. Of course I could technically parse the Subject: header of the formatted patch files, and rename the files for interdiffing using full-length filenames. But then I'd have to care about filesystem safety and such, and that's already handled perfectly by git-format-patch itself. The only bit I'm missing is the arbitrary length in the formatted file names. (I had patched git earlier to use 128 for FORMAT_PATCH_NAME_MAX, but I'd like to stop carrying that change.) "git-patch-id" doesn't look applicable here, as I'd like to assign patches (across v1 and v2) to each other based on "purpose". I.e., the assignment should occur regardless of even functional changes between corresponding v1 and v2 patches. I need the coupling exactly so I can review those differences. I use the "join" utility on the number-stripped patch filenames to find the pairs of patches between v1 and v2 that should be compared. Thanks! Laszlo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 12:28 ` Laszlo Ersek @ 2017-05-30 12:33 ` Laszlo Ersek 2017-05-30 12:41 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 11+ messages in thread From: Laszlo Ersek @ 2017-05-30 12:33 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List (apologies for the self-followup:) On 05/30/17 14:28, Laszlo Ersek wrote: > Note that in such an incremental review, I specifically wish to compare > patches against each other (i.e., I'd like to see diffs of diffs, AKA > interdiffs), and not the source tree at two, v1<->v2, commits into the > two series. The latter would only give me the difference between the v1 > and v2 patch application results, and while that is valuable as well, > I'd really like to diff the actual patches. ... One (but not the only) reason for that is that I'd like to see the v1->v2 commit message improvements as well. I quite frequently suggest commit message improvements in review. Thanks Laszlo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 12:33 ` Laszlo Ersek @ 2017-05-30 12:41 ` Ævar Arnfjörð Bjarmason 2017-05-30 13:37 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2017-05-30 12:41 UTC (permalink / raw) To: Laszlo Ersek; +Cc: Git Mailing List On Tue, May 30, 2017 at 2:33 PM, Laszlo Ersek <lersek@redhat.com> wrote: > (apologies for the self-followup:) > > On 05/30/17 14:28, Laszlo Ersek wrote: > >> Note that in such an incremental review, I specifically wish to compare >> patches against each other (i.e., I'd like to see diffs of diffs, AKA >> interdiffs), and not the source tree at two, v1<->v2, commits into the >> two series. The latter would only give me the difference between the v1 >> and v2 patch application results, and while that is valuable as well, >> I'd really like to diff the actual patches. > > ... One (but not the only) reason for that is that I'd like to see the > v1->v2 commit message improvements as well. I quite frequently suggest > commit message improvements in review. This all makes sense, I think a way to make this length configurable is the easiest/best solution to this. Just curious do you know about https://github.com/trast/tbdiff ? If not it might have a high overlap with what you're doing. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 12:41 ` Ævar Arnfjörð Bjarmason @ 2017-05-30 13:37 ` Junio C Hamano 2017-05-30 14:35 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2017-05-30 13:37 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Laszlo Ersek, Git Mailing List Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Just curious do you know about https://github.com/trast/tbdiff ? If > not it might have a high overlap with what you're doing. Yes, that is a very good suggestion. You'd need to be able to actually apply the patches but the way I often do a review is very similar to (actually, I'd say it is identical workflow) Laszlo's, and it goes like this: $ git checkout topic ;# previous round $ git checkout master... ;# check out the fork point of previous one $ git am mbox ;# apply the updated one $ git tbdiff ..@{-1} @{-1}.. With the second step, the commit immediately before the previous round of patches were applied to is checked out as a detached HEAD, and then with the third step, the updated patches are applied. After these two steps, the history leading to HEAD is the latest patches, and the history leading to topic (which can be referred to as @{-1}, i.e. the branch we were previously on) is the previous round. "git tbdiff" takes two ranges and compares these two series of commits. The first one says "commits included in the branch we are previously on (i.e. topic), excluding the ones on the current HEAD", which means "the patches from the previous round", and the second one says "commits included in the current HEAD, excluding the ones on the previous branch (i.e. topic)", which means "the patches from this latest round". After that, I may conclude $ git checkout -B @{-1} to update the tip of 'topic' with the latest set of patches (the reason why I type @{-1} is because that can stay constant in the workflow, no matter what the actual topic branch is called). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 13:37 ` Junio C Hamano @ 2017-05-30 14:35 ` Ævar Arnfjörð Bjarmason 2017-05-30 15:06 ` Laszlo Ersek 0 siblings, 1 reply; 11+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2017-05-30 14:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Laszlo Ersek, Git Mailing List, Thomas Rast On Tue, May 30, 2017 at 3:37 PM, Junio C Hamano <gitster@pobox.com> wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Just curious do you know about https://github.com/trast/tbdiff ? If >> not it might have a high overlap with what you're doing. > > Yes, that is a very good suggestion. You'd need to be able to > actually apply the patches but the way I often do a review is very > similar to (actually, I'd say it is identical workflow) Laszlo's, > and it goes like this: > > $ git checkout topic ;# previous round > $ git checkout master... ;# check out the fork point of previous one > $ git am mbox ;# apply the updated one > $ git tbdiff ..@{-1} @{-1}.. I wonder if this tool seemingly everyone on-list uses should just be integrated into git.git. I only learned about it <2 weeks ago and it's been great. The diff output was a bit nonsensical in some cases because it uses python's diff engine instead of git's. Would you mind patches to just integrate it to git in python as-is, then we could slowly convert it to C as we're doing with everything else. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: FORMAT_PATCH_NAME_MAX increase 2017-05-30 14:35 ` Ævar Arnfjörð Bjarmason @ 2017-05-30 15:06 ` Laszlo Ersek 0 siblings, 0 replies; 11+ messages in thread From: Laszlo Ersek @ 2017-05-30 15:06 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason, Junio C Hamano, Thomas Rast Cc: Git Mailing List On 05/30/17 16:35, Ævar Arnfjörð Bjarmason wrote: > On Tue, May 30, 2017 at 3:37 PM, Junio C Hamano <gitster@pobox.com> wrote: >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> Just curious do you know about https://github.com/trast/tbdiff ? If >>> not it might have a high overlap with what you're doing. Thank you for the suggestion, it does exactly what I need. It solves the Assignment Problem a whole lot better than my current, crude script. https://en.wikipedia.org/wiki/Assignment_problem https://en.wikipedia.org/wiki/Hungarian_algorithm (Mildly amusing, to me anyway: I happen to be Hungarian.) The output of git-tbdiff is not exactly the way I like it (I like to review interdiffs colorized, and fed to $PAGER individually), but I can easily do that on top of the "--no-patches" output. >> Yes, that is a very good suggestion. You'd need to be able to >> actually apply the patches but the way I often do a review is very >> similar to (actually, I'd say it is identical workflow) Laszlo's, >> and it goes like this: >> >> $ git checkout topic ;# previous round >> $ git checkout master... ;# check out the fork point of previous one >> $ git am mbox ;# apply the updated one >> $ git tbdiff ..@{-1} @{-1}.. > > I wonder if this tool seemingly everyone on-list uses should just be > integrated into git.git. > > I only learned about it <2 weeks ago and it's been great. The diff > output was a bit nonsensical in some cases because it uses python's > diff engine instead of git's. Yes, I'd probably plug in git's diff engine (the patience algorithm at that). > > Would you mind patches to just integrate it to git in python as-is, > then we could slowly convert it to C as we're doing with everything > else. That would be wonderful. My preference would be to use core git features for solving the problem; as a second preference, it would be great if git-tbdiff were packaged & shipped by GNU/Linux distros. Thomas (I see you are on To:/Cc: now -- I meant to CC you, snarfing your email from the git.git history :) ), are you aware of any distros already packaging git-tbdiff? Still the best would be if git provided this feature out of the box; incremental review of rerolled series appears a common activity for any serious reviewer / subsystem co-maintainer. Thanks! Laszlo ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-05-30 15:06 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-29 8:49 FORMAT_PATCH_NAME_MAX increase Laszlo Ersek 2017-05-30 1:34 ` Junio C Hamano 2017-05-30 11:03 ` Laszlo Ersek 2017-05-30 13:28 ` Junio C Hamano 2017-05-30 11:36 ` Ævar Arnfjörð Bjarmason 2017-05-30 12:28 ` Laszlo Ersek 2017-05-30 12:33 ` Laszlo Ersek 2017-05-30 12:41 ` Ævar Arnfjörð Bjarmason 2017-05-30 13:37 ` Junio C Hamano 2017-05-30 14:35 ` Ævar Arnfjörð Bjarmason 2017-05-30 15:06 ` Laszlo Ersek
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.