All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword
       [not found] <20201221034452.307153-1-mattst88@gentoo.org>
@ 2020-12-22 22:40 ` Matt Turner
  2020-12-23 18:11   ` Beat Bolli
  2020-12-23 19:45   ` Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Matt Turner @ 2020-12-22 22:40 UTC (permalink / raw)
  To: gentoo-portage-dev; +Cc: git

tl;dr:

I want to handle conflicts automatically on lines like

> KEYWORDS="~alpha ~amd64 ~arm ~arm64 ~hppa ~ia64 ~mips ~ppc ~ppc64 ~riscv ~s390 ~sparc ~x86"

where conflicts frequently happen by adding/removing ~ before the
architecture names or adding/removing whole architectures. I don't
know if I should use a custom git merge driver or a custom git merge
strategy.


So the program in the patch below works, but it's not ideal, because
it rejects any hunks that don't touch the KEYWORDS=... assignment.

As I understand it, a custom git merge driver is intended to be used
to merge whole file formats, like JSON. As a result, you configure it
via gitattributes on a per-extension basis.

I really just want to make the default recursive git merge handle
KEYWORDS=... conflicts automatically, and I don't expect to be able to
make a git merge driver that can handle arbitrary conflicts in
*.ebuild files. If the merge driver returns non-zero if it was unable
to resolve the conflicts, but when it does so git evidently doesn't
fallback and insert the typical <<< HEAD ... === ... >>> markers.
Maybe I could make my merge driver insert those like git normally
does? Seems like git's  logic is probably a bit better about handling
some conflicts than my tool would be.

So... is a git merge strategy the thing I want? I don't know. There
doesn't seem to really be any documentation on writing git merge
strategies. I've only found [1] and [2].

Cc'ing git@vger.kernel.org, since I expect that's where the experts
are. Hopefully they have suggestions.


[1] https://stackoverflow.com/questions/23140240/git-how-do-i-add-a-custom-merge-strategy
[2] https://stackoverflow.com/questions/54528824/any-documentation-for-writing-a-custom-git-merge-strategy


On Sun, Dec 20, 2020 at 10:44 PM Matt Turner <mattst88@gentoo.org> wrote:
>
> Since the KEYWORDS=... assignment is a single line, git struggles to
> handle conflicts. When rebasing a series of commits that modify the
> KEYWORDS=... it's usually easier to throw them away and reapply on the
> new tree than it is to manually handle conflicts during the rebase.
>
> git allows a 'merge driver' program to handle conflicts; this program
> handles conflicts in the KEYWORDS=... assignment. E.g., given an ebuild
> with these keywords:
>
> KEYWORDS="~alpha ~amd64 ~arm ~arm64 ~hppa ~ia64 ~mips ~ppc ~ppc64 ~riscv ~s390 ~sparc ~x86"
>
> One developer drops the ~alpha keyword and pushes to gentoo.git, and
> another developer stabilizes hppa. Without this merge driver, git
> requires the second developer to manually resolve the conflict.  With
> the custom merge driver, it automatically resolves the conflict.
>
> gentoo.git/.git/config:
>
>         [core]
>                 ...
>                 attributesfile = ~/.gitattributes
>         [merge "keywords"]
>                 name = KEYWORDS merge driver
>                 driver = merge-driver-ekeyword %O %A %B
>
>  ~/.gitattributes:
>
>         *.ebuild merge=keywords
>
> Signed-off-by: Matt Turner <mattst88@gentoo.org>
> ---
> One annoying wart in the program is due to the fact that ekeyword
> won't work on any file not named *.ebuild. I make a symlink (and set up
> an atexit handler to remove it) to work around this. I'm not sure we
> could make ekeyword handle arbitrary filenames given its complex multi-
> argument parameter support. git merge files are named .merge_file_XXXXX
> according to git-unpack-file(1), so we could allow those. Thoughts?
>
>  bin/merge-driver-ekeyword | 125 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 125 insertions(+)
>  create mode 100755 bin/merge-driver-ekeyword
>
> diff --git a/bin/merge-driver-ekeyword b/bin/merge-driver-ekeyword
> new file mode 100755
> index 0000000..6e645a9
> --- /dev/null
> +++ b/bin/merge-driver-ekeyword
> @@ -0,0 +1,125 @@
> +#!/usr/bin/python
> +#
> +# Copyright 2020 Gentoo Authors
> +# Distributed under the terms of the GNU General Public License v2 or later
> +
> +"""
> +Custom git merge driver for handling conflicts in KEYWORDS assignments
> +
> +See https://git-scm.com/docs/gitattributes#_defining_a_custom_merge_driver
> +"""
> +
> +import atexit
> +import difflib
> +import os
> +import shutil
> +import sys
> +
> +from typing import List, Optional, Tuple
> +
> +from gentoolkit.ekeyword import ekeyword
> +
> +
> +def keyword_array(keyword_line: str) -> List[str]:
> +    # Find indices of string inside the double-quotes
> +    i1: int = keyword_line.find('"') + 1
> +    i2: int = keyword_line.rfind('"')
> +
> +    # Split into array of KEYWORDS
> +    return keyword_line[i1:i2].split(' ')
> +
> +
> +def keyword_line_changes(old: str, new: str) -> List[Tuple[Optional[str],
> +                                                           Optional[str]]]:
> +    a: List[str] = keyword_array(old)
> +    b: List[str] = keyword_array(new)
> +
> +    s = difflib.SequenceMatcher(a=a, b=b)
> +
> +    changes = []
> +    for tag, i1, i2, j1, j2 in s.opcodes():
> +        if tag == 'replace':
> +            changes.append((a[i1:i2], b[j1:j2]),)
> +        elif tag == 'delete':
> +            changes.append((a[i1:i2], None),)
> +        elif tag == 'insert':
> +            changes.append((None, b[j1:j2]),)
> +        else:
> +            assert tag == 'equal'
> +    return changes
> +
> +
> +def keyword_changes(ebuild1: str, ebuild2: str) -> List[Tuple[Optional[str],
> +                                                              Optional[str]]]:
> +    with open(ebuild1) as e1, open(ebuild2) as e2:
> +        lines1 = e1.readlines()
> +        lines2 = e2.readlines()
> +
> +        diff = difflib.unified_diff(lines1, lines2, n=0)
> +        assert next(diff) == '--- \n'
> +        assert next(diff) == '+++ \n'
> +
> +        hunk: int = 0
> +        old: str = ''
> +        new: str = ''
> +
> +        for line in diff:
> +            if line.startswith('@@ '):
> +                if hunk > 0: break
> +                hunk += 1
> +            elif line.startswith('-'):
> +                if old or new: break
> +                old = line
> +            elif line.startswith('+'):
> +                if not old or new: break
> +                new = line
> +        else:
> +            if 'KEYWORDS=' in old and 'KEYWORDS=' in new:
> +                return keyword_line_changes(old, new)
> +        return None
> +
> +
> +def apply_keyword_changes(ebuild: str,
> +                          changes: List[Tuple[Optional[str],
> +                                              Optional[str]]]) -> int:
> +    # ekeyword will only modify files named *.ebuild, so make a symlink
> +    ebuild_symlink = ebuild + '.ebuild'
> +    os.symlink(ebuild, ebuild_symlink)
> +    atexit.register(lambda: os.remove(ebuild_symlink))
> +
> +    for removals, additions in changes:
> +        args = []
> +        for rem in removals:
> +            # Drop leading '~' and '-' characters and prepend '^'
> +            i = 1 if rem[0] in ('~', '-') else 0
> +            args.append('^' + rem[i:])
> +        if additions:
> +            args.extend(additions)
> +        args.append(ebuild_symlink)
> +
> +        result = ekeyword.main(args)
> +        if result != 0:
> +            return result
> +    return 0
> +
> +
> +def main(argv):
> +    if len(argv) != 4:
> +        sys.exit(-1)
> +
> +    O = argv[1] # %O - filename of original
> +    A = argv[2] # %A - filename of our current version
> +    B = argv[3] # %B - filename of the other branch's version
> +
> +    # Get changes from %O to %B
> +    changes = keyword_changes(O, B)
> +    if not changes:
> +        sys.exit(-1)
> +
> +    # Apply O -> B changes to A
> +    result: int = apply_keyword_changes(A, changes)
> +    sys.exit(result)
> +
> +
> +if __name__ == "__main__":
> +    main(sys.argv)
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword
  2020-12-22 22:40 ` [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword Matt Turner
@ 2020-12-23 18:11   ` Beat Bolli
  2020-12-23 19:46     ` Junio C Hamano
  2020-12-23 19:45   ` Junio C Hamano
  1 sibling, 1 reply; 6+ messages in thread
From: Beat Bolli @ 2020-12-23 18:11 UTC (permalink / raw)
  To: Matt Turner, gentoo-portage-dev; +Cc: git

On 22.12.20 23:40, Matt Turner wrote:
> tl;dr:
> 
> I want to handle conflicts automatically on lines like
> 
>> KEYWORDS="~alpha ~amd64 ~arm ~arm64 ~hppa ~ia64 ~mips ~ppc ~ppc64 ~riscv ~s390 ~sparc ~x86"
> 
> where conflicts frequently happen by adding/removing ~ before the
> architecture names or adding/removing whole architectures. I don't
> know if I should use a custom git merge driver or a custom git merge
> strategy.

You can probably put each of the keywords on a separate line:

KEYWORDS="
~alpha
~amd64
~arm
~arm64
~hppa
~ia64
~mips
~ppc
~ppc64
~riscv
~s390
~sparc~x86
"

The shell should handle both forms about the same.

(I'm not a Gentoo user, just talking about my general shell experience)

Regards
Beat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword
  2020-12-22 22:40 ` [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword Matt Turner
  2020-12-23 18:11   ` Beat Bolli
@ 2020-12-23 19:45   ` Junio C Hamano
  2020-12-24  4:47     ` Matt Turner
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2020-12-23 19:45 UTC (permalink / raw)
  To: Matt Turner; +Cc: gentoo-portage-dev, git

Matt Turner <mattst88@gentoo.org> writes:

> I want to handle conflicts automatically on lines like
>
>> KEYWORDS="~alpha ~amd64 ~arm ~arm64 ~hppa ~ia64 ~mips ~ppc ~ppc64 ~riscv ~s390 ~sparc ~x86"
>
> where conflicts frequently happen by adding/removing ~ before the
> architecture names or adding/removing whole architectures. I don't
> know if I should use a custom git merge driver or a custom git merge
> strategy.

A merge strategy is about how the changes at the tree level are
handled.  A merge driver is given three blobs (original, your
version, and their version) and comes up with a merged blob.

In your case, you'd want a custom merge driver if you want to handle
word changes on a single line, because the default text merge driver
is pretty much line oriented.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword
  2020-12-23 18:11   ` Beat Bolli
@ 2020-12-23 19:46     ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2020-12-23 19:46 UTC (permalink / raw)
  To: Beat Bolli; +Cc: Matt Turner, gentoo-portage-dev, git

Beat Bolli <dev+git@drbeat.li> writes:

> You can probably put each of the keywords on a separate line:
>
> KEYWORDS="
> ~alpha
> ~amd64
> ~arm
> ~arm64
> ~hppa
> ~ia64
> ~mips
> ~ppc
> ~ppc64
> ~riscv
> ~s390
> ~sparc~x86
> "
>
> The shell should handle both forms about the same.

I agree that it is a more practical approach than writing an one-off
merge driver.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword
  2020-12-23 19:45   ` Junio C Hamano
@ 2020-12-24  4:47     ` Matt Turner
  2020-12-24  6:13       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Matt Turner @ 2020-12-24  4:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: gentoo-portage-dev, git

On Wed, Dec 23, 2020 at 2:46 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matt Turner <mattst88@gentoo.org> writes:
>
> > I want to handle conflicts automatically on lines like
> >
> >> KEYWORDS="~alpha ~amd64 ~arm ~arm64 ~hppa ~ia64 ~mips ~ppc ~ppc64 ~riscv ~s390 ~sparc ~x86"
> >
> > where conflicts frequently happen by adding/removing ~ before the
> > architecture names or adding/removing whole architectures. I don't
> > know if I should use a custom git merge driver or a custom git merge
> > strategy.
>
> A merge strategy is about how the changes at the tree level are
> handled.  A merge driver is given three blobs (original, your
> version, and their version) and comes up with a merged blob.
>
> In your case, you'd want a custom merge driver if you want to handle
> word changes on a single line, because the default text merge driver
> is pretty much line oriented.

Thanks, that makes sense. The merge driver I've written seems to work
great for handling the KEYWORDS=... line.

If users could more simply opt into using it (e.g., on the command
line rather than enabling it via ~/.gitattributes) I think it would be
fine to use. Better yet, is there a way git can be configured to
fallback to another merge driver if the first returns a non-zero
status due to unresolved conflicts? For example, if there are changes
to other lines, how can I fall back to another merge driver?

Thank you for your advice!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword
  2020-12-24  4:47     ` Matt Turner
@ 2020-12-24  6:13       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2020-12-24  6:13 UTC (permalink / raw)
  To: Matt Turner; +Cc: gentoo-portage-dev, git

Matt Turner <mattst88@gentoo.org> writes:

> ... is there a way git can be configured to
> fallback to another merge driver if the first returns a non-zero
> status due to unresolved conflicts? For example, if there are changes
> to other lines, how can I fall back to another merge driver?

There is no "fallback", but a merge driver should be able to first
run another merge driver (e.g. "git merge-file" or the "merge"
command from the RCS suite of programs would be line-oriented 3-way
drivers suitable for text files) and then fix up the leftover bits.

If your users don't want to contaminate the .gitattributes file that
is recorded in-tree, they can also use .git/info/attributes to locally
configure Git to use such a driver.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-24  6:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201221034452.307153-1-mattst88@gentoo.org>
2020-12-22 22:40 ` [RFC PATCH gentoolkit] bin: Add merge-driver-ekeyword Matt Turner
2020-12-23 18:11   ` Beat Bolli
2020-12-23 19:46     ` Junio C Hamano
2020-12-23 19:45   ` Junio C Hamano
2020-12-24  4:47     ` Matt Turner
2020-12-24  6:13       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.