All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Lars Schneider" <lars.schneider@autodesk.com>,
	git <git@vger.kernel.org>, "Johannes Sixt" <j6t@kdbg.org>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de
Subject: Re: [PATCH v7 0/7] convert: add support for different encodings
Date: Sun, 25 Feb 2018 20:44:46 -0500	[thread overview]
Message-ID: <20180226014445.GB8677@sigill.intra.peff.net> (raw)
In-Reply-To: <FDF4DEB8-E71A-4BFC-9437-678C8F65BBDC@gmail.com>

On Sat, Feb 24, 2018 at 04:18:36PM +0100, Lars Schneider wrote:

> > We always use the in-repo contents when generating 'diff'.  I think
> > by "attribute to be used in diff", what you are reallying after is
> > to convert the in-repo contents to that encoding _BEFORE_ running
> > 'diff' on it.  E.g. in-repo UTF-16 that can have NUL bytes all over
> > the place will not diff well with the xdiff machinery, but if you
> > first convert it to UTF-8 and have xdiff work on it, you can get
> > reasonable result out of it.  It is unclear what encoding you want
> > your final diff output in (it is equally valid in such a set-up to
> > desire your patch output in UTF-16 or UTF-8), but assuming that you
> > want UTF-8 in your patch output, perhaps we do not have to break
> > gitk users by hijacking the 'encoding' attribute.  Instead what you
> > want is a single bit that says between in-repo or working tree which
> > representation should be given to the xdiff machinery.
> 
> I fear that we could confuse users with an additional knob/bit that
> defines what we diff against. Git always diff'ed against in-repo 
> content and I feel it should stay that way.

Well, except for textconv. You can already do this:

  echo "foo diff=utf16" >.gitattributes
  git config diff.utf16.textconv 'iconv -f utf16 -t utf8'

We could make that easier to use and much more efficient by:

  1. Allowing a special syntax for textconv filters that kicks off an
     internal iconv.

  2. Providing baked-in config for utf16.

The patch below provides a sketch. But I think Torsten raised a good
point that you might want the encoding conversion to be independent of
other diff characteristics (so, e.g., you might say "this is utf16 but
once converted treat it like C code for finding funcnames, etc").

---
diff --git a/diff.c b/diff.c
index 21c3838b25..04032e059c 100644
--- a/diff.c
+++ b/diff.c
@@ -5968,6 +5968,21 @@ struct diff_filepair *diff_unmerge(struct diff_options *options, const char *pat
 	return pair;
 }
 
+static char *iconv_textconv(const char *encoding, struct diff_filespec *spec,
+			    size_t *outsize)
+{
+	char *ret;
+	int outsize_int; /* this really should be a size_t */
+
+	if (diff_populate_filespec(spec, 0))
+		die("unable to load content for %s", spec->path);
+	ret = reencode_string_len(spec->data, spec->size,
+				  "utf-8", /* should be log_output_encoding? */
+				  encoding, &outsize_int);
+	*outsize = outsize_int;
+	return ret;
+}
+
 static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 		size_t *outsize)
 {
@@ -5978,6 +5993,9 @@ static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 	struct strbuf buf = STRBUF_INIT;
 	int err = 0;
 
+	if (skip_prefix(pgm, "iconv:", &pgm))
+		return iconv_textconv(pgm, spec, outsize);
+
 	temp = prepare_temp_file(spec->path, spec);
 	*arg++ = pgm;
 	*arg++ = temp->name;
diff --git a/userdiff.c b/userdiff.c
index dbfb4e13cd..48fa7e8bdd 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -161,6 +161,7 @@ IPATTERN("css",
 	 "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
 	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
 ),
+{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS

  reply	other threads:[~2018-02-26  1:44 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 15:27 [PATCH v7 0/7] convert: add support for different encodings lars.schneider
2018-02-15 15:27 ` [PATCH v7 1/7] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-02-16 12:55   ` Ævar Arnfjörð Bjarmason
2018-02-16 18:45     ` Jeff King
2018-02-16 19:30       ` Junio C Hamano
2018-02-15 15:27 ` [PATCH v7 2/7] strbuf: add xstrdup_toupper() lars.schneider
2018-02-15 15:27 ` [PATCH v7 3/7] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-02-15 15:27 ` [PATCH v7 4/7] utf8: add function to detect a missing " lars.schneider
2018-02-15 15:27 ` [PATCH v7 5/7] convert: add 'working-tree-encoding' attribute lars.schneider
2018-02-15 15:27 ` [PATCH v7 6/7] convert: add tracing for " lars.schneider
2018-02-15 15:27 ` [PATCH v7 7/7] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-02-15 20:03 ` [PATCH v7 0/7] convert: add support for different encodings Junio C Hamano
2018-02-15 22:09   ` Jeff King
2018-02-16 18:55     ` Junio C Hamano
2018-02-16 19:25       ` Jeff King
2018-02-16 19:27         ` Jeff King
2018-02-16 19:41           ` Junio C Hamano
2018-02-21 18:06       ` Lars Schneider
2018-02-16 14:42   ` Lars Schneider
2018-02-16 16:58     ` Torsten Bögershausen
2018-02-22 20:00       ` Lars Schneider
2018-02-22 20:12         ` Jeff King
2018-02-23 16:35         ` Junio C Hamano
2018-02-23 20:11           ` Junio C Hamano
2018-02-24 15:18             ` Lars Schneider
2018-02-26  1:44               ` Jeff King [this message]
2018-02-26 17:35                 ` Torsten Bögershausen
2018-02-26 20:46                   ` Jeff King
2018-02-27 21:05                     ` Torsten Bögershausen
2018-02-27 21:25                       ` Jeff King
2018-02-27 21:55                         ` Junio C Hamano
2018-02-27 21:58                           ` Jeff King
2018-02-27 22:10                             ` Junio C Hamano
2018-02-27 22:20                               ` Jeff King
2018-02-28  8:20                         ` Torsten Bögershausen
2018-02-28 13:21                           ` Jeff King
2018-02-28 17:42                             ` Junio C Hamano
2018-03-01  7:49                               ` Jeff King
2018-03-04 10:16                             ` Torsten Bögershausen
2018-02-28 20:46                         ` Lars Schneider
2018-02-16 19:04     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180226014445.GB8677@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=larsxschneider@gmail.com \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.