From: Eric Sunshine <sunshine@sunshineco.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Git List <git@vger.kernel.org>,
Lars Schneider <larsxschneider@gmail.com>,
Rich Felker <dalias@libc.org>, Junio C Hamano <gitster@pobox.com>,
Kevin Daudt <me@ikke.info>
Subject: Re: [PATCH] utf8: handle systems that don't write BOM for UTF-16
Date: Sat, 9 Feb 2019 20:45:16 -0500 [thread overview]
Message-ID: <CAPig+cRyzZMOM19ztgR_wqvk68P_1eNNVBBj5pbY=MhQm08WAw@mail.gmail.com> (raw)
In-Reply-To: <20190209200802.277139-1-sandals@crustytoothpaste.net>
On Sat, Feb 9, 2019 at 3:08 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> [...]
> Add a Makefile and #define knob, ICONV_NEEDS_BOM, that can be set if the
> iconv implementation has this behavior. When set, Git will write a BOM
> manually for UTF-16 and UTF-32 and then force the data to be written in
> UTF-16BE or UTF-32BE. We choose big-endian behavior here because the
> tests use the raw "UTF-16" encoding, which will be big-endian when the
> implementation requires this knob to be set.
The name ICONV_NEEDS_BOM makes it sound as if we must feed a BOM
_into_ 'iconv', which is quite confusing since the actual intention is
that 'iconv' doesn't emit a BOM and we need to make up for the
deficiency. Using a name such as ICONV_OMITS_BOM or ICONV_NEGLECTS_BOM
makes it somewhat clearer that there is some deficiency with which we
need to deal.
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/Makefile b/Makefile
> @@ -259,6 +259,9 @@ all::
> +# Define ICONV_NEEDS_BOM if your iconv implementation does not write a
> +# byte-order mark (BOM) when writing UTF-16 or UTF-32.
Not a big deal, but I wonder if it would be helpful to tack on "...,
in which case it outputs big-endian unconditionally." or something.
> diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
> @@ -6,6 +6,25 @@ test_description='working-tree-encoding conversion via gitattributes'
> +test_lazy_prereq NO_UTF16_BOM '
> + test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
> +'
> +
> +test_lazy_prereq NO_UTF32_BOM '
> + test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
> +'
> +
> +write_utf16 () {
> + test_have_prereq NO_UTF16_BOM && printf '\xfe\xff'
> + iconv -f UTF-8 -t UTF-16
> +
> +}
Stray blank line before the closing brace.
> +
> +write_utf32 () {
> + test_have_prereq NO_UTF32_BOM && printf '\x00\x00\xfe\xff'
> + iconv -f UTF-8 -t UTF-32
> +}
It's probably doesn't matter much with these two tiny functions, but I
was wondering if it would make sense to maintain the &&-chain, perhaps
like this:
if test test_have_prereq NO_UTF32_BOM
then
printf '\x00\x00\xfe\xff'
fi &&
iconv -f UTF-8 -t UTF-32
next prev parent reply other threads:[~2019-02-10 1:45 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-07 21:59 t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Kevin Daudt
2019-02-08 0:17 ` brian m. carlson
2019-02-08 6:04 ` Rich Felker
2019-02-08 11:45 ` brian m. carlson
2019-02-08 11:55 ` Kevin Daudt
2019-02-08 13:51 ` brian m. carlson
2019-02-08 17:50 ` Junio C Hamano
2019-02-08 20:23 ` Kevin Daudt
2019-02-08 20:42 ` brian m. carlson
2019-02-08 23:12 ` Junio C Hamano
2019-02-09 0:24 ` brian m. carlson
2019-02-09 14:57 ` Kevin Daudt
2019-02-09 20:08 ` [PATCH] utf8: handle systems that don't write BOM for UTF-16 brian m. carlson
2019-02-10 1:45 ` Eric Sunshine [this message]
2019-02-10 18:14 ` brian m. carlson
2019-02-10 8:04 ` Torsten Bögershausen
2019-02-10 18:55 ` brian m. carlson
2019-02-11 17:14 ` Junio C Hamano
2019-02-11 0:23 ` [PATCH v2] " brian m. carlson
2019-02-11 1:16 ` Eric Sunshine
2019-02-11 1:20 ` brian m. carlson
2019-02-11 1:26 ` [PATCH v3] " brian m. carlson
2019-02-11 21:43 ` Kevin Daudt
2019-02-11 23:58 ` brian m. carlson
2019-02-12 0:31 ` Junio C Hamano
2019-02-12 0:53 ` brian m. carlson
2019-02-12 2:43 ` Junio C Hamano
2019-02-12 0:52 ` [PATCH v4] " brian m. carlson
2019-02-08 16:13 ` t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Rich Felker
2019-02-09 8:09 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPig+cRyzZMOM19ztgR_wqvk68P_1eNNVBBj5pbY=MhQm08WAw@mail.gmail.com' \
--to=sunshine@sunshineco.com \
--cc=dalias@libc.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=larsxschneider@gmail.com \
--cc=me@ikke.info \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.