All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sunshine <sunshine@sunshineco.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: Git List <git@vger.kernel.org>,
	Lars Schneider <larsxschneider@gmail.com>,
	Rich Felker <dalias@libc.org>, Junio C Hamano <gitster@pobox.com>,
	Kevin Daudt <me@ikke.info>
Subject: Re: [PATCH] utf8: handle systems that don't write BOM for UTF-16
Date: Sat, 9 Feb 2019 20:45:16 -0500	[thread overview]
Message-ID: <CAPig+cRyzZMOM19ztgR_wqvk68P_1eNNVBBj5pbY=MhQm08WAw@mail.gmail.com> (raw)
In-Reply-To: <20190209200802.277139-1-sandals@crustytoothpaste.net>

On Sat, Feb 9, 2019 at 3:08 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> [...]
> Add a Makefile and #define knob, ICONV_NEEDS_BOM, that can be set if the
> iconv implementation has this behavior. When set, Git will write a BOM
> manually for UTF-16 and UTF-32 and then force the data to be written in
> UTF-16BE or UTF-32BE. We choose big-endian behavior here because the
> tests use the raw "UTF-16" encoding, which will be big-endian when the
> implementation requires this knob to be set.

The name ICONV_NEEDS_BOM makes it sound as if we must feed a BOM
_into_ 'iconv', which is quite confusing since the actual intention is
that 'iconv' doesn't emit a BOM and we need to make up for the
deficiency. Using a name such as ICONV_OMITS_BOM or ICONV_NEGLECTS_BOM
makes it somewhat clearer that there is some deficiency with which we
need to deal.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> diff --git a/Makefile b/Makefile
> @@ -259,6 +259,9 @@ all::
> +# Define ICONV_NEEDS_BOM if your iconv implementation does not write a
> +# byte-order mark (BOM) when writing UTF-16 or UTF-32.

Not a big deal, but I wonder if it would be helpful to tack on "...,
in which case it outputs big-endian unconditionally." or something.

> diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
> @@ -6,6 +6,25 @@ test_description='working-tree-encoding conversion via gitattributes'
> +test_lazy_prereq NO_UTF16_BOM '
> +       test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
> +'
> +
> +test_lazy_prereq NO_UTF32_BOM '
> +       test $(printf abc | iconv -f UTF-8 -t UTF-32 | wc -c) = 12
> +'
> +
> +write_utf16 () {
> +       test_have_prereq NO_UTF16_BOM && printf '\xfe\xff'
> +       iconv -f UTF-8 -t UTF-16
> +
> +}

Stray blank line before the closing brace.

> +
> +write_utf32 () {
> +       test_have_prereq NO_UTF32_BOM && printf '\x00\x00\xfe\xff'
> +       iconv -f UTF-8 -t UTF-32
> +}

It's probably doesn't matter much with these two tiny functions, but I
was wondering if it would make sense to maintain the &&-chain, perhaps
like this:

    if test test_have_prereq NO_UTF32_BOM
    then
        printf '\x00\x00\xfe\xff'
    fi &&
    iconv -f UTF-8 -t UTF-32

  reply	other threads:[~2019-02-10  1:45 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 21:59 t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Kevin Daudt
2019-02-08  0:17 ` brian m. carlson
2019-02-08  6:04   ` Rich Felker
2019-02-08 11:45     ` brian m. carlson
2019-02-08 11:55       ` Kevin Daudt
2019-02-08 13:51         ` brian m. carlson
2019-02-08 17:50           ` Junio C Hamano
2019-02-08 20:23             ` Kevin Daudt
2019-02-08 20:42               ` brian m. carlson
2019-02-08 23:12                 ` Junio C Hamano
2019-02-09  0:24                   ` brian m. carlson
2019-02-09 14:57                 ` Kevin Daudt
2019-02-09 20:08                   ` [PATCH] utf8: handle systems that don't write BOM for UTF-16 brian m. carlson
2019-02-10  1:45                     ` Eric Sunshine [this message]
2019-02-10 18:14                       ` brian m. carlson
2019-02-10  8:04                     ` Torsten Bögershausen
2019-02-10 18:55                       ` brian m. carlson
2019-02-11 17:14                         ` Junio C Hamano
2019-02-11  0:23                     ` [PATCH v2] " brian m. carlson
2019-02-11  1:16                       ` Eric Sunshine
2019-02-11  1:20                         ` brian m. carlson
2019-02-11  1:26                     ` [PATCH v3] " brian m. carlson
2019-02-11 21:43                       ` Kevin Daudt
2019-02-11 23:58                         ` brian m. carlson
2019-02-12  0:31                           ` Junio C Hamano
2019-02-12  0:53                             ` brian m. carlson
2019-02-12  2:43                               ` Junio C Hamano
2019-02-12  0:52                     ` [PATCH v4] " brian m. carlson
2019-02-08 16:13         ` t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Rich Felker
2019-02-09  8:09     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPig+cRyzZMOM19ztgR_wqvk68P_1eNNVBBj5pbY=MhQm08WAw@mail.gmail.com' \
    --to=sunshine@sunshineco.com \
    --cc=dalias@libc.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=me@ikke.info \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.