All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sunshine <sunshine@sunshineco.com>
To: Lars Schneider <lars.schneider@autodesk.com>
Cc: "Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Jeff King" <peff@peff.net>,
	patrick@luehne.de, "Lars Schneider" <larsxschneider@gmail.com>
Subject: Re: [PATCH v1] convert: add support for 'encoding' attribute
Date: Mon, 11 Dec 2017 13:39:27 -0500	[thread overview]
Message-ID: <CAPig+cQ6VSXXSYJOiZeTqUpwijVhvvUYzXF8U3KCBsOQ91HPZQ@mail.gmail.com> (raw)
In-Reply-To: <20171211155023.1405-1-lars.schneider@autodesk.com>

On Mon, Dec 11, 2017 at 10:50 AM,  <lars.schneider@autodesk.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Git and its tools (e.g. git diff) expect all text files in UTF-8
> encoding. Git will happily accept content in all other encodings, too,
> but it might not be able to process the text (e.g. viewing diffs or
> changing line endings).
>
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will
> reverse the conversion.
>
> Reviewed-by: Patrick Lühne <patrick@luehne.de>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
> diff --git a/convert.c b/convert.c
> @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,
> +static int encode_to_git(const char *path, const char *src, size_t src_len,
> +                        struct strbuf *buf, struct encoding *enc)
> +{
> +#ifndef NO_ICONV
> +       char *dst, *re_src;
> +       int dst_len, re_src_len;
> +
> +       /*
> +        * No encoding is specified or there is nothing to encode.
> +        * Tell the caller that the content was not modified.
> +        */
> +       if (!enc || (src && !src_len))
> +               return 0;
> +
> +       /*
> +        * Looks like we got called from "would_convert_to_git()".
> +        * This means Git wants to know if it would encode (= modify!)
> +        * the content. Let's answer with "yes", since an encoding was
> +        * specified.
> +        */
> +       if (!buf && !src)
> +               return 1;
> +
> +       if (enc->to_git == invalid_conversion) {
> +               enc->to_git = iconv_open(default_encoding, encoding->name);
> +               if (enc->to_git == invalid_conversion)
> +                       warning(_("unsupported encoding %s"), encoding->name);
> +       }
> +
> +       if (enc->to_worktree == invalid_conversion)
> +               enc->to_worktree = iconv_open(encoding->name, default_encoding);

Do you need to be calling iconv_close() somewhere on the result of the
iconv_open() calls? [Answering myself after reading the rest of the
patch: You're caching these opened 'iconv' descriptors, so you don't
plan on closing them.]

> + [...]
> +       /*
> +        * Encode dst back to ensure no information is lost. This wastes
> +        * a few cycles as most conversions are round trip conversion
> +        * safe. However, content that has an invalid encoding might not
> +        * match its original byte sequence after the UTF-8 conversion
> +        * round trip. Let's play safe here and check the round trip
> +        * conversion.
> +        */
> +       re_src = reencode_string_iconv(dst, dst_len, enc->to_worktree, &re_src_len);
> +       if (!re_src || strcmp(src, re_src)) {

You're using strcmp() as opposed to memcmp() because you expect
're_src' will unconditionally be UTF-8-encoded, right?

> +               die(_("encoding '%s' from %s to %s and back is not the same"),
> +                       path, enc->name, default_encoding);
> +       }
> +       free(re_src);
> +
> +       strbuf_attach(buf, dst, dst_len, dst_len + 1);
> +       return 1;
> +#else
> +       warning(_("cannot encode '%s' from %s to %s because "
> +               "your Git was not compiled with encoding support"),
> +               path, enc->name, default_encoding);
> +       return 0;
> +#endif
> +}

  reply	other threads:[~2017-12-11 18:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-11 15:50 [PATCH v1] convert: add support for 'encoding' attribute lars.schneider
2017-12-11 18:39 ` Eric Sunshine [this message]
2017-12-11 23:47   ` Lars Schneider
2017-12-11 23:58     ` Eric Sunshine
2017-12-12 10:58       ` Lars Schneider
2017-12-11 20:47 ` Johannes Sixt
2017-12-11 23:42   ` Lars Schneider
2017-12-12  0:59     ` Junio C Hamano
2017-12-12  7:15       ` Johannes Sixt
2017-12-12 10:55         ` Lars Schneider
2017-12-12 19:31           ` Junio C Hamano
2017-12-13 17:57             ` Lars Schneider
2017-12-13 18:11               ` Junio C Hamano
2017-12-13 23:02                 ` Lars Schneider
2017-12-14 23:01                   ` Junio C Hamano
2017-12-12  7:09     ` Johannes Sixt
2017-12-18 10:13   ` Torsten Bögershausen
2017-12-18 13:12     ` Jeff King
2017-12-23  8:08       ` Torsten Bögershausen
2017-12-29 13:28       ` [PATCH/RFC 0/2] git diff --UTF-8 tboegi
2017-12-29 13:28       ` [PATCH/RFC 1/2] convert_to_git(): checksafe becomes an integer tboegi
2017-12-29 13:28       ` [PATCH/RFC 2/2] git diff: Allow to reencode into UTF-8 tboegi
2018-02-26 17:27       ` [PATCH/RFC 1/1] Auto diff of UTF-16 files in UTF-8 tboegi
2018-02-26 18:43         ` Peter Krefting
2018-02-27 22:39         ` Jeff King
2017-12-18 18:02     ` [PATCH v1] convert: add support for 'encoding' attribute Junio C Hamano
2017-12-18 21:55     ` Johannes Sixt
2017-12-15  9:58 ` Jeff King
2017-12-18 10:54   ` Lars Schneider
2017-12-18 12:59     ` Jeff King
2017-12-17 17:14 ` Torsten Bögershausen
2017-12-28 16:14   ` Lars Schneider
2017-12-29 12:59     ` Torsten Bögershausen
2017-12-29 13:56       ` Lars Schneider
2018-01-03 19:15       ` Junio C Hamano
2018-01-03 20:45         ` Lars Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPig+cQ6VSXXSYJOiZeTqUpwijVhvvUYzXF8U3KCBsOQ91HPZQ@mail.gmail.com \
    --to=sunshine@sunshineco.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lars.schneider@autodesk.com \
    --cc=larsxschneider@gmail.com \
    --cc=patrick@luehne.de \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.