All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: lars.schneider@autodesk.com
Cc: git@vger.kernel.org, gitster@pobox.com, tboegi@web.de,
	patrick@luehne.de, Lars Schneider <larsxschneider@gmail.com>
Subject: Re: [PATCH v1] convert: add support for 'encoding' attribute
Date: Fri, 15 Dec 2017 04:58:38 -0500	[thread overview]
Message-ID: <20171215095838.GA3567@sigill.intra.peff.net> (raw)
In-Reply-To: <20171211155023.1405-1-lars.schneider@autodesk.com>

On Mon, Dec 11, 2017 at 04:50:23PM +0100, lars.schneider@autodesk.com wrote:

> From: Lars Schneider <larsxschneider@gmail.com>
> 
> Git and its tools (e.g. git diff) expect all text files in UTF-8
> encoding. Git will happily accept content in all other encodings, too,
> but it might not be able to process the text (e.g. viewing diffs or
> changing line endings).
> 
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will
> reverse the conversion.

This made me wonder what happens when you have a file that _was_ utf16,
and then you later convert it to utf8 and add a .gitattributes entry.

I tried this with your patch:

  git init repo
  cd repo
  
  echo foo | iconv -t utf16 >file
  git add file
  git commit -m utf16
  
  echo 'file encoding=utf16' >.gitattributes
  touch file ;# make it stat-dirty
  git commit -m convert
  
  git checkout HEAD^

That works OK, because we try to read the attributes from the
destination tree for a checkout. If you do this:

  echo 'file encoding=utf16' >.git/info/attributes
  git checkout HEAD^

then we get:

  warning: failed to encode 'file' from utf-8 to utf16

At least it figured out that it couldn't convert the content. It's
slightly troubling that it would try in the first place, though; are
there encoding pairs where we might accidentally generate nonsense?

It _should_ be uncommon, I think, to have a repo-level attribute set
like that. It's very common for us to use whatever happens to be in the
checked-out .gitattributes for some attributes (e.g., when doing a diff
of an older commit), but I think for checkout-related ones it's not.  So
I think it may generally work in practice. And certainly the line-ending
code would share any similar problems, but at least there the result is
less confusing than mojibake.

Playing around, I also managed to do this:

  echo 'file encoding=utf16' >.gitattributes
  echo foo >file

  # I did these with an old version of git that didn't
  # support the new attribute, so they blindly added the utf8 content.
  git add .
  git commit -m convert

  git.compile checkout HEAD^

which yielded:

  fatal: encoding 'file' from utf16 to utf-8 and back is not the same

Now obviously my situation is somewhat nonsense. I was trying to force
the in-repo representation to utf8, but ended up with a mismatched
working tree file. But what's somewhat troubling is that I couldn't
checkout _away_ from that state due to the die() in convert_to_git().
Which is in turn just there as part of refresh_index().

And indeed, other commands hit the same problem:

  $ git.compile diff
  fatal: encoding 'file' from utf16 to utf-8 and back is not the same

  $ git.compile checkout -f
  fatal: encoding 'file' from utf16 to utf-8 and back is not the same

It may make sense to die() during "git add ." (since we're actually
changing the index entry, and we don't want to put nonsense into a
tree). But I'm not sure it's the best thing for operations which just
want to read the content. For them, perhaps it would be more appropriate
to issue a warning and return the untouched content.

-Peff

  parent reply	other threads:[~2017-12-15  9:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-11 15:50 [PATCH v1] convert: add support for 'encoding' attribute lars.schneider
2017-12-11 18:39 ` Eric Sunshine
2017-12-11 23:47   ` Lars Schneider
2017-12-11 23:58     ` Eric Sunshine
2017-12-12 10:58       ` Lars Schneider
2017-12-11 20:47 ` Johannes Sixt
2017-12-11 23:42   ` Lars Schneider
2017-12-12  0:59     ` Junio C Hamano
2017-12-12  7:15       ` Johannes Sixt
2017-12-12 10:55         ` Lars Schneider
2017-12-12 19:31           ` Junio C Hamano
2017-12-13 17:57             ` Lars Schneider
2017-12-13 18:11               ` Junio C Hamano
2017-12-13 23:02                 ` Lars Schneider
2017-12-14 23:01                   ` Junio C Hamano
2017-12-12  7:09     ` Johannes Sixt
2017-12-18 10:13   ` Torsten Bögershausen
2017-12-18 13:12     ` Jeff King
2017-12-23  8:08       ` Torsten Bögershausen
2017-12-29 13:28       ` [PATCH/RFC 0/2] git diff --UTF-8 tboegi
2017-12-29 13:28       ` [PATCH/RFC 1/2] convert_to_git(): checksafe becomes an integer tboegi
2017-12-29 13:28       ` [PATCH/RFC 2/2] git diff: Allow to reencode into UTF-8 tboegi
2018-02-26 17:27       ` [PATCH/RFC 1/1] Auto diff of UTF-16 files in UTF-8 tboegi
2018-02-26 18:43         ` Peter Krefting
2018-02-27 22:39         ` Jeff King
2017-12-18 18:02     ` [PATCH v1] convert: add support for 'encoding' attribute Junio C Hamano
2017-12-18 21:55     ` Johannes Sixt
2017-12-15  9:58 ` Jeff King [this message]
2017-12-18 10:54   ` Lars Schneider
2017-12-18 12:59     ` Jeff King
2017-12-17 17:14 ` Torsten Bögershausen
2017-12-28 16:14   ` Lars Schneider
2017-12-29 12:59     ` Torsten Bögershausen
2017-12-29 13:56       ` Lars Schneider
2018-01-03 19:15       ` Junio C Hamano
2018-01-03 20:45         ` Lars Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171215095838.GA3567@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lars.schneider@autodesk.com \
    --cc=larsxschneider@gmail.com \
    --cc=patrick@luehne.de \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.