All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: "Torsten Bögershausen" <tboegi@web.de>, git@vger.kernel.org
Subject: Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows
Date: Fri, 31 Mar 2017 21:44:15 +0200	[thread overview]
Message-ID: <bbd60ab1-1309-6b1e-9b7f-09764bab5ccd@gmail.com> (raw)
In-Reply-To: <264c72d0-9558-fa0d-e5ee-eaca894538be@web.de>

W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze:
> On 30.03.17 21:35, Jakub Narębski wrote:
>> Hello,
>>
>> Recently I had to work on a project which uses legacy 8-bit encoding
>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX
>> documents).  My terminal, that is Git Bash from Git for Windows is set
>> up for utf-8.
>>
>> I wanted for "git diff" and friends to return something sane on said
>> utf-8 terminal, instead of mojibake.  There is 'encoding'
>> gitattribute... but it works only for GUI ('git gui', that is).
>>
>> Therefore I have (ab)used textconv facility to convert from cp1250 of
>> file encoding to utf-8 encoding of console.
>>
>> I have set the following in .gitattributes file:
>>
>>   ## LaTeX documents in cp1250 encoding
>>   *.tex text diff=mylatex
>>
>> The 'mylatex' driver is defined as:
>>
>>   [diff "mylatex"]
>>         xfuncname = "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
>>         wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
>>         textconv  = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t utf-8
>>         cachetextconv = true
>>
>> And everything would be all right... if not the fact that Git appends
>> spurious ^M to added lines in the `git diff` output.  Files use CRLF
>> end-of-line convention (the native MS Windows one).
>>
>>   $ git diff test.tex
>>   diff --git a/test.tex b/test.tex
>>   index 029646e..250ab16 100644
>>   --- a/test.tex
>>   +++ b/test.tex
>>   @@ -1,4 +1,4 @@
>>   -\documentclass{article}
>>   +\documentclass{mwart}^M
>>   
>>    \usepackage[cp1250]{inputenc}
>>    \usepackage{polski}
>>
>> What gives?  Why there is this ^M tacked on the end of added lines,
>> while it is not present in deleted lines, nor in content lines?
>>
>> Puzzled.
>>
>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
>> that it doesn't supports in core `encoding` attribute together with
>> having `i18n.outputEncoding`.
>
> Is there a chance to give us a receipt how to reproduce it?
> A complete test script or ?
> (I don't want to speculate, if the invocation of iconv is the problem,
>  where stdout is not in "binary mode", or however this is called under Windows)

I'm sorry, I though I posted whole recipe, but I missed some details
in the above description of the case.

First, files are stored on filesystem using CRLF eol (DOS end-of-line
convention).  Due to `core.autocrlf` they are converted to LF in blobs,
that is in the index and in the repository.

Second, a textconv with filter preserving end-of-line needs to be
configured.  I have used `iconv`, but I suspect that the problem would
happen also for `cat`.

In the .gitattributes file, or .git/info/attributes add, for example:

  *.tex text diff=myconv

In the .git/config configure the textconv filter, for example:

  [diff "myconv"]
         textconv  = iconv.exe -f cp1250 -t utf-8

Create a file which filename matches the attribute line, and which
uses CRLF end of line convention, and add it to Git (adding it to
the index):

  $ printf "foo\r\n" >foo.tex
  $ git add foo.tex

Modify file (also with CRLF):

  $ printf "bar\r\n" >foo.tex

Check the difference

  $ git diff foo.tex

HTH
-- 
Jakub Narębski


  reply	other threads:[~2017-03-31 19:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-30 19:35 [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows Jakub Narębski
2017-03-30 20:00 ` Jeff King
2017-03-31 13:24   ` Jakub Narębski
2017-04-01  6:08     ` Jeff King
2017-04-01 18:31       ` Jakub Narębski
2017-04-02  7:45         ` Jeff King
2017-04-02 11:40           ` Jakub Narębski
2017-03-31 12:38 ` Torsten Bögershausen
2017-03-31 19:44   ` Jakub Narębski [this message]
2017-04-02  4:34     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbd60ab1-1309-6b1e-9b7f-09764bab5ccd@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.