All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Russell, Scott" <Scott.Russell2@ncr.com>
To: "brian m. carlson" <sandals@crustytoothpaste.net>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: RE: git clone corrupts file.
Date: Mon, 16 Aug 2021 15:24:28 +0000	[thread overview]
Message-ID: <BN6PR15MB1426E99386269CCBDA888D51CBFD9@BN6PR15MB1426.namprd15.prod.outlook.com> (raw)
In-Reply-To: <YRbya0UO2+PvOjL5@camp.crustytoothpaste.net>

Brian,  

Thanks for your interest in this issue.   The issue has been determined to have 2 factors. 

1.  The files corrupted are in Unicode.   Though the .h file mentioned certainly doesn't have to be Unicode, it can be ANSI, we have other files that must be Unicode.  We use Unicode in quite a number of our text files.
2.  Git appears to corrupt the file by making line endings changes.  
          a.   Github has the correct file.  It views correct there.  When downloaded as a binary or text from Github in a browser, it is not corrupted. 
          b.   Git seems to change line endings as if the file were ANSI or single byte encoding, not Unicode. 
          c.   Git has the setting git config core.autocrlf false.   But apparently, it is not being observed.   
          d.   The .gitconfig file has the [core] section with the entry autocrlf = false following the section.  
          e.   There is a .gitattributes file in the repo.   
          f.    Entries in .gitattributes specified by type are specified for the affected files. 
                        *.h     text eol=crlf
                        *.ini   text eol=crlf

If you look at the 1st line of the binary view of the original file, it looks like this:

FF FE 2F 00 2F 00 7B 00   7B 00 4E 00 4F 00 5F 00
44 00 45 00 50 00 45 00  4E 00 44 00 45 00 4E 00 
43 00 49 00 45 00 53 00  7D 00 7D 00 0D 00 0A 00   	Note - Unicode CR LF  0D 00 0A 00   

2nd line 
2F 00 2F 00 20 00 4D 00  69 00 63 00 72 00 6F 00  etc.   

If you look at the git file, it looks very similar.   
However, git has put a non Unicode CF LF into the end of line. 
Plus an extra NULL.   This extra NULL throws the 2 byte Unicode encoding off.   It corrupts the line.  On the next line, the extra NULL lines up the 2 byte encoding, so that line appears uncorrupted.  
You can see that in my original email below.   Every other line is not readable.  

FF FE 2F 00 2F 00 7B 00   7B 00 4E 00 4F 00 5F 00
44 00 45 00 50 00 45 00  4E 00 44 00 45 00 4E 00 
43 00 49 00 45 00 53 00  7D 00 7D 00 0D 00 0D 0A0   	Note - Unicode CR LF  0D 00 0A 00   

2nd line 
00 2F 00 2F 00 20 00 4D 00  69 00 63 00 72 00 6F  etc.   

I would like git to observe the autocrlf false as directed.   

It's important that we retain 2 byte Unicode file encoding in many of our files.   And that git not add single byte CR LF into our 2 byte files.  
We can't convert the files to other encoding for convenience of git.  

Thanks, 

Scott Russell
Staff SW Engineer 
NCR Corporation 
Phone: +17706237512
Scott.Russell2@ncr.com  |  ncr.com
       

-----Original Message-----
From: brian m. carlson <sandals@crustytoothpaste.net> 
Sent: Friday, August 13, 2021 6:30 PM
To: Russell, Scott <Scott.Russell2@ncr.com>
Cc: git@vger.kernel.org
Subject: Re: git clone corrupts file.

*External Message* - Use caution before opening links or attachments

On 2021-08-13 at 18:54:43, Russell, Scott wrote:
> File from git.
> 
> ਍⼀⼀ 䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 椀渀挀氀甀搀攀 昀椀氀攀⸀ഀഀ
> // Used by CamTest.rc
> ਍⼀⼀ഀഀ
> #define IDM_ABOUTBOX                    0x0010
> ਍⌀搀攀昀椀渀攀 䤀䐀䐀开䄀䈀伀唀吀䈀伀堀                    ㄀  ഀഀ
> 
> File in github.
> 
> //{{NO_DEPENDENCIES}}
> // Microsoft Visual C++ generated include file.
> // Used by CamTest.rc
> //

We're probably going to need a little more information about this.  My guess as to what's happening here is that the editor you're using to view the file is set to read files as UTF-16, but the repository has them stored in UTF-8, or (less likely) vice versa.

Can you tell us what editor or other tool you're using to view the file and what settings it's using for text encoding?  Can you tell us about any working-tree-encoding declarations in your .gitattributes?  You can use "git check-attr -a PATH" to see more information about that.

What code page are you using on your system?  Are you using PowerShell, CMD, or Git Bash?  If you're using Git Bash, what are your locale settings?
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

  reply	other threads:[~2021-08-16 16:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BN6PR15MB1426E50F03A0530CA9140F98CBFA9@BN6PR15MB1426.namprd15.prod.outlook.com>
     [not found] ` <BN6PR15MB14261C40E614CC11416388B4CBFA9@BN6PR15MB1426.namprd15.prod.outlook.com>
2021-08-13 18:54   ` git clone corrupts file Russell, Scott
2021-08-13 22:30     ` brian m. carlson
2021-08-16 15:24       ` Russell, Scott [this message]
2021-08-16 16:53         ` Jeff King
2021-08-16 17:39           ` Russell, Scott
2021-08-16 18:49             ` Jeff King
2021-08-16 18:52               ` Russell, Scott
2021-08-16 18:51           ` Jeff King
2021-08-16 18:53             ` Russell, Scott
2021-08-16 21:50             ` brian m. carlson
2021-08-16 22:04               ` Russell, Scott
2021-08-16 22:19                 ` brian m. carlson
2021-08-16 22:26                   ` Russell, Scott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN6PR15MB1426E99386269CCBDA888D51CBFD9@BN6PR15MB1426.namprd15.prod.outlook.com \
    --to=scott.russell2@ncr.com \
    --cc=git@vger.kernel.org \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.