* git clone corrupts file.
[not found] ` <BN6PR15MB14261C40E614CC11416388B4CBFA9@BN6PR15MB1426.namprd15.prod.outlook.com>
@ 2021-08-13 18:54 ` Russell, Scott
2021-08-13 22:30 ` brian m. carlson
0 siblings, 1 reply; 13+ messages in thread
From: Russell, Scott @ 2021-08-13 18:54 UTC (permalink / raw)
To: git
What did you do before the bug happened? git clone
What did you expect to happen? file cloned matches github copy
What happened instead? file corrupted, does not match github copy see example
What's different between what you expected and what actually happened? corruption
[System Info]
git version:
git version 2.31.1.windows.1
cpu: x86_64
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature:
fsmonitor--daemon
uname: Windows 10.0 17134
compiler info: gnuc: 10.2
libc info: no libc information available
$SHELL (typically, interactive shell): <unset>
[Enabled Hooks]
not run from a git repository - no hooks to show
File from git.
⼀⼀ 䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 椀渀挀氀甀搀攀 昀椀氀攀⸀ഀഀ
// Used by CamTest.rc
⼀⼀ഀഀ
#define IDM_ABOUTBOX 0x0010
⌀搀攀昀椀渀攀 䤀䐀䐀开䄀䈀伀唀吀䈀伀堀 ഀഀ
File in github.
//{{NO_DEPENDENCIES}}
// Microsoft Visual C++ generated include file.
// Used by CamTest.rc
//
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
mailto:Scott.Russell2@ncr.com | http://www.ncr.com/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git clone corrupts file.
2021-08-13 18:54 ` git clone corrupts file Russell, Scott
@ 2021-08-13 22:30 ` brian m. carlson
2021-08-16 15:24 ` Russell, Scott
0 siblings, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2021-08-13 22:30 UTC (permalink / raw)
To: Russell, Scott; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]
On 2021-08-13 at 18:54:43, Russell, Scott wrote:
> File from git.
>
> ⼀⼀ 䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 椀渀挀氀甀搀攀 昀椀氀攀⸀ഀഀ
> // Used by CamTest.rc
> ⼀⼀ഀഀ
> #define IDM_ABOUTBOX 0x0010
> ⌀搀攀昀椀渀攀 䤀䐀䐀开䄀䈀伀唀吀䈀伀堀 ഀഀ
>
> File in github.
>
> //{{NO_DEPENDENCIES}}
> // Microsoft Visual C++ generated include file.
> // Used by CamTest.rc
> //
We're probably going to need a little more information about this. My
guess as to what's happening here is that the editor you're using to
view the file is set to read files as UTF-16, but the repository has
them stored in UTF-8, or (less likely) vice versa.
Can you tell us what editor or other tool you're using to view the file
and what settings it's using for text encoding? Can you tell us about
any working-tree-encoding declarations in your .gitattributes? You can
use "git check-attr -a PATH" to see more information about that.
What code page are you using on your system? Are you using PowerShell,
CMD, or Git Bash? If you're using Git Bash, what are your locale
settings?
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: git clone corrupts file.
2021-08-13 22:30 ` brian m. carlson
@ 2021-08-16 15:24 ` Russell, Scott
2021-08-16 16:53 ` Jeff King
0 siblings, 1 reply; 13+ messages in thread
From: Russell, Scott @ 2021-08-16 15:24 UTC (permalink / raw)
To: brian m. carlson; +Cc: git
Brian,
Thanks for your interest in this issue. The issue has been determined to have 2 factors.
1. The files corrupted are in Unicode. Though the .h file mentioned certainly doesn't have to be Unicode, it can be ANSI, we have other files that must be Unicode. We use Unicode in quite a number of our text files.
2. Git appears to corrupt the file by making line endings changes.
a. Github has the correct file. It views correct there. When downloaded as a binary or text from Github in a browser, it is not corrupted.
b. Git seems to change line endings as if the file were ANSI or single byte encoding, not Unicode.
c. Git has the setting git config core.autocrlf false. But apparently, it is not being observed.
d. The .gitconfig file has the [core] section with the entry autocrlf = false following the section.
e. There is a .gitattributes file in the repo.
f. Entries in .gitattributes specified by type are specified for the affected files.
*.h text eol=crlf
*.ini text eol=crlf
If you look at the 1st line of the binary view of the original file, it looks like this:
FF FE 2F 00 2F 00 7B 00 7B 00 4E 00 4F 00 5F 00
44 00 45 00 50 00 45 00 4E 00 44 00 45 00 4E 00
43 00 49 00 45 00 53 00 7D 00 7D 00 0D 00 0A 00 Note - Unicode CR LF 0D 00 0A 00
2nd line
2F 00 2F 00 20 00 4D 00 69 00 63 00 72 00 6F 00 etc.
If you look at the git file, it looks very similar.
However, git has put a non Unicode CF LF into the end of line.
Plus an extra NULL. This extra NULL throws the 2 byte Unicode encoding off. It corrupts the line. On the next line, the extra NULL lines up the 2 byte encoding, so that line appears uncorrupted.
You can see that in my original email below. Every other line is not readable.
FF FE 2F 00 2F 00 7B 00 7B 00 4E 00 4F 00 5F 00
44 00 45 00 50 00 45 00 4E 00 44 00 45 00 4E 00
43 00 49 00 45 00 53 00 7D 00 7D 00 0D 00 0D 0A0 Note - Unicode CR LF 0D 00 0A 00
2nd line
00 2F 00 2F 00 20 00 4D 00 69 00 63 00 72 00 6F etc.
I would like git to observe the autocrlf false as directed.
It's important that we retain 2 byte Unicode file encoding in many of our files. And that git not add single byte CR LF into our 2 byte files.
We can't convert the files to other encoding for convenience of git.
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
Scott.Russell2@ncr.com | ncr.com
-----Original Message-----
From: brian m. carlson <sandals@crustytoothpaste.net>
Sent: Friday, August 13, 2021 6:30 PM
To: Russell, Scott <Scott.Russell2@ncr.com>
Cc: git@vger.kernel.org
Subject: Re: git clone corrupts file.
*External Message* - Use caution before opening links or attachments
On 2021-08-13 at 18:54:43, Russell, Scott wrote:
> File from git.
>
> ⼀⼀ 䴀椀挀爀漀猀漀昀琀 嘀椀猀甀愀氀 䌀⬀⬀ 最攀渀攀爀愀琀攀搀 椀渀挀氀甀搀攀 昀椀氀攀⸀ഀഀ
> // Used by CamTest.rc
> ⼀⼀ഀഀ
> #define IDM_ABOUTBOX 0x0010
> ⌀搀攀昀椀渀攀 䤀䐀䐀开䄀䈀伀唀吀䈀伀堀 ഀഀ
>
> File in github.
>
> //{{NO_DEPENDENCIES}}
> // Microsoft Visual C++ generated include file.
> // Used by CamTest.rc
> //
We're probably going to need a little more information about this. My guess as to what's happening here is that the editor you're using to view the file is set to read files as UTF-16, but the repository has them stored in UTF-8, or (less likely) vice versa.
Can you tell us what editor or other tool you're using to view the file and what settings it's using for text encoding? Can you tell us about any working-tree-encoding declarations in your .gitattributes? You can use "git check-attr -a PATH" to see more information about that.
What code page are you using on your system? Are you using PowerShell, CMD, or Git Bash? If you're using Git Bash, what are your locale settings?
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git clone corrupts file.
2021-08-16 15:24 ` Russell, Scott
@ 2021-08-16 16:53 ` Jeff King
2021-08-16 17:39 ` Russell, Scott
2021-08-16 18:51 ` Jeff King
0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2021-08-16 16:53 UTC (permalink / raw)
To: Russell, Scott; +Cc: brian m. carlson, git
On Mon, Aug 16, 2021 at 03:24:28PM +0000, Russell, Scott wrote:
> 1. The files corrupted are in Unicode. Though the .h file mentioned
> certainly doesn't have to be Unicode, it can be ANSI, we have
> other files that must be Unicode. We use Unicode in quite a
> number of our text files.
By Unicode, I'll assume you mean UTF-16, since your example below
appears to have a BOM marker at the beginning (FF FE).
Unlike UTF-8, UTF-16 is not a superset of ASCII, and thus can't be
treated as "text" by Git (e.g., the line ending byte is no longer just
hex "0A", but "00 0A").
> f. Entries in .gitattributes specified by type are specified for the affected files.
> *.h text eol=crlf
> *.ini text eol=crlf
So this is your problem. The "text" attribute is telling Git to treat
the file as text (which will handle any ASCII-superset encoding like
UTF-8, ISO8859-1, etc, but not others like UTF-16, UTF-32, EUC-JP, etc).
Depending on what's in your repo and what you want to have happen,
you'll want to:
- remove that attribute, if all of your ".h" files are UTF-16
- if only some are UTF-16, you'll need to provide patterns that
distinguish between the two types by giving them different
attributes (e.g., "-text" should override for specific files)
- you can stop there if you don't need line-ending conversion for
UTF-16 files (and there may be little point; Git will treat them as
binary for the purposes of diffing, so there is little point in
matching the canonical in-repo endings)
- if you do want to do line ending conversion (or any other
modifications on them), you can do so with a custom clean/smudge
filter (see the "filter" attribute in "git help attributes")
> I would like git to observe the autocrlf false as directed.
Hopefully the above explains it, but just to be clear, this isn't
autocrlf kicking in, but rather the "text" and "eol" attributes you've
specified.
> We can't convert the files to other encoding for convenience of git.
If you're happy enough not being able to get meaningful text diffs for
these files from Git, then the above should make your problem go away.
But an alternative workflow, if you really want UTF-16 in the working
tree, is to convert between UTF-8 and UTF-16 as the files go in and out
o the working tree. There's no built-in support for that, but you could
do it with a custom clean/smudge filter. That would let Git store UTF-8
internally, do diffs, etc.
One lighter alternative to that is to actually store UTF-16 in the
repository as you are now, but provide a textconv filter (see diff
attributes in "git help attributes") to convert it to UTF-8 on the fly
when showing a diff. You won't be able to apply such a diff, but they're
useful for human eyes.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: git clone corrupts file.
2021-08-16 16:53 ` Jeff King
@ 2021-08-16 17:39 ` Russell, Scott
2021-08-16 18:49 ` Jeff King
2021-08-16 18:51 ` Jeff King
1 sibling, 1 reply; 13+ messages in thread
From: Russell, Scott @ 2021-08-16 17:39 UTC (permalink / raw)
To: Jeff King; +Cc: brian m. carlson, git
Jeff,
Thanks for your reply.
We don't want any EOL handling of any file. That's why we specify autocrlf false.
We would like git to not any cr lf conversion on any file. Whether they be ANSI or Unicode. They had the right endings when we checked them in.
We don't want them adjusted.
Does removing the eol = cr lf fix that?
You said: UTF-16 ... can't be treated as "text" by Git.
We can't make any changes to the files to suit git. We just need git to store and retrieve files as committed.
Will removing the
eol=cr lf
from the line
*.ini text
from the attributes file stop the issue?
If not, does .gitattributes allow a path? Such that we could say
\config\Language Specific\* type - If these are Unicode, what would we say here. Can it not be text? Then binary?
*.ini text
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
Scott.Russell2@ncr.com | ncr.com
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Monday, August 16, 2021 12:54 PM
To: Russell, Scott <Scott.Russell2@ncr.com>
Cc: brian m. carlson <sandals@crustytoothpaste.net>; git@vger.kernel.org
Subject: Re: git clone corrupts file.
*External Message* - Use caution before opening links or attachments
On Mon, Aug 16, 2021 at 03:24:28PM +0000, Russell, Scott wrote:
> 1. The files corrupted are in Unicode. Though the .h file mentioned
> certainly doesn't have to be Unicode, it can be ANSI, we have
> other files that must be Unicode. We use Unicode in quite a
> number of our text files.
By Unicode, I'll assume you mean UTF-16, since your example below appears to have a BOM marker at the beginning (FF FE).
Unlike UTF-8, UTF-16 is not a superset of ASCII, and thus can't be treated as "text" by Git (e.g., the line ending byte is no longer just hex "0A", but "00 0A").
> f. Entries in .gitattributes specified by type are specified for the affected files.
> *.h text eol=crlf
> *.ini text eol=crlf
So this is your problem. The "text" attribute is telling Git to treat the file as text (which will handle any ASCII-superset encoding like UTF-8, ISO8859-1, etc, but not others like UTF-16, UTF-32, EUC-JP, etc).
Depending on what's in your repo and what you want to have happen, you'll want to:
- remove that attribute, if all of your ".h" files are UTF-16
- if only some are UTF-16, you'll need to provide patterns that
distinguish between the two types by giving them different
attributes (e.g., "-text" should override for specific files)
- you can stop there if you don't need line-ending conversion for
UTF-16 files (and there may be little point; Git will treat them as
binary for the purposes of diffing, so there is little point in
matching the canonical in-repo endings)
- if you do want to do line ending conversion (or any other
modifications on them), you can do so with a custom clean/smudge
filter (see the "filter" attribute in "git help attributes")
> I would like git to observe the autocrlf false as directed.
Hopefully the above explains it, but just to be clear, this isn't autocrlf kicking in, but rather the "text" and "eol" attributes you've specified.
> We can't convert the files to other encoding for convenience of git.
If you're happy enough not being able to get meaningful text diffs for these files from Git, then the above should make your problem go away.
But an alternative workflow, if you really want UTF-16 in the working tree, is to convert between UTF-8 and UTF-16 as the files go in and out o the working tree. There's no built-in support for that, but you could do it with a custom clean/smudge filter. That would let Git store UTF-8 internally, do diffs, etc.
One lighter alternative to that is to actually store UTF-16 in the repository as you are now, but provide a textconv filter (see diff attributes in "git help attributes") to convert it to UTF-8 on the fly when showing a diff. You won't be able to apply such a diff, but they're useful for human eyes.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git clone corrupts file.
2021-08-16 17:39 ` Russell, Scott
@ 2021-08-16 18:49 ` Jeff King
2021-08-16 18:52 ` Russell, Scott
0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2021-08-16 18:49 UTC (permalink / raw)
To: Russell, Scott; +Cc: brian m. carlson, git
On Mon, Aug 16, 2021 at 05:39:12PM +0000, Russell, Scott wrote:
> We don't want any EOL handling of any file. That's why we specify autocrlf false.
Right, but it's not the whole story. autocrlf is an older and broader
mechanism for doing line-ending conversion. From its documentation in
"git help config":
core.autocrlf
Setting this variable to "true" is the same as setting the text
attribute to "auto" on all files and core.eol to "crlf".[...]
You obviously don't want that, but you _also_ don't want to set the text
and eol attributes on individual paths, either.
> We would like git to not any cr lf conversion on any file. Whether
> they be ANSI or Unicode. They had the right endings when we checked
> them in.
> We don't want them adjusted.
>
> Does removing the eol = cr lf fix that?
That might be sufficient. You may also need to drop "text", as well.
Otherwise core.eol will kick in and do conversions. (Sorry, I don't use
Windows and it has been a long time since I looked into these options,
so you may have to do some experimenting).
> You said: UTF-16 ... can't be treated as "text" by Git.
>
> We can't make any changes to the files to suit git. We just need git to store and retrieve files as committed.
Right. That's what it does by default (if you don't set any .gitattributes).
What I mean by "can't be treated as text" is that Git will not correctly
implement text features like CRLF conversion, nor diffs, for such an
encoding. It is effectively a binary file from Git's perspective.
> Will removing the
>
> eol=cr lf
>
> from the line
>
> *.ini text
>
> from the attributes file stop the issue?
>
> If not, does .gitattributes allow a path? Such that we could say
>
> \config\Language Specific\* type - If these are Unicode, what would we say here. Can it not be text? Then binary?
> *.ini text
If you simply drop the attributes entirely, Git will use its
auto-detection to determine whether a file is binary, which looks for
NULs (and UTF-16 files are generally full of them). So I suspect that
would do it. You can also provide the "-text" attribute to override that
and make sure no line-ending conversion is done.
If you want to override a specific file, then yes, you can provide paths
(I don't recall offhand whether we allow backslashes in the patterns;
you may need to use forward slashes). You can also put the pattern "*"
in the "config/Language Specific/.gitattributes" to have it apply only
to that directory (and ones below it).
The patterns are the same as those in .gitignore files; see the section
"PATTERN FORMAT" in "git help ignore".
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git clone corrupts file.
2021-08-16 16:53 ` Jeff King
2021-08-16 17:39 ` Russell, Scott
@ 2021-08-16 18:51 ` Jeff King
2021-08-16 18:53 ` Russell, Scott
2021-08-16 21:50 ` brian m. carlson
1 sibling, 2 replies; 13+ messages in thread
From: Jeff King @ 2021-08-16 18:51 UTC (permalink / raw)
To: Russell, Scott; +Cc: brian m. carlson, git
On Mon, Aug 16, 2021 at 12:53:36PM -0400, Jeff King wrote:
> But an alternative workflow, if you really want UTF-16 in the working
> tree, is to convert between UTF-8 and UTF-16 as the files go in and out
> o the working tree. There's no built-in support for that, but you could
> do it with a custom clean/smudge filter. That would let Git store UTF-8
> internally, do diffs, etc.
Oh, by the way, I totally forgot that we added an internal version of
this, which is easier to configure and much more efficient. See the
"working-tree-encoding" attribute in "git help attributes".
Just in case you do want to go that route.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: git clone corrupts file.
2021-08-16 18:49 ` Jeff King
@ 2021-08-16 18:52 ` Russell, Scott
0 siblings, 0 replies; 13+ messages in thread
From: Russell, Scott @ 2021-08-16 18:52 UTC (permalink / raw)
To: Jeff King; +Cc: brian m. carlson, git
Jeff,
Thanks for your response. I will try these suggestions. I suspect I can come to some solution.
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
Scott.Russell2@ncr.com | ncr.com
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Monday, August 16, 2021 2:49 PM
To: Russell, Scott <Scott.Russell2@ncr.com>
Cc: brian m. carlson <sandals@crustytoothpaste.net>; git@vger.kernel.org
Subject: Re: git clone corrupts file.
*External Message* - Use caution before opening links or attachments
On Mon, Aug 16, 2021 at 05:39:12PM +0000, Russell, Scott wrote:
> We don't want any EOL handling of any file. That's why we specify autocrlf false.
Right, but it's not the whole story. autocrlf is an older and broader mechanism for doing line-ending conversion. From its documentation in "git help config":
core.autocrlf
Setting this variable to "true" is the same as setting the text
attribute to "auto" on all files and core.eol to "crlf".[...]
You obviously don't want that, but you _also_ don't want to set the text and eol attributes on individual paths, either.
> We would like git to not any cr lf conversion on any file. Whether
> they be ANSI or Unicode. They had the right endings when we checked
> them in.
> We don't want them adjusted.
>
> Does removing the eol = cr lf fix that?
That might be sufficient. You may also need to drop "text", as well.
Otherwise core.eol will kick in and do conversions. (Sorry, I don't use Windows and it has been a long time since I looked into these options, so you may have to do some experimenting).
> You said: UTF-16 ... can't be treated as "text" by Git.
>
> We can't make any changes to the files to suit git. We just need git to store and retrieve files as committed.
Right. That's what it does by default (if you don't set any .gitattributes).
What I mean by "can't be treated as text" is that Git will not correctly implement text features like CRLF conversion, nor diffs, for such an encoding. It is effectively a binary file from Git's perspective.
> Will removing the
>
> eol=cr lf
>
> from the line
>
> *.ini text
>
> from the attributes file stop the issue?
>
> If not, does .gitattributes allow a path? Such that we could say
>
> \config\Language Specific\* type - If these are Unicode, what would we say here. Can it not be text? Then binary?
> *.ini text
If you simply drop the attributes entirely, Git will use its auto-detection to determine whether a file is binary, which looks for NULs (and UTF-16 files are generally full of them). So I suspect that would do it. You can also provide the "-text" attribute to override that and make sure no line-ending conversion is done.
If you want to override a specific file, then yes, you can provide paths (I don't recall offhand whether we allow backslashes in the patterns; you may need to use forward slashes). You can also put the pattern "*"
in the "config/Language Specific/.gitattributes" to have it apply only to that directory (and ones below it).
The patterns are the same as those in .gitignore files; see the section "PATTERN FORMAT" in "git help ignore".
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: git clone corrupts file.
2021-08-16 18:51 ` Jeff King
@ 2021-08-16 18:53 ` Russell, Scott
2021-08-16 21:50 ` brian m. carlson
1 sibling, 0 replies; 13+ messages in thread
From: Russell, Scott @ 2021-08-16 18:53 UTC (permalink / raw)
To: Jeff King; +Cc: brian m. carlson, git
Okay, thanks. I will look for that.
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
Scott.Russell2@ncr.com | ncr.com
-----Original Message-----
From: Jeff King <peff@peff.net>
Sent: Monday, August 16, 2021 2:51 PM
To: Russell, Scott <Scott.Russell2@ncr.com>
Cc: brian m. carlson <sandals@crustytoothpaste.net>; git@vger.kernel.org
Subject: Re: git clone corrupts file.
*External Message* - Use caution before opening links or attachments
On Mon, Aug 16, 2021 at 12:53:36PM -0400, Jeff King wrote:
> But an alternative workflow, if you really want UTF-16 in the working
> tree, is to convert between UTF-8 and UTF-16 as the files go in and
> out o the working tree. There's no built-in support for that, but you
> could do it with a custom clean/smudge filter. That would let Git
> store UTF-8 internally, do diffs, etc.
Oh, by the way, I totally forgot that we added an internal version of this, which is easier to configure and much more efficient. See the "working-tree-encoding" attribute in "git help attributes".
Just in case you do want to go that route.
-Peff
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git clone corrupts file.
2021-08-16 18:51 ` Jeff King
2021-08-16 18:53 ` Russell, Scott
@ 2021-08-16 21:50 ` brian m. carlson
2021-08-16 22:04 ` Russell, Scott
1 sibling, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2021-08-16 21:50 UTC (permalink / raw)
To: Jeff King; +Cc: Russell, Scott, git
[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]
On 2021-08-16 at 18:51:04, Jeff King wrote:
> On Mon, Aug 16, 2021 at 12:53:36PM -0400, Jeff King wrote:
>
> > But an alternative workflow, if you really want UTF-16 in the working
> > tree, is to convert between UTF-8 and UTF-16 as the files go in and out
> > o the working tree. There's no built-in support for that, but you could
> > do it with a custom clean/smudge filter. That would let Git store UTF-8
> > internally, do diffs, etc.
>
> Oh, by the way, I totally forgot that we added an internal version of
> this, which is easier to configure and much more efficient. See the
> "working-tree-encoding" attribute in "git help attributes".
>
> Just in case you do want to go that route.
The specific information you need is located in the Git FAQ[0], but
roughly, you would probably want something like this:
*.h text lf=crlf working-tree-encoding=UTF-16LE-BOM
That means that when checked out, the file will be in the format that
legacy Windows programs prefer (CRLF with little-endian UTF-16 with a
BOM), but will be stored internally in Git with LF and UTF-8. That will
make things like git diff work much better, but still permit things to
be in the working tree as you wish.
If you really don't want those to be modified at all, then you'd want to
write this:
*.h -text
However, Git will consider these files to be binary, since they are, and
git diff won't work on them without a textconv filter.
[0] https://git-scm.com/docs/gitfaq#windows-text-binary
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: git clone corrupts file.
2021-08-16 21:50 ` brian m. carlson
@ 2021-08-16 22:04 ` Russell, Scott
2021-08-16 22:19 ` brian m. carlson
0 siblings, 1 reply; 13+ messages in thread
From: Russell, Scott @ 2021-08-16 22:04 UTC (permalink / raw)
To: brian m. carlson, Jeff King; +Cc: git
Thanks Brian,
I appreciate the guidance. All our .h files can call be converted to ANSI. I don't know why we seemed to have just one saved as Unicode.
But it was a wakeup, and led to discovery of other files not correct.
Upon reading the help on .gitattributes, I was reminded that Windows Visual Studio can save some .rc files as Unicode.
I think that most all are ANSI but that leaves the possible result that any one saved as Unicode could unexpectedly fail compiling due to the conversion.
We have a mix of *.ini files which are a mix of mostly ANSI and more than a few others are Unicode.
I don't know how to handle a mixture.
Perhaps I will have to specify
*.ini -text.
Unless, does .gitattributes allow paths to be specified? In effect use the
Path/path/path/* text lf=crlf working-tree-encoding=UTF-16LE-BOM
And otherwise,
*.ini text - these would be ansi if not in path/path/path
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
Scott.Russell2@ncr.com | ncr.com
-----Original Message-----
From: brian m. carlson <sandals@crustytoothpaste.net>
Sent: Monday, August 16, 2021 5:51 PM
To: Jeff King <peff@peff.net>
Cc: Russell, Scott <Scott.Russell2@ncr.com>; git@vger.kernel.org
Subject: Re: git clone corrupts file.
*External Message* - Use caution before opening links or attachments
On 2021-08-16 at 18:51:04, Jeff King wrote:
> On Mon, Aug 16, 2021 at 12:53:36PM -0400, Jeff King wrote:
>
> > But an alternative workflow, if you really want UTF-16 in the
> > working tree, is to convert between UTF-8 and UTF-16 as the files go
> > in and out o the working tree. There's no built-in support for that,
> > but you could do it with a custom clean/smudge filter. That would
> > let Git store UTF-8 internally, do diffs, etc.
>
> Oh, by the way, I totally forgot that we added an internal version of
> this, which is easier to configure and much more efficient. See the
> "working-tree-encoding" attribute in "git help attributes".
>
> Just in case you do want to go that route.
The specific information you need is located in the Git FAQ[0], but roughly, you would probably want something like this:
*.h text lf=crlf working-tree-encoding=UTF-16LE-BOM
That means that when checked out, the file will be in the format that legacy Windows programs prefer (CRLF with little-endian UTF-16 with a BOM), but will be stored internally in Git with LF and UTF-8. That will make things like git diff work much better, but still permit things to be in the working tree as you wish.
If you really don't want those to be modified at all, then you'd want to write this:
*.h -text
However, Git will consider these files to be binary, since they are, and git diff won't work on them without a textconv filter.
[0] https://git-scm.com/docs/gitfaq#windows-text-binary
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: git clone corrupts file.
2021-08-16 22:04 ` Russell, Scott
@ 2021-08-16 22:19 ` brian m. carlson
2021-08-16 22:26 ` Russell, Scott
0 siblings, 1 reply; 13+ messages in thread
From: brian m. carlson @ 2021-08-16 22:19 UTC (permalink / raw)
To: Russell, Scott; +Cc: Jeff King, git
[-- Attachment #1: Type: text/plain, Size: 2388 bytes --]
On 2021-08-16 at 22:04:20, Russell, Scott wrote:
> Thanks Brian,
>
> I appreciate the guidance. All our .h files can call be converted to ANSI. I don't know why we seemed to have just one saved as Unicode.
> But it was a wakeup, and led to discovery of other files not correct.
>
> Upon reading the help on .gitattributes, I was reminded that Windows Visual Studio can save some .rc files as Unicode.
> I think that most all are ANSI but that leaves the possible result that any one saved as Unicode could unexpectedly fail compiling due to the conversion.
I do want to specify a distinction here. You're referring to "Unicode"
and "ANSI", which traditionally mean, on Windows, little-endian UTF-16
with BOM and Windows-1252. You do not generally want Windows-1252, or
the encoding on which it's based, ISO-8859-1. Those are obsolete and
have been for well over a decade. It's unfortunate that many Windows
programs continue to use these terms, because neither "Unicode" nor
"ANSI" describe an actual character set according to IANA.
What is going to work best here is UTF-8 without a BOM. Most Windows
programs can handle that these days, but some still don't. If you try
to save things as "ANSI" without a working-tree-encoding and they aren't
completely ASCII files, then you will end up with some weird diff output
at the very least.
If the files are completely ASCII, then no working-tree-encoding is
necessary, because ASCII is a subset of UTF-8.
> We have a mix of *.ini files which are a mix of mostly ANSI and more than a few others are Unicode.
> I don't know how to handle a mixture.
>
> Perhaps I will have to specify
>
> *.ini -text.
>
> Unless, does .gitattributes allow paths to be specified? In effect use the
>
> Path/path/path/* text lf=crlf working-tree-encoding=UTF-16LE-BOM
Yes, this syntax is allowed. See the gitattributes(5) manual page for
what's allowed. You can even do this:
dir/sub/path/*.ini text eol=crlf working-tree-encoding=UTF-16LE-BOM
One thing I forgot to mention is that after modifying your
.gitattributes file, you'll want to run "git add --renormalize ." and
then commit both the .gitattributes file and any changes. Otherwise,
you may end up with files that don't end up converted the way that you
want.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: git clone corrupts file.
2021-08-16 22:19 ` brian m. carlson
@ 2021-08-16 22:26 ` Russell, Scott
0 siblings, 0 replies; 13+ messages in thread
From: Russell, Scott @ 2021-08-16 22:26 UTC (permalink / raw)
To: brian m. carlson; +Cc: Jeff King, git
Ok, thanks for all the help.
I think with the path in .gitattributes It will be fine.
dir/sub/path/*.ini text eol=crlf working-tree-encoding=UTF-16LE-BOM
I will give those a try and see how it works out. And especially thanks for the help advice on add -renormalize. I would never have done that.
Thanks,
Scott Russell
Staff SW Engineer
NCR Corporation
Phone: +17706237512
Scott.Russell2@ncr.com | ncr.com
-----Original Message-----
From: brian m. carlson <sandals@crustytoothpaste.net>
Sent: Monday, August 16, 2021 6:20 PM
To: Russell, Scott <Scott.Russell2@ncr.com>
Cc: Jeff King <peff@peff.net>; git@vger.kernel.org
Subject: Re: git clone corrupts file.
*External Message* - Use caution before opening links or attachments
On 2021-08-16 at 22:04:20, Russell, Scott wrote:
> Thanks Brian,
>
> I appreciate the guidance. All our .h files can call be converted to ANSI. I don't know why we seemed to have just one saved as Unicode.
> But it was a wakeup, and led to discovery of other files not correct.
>
> Upon reading the help on .gitattributes, I was reminded that Windows Visual Studio can save some .rc files as Unicode.
> I think that most all are ANSI but that leaves the possible result that any one saved as Unicode could unexpectedly fail compiling due to the conversion.
I do want to specify a distinction here. You're referring to "Unicode"
and "ANSI", which traditionally mean, on Windows, little-endian UTF-16 with BOM and Windows-1252. You do not generally want Windows-1252, or the encoding on which it's based, ISO-8859-1. Those are obsolete and have been for well over a decade. It's unfortunate that many Windows programs continue to use these terms, because neither "Unicode" nor "ANSI" describe an actual character set according to IANA.
What is going to work best here is UTF-8 without a BOM. Most Windows programs can handle that these days, but some still don't. If you try to save things as "ANSI" without a working-tree-encoding and they aren't completely ASCII files, then you will end up with some weird diff output at the very least.
If the files are completely ASCII, then no working-tree-encoding is necessary, because ASCII is a subset of UTF-8.
> We have a mix of *.ini files which are a mix of mostly ANSI and more than a few others are Unicode.
> I don't know how to handle a mixture.
>
> Perhaps I will have to specify
>
> *.ini -text.
>
> Unless, does .gitattributes allow paths to be specified? In effect
> use the
>
> Path/path/path/* text lf=crlf working-tree-encoding=UTF-16LE-BOM
Yes, this syntax is allowed. See the gitattributes(5) manual page for what's allowed. You can even do this:
dir/sub/path/*.ini text eol=crlf working-tree-encoding=UTF-16LE-BOM
One thing I forgot to mention is that after modifying your .gitattributes file, you'll want to run "git add --renormalize ." and then commit both the .gitattributes file and any changes. Otherwise, you may end up with files that don't end up converted the way that you want.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-08-16 22:26 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <BN6PR15MB1426E50F03A0530CA9140F98CBFA9@BN6PR15MB1426.namprd15.prod.outlook.com>
[not found] ` <BN6PR15MB14261C40E614CC11416388B4CBFA9@BN6PR15MB1426.namprd15.prod.outlook.com>
2021-08-13 18:54 ` git clone corrupts file Russell, Scott
2021-08-13 22:30 ` brian m. carlson
2021-08-16 15:24 ` Russell, Scott
2021-08-16 16:53 ` Jeff King
2021-08-16 17:39 ` Russell, Scott
2021-08-16 18:49 ` Jeff King
2021-08-16 18:52 ` Russell, Scott
2021-08-16 18:51 ` Jeff King
2021-08-16 18:53 ` Russell, Scott
2021-08-16 21:50 ` brian m. carlson
2021-08-16 22:04 ` Russell, Scott
2021-08-16 22:19 ` brian m. carlson
2021-08-16 22:26 ` Russell, Scott
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.