* Re: Cross-Platform Version Control @ 2009-05-12 15:06 Esko Luontola 2009-05-12 15:14 ` Shawn O. Pearce ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Esko Luontola @ 2009-05-12 15:06 UTC (permalink / raw) To: git A good start for making Git cross-platform, would be storing the text encoding of every file name and commit message together with the commit. Currently, because Git is oblivious to the encodings and just considers them as a series of bytes, there is no way to make them cross-platform. It's as http://www.joelonsoftware.com/articles/Unicode.html says, "It does not make sense to have a string without knowing what encoding it uses." Without explicit encoding information, making a system that works even on the three main platforms, let alone in all countries and languages, is simply not possible. On the other hand, if the encoding is explicitly stated in the repository, then it is possible for platform and locale aware Git clients to handle the file names and commit messages in whatever way makes most sense for the platform (for example convert the file names to the platform's encoding, if it differs from the committer's platform encoding). Then it would also be possible to create a Mac version of Git, which compensates for Mac OS X's file system's file name encoding peculiarities. Also the system could then warn (on "git add") if the data does not look like it has been encoded with the said encoding. If the platform's and the repository's encoding happen to be the same (which in reality might be possible only inside a small company where everybody is forced to use the same OS and is configured by a single sysadmin), then no conversions need to be done. Also Git purists, who think that the byte sequence representing a file name are more important than the human readable version of the file name, may use some configuration switch that disables all conversions - but even then the current encoding should be stored together with the commit. Are there any plans on storing the encoding information of file names and commit messages in the Git repository? How much time would implementing it take? Any ideas on how to maintain backwards compatibility (for old commits that do not have the encoding information)? - Esko ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 15:06 Cross-Platform Version Control Esko Luontola @ 2009-05-12 15:14 ` Shawn O. Pearce 2009-05-12 16:13 ` Johannes Schindelin 2009-05-12 16:16 ` Jeff King 2009-05-12 18:28 ` Dmitry Potapov 2009-05-14 13:48 ` Cross-Platform Version Control Peter Krefting 2 siblings, 2 replies; 61+ messages in thread From: Shawn O. Pearce @ 2009-05-12 15:14 UTC (permalink / raw) To: Esko Luontola; +Cc: git Esko Luontola <esko.luontola@gmail.com> wrote: > Are there any plans on storing the encoding information of file names > and commit messages in the Git repository? Commit messages already store their encoding in an optional "encoding" header if the message isn't stored in UTF-8, or US-ASCII, which is a strict subset of UTF-8. As for file names, no plans, its a sequence of bytes, but I think a lot of people wind up using some subset of US-ASCII for their file names, especially if their project is going to be cross platform. -- Shawn. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 15:14 ` Shawn O. Pearce @ 2009-05-12 16:13 ` Johannes Schindelin 2009-05-12 17:56 ` Esko Luontola 2009-05-12 16:16 ` Jeff King 1 sibling, 1 reply; 61+ messages in thread From: Johannes Schindelin @ 2009-05-12 16:13 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Esko Luontola, git Hi, On Tue, 12 May 2009, Shawn O. Pearce wrote: > Esko Luontola <esko.luontola@gmail.com> wrote: > > Are there any plans on storing the encoding information of file names > > and commit messages in the Git repository? > > Commit messages already store their encoding in an optional "encoding" > header if the message isn't stored in UTF-8, or US-ASCII, which is a > strict subset of UTF-8. > > As for file names, no plans, its a sequence of bytes, but I think a > lot of people wind up using some subset of US-ASCII for their file > names, especially if their project is going to be cross platform. Some context: this issue cropped up in msysGit, of course. As to storing all file names in UTF-8, my point about Unicode being not necessarily appropriate for everyone still stands. UTF-8 _might_ be the de-facto standard for Linux filesystems, but IMHO we should not take away the freedom for everybody to decide what they want their file names to be encoded as. However, I see that there might be a need to be able to encode the file names differently, such as on Windows. IMHO the best solution would be a config variable controlling the reencoding of file names. For some time, it looked as if two people were interested in implementing something like that (Peter and Robin IIRC), but efforts have stalled. Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 16:13 ` Johannes Schindelin @ 2009-05-12 17:56 ` Esko Luontola 2009-05-12 20:38 ` Johannes Schindelin 0 siblings, 1 reply; 61+ messages in thread From: Esko Luontola @ 2009-05-12 17:56 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Shawn O. Pearce, git On 12.5.2009, at 19:13, Johannes Schindelin wrote: > As to storing all file names in UTF-8, my point about Unicode being > not > necessarily appropriate for everyone still stands. > > UTF-8 _might_ be the de-facto standard for Linux filesystems, but > IMHO we should not take away the freedom for everybody to decide > what they > want their file names to be encoded as. > > However, I see that there might be a need to be able to encode the > file > names differently, such as on Windows. IMHO the best solution would > be > a config variable controlling the reencoding of file names. Exactly. The system should not force the use of a specific encoding. It should only offer a recommendation, but be also fully compatible if the user uses some other encoding. That's why it's best to always store the information about what encoding was used. It shouldn't matter, whether the data is encoded with ISO-8859-1, UTF-8, Shift_JIS, Big5 or some other encoding, as long as it is explicitly said that what the encoding is. Then the reader of the data can best decide, how to show that data on the current platform. A config variable for defining, that what encoding should be used when committing the file names, would make sense. Git should also try to autodetect, that what encoding is used in its current environment. In the case of UTF-8, you should also be able to specify which normalization form is used (http://www.unicode.org/unicode/reports/ tr15/), or whether it is normalized at all. For example, it should be possible to configure Git so, that when a file is checked out on Mac, its file name is converted to the current file system's encoding (UTF-8 NFD, I think), and when the file is committed on Mac, the file name is normalized back to the same UTF-8 form as is used on Linux (UTF-8 NFC). It would be nice to have config variables for saying, that all file names in this repository must use UTF-8 NFC, and all commit messages must use UTF-8 NFC (with Unix newlines). Then the Git client would autodetect the current environment's encoding, and convert the text, if necessary, to match the repository's encoding. - Esko ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 17:56 ` Esko Luontola @ 2009-05-12 20:38 ` Johannes Schindelin 2009-05-12 21:16 ` Esko Luontola 0 siblings, 1 reply; 61+ messages in thread From: Johannes Schindelin @ 2009-05-12 20:38 UTC (permalink / raw) To: Esko Luontola; +Cc: Shawn O. Pearce, git Hi, On Tue, 12 May 2009, Esko Luontola wrote: > On 12.5.2009, at 19:13, Johannes Schindelin wrote: > >As to storing all file names in UTF-8, my point about Unicode being not > >necessarily appropriate for everyone still stands. > > > >UTF-8 _might_ be the de-facto standard for Linux filesystems, but IMHO > >we should not take away the freedom for everybody to decide what they > >want their file names to be encoded as. > > > >However, I see that there might be a need to be able to encode the file > >names differently, such as on Windows. IMHO the best solution would be > >a config variable controlling the reencoding of file names. > > Exactly. The system should not force the use of a specific encoding. It > should only offer a recommendation, but be also fully compatible if the > user uses some other encoding. > > That's why it's best to always store the information about what encoding > was used. It shouldn't matter, whether the data is encoded with > ISO-8859-1, UTF-8, Shift_JIS, Big5 or some other encoding, as long as it > is explicitly said that what the encoding is. Then the reader of the > data can best decide, how to show that data on the current platform. > > A config variable for defining, that what encoding should be used when > committing the file names, would make sense. Git should also try to > autodetect, that what encoding is used in its current environment. In > the case of UTF-8, you should also be able to specify which > normalization form is used > (http://www.unicode.org/unicode/reports/tr15/), or whether it is > normalized at all. > > For example, it should be possible to configure Git so, that when a file > is checked out on Mac, its file name is converted to the current file > system's encoding (UTF-8 NFD, I think), and when the file is committed > on Mac, the file name is normalized back to the same UTF-8 form as is > used on Linux (UTF-8 NFC). > > It would be nice to have config variables for saying, that all file > names in this repository must use UTF-8 NFC, and all commit messages > must use UTF-8 NFC (with Unix newlines). Then the Git client would > autodetect the current environment's encoding, and convert the text, if > necessary, to match the repository's encoding. That is a nice analysis. How about implementing it? Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 20:38 ` Johannes Schindelin @ 2009-05-12 21:16 ` Esko Luontola 2009-05-13 0:23 ` Johannes Schindelin 0 siblings, 1 reply; 61+ messages in thread From: Esko Luontola @ 2009-05-12 21:16 UTC (permalink / raw) To: git; +Cc: Johannes Schindelin, Shawn O. Pearce Johannes Schindelin wrote on 12.5.2009 23:38: > That is a nice analysis. How about implementing it? > Do we have here somebody, who knows Git's code well and is motivated to implement this? I don't think that I would be capable, because of not having used C much, being new to Git's codebase and having too little time. But I can help with the requirements specification, interaction design and system testing. -- Esko Luontola www.orfjackal.net ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 21:16 ` Esko Luontola @ 2009-05-13 0:23 ` Johannes Schindelin 2009-05-13 5:34 ` Esko Luontola 0 siblings, 1 reply; 61+ messages in thread From: Johannes Schindelin @ 2009-05-13 0:23 UTC (permalink / raw) To: Esko Luontola; +Cc: git, Shawn O. Pearce Hi, On Wed, 13 May 2009, Esko Luontola wrote: > Johannes Schindelin wrote on 12.5.2009 23:38: > > That is a nice analysis. How about implementing it? > > > > Do we have here somebody, who knows Git's code well and is motivated to > implement this? > > I don't think that I would be capable, because of not having used C > much, being new to Git's codebase and having too little time. Well, that rather settles things, no? Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 0:23 ` Johannes Schindelin @ 2009-05-13 5:34 ` Esko Luontola 2009-05-13 6:49 ` Alex Riesen 2009-05-13 10:15 ` Johannes Schindelin 0 siblings, 2 replies; 61+ messages in thread From: Esko Luontola @ 2009-05-13 5:34 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Shawn O. Pearce Johannes Schindelin wrote on 13.5.2009 3:23: > Well, that rather settles things, no? > There is need for the feature, but it's unfortunate that the Git developers do not see its value. There are many users for whom using non-ASCII names is necessary (for example all of Asia and most of Europe), but now it seems that Bazaar is the only DVCS that handles encodings correctly: http://stackoverflow.com/questions/829682/what-dvcs-support-unicode-filenames Let's see if I have time later this or next year to work on it. At least it would be good practise in getting acquainted with a new codebase and learning C. But it would be better for someone else do it, to get it done within a reasonable amount of time. I see that there are some tests in the /t directory. Which command will run all of them, how good coverage do the tests have, how reproducable and isolated they are, how many seconds does it take to run all the tests? Is there some high-level documentation for new developers? -- Esko Luontola www.orfjackal.net ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 5:34 ` Esko Luontola @ 2009-05-13 6:49 ` Alex Riesen 2009-05-13 10:15 ` Johannes Schindelin 1 sibling, 0 replies; 61+ messages in thread From: Alex Riesen @ 2009-05-13 6:49 UTC (permalink / raw) To: Esko Luontola; +Cc: Johannes Schindelin, git, Shawn O. Pearce 2009/5/13 Esko Luontola <esko.luontola@gmail.com>: > Johannes Schindelin wrote on 13.5.2009 3:23: >> >> Well, that rather settles things, no? >> > > There is need for the feature, but it's unfortunate that the Git developers > do not see its value. There are many users for whom using non-ASCII names is > necessary (for example all of Asia and most of Europe), but now it seems > that Bazaar is the only DVCS that handles encodings correctly: > http://stackoverflow.com/questions/829682/what-dvcs-support-unicode-filenames Many Git developers just use systems which don't care about the file names encoding at all and just keep the names as they were. So interoperability problem does not exist for them. So, they either don't need the feature, or can trivially avoid or workaround any problems. > I see that there are some tests in the /t directory. Which command will run > all of them, how good coverage do the tests have, how reproducable and > isolated they are, how many seconds does it take to run all the tests? Is > there some high-level documentation for new developers? make test. See also t/README. We like them. I always run test suite before deployment and sometimes run it just for fun (unless I have to run it on Windows). ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 5:34 ` Esko Luontola 2009-05-13 6:49 ` Alex Riesen @ 2009-05-13 10:15 ` Johannes Schindelin [not found] ` <43d8ce650905130340q596043d5g45b342b62fe20e8d@mail.gmail.com> 1 sibling, 1 reply; 61+ messages in thread From: Johannes Schindelin @ 2009-05-13 10:15 UTC (permalink / raw) To: Esko Luontola; +Cc: git, Shawn O. Pearce Hi, On Wed, 13 May 2009, Esko Luontola wrote: > Johannes Schindelin wrote on 13.5.2009 3:23: > > Well, that rather settles things, no? > > There is need for the feature, but it's unfortunate that the Git > developers do not see its value. I see a value. But it is not my itch. And since it is your itch and you said that you will not do anything about it (I don't count writing emails here ;-), I concluded that it settles the issue. Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <43d8ce650905130340q596043d5g45b342b62fe20e8d@mail.gmail.com>]
* Cross-Platform Version Control [not found] ` <43d8ce650905130340q596043d5g45b342b62fe20e8d@mail.gmail.com> @ 2009-05-13 10:41 ` John Tapsell 2009-05-13 13:42 ` Jay Soffian 0 siblings, 1 reply; 61+ messages in thread From: John Tapsell @ 2009-05-13 10:41 UTC (permalink / raw) To: git 2009/5/13 Johannes Schindelin <Johannes.Schindelin@gmx.de>: > Hi, > > On Wed, 13 May 2009, Esko Luontola wrote: > >> Johannes Schindelin wrote on 13.5.2009 3:23: >> > Well, that rather settles things, no? >> >> There is need for the feature, but it's unfortunate that the Git >> developers do not see its value. > > I see a value. But it is not my itch. And since it is your itch and you > said that you will not do anything about it (I don't count writing emails > here ;-), I concluded that it settles the issue. I don't know why the git developers are being so hostile/dismisisve, but I also hope that somebody volunteers to fix this. Esko, you have my moral support :-) John ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 10:41 ` John Tapsell @ 2009-05-13 13:42 ` Jay Soffian 2009-05-13 13:44 ` Alex Riesen 0 siblings, 1 reply; 61+ messages in thread From: Jay Soffian @ 2009-05-13 13:42 UTC (permalink / raw) To: John Tapsell; +Cc: git On Wed, May 13, 2009 at 6:41 AM, John Tapsell <johnflux@gmail.com> wrote: > I don't know why the git developers are being so hostile/dismisisve, Are you serious? j. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 13:42 ` Jay Soffian @ 2009-05-13 13:44 ` Alex Riesen 2009-05-13 13:50 ` Jay Soffian 0 siblings, 1 reply; 61+ messages in thread From: Alex Riesen @ 2009-05-13 13:44 UTC (permalink / raw) To: Jay Soffian; +Cc: John Tapsell, git 2009/5/13 Jay Soffian <jaysoffian@gmail.com>: > On Wed, May 13, 2009 at 6:41 AM, John Tapsell <johnflux@gmail.com> wrote: >> I don't know why the git developers are being so hostile/dismisisve, > > Are you serious? > ...because we'll kill you if aren't >:-E ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 13:44 ` Alex Riesen @ 2009-05-13 13:50 ` Jay Soffian 2009-05-13 13:57 ` John Tapsell 0 siblings, 1 reply; 61+ messages in thread From: Jay Soffian @ 2009-05-13 13:50 UTC (permalink / raw) To: Alex Riesen; +Cc: John Tapsell, git On Wed, May 13, 2009 at 9:44 AM, Alex Riesen <raa.lkml@gmail.com> wrote: > 2009/5/13 Jay Soffian <jaysoffian@gmail.com>: >> On Wed, May 13, 2009 at 6:41 AM, John Tapsell <johnflux@gmail.com> wrote: >>> I don't know why the git developers are being so hostile/dismisisve, >> >> Are you serious? >> > > ...because we'll kill you if aren't >:-E I'm just flabbergasted by some people's expectations. Perhaps John doesn't realize the git developers are all volunteers, and that it is never appropriate to criticize a volunteer. A "thank you for all your hard work on git" would have done nicely. j. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 13:50 ` Jay Soffian @ 2009-05-13 13:57 ` John Tapsell 2009-05-13 15:27 ` Nicolas Pitre ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: John Tapsell @ 2009-05-13 13:57 UTC (permalink / raw) To: Jay Soffian; +Cc: Alex Riesen, git 2009/5/13 Jay Soffian <jaysoffian@gmail.com>: > On Wed, May 13, 2009 at 9:44 AM, Alex Riesen <raa.lkml@gmail.com> wrote: >> 2009/5/13 Jay Soffian <jaysoffian@gmail.com>: >>> On Wed, May 13, 2009 at 6:41 AM, John Tapsell <johnflux@gmail.com> wrote: >>>> I don't know why the git developers are being so hostile/dismisisve, >>> >>> Are you serious? >>> >> >> ...because we'll kill you if aren't >:-E > > I'm just flabbergasted by some people's expectations. Perhaps John > doesn't realize the git developers are all volunteers, and that it is > never appropriate to criticize a volunteer. A "thank you for all your > hard work on git" would have done nicely. I'm as much of an open source developer as anyone else here. I spend a huge amount of my time programming for KDE. But I've never told a user "well that settles it" because they won't code it themselves :-/ I certaintly get a huge number of bug/wishes that I can't/won't code myself, but I try to be a bit more diplomatic about it. But then the kernel mailing lists tend to be a lot more.. direct.. than the kde mailing lists, so I guess it comes from that. Requiring people to have a thick skin and all that. John ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 13:57 ` John Tapsell @ 2009-05-13 15:27 ` Nicolas Pitre 2009-05-13 16:22 ` Johannes Schindelin 2009-05-13 17:24 ` Andreas Ericsson 2009-05-14 1:49 ` Miles Bader 2 siblings, 1 reply; 61+ messages in thread From: Nicolas Pitre @ 2009-05-13 15:27 UTC (permalink / raw) To: John Tapsell; +Cc: Jay Soffian, Alex Riesen, git On Wed, 13 May 2009, John Tapsell wrote: > I'm as much of an open source developer as anyone else here. I spend > a huge amount of my time programming for KDE. But I've never told a > user "well that settles it" because they won't code it themselves :-/ > I certaintly get a huge number of bug/wishes that I can't/won't code > myself, but I try to be a bit more diplomatic about it. > But then the kernel mailing lists tend to be a lot more.. direct.. > than the kde mailing lists, so I guess it comes from that. Requiring > people to have a thick skin and all that. This is not the kernel mailing list. In fact this list is quite friendlier and accommodating that the kernel list. The remark alluded above comes from _one_ of the git developers. And Dscho is apparently in a rather sad mood these days. While the substance of Dscho's remark is entirely pertinent, it would be wrong to use its form and style as a characterization of git developers in general. Nicolas ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 15:27 ` Nicolas Pitre @ 2009-05-13 16:22 ` Johannes Schindelin 0 siblings, 0 replies; 61+ messages in thread From: Johannes Schindelin @ 2009-05-13 16:22 UTC (permalink / raw) To: Nicolas Pitre; +Cc: John Tapsell, Jay Soffian, Alex Riesen, git Hi, On Wed, 13 May 2009, Nicolas Pitre wrote: > On Wed, 13 May 2009, John Tapsell wrote: > > > I'm as much of an open source developer as anyone else here. I spend > > a huge amount of my time programming for KDE. But I've never told a > > user "well that settles it" because they won't code it themselves :-/ > > I certaintly get a huge number of bug/wishes that I can't/won't code > > myself, but I try to be a bit more diplomatic about it. > > > > But then the kernel mailing lists tend to be a lot more.. direct.. > > than the kde mailing lists, so I guess it comes from that. Requiring > > people to have a thick skin and all that. > > This is not the kernel mailing list. In fact this list is quite > friendlier and accommodating that the kernel list. > > The remark alluded above comes from _one_ of the git developers. And > Dscho is apparently in a rather sad mood these days. While the substance > of Dscho's remark is entirely pertinent, it would be wrong to use its > form and style as a characterization of git developers in general. Even if I were in a better mood, the whole thread has a back story on an msysGit issue, and this led me to try to stop what I feared would become a rather long mail thread without much of an outcome, such as that infamous thread about MacOSX UTF-8 filename handling. Alas, it seems that Robin is willing to work on the issues, so my fears have been totally and completely unfounded. Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 13:57 ` John Tapsell 2009-05-13 15:27 ` Nicolas Pitre @ 2009-05-13 17:24 ` Andreas Ericsson 2009-05-14 1:49 ` Miles Bader 2 siblings, 0 replies; 61+ messages in thread From: Andreas Ericsson @ 2009-05-13 17:24 UTC (permalink / raw) To: John Tapsell; +Cc: Jay Soffian, Alex Riesen, git John Tapsell wrote: > 2009/5/13 Jay Soffian <jaysoffian@gmail.com>: >> On Wed, May 13, 2009 at 9:44 AM, Alex Riesen <raa.lkml@gmail.com> wrote: >>> 2009/5/13 Jay Soffian <jaysoffian@gmail.com>: >>>> On Wed, May 13, 2009 at 6:41 AM, John Tapsell <johnflux@gmail.com> wrote: >>>>> I don't know why the git developers are being so hostile/dismisisve, >>>> Are you serious? >>>> >>> ...because we'll kill you if aren't >:-E >> I'm just flabbergasted by some people's expectations. Perhaps John >> doesn't realize the git developers are all volunteers, and that it is >> never appropriate to criticize a volunteer. A "thank you for all your >> hard work on git" would have done nicely. > > I'm as much of an open source developer as anyone else here. I spend > a huge amount of my time programming for KDE. But I've never told a > user "well that settles it" because they won't code it themselves :-/ > I certaintly get a huge number of bug/wishes that I can't/won't code > myself, but I try to be a bit more diplomatic about it. > But then the kernel mailing lists tend to be a lot more.. direct.. > than the kde mailing lists, so I guess it comes from that. Requiring > people to have a thick skin and all that. > I think much of the perceived malignancy stems from the fact that the git list has a high ratio of developer-to-luser mailings on it, being by nature a developer tool most of the time. When the unaware user appears on the list with demands rather than polite requests, they're treated that much harder. Especially by the developer who happens to be, as it were, the butt of the request. Personally, I've only ever found Dscho being anything but friendly on this list, and even then, I really didn't find it offensive. If viewed in a happy mood, it matches quite nicely with a swedish sketch whose theme is "men ja ente bitter". It's often quite funny, really :-) -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Register now for Nordic Meet on Nagios, June 3-4 in Stockholm http://nordicmeetonnagios.op5.org/ Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 13:57 ` John Tapsell 2009-05-13 15:27 ` Nicolas Pitre 2009-05-13 17:24 ` Andreas Ericsson @ 2009-05-14 1:49 ` Miles Bader 2 siblings, 0 replies; 61+ messages in thread From: Miles Bader @ 2009-05-14 1:49 UTC (permalink / raw) To: John Tapsell; +Cc: Jay Soffian, Alex Riesen, git John Tapsell <johnflux@gmail.com> writes: > I'm as much of an open source developer as anyone else here. I spend > a huge amount of my time programming for KDE. But I've never told a > user "well that settles it" because they won't code it themselves :-/ FWIW, Johannes' use of "Well, that rather settles things, no?" in this thread this didn't strike me as being rude or truly dismissive (even though it's literally so). It seemed more just a timely and to the point reminder that however fun it is to talk about random feature X, someone's gotta do the work if it's going to actually be implemented, and that the direction of git development very much follows the whims of those doing the actual hacking (perhaps more so than other projects). [and I don't even have particularly thick skin, I think -- I'm often very annoyed by brusqueness one sees on many developer mailing lists...] -Miles -- Acquaintance, n. A person whom we know well enough to borrow from, but not well enough to lend to. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 15:14 ` Shawn O. Pearce 2009-05-12 16:13 ` Johannes Schindelin @ 2009-05-12 16:16 ` Jeff King 2009-05-12 16:57 ` Johannes Schindelin 2009-05-13 16:26 ` Linus Torvalds 1 sibling, 2 replies; 61+ messages in thread From: Jeff King @ 2009-05-12 16:16 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Esko Luontola, git On Tue, May 12, 2009 at 08:14:03AM -0700, Shawn O. Pearce wrote: > As for file names, no plans, its a sequence of bytes, but I think a > lot of people wind up using some subset of US-ASCII for their file > names, especially if their project is going to be cross platform. Or they use a single encoding like utf8 so that there are no surprises. You can still run into normalization problems with filenames on some filesystems, though. Linus's name_hash code sets up the framework to handle "these two names are actually equivalent", but right now I think there is just code for handling case-sensitivity, not utf8 normalization (but I just skimmed the code, so I might be wrong). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 16:16 ` Jeff King @ 2009-05-12 16:57 ` Johannes Schindelin 2009-05-13 16:26 ` Linus Torvalds 1 sibling, 0 replies; 61+ messages in thread From: Johannes Schindelin @ 2009-05-12 16:57 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, Esko Luontola, git Hi, On Tue, 12 May 2009, Jeff King wrote: > On Tue, May 12, 2009 at 08:14:03AM -0700, Shawn O. Pearce wrote: > > > As for file names, no plans, its a sequence of bytes, but I think a > > lot of people wind up using some subset of US-ASCII for their file > > names, especially if their project is going to be cross platform. > > Or they use a single encoding like utf8 so that there are no surprises. > You can still run into normalization problems with filenames on some > filesystems, though. Linus's name_hash code sets up the framework to > handle "these two names are actually equivalent", but right now I think > there is just code for handling case-sensitivity, not utf8 normalization > (but I just skimmed the code, so I might be wrong). Back then I actually started on a patch to make Git capable of determining UTF-8 equivalence, but at the same time somebody started such an annoying mail thread that I stopped working on the issue completely. Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 16:16 ` Jeff King 2009-05-12 16:57 ` Johannes Schindelin @ 2009-05-13 16:26 ` Linus Torvalds 2009-05-13 17:12 ` Linus Torvalds 1 sibling, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 16:26 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, Esko Luontola, git On Tue, 12 May 2009, Jeff King wrote: > > Or they use a single encoding like utf8 so that there are no surprises. > You can still run into normalization problems with filenames on some > filesystems, though. Linus's name_hash code sets up the framework to > handle "these two names are actually equivalent", but right now I think > there is just code for handling case-sensitivity, not utf8 normalization > (but I just skimmed the code, so I might be wrong). utf-8 normalization was one goal, and shouldn't be _that_ hard to do. But quite frankly, the index is only part of it, and probably not the worst part. The real pain of filename handling is all the "read tree recursively with readdir()" issues. Along with just an absolute sh*t-load of issues about what to do when people ended up using different versions of the "same" name in different branches. There's also the issue that "cross-platform" really can be a pretty damn big pain. What do you do for platforms that simply are pure shit? I realize that OS X people have a hard time accepting it, but OS X filesystems are generally total and utter crap - even more so than Windows. Yes, yes, you can tell OS X that case matters, but that's not the normal case - and what do you do with projects that simply _do_ care about case. The kernel is one such project. Sure, you can "encode" the filenames on such broken filesystems in a way that they'd be different - but that won't really help the project, since makefiles etc won't work anyway. So one reason I didn't bother with utf-8 is that the much more fundamental issues are simply in plain old 7-bit US-ASCII. That said, if the only issue is that you want to encode regular utf-8 in a coherent way (and ignore the case issues), then we could probably do that part fairly easily with a "convert_to_internal()" and "convert_to_filename()" thing that acts very much like the CRLF conversion (except on filenames, not data). And yes, it's probably worth doing, since we'd need that for fuller case support anyway. It's just a fair amount of churn - not fundamentally _hard_, but not trivial either. And it needs a _lot_ of care, and a fair amount of testing that is probably hard to do on sane filesystems (ie the case where the filesystem actually _changes_ the name is going to be hard to test on anything sane). Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 16:26 ` Linus Torvalds @ 2009-05-13 17:12 ` Linus Torvalds 2009-05-13 17:31 ` Andreas Ericsson ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 17:12 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Linus Torvalds wrote: > > utf-8 normalization was one goal, and shouldn't be _that_ hard to do. But > quite frankly, the index is only part of it, and probably not the worst > part. > > The real pain of filename handling is all the "read tree recursively with > readdir()" issues. Along with just an absolute sh*t-load of issues about > what to do when people ended up using different versions of the "same" > name in different branches. Btw, if people care mainly just about OS X, and don't worry so much about case, but about the idiotic and insane OS X behavior of turning UTF-8 filenames into that crazy NFD format, here's a simple patch that may be useful for that. There _will_ certainly be other places, but this handles the one big case of "read_directory_recursive()", and can turn NFD into the sane NFC format. Since OS X will then accept NFC (and internally turn it back to NFD) when you pass them as filenames, that means that converting the other way is not necessary. NOTE NOTE NOTE! This really just handles one case, and is not enough for any kind of general case. For example, it does NOT handle the case where you do git add filename_with_åäö explicitly, because if the "filename_with_åäö" is done using NFD (tab-completion etc), now git won't _match_ it with the filename it reads using readdir() any more (which got converted to NFC), so at a minimum we'd need to do that crazy NFD->NFC conversion in all the pathspecs too. See "get_pathspec()" in setup.c for that latter case. But with that, and this crazy thing, OS X users might be already a lot better off. Totally untested, of course. Oh, and somebody needs to fill in that convert_name_from_nfd_to_nfc() implementation. It's designed so that if it notices that the string is just plain US-ASCII, it can return 0 and no extra work is done. That, in turn, can easily be done by some simple and efficient pre-processign that checks that there are no high bits set (on a 64-bit platform, do it 8 characters at a time with a "& 0x8080808080808080"), so that the common case doesn't need to have barely any overhead at all. Use <stringprep.h> and stringprep_utf8_nfkc_normalize() or something to do the actual normalization if you find characters with the high bit set. And since I know that the OS X filesystems are so buggy as to not even do that whole NFD thing right, there is probably some OS-X specific "use this for filesystem names" conversion function. Hmm. Anybody want to take this on? It really shouldn't be too complex to get it working for the common case on just OS X. It's really the case sensitivity that is the biggest problem, if you ignore that for now, the problem space is _much_ smaller. In other words, I think we can reasonably easily support a subset of _common_ issues with some trivial patches like this. But getting it right in _all_ the cases is going to be much more work (there are lots of other uses of "readdir()" too, this one just happens to be one of the more central ones). Of course, it probably makes sense to have a whole "git_readdir()" that does this thing in general. That "create_full_path()" thing makes sense regardless, though, in that it also simplifies a lot of "baselen+len" usage in just "len". Linus --- dir.c | 40 ++++++++++++++++++++++++++++++++-------- 1 files changed, 32 insertions(+), 8 deletions(-) diff --git a/dir.c b/dir.c index 6aae09a..4cbfc24 100644 --- a/dir.c +++ b/dir.c @@ -566,6 +566,30 @@ static int get_dtype(struct dirent *de, const char *path) } /* + * Take the readdir output, in (d_name,len), and append it to + * our base name in (fullname,baselen) with any required + * readdir fs->internal translation. + * + * Put the result in 'fullname', and return the final length. + * + * Right now we have no translation, and just do a memcpy() + * (the +1 is to copy the final NUL character too). + */ +static int create_full_path(char *fullname, int baselen, const char *d_name, int len) +{ +#ifdef OS_X_IS_SOME_CRAZY_SHxAT + char temp[256], nlen; + nlen = convert_name_from_nfd_to_nfc(d_name, len, temp, sizeof(temp)); + if (nlen) { + len = nlen; + d_name = temp; + } +#endif + memcpy(fullname + baselen, d_name, len + 1); + return baselen + len; +} + +/* * Read a directory tree. We currently ignore anything but * directories, regular files and symlinks. That's because git * doesn't handle them at all yet. Maybe that will change some @@ -595,15 +619,15 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co /* Ignore overly long pathnames! */ if (len + baselen + 8 > sizeof(fullname)) continue; - memcpy(fullname + baselen, de->d_name, len+1); - if (simplify_away(fullname, baselen + len, simplify)) + len = create_full_path(fullname, baselen, de->d_name, len); + if (simplify_away(fullname, len, simplify)) continue; dtype = DTYPE(de); exclude = excluded(dir, fullname, &dtype); if (exclude && (dir->flags & DIR_COLLECT_IGNORED) - && in_pathspec(fullname, baselen + len, simplify)) - dir_add_ignored(dir, fullname, baselen + len); + && in_pathspec(fullname, len, simplify)) + dir_add_ignored(dir, fullname, len); /* * Excluded? If we don't explicitly want to show @@ -630,9 +654,9 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co default: continue; case DT_DIR: - memcpy(fullname + baselen + len, "/", 2); + memcpy(fullname + len, "/", 2); len++; - switch (treat_directory(dir, fullname, baselen + len, simplify)) { + switch (treat_directory(dir, fullname, len, simplify)) { case show_directory: if (exclude != !!(dir->flags & DIR_SHOW_IGNORED)) @@ -640,7 +664,7 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co break; case recurse_into_directory: contents += read_directory_recursive(dir, - fullname, fullname, baselen + len, 0, simplify); + fullname, fullname, len, 0, simplify); continue; case ignore_directory: continue; @@ -654,7 +678,7 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co if (check_only) goto exit_early; else - dir_add_name(dir, fullname, baselen + len); + dir_add_name(dir, fullname, len); } exit_early: closedir(fdir); ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 17:12 ` Linus Torvalds @ 2009-05-13 17:31 ` Andreas Ericsson 2009-05-13 17:46 ` Linus Torvalds 2009-05-13 20:57 ` Matthias Andree 2 siblings, 0 replies; 61+ messages in thread From: Andreas Ericsson @ 2009-05-13 17:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Shawn O. Pearce, Esko Luontola, git Linus Torvalds wrote: > > Of course, it probably makes sense to have a whole "git_readdir()" that > does this thing in general. That "create_full_path()" thing makes sense > regardless, though, in that it also simplifies a lot of "baselen+len" > usage in just "len". > In a flash of premonitory insight, libgit2 has gitfo_foreach_dirent(path, callback) which would probably be well suited for this kind of thing. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Register now for Nordic Meet on Nagios, June 3-4 in Stockholm http://nordicmeetonnagios.op5.org/ Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 17:12 ` Linus Torvalds 2009-05-13 17:31 ` Andreas Ericsson @ 2009-05-13 17:46 ` Linus Torvalds 2009-05-13 18:26 ` Martin Langhoff 2009-05-13 20:57 ` Matthias Andree 2 siblings, 1 reply; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 17:46 UTC (permalink / raw) To: Jeff King; +Cc: Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Linus Torvalds wrote: > > Of course, it probably makes sense to have a whole "git_readdir()" that > does this thing in general. Actually, the more I think about that, the less true I think it is. It _sounds_ like a nice simplification ("just do it once in readdir, and forget about it everywhere else"), but it's in fact a stupid thing to do. Why? If we _ever_ want to fix this in the general case, then the code that does the readdir() will actually have to remember both the "raw filesystem" form _and_ the "cleaned-up utf-8 form". Why? Because when we do readdir(), we'll also do 'lstat()' on the end result to check the types, and opendir() in case it's a directory and we then want to do things recursively etc. And that happens to work on OS X (because we can use our "fixed" filename for lstat too), but it does not work in the general case. And you can say "well, just do the stat inside the wrapped readdir()", but that doesn't work _either_, since - we don't want to do the lstat() if it's unnecessary. Even if we don't have "de->d_type" information, we can often avoid the need for it, if we can tell that the name isn't interestign (due to being ignored). Avoiding the lstat is a huge performance issue for cold-cache cases. It's basically a seek. So we really want to do the lstat() later, which implies that the caller needs to know _both_ the original "real" filesystem name _and_ the converted one. - it doesn't handle the opendir() case anyway - so the end result is that a real implementation will _always_ need to carry around both the "filesystem view" filename _and_ the "what we've converted it into". Now, the point of the patch I sent out was that for the specific case of OS X, which does UTF-8 conversions (wrong) but also is happy to get our properly normalized name, we don't care. So my patch is "correct" for that special case - and so would a plain readdir() wrapper be. But my patch is _also_ correct for the case where a readdir() wrapper would do the wrong thing. My patch doesn't _handle_ it (since it doesn't change the code to pass both "filesystem view" and "cleaned-up view" pathnames), but the patch I sent out also doesn't make it any harder to do right. In contrast, doing a readdir() wrapper makes it much harder to do right later, because it's just doing the conversion at the wrong level (you could make that "wrapper" return both the original and the fixed filename, but at that point the wrapper doesn't really help - you might as well just have the "convert" function, and it would be a hell of a lot more obvious what is really going on). So I take it back. A readdir() wrapper is not a good idea. It gets us a tiny bit of the way, but it would actually take us a step back from the "real" solution. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 17:46 ` Linus Torvalds @ 2009-05-13 18:26 ` Martin Langhoff 2009-05-13 18:37 ` Linus Torvalds 0 siblings, 1 reply; 61+ messages in thread From: Martin Langhoff @ 2009-05-13 18:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, May 13, 2009 at 7:46 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > So I take it back. A readdir() wrapper is not a good idea. It gets us a > tiny bit of the way, but it would actually take us a step back from the > "real" solution. Do we need to take the real solution to the core of git? What I am wondering is whether we can keep this simple in git internals and catch problem filenames at git-add time. This would allow git to keep treating filenames as a bag of bytes, and it does a better thing for users. In cross platform projects, most users don't even know that there are problems, and even if they do, they don't know what the problems are. If git add can be told to warn & refuse to add a path with portability problems, then we educate our users, prevent them from committing filenames that will later cause trouble to others in their projects, etc. from-the-keep-it-simple-and-informative-dept, m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 18:26 ` Martin Langhoff @ 2009-05-13 18:37 ` Linus Torvalds 2009-05-13 21:04 ` Theodore Tso 2009-05-13 21:08 ` Daniel Barkalow 0 siblings, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 18:37 UTC (permalink / raw) To: Martin Langhoff; +Cc: Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Martin Langhoff wrote: > > Do we need to take the real solution to the core of git? Well, I suspect that if we really want to support it, then we'd better. > What I am wondering is whether we can keep this simple in git > internals and catch problem filenames at git-add time. I can almost guarantee that it will just cause more problems than it solves, and generate some nasty cases that just aren't solvable. Because it really isn't just "git add". It's every single thing that does a lstat() on a filename inside of git. Now, the simple OS X case is not a huge problem, since the lstat will succeed with the fixed-up filename too. But as mentioned, the OS X case is the thing that doesn't need a lot of infrastructure _anyway_ - I can almost guarantee that my posted patch (with the added setup.c stuff for get_pathspec()) is going to be _fewer_ lines than some wrapper logic. Note: in all of the above, I assume that people care more about just plain UTF characters (and the insane NFD form OS X uses) than about worrying about the _really_ subtle issues of case-independence. Those are a major pain, but they will need even more "internal" support, because there simply isn't any sane wrapping method. (You could wrap everything to force lower-casing of all filesystem ops or something, but that would not be acceptable to any sane environment. So in reality you need to accept mixed-case things, and then there is no way to know from the "outside" whether one external mixed-case thing matches some internal index mixed-case thing). Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 18:37 ` Linus Torvalds @ 2009-05-13 21:04 ` Theodore Tso 2009-05-13 21:20 ` Linus Torvalds 2009-05-13 21:08 ` Daniel Barkalow 1 sibling, 1 reply; 61+ messages in thread From: Theodore Tso @ 2009-05-13 21:04 UTC (permalink / raw) To: Linus Torvalds Cc: Martin Langhoff, Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, May 13, 2009 at 11:37:28AM -0700, Linus Torvalds wrote: > Note: in all of the above, I assume that people care more about just plain > UTF characters (and the insane NFD form OS X uses) than about worrying > about the _really_ subtle issues of case-independence. Those are a major > pain, but they will need even more "internal" support, because there > simply isn't any sane wrapping method. Stupid question --- if we get something that works for Windows and MacOS X, is there any reason why we need to solve the general problem of case-insentive filesystems? It's really backwards compatibility with Legacy OS's that most important, right? Are there any other systems other than Windows and Mac OS X which (a) perpetrate case insensitivity on application programmers, and (b) which current or future git users are likely to care about? - Ted ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 21:04 ` Theodore Tso @ 2009-05-13 21:20 ` Linus Torvalds 0 siblings, 0 replies; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 21:20 UTC (permalink / raw) To: Theodore Tso Cc: Martin Langhoff, Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Theodore Tso wrote: > > Stupid question --- if we get something that works for Windows and > MacOS X, is there any reason why we need to solve the general problem > of case-insentive filesystems? Qutie frankly, I don't think we're even very close to getting anything that works for Windows of OS X. Case-insensitivity is _hard_. The "easy" case is to just handle the OS X craxy pseudo-NFD format, and at least turn that into NFC (and perhaps add a config option to do latin1 and EUC-JP to utf-8 too) and. At that point, we at least handle regular utf-8 the same way. Doing the latin1/EUC-JP thing would actually to some degree be more interesting than the OS X NFD case, because that really does require two-way conversion, and we can "test" that even on sane filesystems (ie play at having a Latin1 filesystem). That said, I suspect there aren't that many people who care about latin1 filesystems. I dunno about EUC-JP (and variants - for all I know, shift-JIS and other cases may be the more common ones). Of course, if we do everything right, maybe the windows people would actually like us to keep the filesystem-native representation in UTF-16LE or whatever the crazy format is that Windows really uses deep down. My point being that all of these things happen even without the added worry about case. And in many ways, not worrying about case should probably be the first step. We do have some support for worrying about case, but trying to solve both things at the same time isn't going to be workable, I suspect. Case insensitivity should never ever involve a _conversion_ (if it does, you get all kinds of crazy behavior), it's just purely a _comparison_ issue, so the two really are fundamentally different. Of course, the reason OS-X seems to be so messed up is exactly that the morons at Apple didn't understand the difference between conversion and comparison, and mixed them up. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 18:37 ` Linus Torvalds 2009-05-13 21:04 ` Theodore Tso @ 2009-05-13 21:08 ` Daniel Barkalow 2009-05-13 21:29 ` Linus Torvalds 1 sibling, 1 reply; 61+ messages in thread From: Daniel Barkalow @ 2009-05-13 21:08 UTC (permalink / raw) To: Linus Torvalds Cc: Martin Langhoff, Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Linus Torvalds wrote: > On Wed, 13 May 2009, Martin Langhoff wrote: > > > > Do we need to take the real solution to the core of git? > > Well, I suspect that if we really want to support it, then we'd better. > > > What I am wondering is whether we can keep this simple in git > > internals and catch problem filenames at git-add time. > > I can almost guarantee that it will just cause more problems than it > solves, and generate some nasty cases that just aren't solvable. > > Because it really isn't just "git add". It's every single thing that does > a lstat() on a filename inside of git. > > Now, the simple OS X case is not a huge problem, since the lstat will > succeed with the fixed-up filename too. I'm not seeing what the general case is, and how it could possibly behave. There's the "insensitive" behavior: if you create "foo" and look for "FOO", it's there, but readdir() reports "foo". There's the "converting" behavior: if you create "foo", readdir() reports "FOO", but lstat("foo") returns it. The obvious general case is: if you create "foo", readdir() reports "FOO", and lstat("foo") doesn't find a match. But if you create "foo" again... it doesn't find "foo", so it creates a new file, which it also calls "FOO", and the filesystem now has two files with identical names? It seems to me that the limits of minimally functional, non-inode-losing filesystems are: lstat() might take a filename and return the data for a non-byte-identical filename; open(name, O_CREAT|O_EXCL) might replace the given name with a non-byte-identical filename. But surely open(name) and lstat(name) (with the same name) must find the same file, even if readdir() would report it with a different name. And I assume that a filesystem that rejected any non-NFD filenames or any non-NFC filenames would be totally unusable, in that users will manage to get unnormalized filenames into programs and find that the filesystem just doesn't work. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 21:08 ` Daniel Barkalow @ 2009-05-13 21:29 ` Linus Torvalds 0 siblings, 0 replies; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 21:29 UTC (permalink / raw) To: Daniel Barkalow Cc: Martin Langhoff, Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Daniel Barkalow wrote: > > > > Now, the simple OS X case is not a huge problem, since the lstat will > > succeed with the fixed-up filename too. > > I'm not seeing what the general case is, and how it could possibly behave. Here's a simple example. Let's say that your company uses Latin1 internally for your filesystems, because your tools really aren't utf-8 ready. This is NOT AT ALL unnatural - it's how lots of people used to work with Linux over the years, and it's largely how people still use FAT, I suspect (except it's not latin1, it's some windows-specific 8-bits-per-character mapping). IOW, if you have a file called 'åäö', it literally is encoded as '\xe5\xe4\xf6' (if you wonder why I picked those three letters, it's because they are the regular extra letters in Swedish - Swedish has 29 letters in its alphabet, and those three letters really are letters in their own right, they are NOT 'a' and 'o' with some dots/rings on top). IOW, if you open such a file, you need to use those three bytes. Now, even if you happen to have an OS and use Latin1 on disk, you may realize that you'd like to interact with others that use UTF-8, and would want to have your git archive that you export use nice portable UTF-8. But you absolutely MUST NOT just do a conversion at "readdir()" time. If you do that, then your three-byte filename turns into a six-byte utf-8 sequence of '\xc3\xa5\xc3\xa4\xc3\xb6' and the thing is, now "lstat()" won't work on that sequence. So obviously you could always turn things _back_ for lstat(), but quite frankly, that's (a) insane (b) incompetent and (c) not even always well-defined. > There's the "insensitive" behavior: if you create "foo" and look for > "FOO", it's there, but readdir() reports "foo". > > There's the "converting" behavior: if you create "foo", readdir() reports > "FOO", but lstat("foo") returns it. Then there's the behaviour above: you want your git repository to have utf-8, but your filesystem doesn't convert anything at all, and all your regular tools (think editors etc) are all Latin1. Latin1 is going away, I hope, but I bet EUC-JP etc still exist. Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 17:12 ` Linus Torvalds 2009-05-13 17:31 ` Andreas Ericsson 2009-05-13 17:46 ` Linus Torvalds @ 2009-05-13 20:57 ` Matthias Andree 2009-05-13 21:10 ` Linus Torvalds 2 siblings, 1 reply; 61+ messages in thread From: Matthias Andree @ 2009-05-13 20:57 UTC (permalink / raw) To: Linus Torvalds, Jeff King; +Cc: Shawn O. Pearce, Esko Luontola, git Am 13.05.2009, 19:12 Uhr, schrieb Linus Torvalds <torvalds@linux-foundation.org>: > Use <stringprep.h> and stringprep_utf8_nfkc_normalize() or something to > do the actual normalization if you find characters with the high bit > set. And since I know that the OS X filesystems are so buggy as to not > even do that whole NFD thing right, there is probably some OS-X specific > "use this for > filesystem names" conversion function. Sorry for interrupting, but NF_K_C? You don't want that (K for compatibility, rather than canonical, normalization) for anything except normalizing temporary variables inside strcasecmp(3) or similar. Probably not even that. The normalizations done are often irreversible and also surprising. You don't want to turn 2³.c into 23.c, do you? -- Matthias Andree ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 20:57 ` Matthias Andree @ 2009-05-13 21:10 ` Linus Torvalds 2009-05-13 21:30 ` Jay Soffian 2009-05-13 21:47 ` Matthias Andree 0 siblings, 2 replies; 61+ messages in thread From: Linus Torvalds @ 2009-05-13 21:10 UTC (permalink / raw) To: Matthias Andree; +Cc: Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, 13 May 2009, Matthias Andree wrote: > Am 13.05.2009, 19:12 Uhr, schrieb Linus Torvalds > <torvalds@linux-foundation.org>: > > > Use <stringprep.h> and stringprep_utf8_nfkc_normalize() or something to do > > the actual normalization if you find characters with the high bit set. And > > since I know that the OS X filesystems are so buggy as to not even do that > > whole NFD thing right, there is probably some OS-X specific "use this for > > filesystem names" conversion function. > > Sorry for interrupting, but NF_K_C? You don't want that (K for compatibility, > rather than canonical, normalization) for anything except normalizing > temporary variables inside strcasecmp(3) or similar. Probably not even that. > The normalizations done are often irreversible and also surprising. You don't > want to turn 2³.c into 23.c, do you? No, you're right. We want just plain NFC. I just googled for how some other projects handled this, and found the stringprep thing in a post about rsync, and didn't look any closer. But yes, you're absolutely right, stringprep is total crap, and nfkc is horrible. I have no idea of what library to use, though. For perl, there's Unicode::Normalize, but that's likely still subtly incorrect for the OS-X case due to the filesystem not using _strict_ NFD. I have this dim memory of somebody actually pointing to the documentation of exactly which characters OS X ends up decomposing. Maybe we could just do a git-specific inverse of that, knowing that NOBODY ELSE IN THE WHOLE UNIVERSE IS SO TERMINALLY STUPID AS TO DO THAT DECOMPOSITION, and thus the OS X case is the only one we need to care about? Linus ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 21:10 ` Linus Torvalds @ 2009-05-13 21:30 ` Jay Soffian 2009-05-13 21:47 ` Matthias Andree 1 sibling, 0 replies; 61+ messages in thread From: Jay Soffian @ 2009-05-13 21:30 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias Andree, Jeff King, Shawn O. Pearce, Esko Luontola, git On Wed, May 13, 2009 at 5:10 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > I have this dim memory of somebody actually pointing to the documentation > of exactly which characters OS X ends up decomposing. http://developer.apple.com/technotes/tn/tn1150.html#UnicodeSubtleties http://developer.apple.com/technotes/tn/tn1150table.html j. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-13 21:10 ` Linus Torvalds 2009-05-13 21:30 ` Jay Soffian @ 2009-05-13 21:47 ` Matthias Andree 1 sibling, 0 replies; 61+ messages in thread From: Matthias Andree @ 2009-05-13 21:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff King, Shawn O. Pearce, Esko Luontola, git Am 13.05.2009, 23:10 Uhr, schrieb Linus Torvalds <torvalds@linux-foundation.org>: > > > On Wed, 13 May 2009, Matthias Andree wrote: > >> Am 13.05.2009, 19:12 Uhr, schrieb Linus Torvalds >> <torvalds@linux-foundation.org>: >> >> > Use <stringprep.h> and stringprep_utf8_nfkc_normalize() or something >> to do >> > the actual normalization if you find characters with the high bit >> set. And >> > since I know that the OS X filesystems are so buggy as to not even do >> that >> > whole NFD thing right, there is probably some OS-X specific "use this >> for >> > filesystem names" conversion function. >> >> Sorry for interrupting, but NF_K_C? You don't want that (K for >> compatibility, >> rather than canonical, normalization) for anything except normalizing >> temporary variables inside strcasecmp(3) or similar. Probably not even >> that. >> The normalizations done are often irreversible and also surprising. You >> don't >> want to turn 2³.c into 23.c, do you? > > No, you're right. We want just plain NFC. I just googled for how some > other projects handled this, and found the stringprep thing in a post > about rsync, and didn't look any closer. > > But yes, you're absolutely right, stringprep is total crap, and nfkc is > horrible. Crap? It's just besides the purpose and some limited form of fuzzy match. Anyways... > I have no idea of what library to use, though. For perl, there's > Unicode::Normalize, but that's likely still subtly incorrect for the OS-X > case due to the filesystem not using _strict_ NFD. Perhaps ICU (ICU4C), from http://site.icu-project.org/ -- Matthias Andree ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 15:06 Cross-Platform Version Control Esko Luontola 2009-05-12 15:14 ` Shawn O. Pearce @ 2009-05-12 18:28 ` Dmitry Potapov 2009-05-12 18:40 ` Martin Langhoff 2009-05-14 13:48 ` Cross-Platform Version Control Peter Krefting 2 siblings, 1 reply; 61+ messages in thread From: Dmitry Potapov @ 2009-05-12 18:28 UTC (permalink / raw) To: Esko Luontola; +Cc: git On Tue, May 12, 2009 at 06:06:05PM +0300, Esko Luontola wrote: > A good start for making Git cross-platform, would be storing the text > encoding of every file name and commit message together with the commit. > Currently, because Git is oblivious to the encodings and just considers > them as a series of bytes, there is no way to make them cross-platform. 1. Git already stores the endcoding for all commit messages that are not in UTF-8. 2. If you really want to be cross-platform portable, you should not use any characters in filenames outside of [A-Za-z0-9._-] (i.e. Portable Filename Character Set) http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap03.html#tag_03_276 Dmitry ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 18:28 ` Dmitry Potapov @ 2009-05-12 18:40 ` Martin Langhoff 2009-05-12 18:55 ` Jakub Narebski 0 siblings, 1 reply; 61+ messages in thread From: Martin Langhoff @ 2009-05-12 18:40 UTC (permalink / raw) To: Dmitry Potapov; +Cc: Esko Luontola, git On Tue, May 12, 2009 at 8:28 PM, Dmitry Potapov <dpotapov@gmail.com> wrote: > 2. If you really want to be cross-platform portable, you should not use > any characters in filenames outside of [A-Za-z0-9._-] (i.e. Portable > Filename Character Set) > http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap03.html#tag_03_276 Would it make sense to have warnings at 'git add' time about - filenames outside of that charset (as the strictest mode, perhaps even default) - filenames that have a potential conflict wrt case-sensitivity - filenames that have potential conflict in the same tree due to utf-8 encoding vagaries MHO is that a strict "start your project portable from day one" mode is best as a default. But I'd be happy with any default, actually ;-) m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 18:40 ` Martin Langhoff @ 2009-05-12 18:55 ` Jakub Narebski 2009-05-12 21:43 ` [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames Heiko Voigt 0 siblings, 1 reply; 61+ messages in thread From: Jakub Narebski @ 2009-05-12 18:55 UTC (permalink / raw) To: Martin Langhoff; +Cc: Dmitry Potapov, Esko Luontola, git Martin Langhoff <martin.langhoff@gmail.com> writes: > On Tue, May 12, 2009 at 8:28 PM, Dmitry Potapov <dpotapov@gmail.com> wrote: > > 2. If you really want to be cross-platform portable, you should not use > > any characters in filenames outside of [A-Za-z0-9._-] (i.e. Portable > > Filename Character Set) > > http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap03.html#tag_03_276 > > Would it make sense to have warnings at 'git add' time about > > - filenames outside of that charset (as the strictest mode, perhaps > even default) > - filenames that have a potential conflict wrt case-sensitivity > - filenames that have potential conflict in the same tree due to > utf-8 encoding vagaries > > MHO is that a strict "start your project portable from day one" mode > is best as a default. But I'd be happy with any default, actually ;-) Somebody asked for a pre-add hook in the past; it would be good place to put such check. But in meantime you can do it using pre-commit hook instead, isn't it? -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames 2009-05-12 18:55 ` Jakub Narebski @ 2009-05-12 21:43 ` Heiko Voigt 2009-05-12 21:55 ` Jakub Narebski 0 siblings, 1 reply; 61+ messages in thread From: Heiko Voigt @ 2009-05-12 21:43 UTC (permalink / raw) To: Jakub Narebski Cc: Martin Langhoff, Dmitry Potapov, Esko Luontola, git, Junio C Hamano At the moment non-ascii encodings of file/usernames are not very well supported by git. This will most likely change in the future but to allow repositories to be portable among different file/operating systems this check is enabled by default. Signed-off-by: Heiko Voigt <heiko.voigt@mahr.de> --- On Tue, May 12, 2009 at 11:55:39AM -0700, Jakub Narebski wrote: > Somebody asked for a pre-add hook in the past; it would be good place > to put such check. But in meantime you can do it using pre-commit > hook instead, isn't it? I actually had this in my queue to be submitted... templates/hooks--pre-commit.sample | 33 +++++++++++++++++++++++++++++++++ 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample index 0e49279..83ff873 100755 --- a/templates/hooks--pre-commit.sample +++ b/templates/hooks--pre-commit.sample @@ -7,6 +7,39 @@ # # To enable this hook, rename this file to "pre-commit". +# If you want to allow non-ascii filenames or usernames set +# this variable to true. +allownonascii=$(git config hooks.allownonascii) + +function is_ascii () { + test -z "$(cat | sed -e "s/[\ -~]*//g")" + return $? +} + +if [ "$allownonascii" != "true" ] +then + # until git can handle non-ascii filenames gracefully + # prevent them to be added into the repository + if ! git diff --cached --name-only --diff-filter=A -z \ + | tr "\0" "\n" | is_ascii; then + echo "Non-ascii filenames are not allowed !" + echo "Please rename the file ..." + exit 1 + fi + + # non-ascii username issue a warning in git gui so tell the + # user to change it + if ! git config user.name | is_ascii; then + echo "Please only use ascii characters in your username!" + exit 1 + fi + + if ! git config user.email | is_ascii; then + echo "Please only use ascii characters in your email!" + exit 1 + fi +fi + if git-rev-parse --verify HEAD 2>/dev/null then against=HEAD -- 1.6.3 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames 2009-05-12 21:43 ` [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames Heiko Voigt @ 2009-05-12 21:55 ` Jakub Narebski 2009-05-14 17:59 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt 0 siblings, 1 reply; 61+ messages in thread From: Jakub Narebski @ 2009-05-12 21:55 UTC (permalink / raw) To: Heiko Voigt Cc: Martin Langhoff, Dmitry Potapov, Esko Luontola, git, Junio C Hamano On Tue, 12 May 2009, Heiko Voigt wrote: > At the moment non-ascii encodings of file/usernames are not very well > supported by git. This will most likely change in the future but to > allow repositories to be portable among different file/operating systems > this check is enabled by default. > + # non-ascii username issue a warning in git gui so tell the > + # user to change it > + if ! git config user.name | is_ascii; then > + echo "Please only use ascii characters in your username!" > + exit 1 > + fi > + > + if ! git config user.email | is_ascii; then > + echo "Please only use ascii characters in your email!" > + exit 1 > + fi Actually 1.) there is no easy way to avoid non-ASCII names at least in user.name (I think they are not allowed in email), but 2.) there is no trouble with non-ASCII encoding of commits, as they have 'encoding' header if it is not uft-8 (see *encoding* config variables). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames 2009-05-12 21:55 ` Jakub Narebski @ 2009-05-14 17:59 ` Heiko Voigt 2009-05-15 10:52 ` Martin Langhoff ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Heiko Voigt @ 2009-05-14 17:59 UTC (permalink / raw) To: Jakub Narebski Cc: Martin Langhoff, Dmitry Potapov, Esko Luontola, git, Junio C Hamano At the moment non-ascii encodings of filenames are not portably converted between different filesystems by git. This will most likely change in the future but to allow repositories to be portable among different file/operating systems this check is enabled by default. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> --- On Tue, May 12, 2009 at 11:55:59PM +0200, Jakub Narebski wrote: > On Tue, 12 May 2009, Heiko Voigt wrote: > > > At the moment non-ascii encodings of file/usernames are not very well > > supported by git. This will most likely change in the future but to > > allow repositories to be portable among different file/operating systems > > this check is enabled by default. > > > + # non-ascii username issue a warning in git gui so tell the > > + # user to change it > > + if ! git config user.name | is_ascii; then > > + echo "Please only use ascii characters in your username!" > > + exit 1 > > + fi > > + > > + if ! git config user.email | is_ascii; then > > + echo "Please only use ascii characters in your email!" > > + exit 1 > > + fi > > Actually 1.) there is no easy way to avoid non-ASCII names at least > in user.name (I think they are not allowed in email), but 2.) there > is no trouble with non-ASCII encoding of commits, as they have > 'encoding' header if it is not uft-8 (see *encoding* config variables). I tried it and indeed it seems to work now. This hook originated from a windows installation were having non-ascii characters resulted in a strange warning from git gui each time you commit. So here is the corrected patch. templates/hooks--pre-commit.sample | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample index 0e49279..3083735 100755 --- a/templates/hooks--pre-commit.sample +++ b/templates/hooks--pre-commit.sample @@ -7,6 +7,26 @@ # # To enable this hook, rename this file to "pre-commit". +# If you want to allow non-ascii filenames set this variable to true. +allownonascii=$(git config hooks.allownonascii) + +function is_ascii () { + test -z "$(cat | sed -e "s/[\ -~]*//g")" + return $? +} + +if [ "$allownonascii" != "true" ] +then + # until git can handle non-ascii filenames gracefully + # prevent them to be added into the repository + if ! git diff --cached --name-only --diff-filter=A -z \ + | tr "\0" "\n" | is_ascii; then + echo "Non-ascii filenames are not allowed !" + echo "Please rename the file ..." + exit 1 + fi +fi + if git-rev-parse --verify HEAD 2>/dev/null then against=HEAD -- 1.6.3 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames 2009-05-14 17:59 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt @ 2009-05-15 10:52 ` Martin Langhoff 2009-05-18 9:37 ` Heiko Voigt 2009-06-20 12:14 ` [RFC PATCH] check for filenames that only differ in case to sample pre-commit hook Heiko Voigt 2009-05-15 14:57 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Jakub Narebski 2009-05-15 18:11 ` [PATCH v2] " Junio C Hamano 2 siblings, 2 replies; 61+ messages in thread From: Martin Langhoff @ 2009-05-15 10:52 UTC (permalink / raw) To: Heiko Voigt Cc: Jakub Narebski, Dmitry Potapov, Esko Luontola, git, Junio C Hamano On Thu, May 14, 2009 at 7:59 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote: > At the moment non-ascii encodings of filenames are not portably converted > between different filesystems by git. This will most likely change in the > future but to allow repositories to be portable among different file/operating > systems this check is enabled by default. Nice! - It'd be a good idea to add to the mix a check for filenames that are equivalent in case-insensitive FSs. - Should all of this be a general "portablefilenames" setting? cheers, m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames 2009-05-15 10:52 ` Martin Langhoff @ 2009-05-18 9:37 ` Heiko Voigt 2009-05-18 22:26 ` Jakub Narebski 2009-06-20 12:14 ` [RFC PATCH] check for filenames that only differ in case to sample pre-commit hook Heiko Voigt 1 sibling, 1 reply; 61+ messages in thread From: Heiko Voigt @ 2009-05-18 9:37 UTC (permalink / raw) To: Martin Langhoff Cc: Jakub Narebski, Dmitry Potapov, Esko Luontola, git, Junio C Hamano On Fri, May 15, 2009 at 12:52:41PM +0200, Martin Langhoff wrote: > On Thu, May 14, 2009 at 7:59 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote: > > At the moment non-ascii encodings of filenames are not portably converted > > between different filesystems by git. This will most likely change in the > > future but to allow repositories to be portable among different file/operating > > systems this check is enabled by default. > > Nice! > > - It'd be a good idea to add to the mix a check for filenames that > are equivalent in case-insensitive FSs. I agree, but that will be an extension in another patch. BTW, if anyone has a good idea how to efficiently do that kind of check in a hook I'd cook up a patch on top of this. > - Should all of this be a general "portablefilenames" setting? Well, if you can specify what general portable filenames would have as properties. Questions like: * What is the portable maximum path length? * How long may a filename be (DOS 8.3 ?) * Are windows keywords (PRN, ...) allowed? * ... So I think this should be on a per property basis providing sensible defaults to support the most standard case. cheers Heiko ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 9:37 ` Heiko Voigt @ 2009-05-18 22:26 ` Jakub Narebski 0 siblings, 0 replies; 61+ messages in thread From: Jakub Narebski @ 2009-05-18 22:26 UTC (permalink / raw) To: Heiko Voigt Cc: Martin Langhoff, Dmitry Potapov, Esko Luontola, git, Junio C Hamano On Mon, 18 May 2009, Heiko Voigt wrote: > On Fri, May 15, 2009 at 12:52:41PM +0200, Martin Langhoff wrote: > > - Should all of this be a general "portablefilenames" setting? > > Well, if you can specify what general portable filenames would have as > properties. "Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems" by David A. Wheeler http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 61+ messages in thread
* [RFC PATCH] check for filenames that only differ in case to sample pre-commit hook 2009-05-15 10:52 ` Martin Langhoff 2009-05-18 9:37 ` Heiko Voigt @ 2009-06-20 12:14 ` Heiko Voigt 1 sibling, 0 replies; 61+ messages in thread From: Heiko Voigt @ 2009-06-20 12:14 UTC (permalink / raw) To: Martin Langhoff Cc: Jakub Narebski, Dmitry Potapov, Esko Luontola, git, Junio C Hamano This helps cross-platform projects on the case-sensitive filename side of operating systems to use filenames that are nice for the case-insensitive side --- On Fri, May 15, 2009 at 12:52:41PM +0200, Martin Langhoff wrote: > On Thu, May 14, 2009 at 7:59 PM, Heiko Voigt <hvoigt@hvoigt.net> wrote: > > At the moment non-ascii encodings of filenames are not portably converted > > between different filesystems by git. This will most likely change in the > > future but to allow repositories to be portable among different file/operating > > systems this check is enabled by default. > - It'd be a good idea to add to the mix a check for filenames that > are equivalent in case-insensitive FSs. Totally untested. Just to get feedback if someone has ideas how this can be solved more efficiently. I suspect that processing all files will yield an unbearable performance degradation on large projects. Let me know what you think. The wording of the error message is not yet final. templates/hooks--pre-commit.sample | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample index b11ad6a..32d1809 100755 --- a/templates/hooks--pre-commit.sample +++ b/templates/hooks--pre-commit.sample @@ -9,6 +9,10 @@ # If you want to allow non-ascii filenames set this variable to true. allownonascii=$(git config hooks.allownonascii) +# If you want to allow filenames that only differ in case set this +# variable to true. NOTE: This can degrade performance on project with +# lots of files +allowcaseonly=$(git config hooks.allowcaseonly) # Cross platform projects tend to avoid non-ascii filenames; prevent # them from being added to the repository. We exploit the fact that the @@ -32,6 +36,23 @@ then exit 1 fi +# check for names that already exist but only differ in case +# which can be problematic on non-casesensitive filesystems +if [ "$allowcaseonly" != "true" ] && + test -z "$(git ls-files | LC_ALL=C tr -s [A-Z] [a-z] | uniq -d)" +then + echo "Error: Attempt to add file which already exists in different case" + echo + echo "If you know what you are doing you can disable this" + echo "check using:" + echo + echo " git config hooks.allowcaseonly true" + echo + exit 1 +fi + if git-rev-parse --verify HEAD >/dev/null 2>&1 then against=HEAD -- 1.6.3.2.203.g9a122 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames 2009-05-14 17:59 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt 2009-05-15 10:52 ` Martin Langhoff @ 2009-05-15 14:57 ` Jakub Narebski 2009-05-18 9:50 ` [PATCH] " Heiko Voigt 2009-05-15 18:11 ` [PATCH v2] " Junio C Hamano 2 siblings, 1 reply; 61+ messages in thread From: Jakub Narebski @ 2009-05-15 14:57 UTC (permalink / raw) To: Heiko Voigt Cc: Martin Langhoff, Dmitry Potapov, Esko Luontola, git, Junio C Hamano <Insert standard Dscho disclaimer here...> ;-) In short: good idea, don't be discouraged by criticism... On Thu, 14 May 2009, Heiko Voigt wrote: > At the moment non-ascii encodings of filenames are not portably converted > between different filesystems by git. This will most likely change in the > future but to allow repositories to be portable among different file/operating > systems this check is enabled by default. By the way, you might consider choosing shorter line length for commits, something around 70-76 characters per line; otherwise it is harder to reply to without linewrapping. 80 characters that you used is, IMHO, absolute maximum, and it is good that you kept to it. > > Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> > --- > +# If you want to allow non-ascii filenames set this variable to true. > +allownonascii=$(git config hooks.allownonascii) > + > +function is_ascii () { > + test -z "$(cat | sed -e "s/[\ -~]*//g")" > + return $? > +} >From CodingGuidelines for shell scripts: - We do not write the noiseword "function" in front of shell functions. (in short: do not use bash-specific features... unless, of course, you are modifying bash-completion script). Second, it would be nice to have comment about how to use this function (as it does not check file given by its argument, but rather its standard input). And perhaps also a comment that it works because ASCII printable characters begin with ' ' space (does it have to be escaped?) and end with '~' tilde[2]. Third, isn't it useless use of 'cat'[3]? And wouldn't it be better to use 'tr' to either delete printable characters and check for anything left (as you do; BTW. wouldn't "return test ..." be simpler?), or use 'tr' to count non portable characters? [1] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [2] http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters [3] http://partmaps.org/era/unix/award.html#cat > + > +if [ "$allownonascii" != "true" ] > +then > + # until git can handle non-ascii filenames gracefully > + # prevent them to be added into the repository > + if ! git diff --cached --name-only --diff-filter=A -z \ > + | tr "\0" "\n" | is_ascii; then > + echo "Non-ascii filenames are not allowed !" > + echo "Please rename the file ..." > + exit 1 > + fi > +fi > + > if git-rev-parse --verify HEAD 2>/dev/null > then > against=HEAD > -- > 1.6.3 > > > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH] Extend sample pre-commit hook to check for non ascii filenames 2009-05-15 14:57 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Jakub Narebski @ 2009-05-18 9:50 ` Heiko Voigt 2009-05-18 10:40 ` Johannes Sixt ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Heiko Voigt @ 2009-05-18 9:50 UTC (permalink / raw) To: Jakub Narebski, Junio C Hamano Cc: Martin Langhoff, Dmitry Potapov, Esko Luontola, git At the moment non-ascii encodings of filenames are not portably converted between different filesystems by git. This will most likely change in the future but to allow repositories to be portable among different file/operating systems this check is enabled by default. Signed-off-by: Heiko <hvoigt@hvoigt.net> --- so here is a third version ... On Fri, May 15, 2009 at 04:57:45PM +0200, Jakub Narebski wrote: > On Thu, 14 May 2009, Heiko Voigt wrote: > > > At the moment non-ascii encodings of filenames are not portably converted > > between different filesystems by git. This will most likely change in the > > future but to allow repositories to be portable among different file/operating > > systems this check is enabled by default. > > By the way, you might consider choosing shorter line length for commits, > something around 70-76 characters per line; otherwise it is harder to > reply to without linewrapping. 80 characters that you used is, IMHO, > absolute maximum, and it is good that you kept to it. Yeah, I admit they were a little bit long. > > +function is_ascii () { > > + test -z "$(cat | sed -e "s/[\ -~]*//g")" > > + return $? > > +} > > From CodingGuidelines for shell scripts: > - We do not write the noiseword "function" in front of shell > functions. > > (in short: do not use bash-specific features... unless, of course, > you are modifying bash-completion script). Addressed. > Second, it would be nice to have comment about how to use this > function (as it does not check file given by its argument, but > rather its standard input). And perhaps also a comment that it > works because ASCII printable characters begin with ' ' space > (does it have to be escaped?) and end with '~' tilde[2]. Done > > Third, isn't it useless use of 'cat'[3]? And wouldn't it be better > to use 'tr' to either delete printable characters and check for > anything left (as you do; BTW. wouldn't "return test ..." be simpler?), > or use 'tr' to count non portable characters? Yes indeed it was useless. I also switched from sed to tr. > > [1] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html > [2] http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters > [3] http://partmaps.org/era/unix/award.html#cat On Fri, May 15, 2009 at 11:11:12AM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > +function is_ascii () { > > We do not say "#!/bin/bash" at the beginning (hopefully), so let's not say > "function " here. See above. > > + test -z "$(cat | sed -e "s/[\ -~]*//g")" > > Do you need "cat | "? Also above. > Does this script run under LC_ALL=C? Can an i18n'ized sed interfere with > what you are trying to do? I now explicitely set LC_ALL=C for the tr call which should now be robust against such cases. > > > + return $? > > Do you need this, or does the function return the result of the last > statment anyway? I wasn't aware of that. Removed the return. > > + echo "Non-ascii filenames are not allowed !" > > + echo "Please rename the file ..." > > Can we make this sound more like a _sample_ project policy? It's not like > we enforce that policy to other people's projects. I've polished this so we are now more user friendly as well. templates/hooks--pre-commit.sample | 32 ++++++++++++++++++++++++++++++++ 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample index 0e49279..91ab563 100755 --- a/templates/hooks--pre-commit.sample +++ b/templates/hooks--pre-commit.sample @@ -7,6 +7,38 @@ # # To enable this hook, rename this file to "pre-commit". +# If you want to allow non-ascii filenames set this variable to true. +allownonascii=$(git config hooks.allownonascii) + +# is_ascii() Tests the string given given on standard input for +# printable ascii conformance. We exploit the fact that the printable +# range starts at the space character and ends with tilde. +is_ascii() { + test -z "$(LC_ALL=C tr -d \ -~)" +} + +if [ "$allownonascii" != "true" ] +then + # until git can handle non-ascii filenames gracefully + # prevent them to be added into the repository + if ! git diff --cached --name-only --diff-filter=A -z \ + | tr "\0" "\n" | is_ascii; then + echo "Error: Preventing to add a non-ascii filename." + echo + echo "This can cause problems if you want to work together" + echo "with people on other platforms than you." + echo + echo "To be portable it is adviseable to rename the file ..." + echo + echo "If you know what you are doing you can disable this" + echo "check using:" + echo + echo " git config hooks.allownonascii true" + echo + exit 1 + fi +fi + if git-rev-parse --verify HEAD 2>/dev/null then against=HEAD -- 1.6.3 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 9:50 ` [PATCH] " Heiko Voigt @ 2009-05-18 10:40 ` Johannes Sixt 2009-05-18 11:50 ` Heiko Voigt 2009-05-19 20:01 ` [PATCH v4] " Heiko Voigt 2009-05-18 14:42 ` [PATCH] " Junio C Hamano 2009-05-18 20:35 ` Julian Phillips 2 siblings, 2 replies; 61+ messages in thread From: Johannes Sixt @ 2009-05-18 10:40 UTC (permalink / raw) To: Heiko Voigt Cc: Jakub Narebski, Junio C Hamano, Martin Langhoff, Dmitry Potapov, Esko Luontola, git Heiko Voigt schrieb: > +# is_ascii() Tests the string given given on standard input for > +# printable ascii conformance. We exploit the fact that the printable > +# range starts at the space character and ends with tilde. > +is_ascii() { > + test -z "$(LC_ALL=C tr -d \ -~)" > +} > + > +if [ "$allownonascii" != "true" ] > +then > + # until git can handle non-ascii filenames gracefully > + # prevent them to be added into the repository > + if ! git diff --cached --name-only --diff-filter=A -z \ > + | tr "\0" "\n" | is_ascii; then Will this not fail to add more than one file with allowed names? The \n is not removed in is_ascii(), and so the resulting string will not be empty. BTW, not all tr work well with NULs. See the commit message of e85fe4d8, for example. Otherwise, I would have suggested to convert the NUL to some allowed ASCII character, e.g. 'A'. BTW, you should really use '\0' and '\n' (single-quotes) to guarantee that the shell does not ignore the backslash. -- Hannes ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 10:40 ` Johannes Sixt @ 2009-05-18 11:50 ` Heiko Voigt 2009-05-18 12:04 ` Johannes Sixt 2009-05-19 20:01 ` [PATCH v4] " Heiko Voigt 1 sibling, 1 reply; 61+ messages in thread From: Heiko Voigt @ 2009-05-18 11:50 UTC (permalink / raw) To: Johannes Sixt Cc: Jakub Narebski, Junio C Hamano, Martin Langhoff, Dmitry Potapov, Esko Luontola, git On Mon, May 18, 2009 at 12:40:09PM +0200, Johannes Sixt wrote: > Heiko Voigt schrieb: > > +# is_ascii() Tests the string given given on standard input for > > +# printable ascii conformance. We exploit the fact that the printable > > +# range starts at the space character and ends with tilde. > > +is_ascii() { > > + test -z "$(LC_ALL=C tr -d \ -~)" > > +} > > + > > +if [ "$allownonascii" != "true" ] > > +then > > + # until git can handle non-ascii filenames gracefully > > + # prevent them to be added into the repository > > + if ! git diff --cached --name-only --diff-filter=A -z \ > > + | tr "\0" "\n" | is_ascii; then > > Will this not fail to add more than one file with allowed names? The \n is > not removed in is_ascii(), and so the resulting string will not be empty. No currently it does not. At least on my system, but good point. > BTW, not all tr work well with NULs. See the commit message of e85fe4d8, > for example. Otherwise, I would have suggested to convert the NUL to some > allowed ASCII character, e.g. 'A'. BTW, you should really use '\0' and > '\n' (single-quotes) to guarantee that the shell does not ignore the > backslash. Are there any problems with '\0' and tr other than swallowing of it. In case not I would just change tr "\0" "\n" to tr -d '\0' That way there are no '\n's left over and it doesn't matter if tr swallows the '\0'. Waiting for further comments before sending the cleanup. cheers Heiko ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 11:50 ` Heiko Voigt @ 2009-05-18 12:04 ` Johannes Sixt 0 siblings, 0 replies; 61+ messages in thread From: Johannes Sixt @ 2009-05-18 12:04 UTC (permalink / raw) To: Heiko Voigt Cc: Jakub Narebski, Junio C Hamano, Martin Langhoff, Dmitry Potapov, Esko Luontola, git Heiko Voigt schrieb: > Are there any problems with '\0' and tr other than swallowing of it. I can't tell. But the commits ae90e16..aab0abf are interesting to study in w.r.t. portability. > In > case not I would just change > > tr "\0" "\n" > to > tr -d '\0' In which case I'd suggest that you call tr only once, in isascii(): tr -d '[ -~]\0' -- Hannes ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v4] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 10:40 ` Johannes Sixt 2009-05-18 11:50 ` Heiko Voigt @ 2009-05-19 20:01 ` Heiko Voigt 1 sibling, 0 replies; 61+ messages in thread From: Heiko Voigt @ 2009-05-19 20:01 UTC (permalink / raw) To: Johannes Sixt, Junio C Hamano, Julian Phillips Cc: Jakub Narebski, Martin Langhoff, Dmitry Potapov, Esko Luontola, git At the moment non-ascii encodings of filenames are not portably converted between different filesystems by git. This will most likely change in the future but to allow repositories to be portable among different file/operating systems this check is enabled by default. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> --- Thanks for all comments. I now hopefully have a satisfying patch. On Mon, May 18, 2009 at 12:40:09PM +0200, Johannes Sixt wrote: > Heiko Voigt schrieb: > > + if ! git diff --cached --name-only --diff-filter=A -z \ > > + | tr "\0" "\n" | is_ascii; then > > Will this not fail to add more than one file with allowed names? The \n is > not removed in is_ascii(), and so the resulting string will not be empty. > > BTW, not all tr work well with NULs. See the commit message of e85fe4d8, > for example. Otherwise, I would have suggested to convert the NUL to some > allowed ASCII character, e.g. 'A'. BTW, you should really use '\0' and > '\n' (single-quotes) to guarantee that the shell does not ignore the > backslash. I removed all \0 characters and hopefully use the correct platform independent syntax as described in the commits you send. On Mon, May 18, 2009 at 02:04:08PM +0200, Johannes Sixt wrote: > Heiko Voigt schrieb: > > Are there any problems with '\0' and tr other than swallowing of it. > > I can't tell. But the commits ae90e16..aab0abf are interesting to study in > w.r.t. portability. > > > In > > case not I would just change > > > > tr "\0" "\n" > > to > > tr -d '\0' > > In which case I'd suggest that you call tr only once, in isascii(): > > tr -d '[ -~]\0' After reading a little about the portability things. This seems to be the right way and is now included. On Mon, May 18, 2009 at 07:42:31AM -0700, Junio C Hamano wrote: > Heiko Voigt <hvoigt@hvoigt.net> writes: > > > +if [ "$allownonascii" != "true" ] > > +then > > + # until git can handle non-ascii filenames gracefully > > + # prevent them to be added into the repository > > I think you can inline your is_ascii shell function in the pipeline below. > You made it a separate function and I agree that it has a very good > documentation value, but the mention of "non-ascii filenames" in this > comment here is enough clue to let anybody know what is going on. I agree. I thought it would probably be useful in other places but we just need it once so its inlined now. > > Side note: I am not sure "Until ... can ... gracefully" is a good > description of the general problem. It probably is more neutral > to say "Cross platform projects tend to avoid non-ascii filenames; > prevent them from being added to the repository." Changed that. > > > + if ! git diff --cached --name-only --diff-filter=A -z \ > > + | tr "\0" "\n" | is_ascii; then > > A standard trick while writing a long pipeline in shell is to change line > after a pipe, like: > > cmd1 | > cmd2 | > cmd3 > > which allows you to lose the BS-before-LF sequence. Wasn't aware of that. Changed it accordingly. On Mon, May 18, 2009 at 09:35:19PM +0100, Julian Phillips wrote: > On Mon, 18 May 2009, Heiko Voigt wrote: >> + echo "Error: Preventing to add a non-ascii filename." > > This would read better as: > > + echo "Error: Attempt to add a non-ascii filename." > > (after all the prevention itself is a result of the error, not the cause > of it) That really sounds better. Thanks. templates/hooks--pre-commit.sample | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample index 0e49279..ad892a2 100755 --- a/templates/hooks--pre-commit.sample +++ b/templates/hooks--pre-commit.sample @@ -7,6 +7,31 @@ # # To enable this hook, rename this file to "pre-commit". +# If you want to allow non-ascii filenames set this variable to true. +allownonascii=$(git config hooks.allownonascii) + +# Cross platform projects tend to avoid non-ascii filenames; prevent +# them from being added to the repository. We exploit the fact that the +# printable range starts at the space character and ends with tilde. +if [ "$allownonascii" != "true" ] && + test "$(git diff --cached --name-only --diff-filter=A -z | + LC_ALL=C tr -d '[ -~]\0')" +then + echo "Error: Attempt to add a non-ascii filename." + echo + echo "This can cause problems if you want to work together" + echo "with people on other platforms than you." + echo + echo "To be portable it is adviseable to rename the file ..." + echo + echo "If you know what you are doing you can disable this" + echo "check using:" + echo + echo " git config hooks.allownonascii true" + echo + exit 1 +fi + if git-rev-parse --verify HEAD 2>/dev/null then against=HEAD -- 1.6.3 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 9:50 ` [PATCH] " Heiko Voigt 2009-05-18 10:40 ` Johannes Sixt @ 2009-05-18 14:42 ` Junio C Hamano 2009-05-18 20:35 ` Julian Phillips 2 siblings, 0 replies; 61+ messages in thread From: Junio C Hamano @ 2009-05-18 14:42 UTC (permalink / raw) To: Heiko Voigt Cc: Jakub Narebski, Junio C Hamano, Martin Langhoff, Dmitry Potapov, Esko Luontola, git Heiko Voigt <hvoigt@hvoigt.net> writes: > +if [ "$allownonascii" != "true" ] > +then > + # until git can handle non-ascii filenames gracefully > + # prevent them to be added into the repository I think you can inline your is_ascii shell function in the pipeline below. You made it a separate function and I agree that it has a very good documentation value, but the mention of "non-ascii filenames" in this comment here is enough clue to let anybody know what is going on. Side note: I am not sure "Until ... can ... gracefully" is a good description of the general problem. It probably is more neutral to say "Cross platform projects tend to avoid non-ascii filenames; prevent them from being added to the repository." > + if ! git diff --cached --name-only --diff-filter=A -z \ > + | tr "\0" "\n" | is_ascii; then A standard trick while writing a long pipeline in shell is to change line after a pipe, like: cmd1 | cmd2 | cmd3 which allows you to lose the BS-before-LF sequence. I think comments from J6t and others are valuable but clear enough that I wouldn't have to repeat them. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH] Extend sample pre-commit hook to check for non ascii filenames 2009-05-18 9:50 ` [PATCH] " Heiko Voigt 2009-05-18 10:40 ` Johannes Sixt 2009-05-18 14:42 ` [PATCH] " Junio C Hamano @ 2009-05-18 20:35 ` Julian Phillips 2 siblings, 0 replies; 61+ messages in thread From: Julian Phillips @ 2009-05-18 20:35 UTC (permalink / raw) To: Heiko Voigt Cc: Jakub Narebski, Junio C Hamano, Martin Langhoff, Dmitry Potapov, Esko Luontola, git On Mon, 18 May 2009, Heiko Voigt wrote: > +if [ "$allownonascii" != "true" ] > +then > + # until git can handle non-ascii filenames gracefully > + # prevent them to be added into the repository > + if ! git diff --cached --name-only --diff-filter=A -z \ > + | tr "\0" "\n" | is_ascii; then > + echo "Error: Preventing to add a non-ascii filename." This would read better as: + echo "Error: Attempt to add a non-ascii filename." (after all the prevention itself is a result of the error, not the cause of it) If you want to keep the preventing, then you need to at least correct the english: > + echo "Error: Preventing addition of a non-ascii filename." -- Julian --- QOTD: Money isn't everything, but at least it keeps the kids in touch. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames 2009-05-14 17:59 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt 2009-05-15 10:52 ` Martin Langhoff 2009-05-15 14:57 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Jakub Narebski @ 2009-05-15 18:11 ` Junio C Hamano 2 siblings, 0 replies; 61+ messages in thread From: Junio C Hamano @ 2009-05-15 18:11 UTC (permalink / raw) To: Heiko Voigt Cc: Jakub Narebski, Martin Langhoff, Dmitry Potapov, Esko Luontola, git, Junio C Hamano Heiko Voigt <hvoigt@hvoigt.net> writes: > diff --git a/templates/hooks--pre-commit.sample b/templates/hooks--pre-commit.sample > index 0e49279..3083735 100755 > --- a/templates/hooks--pre-commit.sample > +++ b/templates/hooks--pre-commit.sample > @@ -7,6 +7,26 @@ > # > # To enable this hook, rename this file to "pre-commit". > > +# If you want to allow non-ascii filenames set this variable to true. > +allownonascii=$(git config hooks.allownonascii) > + > +function is_ascii () { We do not say "#!/bin/bash" at the beginning (hopefully), so let's not say "function " here. > + test -z "$(cat | sed -e "s/[\ -~]*//g")" Do you need "cat | "? Does this script run under LC_ALL=C? Can an i18n'ized sed interfere with what you are trying to do? > + return $? Do you need this, or does the function return the result of the last statment anyway? > + echo "Non-ascii filenames are not allowed !" > + echo "Please rename the file ..." Can we make this sound more like a _sample_ project policy? It's not like we enforce that policy to other people's projects. > + exit 1 > + fi > +fi > + > if git-rev-parse --verify HEAD 2>/dev/null > then > against=HEAD > -- > 1.6.3 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-12 15:06 Cross-Platform Version Control Esko Luontola 2009-05-12 15:14 ` Shawn O. Pearce 2009-05-12 18:28 ` Dmitry Potapov @ 2009-05-14 13:48 ` Peter Krefting 2009-05-14 19:58 ` Esko Luontola 2 siblings, 1 reply; 61+ messages in thread From: Peter Krefting @ 2009-05-14 13:48 UTC (permalink / raw) To: Esko Luontola; +Cc: git Esko Luontola: > A good start for making Git cross-platform, would be storing the text > encoding of every file name and commit message together with the commit. Is it really necessary to store the encoding for every single file name, should it not be enough to just store encoding information for all file names at once (i.e., for the object that contains the list of file names and their associated blobs)? I did publish, as a request for comments, the beginnings of a patch that would change the Windows version of Git to expect file names to be UTF-8 encoded. There were some comments about it, especially that I could not just assume that UTF-8 was the right thing to assume. Perhaps if we added some meta-data, maybe using the same fall-back mechanism as for commit messages (i.e., assume UTF-8 unless otherwise specified), it would be easier to do. On Windows, the file APIs allow you to use Unicode (UTF-16) to specify file names, and the file systems will handle any necessary conversion to whatever byte sequences are used to store the file names. UTF-16 and UTF-8 are trivial to convert between, and Windows does contain APIs to convert between other character encodings and UTF-16. On Mac OS X, I believe the file system APIs assume you use some kind of normalized UTF-8. That should also be possible to create, possibly converting back and forth between different normalization forms, if necessary. On Linux and other Unixes we could just use iconv() to convert from the repository file name encoding to whatever the current locale has set up. The trick here is to handle file names outside the current encoding. Some kind of escaping mechanism will probably need to be introduced. The best way would be to define this in the Git core once and for all, and add support to it for all the platforms in the same go, instead of trying to hack around the issue whenever it pops up on the various platforms. My main use-case for Git on Windows has disappeared as my $dayjob went bankrupt, but I am happy to assist with whatever insight I may be able to bring. -- \\// Peter - http://www.softwolves.pp.se/ ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-14 13:48 ` Cross-Platform Version Control Peter Krefting @ 2009-05-14 19:58 ` Esko Luontola 2009-05-14 20:21 ` Andreas Ericsson ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Esko Luontola @ 2009-05-14 19:58 UTC (permalink / raw) To: Peter Krefting; +Cc: git Peter Krefting wrote on 14.5.2009 16:48: > Is it really necessary to store the encoding for every single file name, > should it not be enough to just store encoding information for all file > names at once (i.e., for the object that contains the list of file names > and their associated blobs)? What about if some disorganized project has people committing with many different encodings? Should we allow it, that a directory has the names of some files using one encoding, and the names of other files using another encoding? Or should we force the whole repository to use the same encoding? > The best way would be to define this in the Git core once and for all, > and add support to it for all the platforms in the same go, instead of > trying to hack around the issue whenever it pops up on the various > platforms. +1 -- Esko Luontola www.orfjackal.net ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-14 19:58 ` Esko Luontola @ 2009-05-14 20:21 ` Andreas Ericsson 2009-05-14 22:25 ` Johannes Schindelin 2009-05-15 11:18 ` Dmitry Potapov 2 siblings, 0 replies; 61+ messages in thread From: Andreas Ericsson @ 2009-05-14 20:21 UTC (permalink / raw) To: Esko Luontola; +Cc: Peter Krefting, git Esko Luontola wrote: > Peter Krefting wrote on 14.5.2009 16:48: >> Is it really necessary to store the encoding for every single file >> name, should it not be enough to just store encoding information for >> all file names at once (i.e., for the object that contains the list of >> file names and their associated blobs)? > > What about if some disorganized project has people committing with many > different encodings? Should we allow it, that a directory has the names > of some files using one encoding, and the names of other files using > another encoding? Or should we force the whole repository to use the > same encoding? > If encodings are on a per-tree basis, we could add a special mode-flag for it without breaking backwards incompatibility (I think, anyways). Older gits just won't know how to handle it and will treat it as a byte-stream. >> The best way would be to define this in the Git core once and for all, >> and add support to it for all the platforms in the same go, instead of >> trying to hack around the issue whenever it pops up on the various >> platforms. > > +1 > There's still the problem that noone's stepped forward to do all that work yet, so apparently this isn't important enough for people to put their patches where their mouths are. Often when issues generate long discussions and no code, it's of high academic interest and of little real-world value. I believe the "little real-world value" here comes from the fact that cross-platform projects often enforce 7-bit ascii compatible filenames from the start, because they *know* they may run into problems on other filesystems otherwise. Remember it's not only git that has to get things right. It's also build-systems and compilers that have to locate the correct files (the Makefile and the filesystem may use different encodings), so in the real world, people really do stay away from filenames with åäö or other non-ascii chars in them. It's fun to discuss, but I won't spend any time on it. Good luck to those who do though. I'd quite like to see if someone could pull it off without breaking backwards compatibility or impacting performance too much. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Register now for Nordic Meet on Nagios, June 3-4 in Stockholm http://nordicmeetonnagios.op5.org/ Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-14 19:58 ` Esko Luontola 2009-05-14 20:21 ` Andreas Ericsson @ 2009-05-14 22:25 ` Johannes Schindelin 2009-05-15 11:18 ` Dmitry Potapov 2 siblings, 0 replies; 61+ messages in thread From: Johannes Schindelin @ 2009-05-14 22:25 UTC (permalink / raw) To: Esko Luontola; +Cc: Peter Krefting, git Hi, On Thu, 14 May 2009, Esko Luontola wrote: > Peter Krefting wrote on 14.5.2009 16:48: > > > The best way would be to define this in the Git core once and for all, > > and add support to it for all the platforms in the same go, instead of > > trying to hack around the issue whenever it pops up on the various > > platforms. > > +1 You might be enthusiastic about this cunning idea. However, if it costs me performance on Linux, and all the benefits go to Windows users, then I will remove this "solution" from my personal Git tree _right away_, and I'd expect a lot of other people, too. I repeat this just once more: if you add complexity, you'll have to have a compelling reason to do so. If there is no benefit for Linux users, why should they bear the cost? But as Andreas remarked, I sincerely think that there has been enough talk about the issue. It's time to see some patches, or to stop the discussion. Ciao, Dscho ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-05-14 19:58 ` Esko Luontola 2009-05-14 20:21 ` Andreas Ericsson 2009-05-14 22:25 ` Johannes Schindelin @ 2009-05-15 11:18 ` Dmitry Potapov 2 siblings, 0 replies; 61+ messages in thread From: Dmitry Potapov @ 2009-05-15 11:18 UTC (permalink / raw) To: Esko Luontola; +Cc: Peter Krefting, git On Thu, May 14, 2009 at 10:58:17PM +0300, Esko Luontola wrote: > > What about if some disorganized project has people committing with many > different encodings? Should we allow it, that a directory has the names > of some files using one encoding, and the names of other files using > another encoding? Or should we force the whole repository to use the > same encoding? The whole repository should have the same encoding internally. Anything else will be too complex and too slow... Have you seen any file system where file names would be stored in different encodings? And Git does far more operation on file names than a file system does. So, it is clearly to me that the whole repository should have a single encoding. Now, I don't think that you will find many open source projects that use non-ASCII in file names. Moreover, most Linux users are either use UTF-8 already or switch to it in the near future. Mac OS X uses UTF-8 (though there is a problem with decomposed characters, but Linus posted a possible solution). So, the only platform were non-ASCII characters may be interesting to Git users and that does not support UTF-8 is Windows. AFAIK, Cygwin 1.7 has UTF-8 support. So, it is mostly a problem for msysGit... Though adding support for legacy encodings can help to some degree, it means that every system call involving a file name will go through UTF-8 <-> LEGACY_ENC <-> UTF-16LE conversion. IMHO, having a legacy encoding involved is far from the best possible solution; but to avoid that, you need to change MSYS to be able to work with UTF-8. (I have never looked at MSYS myself, but I suspect it may be not easy). Dmitry ^ permalink raw reply [flat|nested] 61+ messages in thread
* Eric Sink's blog - notes on git, dscms and a "whole product" approach @ 2009-04-27 8:55 Martin Langhoff 2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski 0 siblings, 1 reply; 61+ messages in thread From: Martin Langhoff @ 2009-04-27 8:55 UTC (permalink / raw) To: Git Mailing List Eric Sink hs been working on the (commercial, proprietary) centralised SCM Vault for a while. He's written recently about his explorations around the new crop of DSCMs, and I think it's quite interesting. A quick search of the list archives makes me thing it wasn't discussed before. The guy is knowledgeable, and writes quite witty posts -- naturally, there's plenty to disagree on, but I'd like to encourage readers not to nitpick or focus on where Eric is wrong. It is interesting to read where he thinks git and other DSCMs are missing the mark. Maybe he's right, maybe he's wrong, but damn he's interesting :-) So here's the blog - http://www.ericsink.com/ These are the best entry points http://www.ericsink.com/entries/quirky.html http://www.ericsink.com/entries/hg_denzel.html To be frank, I think he's wrong in some details (as he's admittedly only spent limited time with it) but right on the larger-picture (large userbases want it integrated and foolproof, bugtracking needs to go distributed alongside the code, git is as powerful^Wdangerous as C). cheers, martin -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) 2009-04-27 8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff @ 2009-04-28 11:24 ` Jakub Narebski 2009-04-29 6:55 ` Martin Langhoff 0 siblings, 1 reply; 61+ messages in thread From: Jakub Narebski @ 2009-04-28 11:24 UTC (permalink / raw) To: Martin Langhoff; +Cc: Git Mailing List Martin Langhoff <martin.langhoff@gmail.com> writes: > Eric Sink hs been working on the (commercial, proprietary) centralised > SCM Vault for a while. He's written recently about his explorations > around the new crop of DSCMs, and I think it's quite interesting. A > quick search of the list archives makes me thing it wasn't discussed > before. > > The guy is knowledgeable, and writes quite witty posts -- naturally, > there's plenty to disagree on, but I'd like to encourage readers not > to nitpick or focus on where Eric is wrong. It is interesting to read > where he thinks git and other DSCMs are missing the mark. > > Maybe he's right, maybe he's wrong, but damn he's interesting :-) > > So here's the blog - http://www.ericsink.com/ "Here's a blog"... and therefore my dilemma. Should I post my reply as a comment to this blog, or should I reply here on git mailing list? > These are the best entry points Because those two entries are quite different, I'll reply separately 1. "Ten Quirky Issues with Cross-Platform Version Control" > http://www.ericsink.com/entries/quirky.html which is generic comment about (mainly) using version control in heterogenic environment, where different machines have different filesystem limitations. I'll concentrate here on that issue. 2. "Mercurial, Subversion, and Wesley Snipes" > http://www.ericsink.com/entries/hg_denzel.html where, paraphrasing, Eric Sink says that he doesn't write about Mercurial and Subversion because they are perfect. Or at least not as controversial (and controversial means interesting). > > To be frank, I think he's wrong in some details (as he's admittedly > only spent limited time with it) but right on the larger-picture > (large userbases want it integrated and foolproof, bugtracking needs > to go distributed alongside the code, git is as powerful^Wdangerous as > C). Neither of mentioned above blog posts touches those issues, BTW... ---------------------------------------------------------------------- Ad 1. "Ten Quirky Issues with Cross-Platform Version Control" Actually those are two issues: troubles with different limitations of different filesystems, and different dealing with line endings in text files on different platforms. Line endings (issue 8.) is in theory and in practice (at least for Git) a non-issue. In theory you should use project's convention for end of line character in text files, and use smart editor that can deal (or can be configured to deal) with this issue correctly. In practice this is a matter of correctly setting up core.autocrlf (and in more complicated cases, where more complicated means for git very very rare, configuring which files are text and which are not). There are a few classes of troubles with filesystems (with filenames). 1. Different limitations on file names (e.g. pathname length), different special characters, different special filenames (if any). Those are issues 2. (special basename PRN on MS Windows), issue 3. (trailing dot, trailing whitespace), issue 4. (pathname and filename length limit), issue 6. (special characters, in this case colon being path element delimiter on MacOS, but it is also about special characters like colon, asterisk and question mark on MS Windows) and also issue 7. (name that begins with dash) in Eric Sink article. The answer is convention for filenames in a project. Simply DON'T use filenames which can cause problems. There is no way to simply solve this problem in version control system, although I think if you really, really, really need it you should be able to cobble something together using low-level git tools to have different name for filename in working directory from the one used in repository (and index). See also David A. Wheeler essay "Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems" http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html DON'T DO THAT. 2. "Case-insensitive" but "case-preserving" filesystems; the case where some different filenames are equivalent (like 'README' and 'readme' on case-insensitive filesystem), but are returned as you created them (so if you created 'README', you would get 'README' in directory listing, but filesystem would return that 'readme' exists too). This is issue 1. ('README' and 'readme' in the same directory) in Eric Sink article. The answer is like for previous issue: don't. Simply DO NOT create files with filenames which differ only in case (like unfortunate ct_conntrack.h and cn_CONNTRACK.h or similar in Linux kernel). But I think that even in case where such unfortunate incident (two filenames differing only in case) occur, you can deal with it in Git by using lower level tools (and editing only one of two such files at once). You would get spurious info about modified files in git-status, though... perhaps that could be improved using infrastructure created (IIRC) by Linus for dealing with 'insane' filesystems. DON'T DO THAT, SOLVABLE. 3. Non "Case-preserving" filesystems, where filename as sequence of bytes differ between what you created, and what you get from filesystem. An example here is MacOS X filesystem, which accepts filenames in NFC composed normalized form of Unicode, but stores them internally and returns them in NFD decomposed form. This is issue 9. (Español being "Espa\u00f1ol" in NFC, but "Espan\u0303ol" in NFD). In this case 'don't do this' might be not acceptable answer. Perhaps you need non-ASCII characters in filenames. Not always can you use filesystem or specify mount point option that makes it not a problem. I remember that this issue was discussed extensively on git mailing list, but I don't remember what was the conclusion (beside agreeing that filesystem that is not "*-preserving" is not sane filesystem ;). In particular I do not remember if Git can deal with this issue sanely (I remember Linus adding infrastructure for that, but did it solve this problem...). PROBABLY SOLVED. 4. Filesystems which cannot store all SCM-sane metainfo, for example filesystems without support for symbolic links, or without support for executable permission (executable bit). This is extension of issue 10. (which is limited to symbolic links) in Eric Sink article. In Git you have core.fileMode to ignore executable bit differences (you would need to use SCM tools and not filesystem tools to maniulate it), and core.symlinks to be able to checkout symlinks as plain text files (again using SCM tools to manipulate). SOLVED. There is also mistaken implicit assumption that version control systems have (and should) preserve all metadata. 5. The issue of extra metadata that is not SCM-sane, and which different filesystems can or cannot store. Examples include full Unix permissions, Unix ownership (and groups file belongs to), other permission-related metadata such as ACL, extra resources tied to file such as EA (extended attributes) for some Linux filesystems or (in)famous resource form in MacOS. This is issue 5. (resource fork on MacOS vs. xattrs on Linux) in Eric Sink article. This is not an issue for SCM: _source_ code management system to solve. Preserving extra metadata indiscrimitedly can cause problems, like e.g. full permissions and ownership. Therefore SCM preserve only limited SCM-sane subset of metadata. If you need to preserve extra metadata, you can use (in good SCMs) hooks for that, like e.g. etckeeper uses metastore (in Git). NOT A PROBLEM. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) 2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski @ 2009-04-29 6:55 ` Martin Langhoff 2009-04-29 7:52 ` Cross-Platform Version Control Jakub Narebski 0 siblings, 1 reply; 61+ messages in thread From: Martin Langhoff @ 2009-04-29 6:55 UTC (permalink / raw) To: Jakub Narebski; +Cc: Git Mailing List On Tue, Apr 28, 2009 at 1:24 PM, Jakub Narebski <jnareb@gmail.com> wrote: > DON'T DO THAT. > DON'T DO THAT, SOLVABLE. As I mentioned, Eric is taking the perspective of offering a supported SCM to a large and diverse audience. As such, his notes are interesting not because he's right or he's wrong. We can be "right" and say "don't do that" if we shrink our audience so that it looks a lot like us. There, fixed. But something tells me that successful tools are -- by definition -- tools that grow past their creators use. So from Eric's perspective, it is worthwhile to work on all those issues, and get the right for the end user -- support things we don't like, offer foolproof catches and warnings that prevent the user from shooting their lovely toes off to mars, etc. His perspective is one of commercial licensing, but even if we aren't driven by the "each new user is a new dollar" bit, the long term hopes for git might also be to be widely used and to improve the version control life of many unsuspecting users. To get there, I suspect we have to understand more of Eric's perspective. that's my 2c. m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-04-29 6:55 ` Martin Langhoff @ 2009-04-29 7:52 ` Jakub Narebski 2009-04-29 8:25 ` Martin Langhoff 0 siblings, 1 reply; 61+ messages in thread From: Jakub Narebski @ 2009-04-29 7:52 UTC (permalink / raw) To: Martin Langhoff; +Cc: Git Mailing List On Wed, 29 April 2009, Martin Langhoff wrote: > On Tue, Apr 28, 2009 at 1:24 PM, Jakub Narebski <jnareb@gmail.com> > wrote: [I think you cut out a bit too much. Here I resurrected it] JN> 1. Different limitations on file names (e.g. pathname length), JN> different special characters, different special filenames JN> (if any). [...] JN> The answer is convention for filenames in a project. Simply JN> DON'T use filenames which can cause problems. [...] > > DON'T DO THAT. What could be proper solution to that, if you do not accept social rather than technical restriction? We can have pre-commit hook that checks for portability for filenames (which is deployment specific, and shouldn't be part of SCM perhaps with an exception of being example hook) but it wouldn't help dealing with non-portable filenames on filesystem that cannot represent them that are there. If I remember correctly Git for some time has layer which can translate between filenames in repository and filenames on filesystem, but I'm not sure if it is generic enough for it to be a solution to this problem, and currently there is no way to manipulate this mapping, I think. JN> 2. "Case-insensitive" but "case-preserving" filesystems. [...] JN> JN> The answer is like for previous issue: don't. Simply DO NOT JN> create files with filenames which differ only in case [...] > > DON'T DO THAT, SOLVABLE. By 'solvable' here I mean that you should be able to modify only one of clashing files at once (checkout 'README', modify, add to index, remove from filesystem, checkout 'readme', modify, etc.), and deal with annoyances in git-status output. It can be done in Git, with medium amount of hacking. I don't think any other SCM can do even this, and I cannot think of a better, automatic solution that would somehow deal with case-clashing. Note that all deals are off in case-insensitive and not preserving filesystem. By the way, wouldn't be a better solution to use sane filesystem, rather than complicating SCM? ;-) > > As I mentioned, Eric is taking the perspective of offering a supported > SCM to a large and diverse audience. As such, his notes are > interesting not because he's right or he's wrong. > > We can be "right" and say "don't do that" if we shrink our audience so > that it looks a lot like us. There, fixed. <quote source="Dune by Frank Herbert"> [...] the attitude of the knife — chopping off what's incomplete and saying: "Now it's complete because it's ended here." </quote> I could not resist posting this quote :-P > > But something tells me that successful tools are -- by definition -- > tools that grow past their creators use. > > So from Eric's perspective, it is worthwhile to work on all those > issues, and get the right for the end user -- support things we don't > like, offer foolproof catches and warnings that prevent the user from > shooting their lovely toes off to mars, etc. Warnings and catches I can accept; adding complications and corner cases for situations which can be trivially avoided with a bit of social engineering aka. project guidelines... not so much. I simply cannot see the situation where you _must_ have dangerously unportable file names (trailing dot, trailing whitespace) and case-clashing files... > > His perspective is one of commercial licensing, but even if we aren't > driven by the "each new user is a new dollar" bit, the long term hopes > for git might also be to be widely used and to improve the version > control life of many unsuspecting users. > > To get there, I suspect we have to understand more of Eric's > perspective. > > that's my 2c. By the way, I think that the article on cross-platform version control (version control in heterogenic environment) is quite good article. I don't quite like the "10 Issues"/"Top 10" way of writing, but the article examines different ways that heterogenic environment can trip SCM. In my opinion Git does quite good here, where it can, and where the issue is to be solved by SCM and not otherwise (extra metadata like resource fork). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: Cross-Platform Version Control 2009-04-29 7:52 ` Cross-Platform Version Control Jakub Narebski @ 2009-04-29 8:25 ` Martin Langhoff 0 siblings, 0 replies; 61+ messages in thread From: Martin Langhoff @ 2009-04-29 8:25 UTC (permalink / raw) To: Jakub Narebski; +Cc: Git Mailing List On Wed, Apr 29, 2009 at 9:52 AM, Jakub Narebski <jnareb@gmail.com> wrote: >> > DON'T DO THAT. > > What could be proper solution to that, if you do not accept social > rather than technical restriction? Let's say strong checks for case sensitivity clashes, leading/trailing dots, utf-8 encoding maladies, etc switched on by default. And note that to be user-friendly you want most of those checks at 'add' time. If we don't like a particular FS, or we think it is messing up our utf-8 filenames, say it up-front, at clone and checkout time. For example, if the checkout has files with interesting utf-8 names, it'd be reasonable to check for filename mangling. Some things are hard or impossible to prevent - the utf-8 encoding maladies of OSX for example. But it may be detectable on checkout. In short, play on the defensive, for the benefit of users who are not kernel developers. It will piss off kernel & git developers and slow some operations somewhat. It will piss off oldtimers like me. But I'll say git config --global core.trainingwheels no and life will be good. It may be - as Jeff King points out - a matter of a polished git porcelain. We've seen lots of porcelains, but no smooth user-targetted porcelain yet. cheers, m -- martin.langhoff@gmail.com martin@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2009-06-20 12:14 UTC | newest] Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-05-12 15:06 Cross-Platform Version Control Esko Luontola 2009-05-12 15:14 ` Shawn O. Pearce 2009-05-12 16:13 ` Johannes Schindelin 2009-05-12 17:56 ` Esko Luontola 2009-05-12 20:38 ` Johannes Schindelin 2009-05-12 21:16 ` Esko Luontola 2009-05-13 0:23 ` Johannes Schindelin 2009-05-13 5:34 ` Esko Luontola 2009-05-13 6:49 ` Alex Riesen 2009-05-13 10:15 ` Johannes Schindelin [not found] ` <43d8ce650905130340q596043d5g45b342b62fe20e8d@mail.gmail.com> 2009-05-13 10:41 ` John Tapsell 2009-05-13 13:42 ` Jay Soffian 2009-05-13 13:44 ` Alex Riesen 2009-05-13 13:50 ` Jay Soffian 2009-05-13 13:57 ` John Tapsell 2009-05-13 15:27 ` Nicolas Pitre 2009-05-13 16:22 ` Johannes Schindelin 2009-05-13 17:24 ` Andreas Ericsson 2009-05-14 1:49 ` Miles Bader 2009-05-12 16:16 ` Jeff King 2009-05-12 16:57 ` Johannes Schindelin 2009-05-13 16:26 ` Linus Torvalds 2009-05-13 17:12 ` Linus Torvalds 2009-05-13 17:31 ` Andreas Ericsson 2009-05-13 17:46 ` Linus Torvalds 2009-05-13 18:26 ` Martin Langhoff 2009-05-13 18:37 ` Linus Torvalds 2009-05-13 21:04 ` Theodore Tso 2009-05-13 21:20 ` Linus Torvalds 2009-05-13 21:08 ` Daniel Barkalow 2009-05-13 21:29 ` Linus Torvalds 2009-05-13 20:57 ` Matthias Andree 2009-05-13 21:10 ` Linus Torvalds 2009-05-13 21:30 ` Jay Soffian 2009-05-13 21:47 ` Matthias Andree 2009-05-12 18:28 ` Dmitry Potapov 2009-05-12 18:40 ` Martin Langhoff 2009-05-12 18:55 ` Jakub Narebski 2009-05-12 21:43 ` [PATCH] Extend sample pre-commit hook to check for non ascii file/usernames Heiko Voigt 2009-05-12 21:55 ` Jakub Narebski 2009-05-14 17:59 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Heiko Voigt 2009-05-15 10:52 ` Martin Langhoff 2009-05-18 9:37 ` Heiko Voigt 2009-05-18 22:26 ` Jakub Narebski 2009-06-20 12:14 ` [RFC PATCH] check for filenames that only differ in case to sample pre-commit hook Heiko Voigt 2009-05-15 14:57 ` [PATCH v2] Extend sample pre-commit hook to check for non ascii filenames Jakub Narebski 2009-05-18 9:50 ` [PATCH] " Heiko Voigt 2009-05-18 10:40 ` Johannes Sixt 2009-05-18 11:50 ` Heiko Voigt 2009-05-18 12:04 ` Johannes Sixt 2009-05-19 20:01 ` [PATCH v4] " Heiko Voigt 2009-05-18 14:42 ` [PATCH] " Junio C Hamano 2009-05-18 20:35 ` Julian Phillips 2009-05-15 18:11 ` [PATCH v2] " Junio C Hamano 2009-05-14 13:48 ` Cross-Platform Version Control Peter Krefting 2009-05-14 19:58 ` Esko Luontola 2009-05-14 20:21 ` Andreas Ericsson 2009-05-14 22:25 ` Johannes Schindelin 2009-05-15 11:18 ` Dmitry Potapov -- strict thread matches above, loose matches on Subject: below -- 2009-04-27 8:55 Eric Sink's blog - notes on git, dscms and a "whole product" approach Martin Langhoff 2009-04-28 11:24 ` Cross-Platform Version Control (was: Eric Sink's blog - notes on git, dscms and a "whole product" approach) Jakub Narebski 2009-04-29 6:55 ` Martin Langhoff 2009-04-29 7:52 ` Cross-Platform Version Control Jakub Narebski 2009-04-29 8:25 ` Martin Langhoff
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.