git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Pro Git book: concerning data lost due to ".gitignore"
@ 2021-06-04 23:12 grizlyk
  2021-06-05 20:39 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 4+ messages in thread
From: grizlyk @ 2021-06-04 23:12 UTC (permalink / raw)
  To: git

Good day.

1. Summary. 

It should be explicitly warned in the Pro Git book https://git-scm.com/book/en/v2 (and in git man also) that the ".gitignore" feature is very dangerous stuff and should be used with care. 

Due to ".gitignore" usage, some data files in directory placed under git version control, can be lost for indexing and can be not placed into repo _unexpectedly_ for user. 

User will call "git add" and git will answer about "nothing to stage". 

The problem grows up if the project was obtained from network with unknown ".gitignore" files or there are external tools (like Visual Studio IDE) that can in background setup and modify "./.gitignore" files by unknown for user way during project lifetime. 

In follow text we list the problems, and git book editors will decide how to write about (esp. in english). 

2.
In ideal case ".gitignore" should be disabled at all, user should provide clean directory to be controlled by any VCS, any all dummy files must be moved out by generic OS tools. 

But for some reasons ".gitignore" will exist. 

There are some visible solutions to relax the data lost danger due to ".gitignore": 

2.1
place files that no need to be staged into separate from repo subdirectories. 

In ideal case for directory tree like "a\b\c": 
- all subdirectories placed under git control must be staged ("git add src"); 
"\src\" + "a\b\c"

- all generated files must be outside of src source directory (possibly with the same subdir tree); 
"\var\" + "a\b\c"

In the case you will treat ".gitignore" feature as workaround to projects that can not follow reliable repo rules (reliable here means files will not be omitted from stage by mistake). 

2.2
provide trivial ".gitignore" file.

- rename all unknown to you foreign or autogenerated ".gitignore" into any kind of ".old-gitignore.old"; 
- manually create ".gitignore" file and place into it only the names, that you exactly know; 
- always call "git status --ignored". 

2.3
use "git status --ignored".

always use "--ignored" option of "git status" at least to complete daily commit but better for every commit. 

Check ignored files list by "git status --ignored". You will see all 4 parts of status list (instead of 3 parts of the list without "--ignored" option): 

example: 
$ git status --ignored
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)

Untracked files:
  (use "git add <file>..." to include in what will be committed)

Ignored files:
  (use "git add -f <file>..." to include in what will be committed)

If you follow rule "to separate output files", the "Ignored files" list will be very short and you will see here only known to you names. 

2.4
visual tools to work with git should have separated tab to display "Ignored files" list. 

not a user work

2.5
We realize that "Ignored files" was invented exactly in intention to be skipped from repo.
 
But in status list, if option "status --ignored" was not used, it could be useful to warn users by message like this: 
"There are ignored files (%u<number of files>) not placed into repo.\n\t(use git status --ignored to view)"

instead of full list, if option "status --ignored" was used
Ignored files:
  (use "git add -f <file>..." to include in what will be committed)

not a user work

3.
More explanations. 

In the book one can read: "2.2 Ignoring Files. Tips: The Linux kernel source repository has 206 .gitignore files". 

But in real the kernel source is not comprehensive list of ".gitignore" usage. 

Any tool does not exist by itself, it works "in terms of task the tool implements". That means there are "tasks" the tool does. 

Like any VCS, git has a valuable "task" named "every commit backup all files in the dir placed under version control". The task means "it is beter to include extra files into repo than to lose required files". If even only one file will be lost from stage git will fail the task. 

In the task git works in "cp" level and should be the same trivial and reliable as "cp", here we even should not try more tricks with git, just because "enumerate files in directory without errors" is very easy work that any software from CP/M times should implement. 

4. Example. 

There is practical example of the git usage with errors due to ".gitignore". 

I have a source "." directory. 

In the directory i have some files (in my case 297 files, i was forced by git to count files in the directory, i have never tried to count files before git usage) and some levels of sub directories (in my case 9 or more levels of sub directories, i am not sure). 

I called "git add ." but i was not able to add all files in the directory in git repo without any error messages by answer about "nothing to stage". 

VS shows me that only 80 sacred files of 297 total files was blessed into repo by git in conjunction with VS without any error messages. 

Though first time (when repo was created) VS tried to stage more files (all 297 files) into git repo, but git did not accept the gift by answer about "nothing to stage". Later VS does not offer more than 80 files to add. 

I spent several hours to read git large mans, tens of flat git options, but i was still not able to add the directory in git repo. 

Because "gnu in windows" has well known troubles with console and charsets (names can look like "f??df???.???"), the system was also under suspition and was also checked (as the system that is unable to enumerate files in directory). 

4.1
looking into ".gitignore" file, created by several versions of VS, i found that some sub directories of the repo was marked to ignore. 

I removed the names from ".gitignore" file and completed repo about 217 files.

At last the git repo was in work. Creepy. 

4.2
After several days when the git repo was (as i believed) recovered i called "git status --ignored" by random event. 

I got 
>git status --ignored

Ignored files:
  (use "git add -f <file>..." to include in what will be committed)
        .vs/
        10
        7
        .../state/#bak/v02/
        .../state/...reg_i.h

I found yet 2 extra names ignored by VS and by git due to ".gitignore" files. More creepy. 

As you see by the real example: ".gitignore" feature has real ability to produce repo with lost files. 

PS:
In Visual Studio related part it should be warned to do manual check ".gitattributes" file often to adjust (for example to comment out) record "* text=auto". 

###############################################################################
# Set default behavior to automatically normalize line endings.
###############################################################################
#* text=auto

PPS:
I also just have played in dice (no luck) to report the issue, not sure it can help to stage files. 

Best regards, 
Maksim.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pro Git book: concerning data lost due to ".gitignore"
  2021-06-04 23:12 Pro Git book: concerning data lost due to ".gitignore" grizlyk
@ 2021-06-05 20:39 ` Ævar Arnfjörð Bjarmason
  2021-07-10  4:52   ` grizlyk
  0 siblings, 1 reply; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-05 20:39 UTC (permalink / raw)
  To: grizlyk; +Cc: git


On Sat, Jun 05 2021, grizlyk wrote:

> Good day.

Hi.

> 1. Summary. 
>
> It should be explicitly warned in the Pro Git book https://git-scm.com/book/en/v2 (and in git man also) that the ".gitignore" feature is very dangerous stuff and should be used with care. 

This mailing list does not maintain that book. For any issues with it
see the issues/PR's at https://github.com/progit/progit2.

> Due to ".gitignore" usage, some data files in directory placed under
> git version control, can be lost for indexing and can be not placed
> into repo _unexpectedly_ for user.

I skimmed the rest of your mail, I think you might find the previous
discussion(s) about a "precious" attribute at and adding something like
a backup log when we shred files due to gitignore[2] interesting and
relevant to much of what you point out.

I don't think the notion of moving to some general workflow of compiled
files being staged elsewhere than the source is something that's viable
as a general constraint for a VCS like git.

It's way too common of a pattern to e.g. have a *.o file made from a
corresponding *.[ch] file(s) in the same directory.

For those for whom the solution you suggested works I believe git
already does a good job of supporting it. You'd e.g. compile all your
assets outside of the repo via your build system, and just not have
anything in .gitignore.

I don't see how we could expect to smartly deal with having some
parallel tool like VS that's (supposedly, I'm just taking your summary
at face value) actively working against the wishes of its
users. Something like .git/info/exclude is useful, but only assuming
that your tooling isn't actively trying to subvert you.

1. https://lore.kernel.org/git/20190216114938.18843-1-pclouds@gmail.com/
2. https://lore.kernel.org/git/20181209104419.12639-1-pclouds@gmail.com/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pro Git book: concerning data lost due to ".gitignore"
  2021-06-05 20:39 ` Ævar Arnfjörð Bjarmason
@ 2021-07-10  4:52   ` grizlyk
  2021-07-10  8:23     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 4+ messages in thread
From: grizlyk @ 2021-07-10  4:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

hi

> On Sat, Jun 05 2021
> It's way too common of a pattern to e.g. have a *.o file made from a
> corresponding *.[ch] file(s) in the same directory.

The patterns were common for old times (before VCSes was involved). To deal with temporary files (like .o), generic OS tools like "make remove_compiled" can help to clean directory before stage. To keep derivative persistent files (like the same .o) separated directory can be used.

> git already does a good job of supporting it. 

Sure, the light message: "There are ignored files (%u<number of files>) not placed into repo.\n\t(use git status --ignored to view)"; will improve the activity. Otherwise some files somethimes will not be placed into repo unexpectedly for user. 

> You'd e.g. compile all your 
> assets outside of the repo via your build system, and just not have
> anything in .gitignore.

Do you suggest to copy desired src files into separated repo directory (the repo directory placed under VCS control) by generic OS tools (i.e. by cp command) and stage the separated directory? 

If yes; for the first it is copies of all src files; for the second we could lost some src files due to the possible wrong copy patterns (the same reasons as by wrong .gitignore patterns). 
So, to explicitly create output files into separated directory and to implicitly include all files in src directory into repo is only reliable way to keep commits without data lost (the way will sometimes include extra output files into repo). 

to explicitly create output files into separated directory is responsibility of translators and makefiles. 

Best regards,
Maksim.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pro Git book: concerning data lost due to ".gitignore"
  2021-07-10  4:52   ` grizlyk
@ 2021-07-10  8:23     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 4+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10  8:23 UTC (permalink / raw)
  To: grizlyk; +Cc: git


On Sat, Jul 10 2021, grizlyk wrote:

> hi
>
>> On Sat, Jun 05 2021
>> It's way too common of a pattern to e.g. have a *.o file made from a
>> corresponding *.[ch] file(s) in the same directory.
>
> The patterns were common for old times (before VCSes was involved). To
> deal with temporary files (like .o), generic OS tools like "make
> remove_compiled" can help to clean directory before stage. To keep
> derivative persistent files (like the same .o) separated directory can
> be used.

It's still a very common pattern, e.g. the project whose ML you're
posting to uses it, anecdotally most free software C or C-like projects
I look at / work on use it.

In any case, git as a project can't say "you should fix your code". This
VCS has to deal with the real world, people do use this pattern in the
wild, and we can't willy-nilly eat their data.

It's not a good approach to advocate a change in git behavior to say
"people should do X, not Y, to avoid this problem", when a cursory look
at real-world use reveals that "X" is in wide use, and unless you did
"Y" a proposed change in behavior would be detrimental to your use of
git.

What is more productive is to either find out how we can support both
without harming the other, or make new behavior opt-in, hence the
thread(s) I linked to about "precious" etc.

>> git already does a good job of supporting it. 
>
> Sure, the light message: "There are ignored files (%u<number of
> files>) not placed into repo.\n\t(use git status --ignored to view)";
> will improve the activity. Otherwise some files somethimes will not be
> placed into repo unexpectedly for user.
>
>> You'd e.g. compile all your 
>> assets outside of the repo via your build system, and just not have
>> anything in .gitignore.
>
> Do you suggest to copy desired src files into separated repo directory
> (the repo directory placed under VCS control) by generic OS tools
> (i.e. by cp command) and stage the separated directory?

I'm not really being serious here, as should be clear from the linked
threads I think the current behavior has sucky edge cases and does eat
people's data in some cases, that's bad, the problem is finding a way to
change it that doesn't cause badness for other use-cases.

I am saying that if your proposed "Y" solution is effectively "other
people should mostly/entirely rewrite their build systems to deal with a
new default I'm proposing", then in this case you'd also approximately
what you wou want if we keep the current behavior and you rewrite your
build system(s), no?

Anyway, maybe I misunderstood some of what you're saying...

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-10  8:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04 23:12 Pro Git book: concerning data lost due to ".gitignore" grizlyk
2021-06-05 20:39 ` Ævar Arnfjörð Bjarmason
2021-07-10  4:52   ` grizlyk
2021-07-10  8:23     ` Ævar Arnfjörð Bjarmason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).