All of lore.kernel.org
 help / color / mirror / Atom feed
* A puzzle: reset --hard and hard links
@ 2022-01-19 20:37 Michael Herrmann
  2022-01-19 22:20 ` brian m. carlson
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Herrmann @ 2022-01-19 20:37 UTC (permalink / raw)
  To: git

Hi all,

It seems `git reset --hard` recreates files that have external hard
links. Is this intended?

The example below highlights the behavior. I have an unchanged Git
repository. When I create an (outside) hard link to a file in the
repository, then `git reset --hard` re-creates the file with a new
modification time and inode. This occurs on Debian 11 with Git 2.30.2
but not on Debian 10 with Git 2.20.1

> git init
Initialized empty Git repository in .../git-test/.git/
> echo "test" > file.txt
> git add file.txt
> git commit -m "Import"
[main (root-commit) f73709f] Import
 1 file changed, 1 insertion(+)
 create mode 100644 file.txt
> stat -c '%y' file.txt
2022-01-19 18:43:52.147781748 +0100
> ls -i file.txt
74458979 file.txt
> ln file.txt ../file.txt
> git reset --hard
HEAD is now at f73709f Import
> stat -c '%y' file.txt
2022-01-19 18:44:47.013167127 +0100
> ls -i file.txt
74458995 file.txt

Can this behavior be avoided?

Best,
Michael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-19 20:37 A puzzle: reset --hard and hard links Michael Herrmann
@ 2022-01-19 22:20 ` brian m. carlson
  2022-01-19 22:37   ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: brian m. carlson @ 2022-01-19 22:20 UTC (permalink / raw)
  To: Michael Herrmann; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

On 2022-01-19 at 20:37:48, Michael Herrmann wrote:
> Hi all,
> 
> It seems `git reset --hard` recreates files that have external hard
> links. Is this intended?
> 
> The example below highlights the behavior. I have an unchanged Git
> repository. When I create an (outside) hard link to a file in the
> repository, then `git reset --hard` re-creates the file with a new
> modification time and inode. This occurs on Debian 11 with Git 2.30.2
> but not on Debian 10 with Git 2.20.1
> 
> > git init
> Initialized empty Git repository in .../git-test/.git/
> > echo "test" > file.txt
> > git add file.txt
> > git commit -m "Import"
> [main (root-commit) f73709f] Import
>  1 file changed, 1 insertion(+)
>  create mode 100644 file.txt
> > stat -c '%y' file.txt
> 2022-01-19 18:43:52.147781748 +0100
> > ls -i file.txt
> 74458979 file.txt
> > ln file.txt ../file.txt
> > git reset --hard
> HEAD is now at f73709f Import
> > stat -c '%y' file.txt
> 2022-01-19 18:44:47.013167127 +0100
> > ls -i file.txt
> 74458995 file.txt
> 
> Can this behavior be avoided?

Git generally doesn't guarantee that it will preserve hard links in any
particular situation.  It can and does replace files rather than writing
over the existing ones, so this behavior is expected in at least some
circumstances.

Whether it happens in this particular case probably depends on what data
is in the index and whether it's considered stale, since if the file is
out of date, I believe a git reset --hard will replace it rather than
try to determine whether it's up to date.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-19 22:20 ` brian m. carlson
@ 2022-01-19 22:37   ` Junio C Hamano
  2022-01-20  8:59     ` Michael Herrmann
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-01-19 22:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Michael Herrmann, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Whether it happens in this particular case probably depends on what data
> is in the index and whether it's considered stale, since if the file is
> out of date, I believe a git reset --hard will replace it rather than
> try to determine whether it's up to date.

True, but I think it answers a different question, which was "to
replace or to overwrite".  I do not recall any codepath in git
proper (I do not know about third-party tools and scripts around the
fringe, though) that overwrites working tree files instead of
writing into new files and replacing them.

If there is a hardlink to outside the working tree of a tracked
path, and "git reset --hard [<committish>]" needs to modify the
contents of that tracked path because it does not match what is
recorded in the committish, I think it is the right thing to severe
the link and leave the path outside the working tree intact. If we
instead overwrote, we will damage "the other file" that shares the
hardlink, which is outside the working tree hence outside our
control.

Also, if two paths inside a working tree are made into hardlinks to
each other, whenever Git needs to update either of them, we would
severe the link, i.e. not ovewrite but replace, and it is the right
thing to do, since Git trakcs these two paths as two separate things
(i.e. the index has one entry for each of these paths).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-19 22:37   ` Junio C Hamano
@ 2022-01-20  8:59     ` Michael Herrmann
  2022-01-20 22:20       ` brian m. carlson
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Herrmann @ 2022-01-20  8:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git

Thank you for your replies. Is there a way to tell Git not to sever
hard links as highlighted by my example?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-20  8:59     ` Michael Herrmann
@ 2022-01-20 22:20       ` brian m. carlson
  2022-01-21 12:50         ` Michael Herrmann
  0 siblings, 1 reply; 18+ messages in thread
From: brian m. carlson @ 2022-01-20 22:20 UTC (permalink / raw)
  To: Michael Herrmann; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On 2022-01-20 at 08:59:02, Michael Herrmann wrote:
> Thank you for your replies. Is there a way to tell Git not to sever
> hard links as highlighted by my example?

No, there isn't.  If you need to deal with files that should be linked
and stored in a Git repository, your best bet is symlinks.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-20 22:20       ` brian m. carlson
@ 2022-01-21 12:50         ` Michael Herrmann
  2022-01-24 13:48           ` Michael Herrmann
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Herrmann @ 2022-01-21 12:50 UTC (permalink / raw)
  To: brian m. carlson, Michael Herrmann, Junio C Hamano, git

On Thu, 20 Jan 2022 at 19:21, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> No, there isn't.  If you need to deal with files that should be linked
> and stored in a Git repository, your best bet is symlinks.

Okay, thank you. I unfortunately do not have a choice over which kind
of link (soft or hard) is used. I will find a workaround.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-21 12:50         ` Michael Herrmann
@ 2022-01-24 13:48           ` Michael Herrmann
  2022-01-24 18:07             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Herrmann @ 2022-01-24 13:48 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git

I now believe this is a bug in Git because calling `git status` fixes
the problem.

1) Create a hard link to a file in a Git repo.
2) Call `git status`.
3) Call `git reset --hard`.

Without 2), Git severs the hard link. With 2), the hard link is preserved.

Reproduction steps ("Test script") and further info:
https://github.com/brave/brave-browser/issues/20316#issuecomment-1020054082

This occurs for me on Debian 10 & 11, as well as Git 2.30.2 and 2.20.1.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-24 13:48           ` Michael Herrmann
@ 2022-01-24 18:07             ` Junio C Hamano
  2022-01-24 18:16               ` Michael Herrmann
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-01-24 18:07 UTC (permalink / raw)
  To: Michael Herrmann; +Cc: brian m. carlson, git

Michael Herrmann <michael@herrmann.io> writes:

> I now believe this is a bug in Git because calling `git status` fixes
> the problem.
>
> 1) Create a hard link to a file in a Git repo.
> 2) Call `git status`.
> 3) Call `git reset --hard`.

It is merely because you helped Git to realize that there is no need
to change the contents of hte file with "reset --hard".

With another step 1.5 "append a line to the file in question", git
should severe the link, I would think, as at that point, to revert
the contents of the file in question to its pristine state, it needs
to modify it.







^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-24 18:07             ` Junio C Hamano
@ 2022-01-24 18:16               ` Michael Herrmann
  2022-01-24 21:19                 ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Herrmann @ 2022-01-24 18:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git

> It is merely because you helped Git to realize

I don't want to "help Git realize". I'm sorry but in my opinion `git
status` should not have any effects on other commands. I don't
understand how you can argue that calling `git status` is a valid fix
to "help Git".

> With another step 1.5 "append a line to the file in question", git
> should severe the link,

I don't want to sever the hard link. I want to avoid that it gets severed.

On Mon, 24 Jan 2022 at 15:07, Junio C Hamano <gitster@pobox.com> wrote:
>
> Michael Herrmann <michael@herrmann.io> writes:
>
> > I now believe this is a bug in Git because calling `git status` fixes
> > the problem.
> >
> > 1) Create a hard link to a file in a Git repo.
> > 2) Call `git status`.
> > 3) Call `git reset --hard`.
>
> It is merely because you helped Git to realize that there is no need
> to change the contents of hte file with "reset --hard".
>
> With another step 1.5 "append a line to the file in question", git
> should severe the link, I would think, as at that point, to revert
> the contents of the file in question to its pristine state, it needs
> to modify it.
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-24 18:16               ` Michael Herrmann
@ 2022-01-24 21:19                 ` Junio C Hamano
  2022-01-24 21:50                   ` Michael Herrmann
  2022-01-24 22:18                   ` rsbecker
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-01-24 21:19 UTC (permalink / raw)
  To: Michael Herrmann; +Cc: brian m. carlson, git

Michael Herrmann <michael@herrmann.io> writes:

>> It is merely because you helped Git to realize
>
> I don't want to "help Git realize". I'm sorry but in my opinion `git
> status` should not have any effects on other commands. I don't
> understand how you can argue that calling `git status` is a valid fix
> to "help Git".
>
>> With another step 1.5 "append a line to the file in question", git
>> should severe the link,
>
> I don't want to sever the hard link. I want to avoid that it gets severed.

Sorry, if that is the case, what you want is not a version control,
and it is certainly not Git.  You want something else.

Think about this scenario.

    $ rm -fr one && git init one && cd one
    $ echo 0 >a; echo 0 >b; git add a b; git commit -m zero

We have two files, a and b, each of which has "0" in it.

    $ echo 1 >b; git add b; git commit -m one

Now they have "0" and "1" respectively.

    $ echo 2 >a; echo 2 >b; git commit -a -m two
    $ ln -f a b
    $ git diff
    $ git diff HEAD

Now they have "2".  Since they have identical contents, "diff" would
report no difference relative to the index or HEAD, even after we
manually break the working tree by making one of them a hardlink to
the other.

Now, what should this command do?

    $ git reset --hard HEAD^

What the user is asking is (1) to move the branch to point at the
previous commit, which had 0 and 1 in a and b respectively, and (2)
to make sure that the index and the working tree contents match what
is recorded in the commit.

So for Git to be a usefully correct version control system, it is
essential to make sure what it writes out would not affect any path
other than the one it is writing out.  When it writes "0" to "a", it
MUST break the hardlink from elsewhere that points at "a" before it
does so.  Otherwise, the "0" it writes into "a" will also be seen
elsewhere, which is not what the updated HEAD (i.e. commit "one")
wants to see.  The same for "b" when it is updated from "2" to "1"
when this happens.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-24 21:19                 ` Junio C Hamano
@ 2022-01-24 21:50                   ` Michael Herrmann
  2022-01-25  8:49                     ` Andreas Schwab
  2022-01-25 11:33                     ` Ævar Arnfjörð Bjarmason
  2022-01-24 22:18                   ` rsbecker
  1 sibling, 2 replies; 18+ messages in thread
From: Michael Herrmann @ 2022-01-24 21:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, git

Thank you for your explanations Junio. This is the first part where I differ:

> $ ln -f a b

My hard link is outside the repo. In your example, it makes sense that
Git has to sever the hard link to be able to give the files different
contents. In my case and example, this complication is not present.
And it does not address the main point:

My working tree is clean. `git reset --hard HEAD` (not HEAD^ like you
had) should not do anything.

Finally, your (kind!) explanation does not give a reason why calling
`git status` should change the behavior that Git unnecessarily severs
the hard link.

My suspicion is that Git keeps a cache of the stat(...) result of
files. An additional hard link increases the .st_nlink count of this
struct. `git reset` compares the cached stat(...) values to the actual
ones and sees that one has changed. `git status` does the same but is
smart enough to realize that the additional hard link does not change
anything. It writes this to the cache. `git reset` should also be
smart!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: A puzzle: reset --hard and hard links
  2022-01-24 21:19                 ` Junio C Hamano
  2022-01-24 21:50                   ` Michael Herrmann
@ 2022-01-24 22:18                   ` rsbecker
  1 sibling, 0 replies; 18+ messages in thread
From: rsbecker @ 2022-01-24 22:18 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Michael Herrmann'
  Cc: 'brian m. carlson', git

On January 24, 2022 4:19 PM, Junio wrote:
> Michael Herrmann <michael@herrmann.io> writes:
> 
> >> It is merely because you helped Git to realize
> >
> > I don't want to "help Git realize". I'm sorry but in my opinion `git
> > status` should not have any effects on other commands. I don't
> > understand how you can argue that calling `git status` is a valid fix
> > to "help Git".
> >
> >> With another step 1.5 "append a line to the file in question", git
> >> should severe the link,
> >
> > I don't want to sever the hard link. I want to avoid that it gets
severed.
> 
> Sorry, if that is the case, what you want is not a version control, and it
is certainly
> not Git.  You want something else.
> 
> Think about this scenario.
> 
>     $ rm -fr one && git init one && cd one
>     $ echo 0 >a; echo 0 >b; git add a b; git commit -m zero
> 
> We have two files, a and b, each of which has "0" in it.
> 
>     $ echo 1 >b; git add b; git commit -m one
> 
> Now they have "0" and "1" respectively.
> 
>     $ echo 2 >a; echo 2 >b; git commit -a -m two
>     $ ln -f a b
>     $ git diff
>     $ git diff HEAD
> 
> Now they have "2".  Since they have identical contents, "diff" would
report no
> difference relative to the index or HEAD, even after we manually break the
> working tree by making one of them a hardlink to the other.
> 
> Now, what should this command do?
> 
>     $ git reset --hard HEAD^
> 
> What the user is asking is (1) to move the branch to point at the previous
commit,
> which had 0 and 1 in a and b respectively, and (2) to make sure that the
index and
> the working tree contents match what is recorded in the commit.
> 
> So for Git to be a usefully correct version control system, it is
essential to make
> sure what it writes out would not affect any path other than the one it is
writing
> out.  When it writes "0" to "a", it MUST break the hardlink from elsewhere
that
> points at "a" before it does so.  Otherwise, the "0" it writes into "a"
will also be
> seen elsewhere, which is not what the updated HEAD (i.e. commit "one")
wants to
> see.  The same for "b" when it is updated from "2" to "1"
> when this happens.

I think there are more use cases here than are apparent but also some
serious question about why one would do this.

In a Linux/POSIX environment, one can do a hard link to a file inside a git
repo, change the file using something like vim, and have git recognize that
there is a change in git status. However, this only works on some platforms.
Hard links do not have a 100% consistent semantic from one OS to another,
one file system to another, or even between editors and scripts. As Junio
pointed out, using a > operator on a hard link is likely going to replace
the file instead of modifying the existing one. >> might correctly append
and have git recognize it in a git status... on some platforms.

Making git consistent in this situation across every possible situation is
not only impractical, it is likely impossible and cutting down what git is
allowed to do so that only those common things are implemented might gut git
badly.

If you are looking for doing external edits while keeping git notified, I
would suggest wrapping the file modification in a script that is aware of
hard links so that you get the results you want.

Alternatively, a soft link made externally to a physical git file location
might do what you want - assuming your platform supports soft links. Your
modification script/program/etc. would use the file directly in git instead
of the hard link inode, so git is happy. A git status would see the change
because the file really only exists in git. Other git operations, like
restore, switch, etc., would cause the physical file to be modified
correctly, and anything using the referencing soft link would see the
change. Note: I am not suggest soft-linking from inside git to outside.

--Randall


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-24 21:50                   ` Michael Herrmann
@ 2022-01-25  8:49                     ` Andreas Schwab
  2022-01-25 11:33                     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 18+ messages in thread
From: Andreas Schwab @ 2022-01-25  8:49 UTC (permalink / raw)
  To: Michael Herrmann; +Cc: Junio C Hamano, brian m. carlson, git

On Jan 24 2022, Michael Herrmann wrote:

> My suspicion is that Git keeps a cache of the stat(...) result of
> files. An additional hard link increases the .st_nlink count of this
> struct. `git reset` compares the cached stat(...) values to the actual
> ones and sees that one has changed. `git status` does the same but is
> smart enough to realize that the additional hard link does not change
> anything. It writes this to the cache. `git reset` should also be
> smart!

See the core.trustctime config.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-24 21:50                   ` Michael Herrmann
  2022-01-25  8:49                     ` Andreas Schwab
@ 2022-01-25 11:33                     ` Ævar Arnfjörð Bjarmason
  2022-01-25 13:29                       ` Andreas Schwab
  1 sibling, 1 reply; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-01-25 11:33 UTC (permalink / raw)
  To: Michael Herrmann; +Cc: Junio C Hamano, brian m. carlson, git


On Mon, Jan 24 2022, Michael Herrmann wrote:

> Thank you for your explanations Junio. This is the first part where I differ:
>
>> $ ln -f a b
>
> My hard link is outside the repo. In your example, it makes sense that
> Git has to sever the hard link to be able to give the files different
> contents. In my case and example, this complication is not present.
> And it does not address the main point:
>
> My working tree is clean. `git reset --hard HEAD` (not HEAD^ like you
> had) should not do anything.
>
> Finally, your (kind!) explanation does not give a reason why calling
> `git status` should change the behavior that Git unnecessarily severs
> the hard link.
>
> My suspicion is that Git keeps a cache of the stat(...) result of
> files. An additional hard link increases the .st_nlink count of this
> struct. `git reset` compares the cached stat(...) values to the actual
> ones and sees that one has changed. `git status` does the same but is
> smart enough to realize that the additional hard link does not change
> anything. It writes this to the cache. `git reset` should also be
> smart!

What you're observing is that we tweak the index when various commands
are run, some of that is documented, and others we consider purely
implementation details. Whether we sever a hard link relationship is
definitely on the "implementation detail" side of that.

I.e. that you can observe a behavior difference here doesn't mean that
it's a bug, it means that you're poking at behavior that was never
supposed to work this way, or be stable.

That being said I don't see a reason for why we shouldn't ever support
what you're requesting here in some way. E.g. when we spin up different
a different 'git worktree' on the same storage we could optionally
hardlink to an existing checkout to save space.

This would be useful e.g. for spinning up a bunch of trees to run
compilations on, where much of the checkout tree will be duplicated.

And this probably won't match your use-case, but I wonder how far you
could get with the post-checkout hook, i.e. to have it run around after
a checkout and fix up things that aren't hard links to be hardlinked
appropriately.

I don't know of a tool to take two directories and hardlink things where
possible, but it wouldn't be hard to write. I thought rsync could, but
it appears just to support copying things as hardlink, not "fixing"
files with the same content to be hardlinks after the fact (but maybe
I've just missed a way to operate it).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-25 11:33                     ` Ævar Arnfjörð Bjarmason
@ 2022-01-25 13:29                       ` Andreas Schwab
  2022-01-25 14:30                         ` Michael Herrmann
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2022-01-25 13:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Michael Herrmann, Junio C Hamano, brian m. carlson, git

On Jan 25 2022, Ævar Arnfjörð Bjarmason wrote:

> I don't know of a tool to take two directories and hardlink things where
> possible, but it wouldn't be hard to write.

https://github.com/adrianlopezroche/fdupes

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-25 13:29                       ` Andreas Schwab
@ 2022-01-25 14:30                         ` Michael Herrmann
  2022-01-26  2:14                           ` brian m. carlson
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Herrmann @ 2022-01-25 14:30 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	brian m. carlson, git

Andreas Schwab wrote:

> See the core.trustctime config.

This sounds very promising. It also fixes the problem in my
preliminary tests. Are there known drawbacks to changing this setting
to false? I haven't yet noticed a performance impact.

Randall wrote:

> I think there are more use cases here than are apparent

The use case is Chromium's build process. It creates hard links from a
src/... to a gen/... directory. Some actions do `git reset --hard` in
src/. This updates the modification time because of the hard links -
even when there are no changes. That in turn leads to unnecessary
rebuilds. I have little control over the creation of the hard links.

Ævar, everything you wrote is very interesting and helpful. Thank you!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-25 14:30                         ` Michael Herrmann
@ 2022-01-26  2:14                           ` brian m. carlson
  2022-01-26 18:46                             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: brian m. carlson @ 2022-01-26  2:14 UTC (permalink / raw)
  To: Michael Herrmann
  Cc: Andreas Schwab, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On 2022-01-25 at 14:30:11, Michael Herrmann wrote:
> Andreas Schwab wrote:
> 
> > See the core.trustctime config.
> 
> This sounds very promising. It also fixes the problem in my
> preliminary tests. Are there known drawbacks to changing this setting
> to false? I haven't yet noticed a performance impact.

The index holds a bunch of information on files and checks that
information to see whether the file has changed.  If the stat
information changes, then the file is marked dirty and may be re-read.
The option above and core.checkStat control which information is
included.

When you do "git reset --hard" and there's no change to the file in the
index, Git happens not to rewrite it in the working tree.  This is an
implementation detail and isn't guaranteed, but that's why things happen
to be working here.

The downside to restricting what's in the index is that Git can miss
changes.  For example, with core.trustctime turned off, a program that
modifies a file without changing its size but resets it to have the same
mtime would probably result in Git missing those changes.  It shouldn't
result in a performance difference, but it could theoretically result in
a correctness difference.  You may decide the tradeoff is worth it,
however.

> Randall wrote:
> 
> > I think there are more use cases here than are apparent
> 
> The use case is Chromium's build process. It creates hard links from a
> src/... to a gen/... directory. Some actions do `git reset --hard` in
> src/. This updates the modification time because of the hard links -
> even when there are no changes. That in turn leads to unnecessary
> rebuilds. I have little control over the creation of the hard links.

Thanks, this is helpful context and it explains why you'd want this
behavior.  If you're involved with the project, it may be helpful to
point out to other project members that this occurs and suggest that the
scripts avoid running "git reset --hard".  For example, it may be easy
to avoid if "git status --porcelain" produces empty output.  I've heard
stories about Chromium's build times and I'm sure such an optimization
would be welcome.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: A puzzle: reset --hard and hard links
  2022-01-26  2:14                           ` brian m. carlson
@ 2022-01-26 18:46                             ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-01-26 18:46 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Michael Herrmann, Andreas Schwab,
	Ævar Arnfjörð Bjarmason, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Thanks, this is helpful context and it explains why you'd want this
> behavior.  If you're involved with the project, it may be helpful to
> point out to other project members that this occurs and suggest that the
> scripts avoid running "git reset --hard".  For example, it may be easy
> to avoid if "git status --porcelain" produces empty output.  I've heard
> stories about Chromium's build times and I'm sure such an optimization
> would be welcome.

I am not sure about that.  If the ONLY problem is that hardlinks to
UNMODIFIED paths are severed by "reset --hard" when it is not necessary
in order to ensure that HEAD and the working tree matches in content
without clobbering anything unrelated, then adding an internal call
to refresh before "git reset --hard" would neatly solve it, and
there should not be a need for end-user workaround like that.

But it does not change the fact that we try to avoid clobbering
anything unrelated to the path we are updating when we need to
update the contents of the working tree files, and the way we do so
is to call checkout_entry(), which does unlink() followed by
creat().  So even though you may be able to teach "git reset --hard"
to refrain from severing extra hardlinks when it does not have to,
it will do so when the contents of the path must be changed.

To be quite honest, I am not sure if the patch below is safe either.
I doubt that the lack of "update-index --refresh" in the "reset
--hard" command was a mistake; rather, I suspect that it was
deliberately omitted to avoid some problems, which I do not offhand
recall.

 builtin/reset.c | 1 +
 1 file changed, 1 insertion(+)

diff --git c/builtin/reset.c w/builtin/reset.c
index b97745ee94..8adc1be75b 100644
--- c/builtin/reset.c
+++ w/builtin/reset.c
@@ -83,6 +83,7 @@ static int reset_index(const char *ref, const struct object_id *oid, int reset_t
 	}
 
 	read_cache_unmerged();
+	refresh_cache(REFRESH_QUIET);
 
 	if (reset_type == KEEP) {
 		struct object_id head_oid;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-26 18:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 20:37 A puzzle: reset --hard and hard links Michael Herrmann
2022-01-19 22:20 ` brian m. carlson
2022-01-19 22:37   ` Junio C Hamano
2022-01-20  8:59     ` Michael Herrmann
2022-01-20 22:20       ` brian m. carlson
2022-01-21 12:50         ` Michael Herrmann
2022-01-24 13:48           ` Michael Herrmann
2022-01-24 18:07             ` Junio C Hamano
2022-01-24 18:16               ` Michael Herrmann
2022-01-24 21:19                 ` Junio C Hamano
2022-01-24 21:50                   ` Michael Herrmann
2022-01-25  8:49                     ` Andreas Schwab
2022-01-25 11:33                     ` Ævar Arnfjörð Bjarmason
2022-01-25 13:29                       ` Andreas Schwab
2022-01-25 14:30                         ` Michael Herrmann
2022-01-26  2:14                           ` brian m. carlson
2022-01-26 18:46                             ` Junio C Hamano
2022-01-24 22:18                   ` rsbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.