All of lore.kernel.org
 help / color / mirror / Atom feed
* git repository modified after migration
@ 2015-12-25 20:49 Yang Yu
  2016-01-11 15:04 ` Michael J Gruber
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Yu @ 2015-12-25 20:49 UTC (permalink / raw)
  To: git

I migrated a 11G git repository converted from svn on a host with
Debian 8.2, reiserfs, git 2.1.4 to a host with Ubuntu 12.04.5 LTS,
xfs, git 2.6.4. After the migration, `git status` showing a good
amount of files modified.

I did the transfer with
1) `rsync -azP`, after noticing the modified files I ran `rsync -avH
--delete` but it did not correct the problem
2) tar zcf, then on the destination download the tar.gz (served by
nginx) with wget

Both had the same result. But the original repository was still clean.

I did some comparison between "modified" and original files
a) same hash (md5sum, shasum)
b) same permission (-rw-r--r-- 1 )
c) same encoding and line termination (UTF-8 Unicode (with BOM) text,
with CRLF line terminators)
d) no git attributes for either


On the destination host, I ran `git checkout` on each of those
modified files. After one pass I got less modified files. Repeating
`git checkout` on remaining files for a few more times, finally I got
a clean repository on the destination host.

What could have caused git to consider those files as modified? And
why multiple `git checkout` again the same file was necessary?

Thanks.


Yang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repository modified after migration
  2015-12-25 20:49 git repository modified after migration Yang Yu
@ 2016-01-11 15:04 ` Michael J Gruber
  2016-01-11 17:48   ` Yang Yu
  2016-01-11 18:19   ` Junio C Hamano
  0 siblings, 2 replies; 6+ messages in thread
From: Michael J Gruber @ 2016-01-11 15:04 UTC (permalink / raw)
  To: Yang Yu, git

Yang Yu venit, vidit, dixit 25.12.2015 21:49:
> I migrated a 11G git repository converted from svn on a host with
> Debian 8.2, reiserfs, git 2.1.4 to a host with Ubuntu 12.04.5 LTS,
> xfs, git 2.6.4. After the migration, `git status` showing a good
> amount of files modified.
> 
> I did the transfer with
> 1) `rsync -azP`, after noticing the modified files I ran `rsync -avH
> --delete` but it did not correct the problem
> 2) tar zcf, then on the destination download the tar.gz (served by
> nginx) with wget
> 
> Both had the same result. But the original repository was still clean.
> 
> I did some comparison between "modified" and original files
> a) same hash (md5sum, shasum)
> b) same permission (-rw-r--r-- 1 )
> c) same encoding and line termination (UTF-8 Unicode (with BOM) text,
> with CRLF line terminators)
> d) no git attributes for either
> 
> 
> On the destination host, I ran `git checkout` on each of those
> modified files. After one pass I got less modified files. Repeating
> `git checkout` on remaining files for a few more times, finally I got
> a clean repository on the destination host.
> 
> What could have caused git to consider those files as modified? And
> why multiple `git checkout` again the same file was necessary?
> 
> Thanks.
> 
> 
> Yang

This happens whenever the "stat" information changes, e.g. due to
changed device numbering and such. "git reset --hard" would have been
the quickiest way to reset the stat cache/index - after git diff, of
course ;)

Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repository modified after migration
  2016-01-11 15:04 ` Michael J Gruber
@ 2016-01-11 17:48   ` Yang Yu
  2016-01-11 18:19   ` Junio C Hamano
  1 sibling, 0 replies; 6+ messages in thread
From: Yang Yu @ 2016-01-11 17:48 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git

On Mon, Jan 11, 2016 at 9:04 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> This happens whenever the "stat" information changes, e.g. due to
> changed device numbering and such. "git reset --hard" would have been
> the quickiest way to reset the stat cache/index - after git diff, of
> course ;)
>


Can you go into a bit more details about this? Why does it only affect
certain files (always these filed modified) on every host I tried?
Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repository modified after migration
  2016-01-11 15:04 ` Michael J Gruber
  2016-01-11 17:48   ` Yang Yu
@ 2016-01-11 18:19   ` Junio C Hamano
  2016-01-12  7:01     ` Michael J Gruber
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-01-11 18:19 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Yang Yu, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

> This happens whenever the "stat" information changes, e.g. due to
> changed device numbering and such. "git reset --hard" would have been
> the quickiest way to reset the stat cache/index - after git diff, of
> course ;)

That does not quite explain why 'git status' reported modified files
in the first place.  It would have refreshed the cached stat info in
the index as the first thing to do.  "git status" I think is the
recommended way these days ("update-index --refresh" for us old
timers) to nondestructively correct the cached stat information
discrepancy caused by "cp -R".

If you need to resort to "reset --hard", then there is something
else going on.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repository modified after migration
  2016-01-11 18:19   ` Junio C Hamano
@ 2016-01-12  7:01     ` Michael J Gruber
  2016-01-12 18:07       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Michael J Gruber @ 2016-01-12  7:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Yang Yu, git

Junio C Hamano venit, vidit, dixit 11.01.2016 19:19:
> Michael J Gruber <git@drmicha.warpmail.net> writes:
> 
>> This happens whenever the "stat" information changes, e.g. due to
>> changed device numbering and such. "git reset --hard" would have been
>> the quickiest way to reset the stat cache/index - after git diff, of
>> course ;)
> 
> That does not quite explain why 'git status' reported modified files
> in the first place.  It would have refreshed the cached stat info in
> the index as the first thing to do.  "git status" I think is the
> recommended way these days ("update-index --refresh" for us old
> timers) to nondestructively correct the cached stat information
> discrepancy caused by "cp -R".
> 
> If you need to resort to "reset --hard", then there is something
> else going on.

Back than when I had the same problem with git repos on removable file
systems (if I remember correctly) git status did not correct that
information. It may be different now.

Michael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git repository modified after migration
  2016-01-12  7:01     ` Michael J Gruber
@ 2016-01-12 18:07       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2016-01-12 18:07 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Yang Yu, git

Michael J Gruber <git@drmicha.warpmail.net> writes:

> Junio C Hamano venit, vidit, dixit 11.01.2016 19:19:
>> Michael J Gruber <git@drmicha.warpmail.net> writes:
>> 
>>> This happens whenever the "stat" information changes, e.g. due to
>>> changed device numbering and such. "git reset --hard" would have been
>>> the quickiest way to reset the stat cache/index - after git diff, of
>>> course ;)
>> 
>> That does not quite explain why 'git status' reported modified files
>> in the first place.  It would have refreshed the cached stat info in
>> the index as the first thing to do.  "git status" I think is the
>> recommended way these days ("update-index --refresh" for us old
>> timers) to nondestructively correct the cached stat information
>> discrepancy caused by "cp -R".
>> 
>> If you need to resort to "reset --hard", then there is something
>> else going on.
>
> Back than when I had the same problem with git repos on removable file
> systems (if I remember correctly) git status did not correct that
> information. It may be different now.

I do not recall we did anything specific to help removable devices,
so if the report is coming from a filesystem on a removable device
we may be seeing the same symptom.

I somehow doubt that is the case, though.

A not-entirely-inplausible theory is that the index in the original
repository somehow marks modified entries as clean, fooling "status"
that is run in the original repository into reporting that nothing
changed.  Because "cp -R" into a different location forces the
content-level checking, "status" run in the copy notices that they
are indeed different and tells the true story.  If that is the case,
it would be interesting to see how the index got into that state.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-12 18:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-25 20:49 git repository modified after migration Yang Yu
2016-01-11 15:04 ` Michael J Gruber
2016-01-11 17:48   ` Yang Yu
2016-01-11 18:19   ` Junio C Hamano
2016-01-12  7:01     ` Michael J Gruber
2016-01-12 18:07       ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.