git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Multiple pushurls, different 'objects\info\packs'
@ 2020-04-21 18:07 Mikhail Strelnikov
  2020-04-22  3:14 ` brian m. carlson
  0 siblings, 1 reply; 3+ messages in thread
From: Mikhail Strelnikov @ 2020-04-21 18:07 UTC (permalink / raw)
  To: git

Hi,

I have a repo with two pushurls configured like this:


C:\folder\1>git init --bare
Initialized empty Git repository in C:/folder/1/

C:\folder\2>git init --bare
Initialized empty Git repository in C:/folder/2/

C:\folder\w>git init
Initialized empty Git repository in C:/folder/w/.git/

C:\folder\w>git add work.txt

C:\folder\w>git commit -m "Initial commit"
[master (root-commit) 1b314f3] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 work.txt

C:\folder\w>git remote add origin C:\folder\1

C:\folder\w>git remote set-url origin --push --add C:\folder\1

C:\folder\w>git remote set-url origin --push --add C:\folder\2

C:\folder\w>git push --set-upstream origin master


I would expect those two folders (C:\folder\1 and C:\folder\2) to
contain exactly the same bytes. And they did for quite some time. But
now there is a difference in 'objects\info\packs' (and some of
objects\pack\pack-*.idx/pack are also different).

(all the commits are the same in both and all my data is also the same
and 'fast-export --all' yields the same result)

I'd like to know what might have caused this nondeterminism and if
there is something to do to prevent that.

(my current version of git is 2.25.0.windows.1, but I guess this
divergence happened several versions back)

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Multiple pushurls, different 'objects\info\packs'
  2020-04-21 18:07 Multiple pushurls, different 'objects\info\packs' Mikhail Strelnikov
@ 2020-04-22  3:14 ` brian m. carlson
  2020-04-22 17:45   ` Mikhail Strelnikov
  0 siblings, 1 reply; 3+ messages in thread
From: brian m. carlson @ 2020-04-22  3:14 UTC (permalink / raw)
  To: Mikhail Strelnikov; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

On 2020-04-21 at 18:07:42, Mikhail Strelnikov wrote:
> Hi,
> 
> I have a repo with two pushurls configured like this:
> 
> 
> C:\folder\1>git init --bare
> Initialized empty Git repository in C:/folder/1/
> 
> C:\folder\2>git init --bare
> Initialized empty Git repository in C:/folder/2/
> 
> C:\folder\w>git init
> Initialized empty Git repository in C:/folder/w/.git/
> 
> C:\folder\w>git add work.txt
> 
> C:\folder\w>git commit -m "Initial commit"
> [master (root-commit) 1b314f3] Initial commit
> 1 file changed, 1 insertion(+)
> create mode 100644 work.txt
> 
> C:\folder\w>git remote add origin C:\folder\1
> 
> C:\folder\w>git remote set-url origin --push --add C:\folder\1
> 
> C:\folder\w>git remote set-url origin --push --add C:\folder\2
> 
> C:\folder\w>git push --set-upstream origin master
> 
> 
> I would expect those two folders (C:\folder\1 and C:\folder\2) to
> contain exactly the same bytes. And they did for quite some time. But
> now there is a difference in 'objects\info\packs' (and some of
> objects\pack\pack-*.idx/pack are also different).
> 
> (all the commits are the same in both and all my data is also the same
> and 'fast-export --all' yields the same result)
> 
> I'd like to know what might have caused this nondeterminism and if
> there is something to do to prevent that.

You can get nondeterminism because the push to each repository happens
independently and delta compression is multithreaded.  You can therefore
compute different packs on push and get different packs in the result.

You could try to avoid it by disabling threading for pushes, but that
has to be done on each client that pushes to them.  In general, this is
not worth worrying about as long as the data is intact (that is, it
passes git fsck) and the refs are identical.  It is also not especially
easy to avoid, since determinism of pack files is not considered a goal
of Git.

Maybe if you tell us a little more about your reason for wanting
bit-for-bit identical replicas we can provide some assistance in helping
you achieve your goals.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Multiple pushurls, different 'objects\info\packs'
  2020-04-22  3:14 ` brian m. carlson
@ 2020-04-22 17:45   ` Mikhail Strelnikov
  0 siblings, 0 replies; 3+ messages in thread
From: Mikhail Strelnikov @ 2020-04-22 17:45 UTC (permalink / raw)
  To: brian m. carlson, Mikhail Strelnikov, git

Thanks Brian,

> delta compression is multithreaded

That is great news that this is expected behavior and I'm not dealing
with some sort of obscure data corruption. :)
Now see there is `git config --global pack.threads 1` (I'm doing
everything locally, so this might work for me)

> reason for wanting bit-for-bit identical

My plan was to have 3 pushurls and shoot one of them if it is not
bit-for-bit identical to the other two. But you are right, all I need
is `git fsck`.

On Wed, Apr 22, 2020 at 6:15 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-04-21 at 18:07:42, Mikhail Strelnikov wrote:
> > Hi,
> >
> > I have a repo with two pushurls configured like this:
> >
> >
> > C:\folder\1>git init --bare
> > Initialized empty Git repository in C:/folder/1/
> >
> > C:\folder\2>git init --bare
> > Initialized empty Git repository in C:/folder/2/
> >
> > C:\folder\w>git init
> > Initialized empty Git repository in C:/folder/w/.git/
> >
> > C:\folder\w>git add work.txt
> >
> > C:\folder\w>git commit -m "Initial commit"
> > [master (root-commit) 1b314f3] Initial commit
> > 1 file changed, 1 insertion(+)
> > create mode 100644 work.txt
> >
> > C:\folder\w>git remote add origin C:\folder\1
> >
> > C:\folder\w>git remote set-url origin --push --add C:\folder\1
> >
> > C:\folder\w>git remote set-url origin --push --add C:\folder\2
> >
> > C:\folder\w>git push --set-upstream origin master
> >
> >
> > I would expect those two folders (C:\folder\1 and C:\folder\2) to
> > contain exactly the same bytes. And they did for quite some time. But
> > now there is a difference in 'objects\info\packs' (and some of
> > objects\pack\pack-*.idx/pack are also different).
> >
> > (all the commits are the same in both and all my data is also the same
> > and 'fast-export --all' yields the same result)
> >
> > I'd like to know what might have caused this nondeterminism and if
> > there is something to do to prevent that.
>
> You can get nondeterminism because the push to each repository happens
> independently and delta compression is multithreaded.  You can therefore
> compute different packs on push and get different packs in the result.
>
> You could try to avoid it by disabling threading for pushes, but that
> has to be done on each client that pushes to them.  In general, this is
> not worth worrying about as long as the data is intact (that is, it
> passes git fsck) and the refs are identical.  It is also not especially
> easy to avoid, since determinism of pack files is not considered a goal
> of Git.
>
> Maybe if you tell us a little more about your reason for wanting
> bit-for-bit identical replicas we can provide some assistance in helping
> you achieve your goals.
> --
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-22 17:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-21 18:07 Multiple pushurls, different 'objects\info\packs' Mikhail Strelnikov
2020-04-22  3:14 ` brian m. carlson
2020-04-22 17:45   ` Mikhail Strelnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).