git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fetching a lot of repos
@ 2020-06-07 20:04 Soni "They/Them" L.
  2020-06-07 21:03 ` brian m. carlson
  0 siblings, 1 reply; 5+ messages in thread
From: Soni "They/Them" L. @ 2020-06-07 20:04 UTC (permalink / raw)
  To: git

For... reasons, I need to fetch a lot of repos, and fetching them one by 
one is *extremely* slow, often taking upwards of 30 minutes.

So I decided to try something different. My first attempt was a complete 
failure:

-----

[soniex2@soniex-pc multigit]$ git fetch 
https://soniex2.autistic.space/git-repos/ganarchy.git +HEAD:repo_a & git 
fetch https://github.com/ganarchy/GAnarchy +HEAD:repo_b & git fetch 
https://cybre.tech/SoniEx2/ganarchy +HEAD:repo_c & git fetch 
https://soniex2.autistic.space/git-repos/abdl.git +HEAD:repo_d & git 
fetch https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin +HEAD:repo_e
[1] 2236
[2] 2237
[3] 2238
[4] 2239
remote: Enumerating objects: 87, done.
remote: Total 87 (delta 0), reused 0 (delta 0), pack-reused 87
Unpacking objects: 100% (87/87), 36.06 KiB | 225.00 KiB/s, done.
 From https://github.com/ganarchy/GAnarchy
  * [new ref]                    -> repo_b
 From https://cybre.tech/SoniEx2/ganarchy
  * [new ref]                    -> repo_c
warning: no common commits
remote: Counting objects: 113, done.
remote: Compressing objects: 100% (74/74), done.
remote: Total 113 (delta 48), reused 88 (delta 38)
Receiving objects: 100% (113/113), 30.07 KiB | 138.00 KiB/s, done.
Resolving deltas: 100% (48/48), done.
 From https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin
  * [new ref]                    -> repo_e
[2]   Done                    git fetch 
https://github.com/ganarchy/GAnarchy +HEAD:repo_b
[3]-  Done                    git fetch 
https://cybre.tech/SoniEx2/ganarchy +HEAD:repo_c
[soniex2@soniex-pc multigit]$ error: unable to write file 
.git/objects/d2/5baa9c0a78b0007a34a569b774d983b905f0b5: No such file or 
directory
error: unable to write file 
.git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66: No such file or 
directory
error: unable to write sha1 filename 
.git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66
error: Unable to find baf9414a35a2f48ed1b22644fd4522272fb4bc66 under 
https://soniex2.autistic.space/git-repos/abdl.git
Fetching objects: 12, done.
Cannot obtain needed blob baf9414a35a2f48ed1b22644fd4522272fb4bc66
while processing commit 3f9f66712aaa071bd3bb32c46e1e4dc1fed13378.
error: fetch failed.
Fetching objects: 78, done.
 From https://soniex2.autistic.space/git-repos/ganarchy
  * [new ref]                    -> repo_a

-----

So I figured, "okay this is a git gc issue", and started over (rm -rf 
.git, git init) and turned off the GC (git config --local gc.auto 0, and 
that long command to run 5 git fetch at the same time). At first, it 
seemed to work fine, but then...

-----

$ git gc --aggressive
Enumerating objects: 365, done.
error: object file 
.git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66 is empty
Counting objects: 100% (365/365), done.
Delta compression using up to 4 threads
Compressing objects: 100% (357/357), done.
error: object file 
.git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66 is empty
fatal: loose object baf9414a35a2f48ed1b22644fd4522272fb4bc66 (stored in 
.git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66) is corrupt
fatal: failed to run repack

-----

Well that didn't work did it... I'm not sure what to do about this, but 
I kinda need it to work, and it's currently not working. How can I make 
it work?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching a lot of repos
  2020-06-07 20:04 Fetching a lot of repos Soni "They/Them" L.
@ 2020-06-07 21:03 ` brian m. carlson
  2020-06-08 16:03   ` Soni "They/Them" L.
  2020-06-08 17:14   ` Stefan Moch
  0 siblings, 2 replies; 5+ messages in thread
From: brian m. carlson @ 2020-06-07 21:03 UTC (permalink / raw)
  To: Soni "They/Them" L.; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4249 bytes --]

On 2020-06-07 at 20:04:03, Soni "They/Them" L. wrote:
> -----
> 
> [soniex2@soniex-pc multigit]$ git fetch
> https://soniex2.autistic.space/git-repos/ganarchy.git +HEAD:repo_a & git
> fetch https://github.com/ganarchy/GAnarchy +HEAD:repo_b & git fetch
> https://cybre.tech/SoniEx2/ganarchy +HEAD:repo_c & git fetch
> https://soniex2.autistic.space/git-repos/abdl.git +HEAD:repo_d & git fetch
> https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin +HEAD:repo_e
> [1] 2236
> [2] 2237
> [3] 2238
> [4] 2239
> remote: Enumerating objects: 87, done.
> remote: Total 87 (delta 0), reused 0 (delta 0), pack-reused 87
> Unpacking objects: 100% (87/87), 36.06 KiB | 225.00 KiB/s, done.
> From https://github.com/ganarchy/GAnarchy
>  * [new ref]                    -> repo_b
> From https://cybre.tech/SoniEx2/ganarchy
>  * [new ref]                    -> repo_c
> warning: no common commits
> remote: Counting objects: 113, done.
> remote: Compressing objects: 100% (74/74), done.
> remote: Total 113 (delta 48), reused 88 (delta 38)
> Receiving objects: 100% (113/113), 30.07 KiB | 138.00 KiB/s, done.
> Resolving deltas: 100% (48/48), done.
> From https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin
>  * [new ref]                    -> repo_e
> [2]   Done                    git fetch https://github.com/ganarchy/GAnarchy
> +HEAD:repo_b
> [3]-  Done                    git fetch https://cybre.tech/SoniEx2/ganarchy
> +HEAD:repo_c
> [soniex2@soniex-pc multigit]$ error: unable to write file
> .git/objects/d2/5baa9c0a78b0007a34a569b774d983b905f0b5: No such file or
> directory
> error: unable to write file
> .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66: No such file or
> directory
> error: unable to write sha1 filename
> .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66
> error: Unable to find baf9414a35a2f48ed1b22644fd4522272fb4bc66 under
> https://soniex2.autistic.space/git-repos/abdl.git
> Fetching objects: 12, done.
> Cannot obtain needed blob baf9414a35a2f48ed1b22644fd4522272fb4bc66
> while processing commit 3f9f66712aaa071bd3bb32c46e1e4dc1fed13378.
> error: fetch failed.
> Fetching objects: 78, done.
> From https://soniex2.autistic.space/git-repos/ganarchy
>  * [new ref]                    -> repo_a
> 
> -----

So when Git needs to write a loose object, it will create the sharded
directory if it doesn't exist.  If it removes all of the objects in that
directory, it will remove the directory, which is likely what you're
seeing here.

In general, I wouldn't recommend fetching in parallel, but if you want
to do it anyway, I'd suggest setting `receive.unpackLimit` to 1.  That
will result in you keeping the packs you've fetched instead of exploding
them into loose objects, which will help this case.  It may not help
enough to solve the problem, though.

> So I figured, "okay this is a git gc issue", and started over (rm -rf .git,
> git init) and turned off the GC (git config --local gc.auto 0, and that long
> command to run 5 git fetch at the same time). At first, it seemed to work
> fine, but then...
> 
> -----
> 
> $ git gc --aggressive
> Enumerating objects: 365, done.
> error: object file .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66 is
> empty
> Counting objects: 100% (365/365), done.
> Delta compression using up to 4 threads
> Compressing objects: 100% (357/357), done.
> error: object file .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66 is
> empty
> fatal: loose object baf9414a35a2f48ed1b22644fd4522272fb4bc66 (stored in
> .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66) is corrupt
> fatal: failed to run repack
> 
> -----

It looks like this particular object is corrupt.  If you fetch with
packs this should go away, but you'll need to find which repo it's from,
clone it (without receive.packLimit set), and replace it.  Then run git
fsck to see if you have any more objects that are a problem.  Anything
that says "dangling" can be ignored, but other issues can be a problem.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching a lot of repos
  2020-06-07 21:03 ` brian m. carlson
@ 2020-06-08 16:03   ` Soni "They/Them" L.
  2020-06-08 22:36     ` brian m. carlson
  2020-06-08 17:14   ` Stefan Moch
  1 sibling, 1 reply; 5+ messages in thread
From: Soni "They/Them" L. @ 2020-06-08 16:03 UTC (permalink / raw)
  To: brian m. carlson, git



On 2020-06-07 6:03 p.m., brian m. carlson wrote:
> On 2020-06-07 at 20:04:03, Soni "They/Them" L. wrote:
> > -----
> > 
> > [soniex2@soniex-pc multigit]$ git fetch
> > https://soniex2.autistic.space/git-repos/ganarchy.git +HEAD:repo_a & git
> > fetch https://github.com/ganarchy/GAnarchy +HEAD:repo_b & git fetch
> > https://cybre.tech/SoniEx2/ganarchy +HEAD:repo_c & git fetch
> > https://soniex2.autistic.space/git-repos/abdl.git +HEAD:repo_d & git fetch
> > https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin +HEAD:repo_e
> > [1] 2236
> > [2] 2237
> > [3] 2238
> > [4] 2239
> > remote: Enumerating objects: 87, done.
> > remote: Total 87 (delta 0), reused 0 (delta 0), pack-reused 87
> > Unpacking objects: 100% (87/87), 36.06 KiB | 225.00 KiB/s, done.
> > From https://github.com/ganarchy/GAnarchy
> >  * [new ref]                    -> repo_b
> > From https://cybre.tech/SoniEx2/ganarchy
> >  * [new ref]                    -> repo_c
> > warning: no common commits
> > remote: Counting objects: 113, done.
> > remote: Compressing objects: 100% (74/74), done.
> > remote: Total 113 (delta 48), reused 88 (delta 38)
> > Receiving objects: 100% (113/113), 30.07 KiB | 138.00 KiB/s, done.
> > Resolving deltas: 100% (48/48), done.
> > From https://cybre.tech/SoniEx2/rust.hexchat.hexchat-plugin
> >  * [new ref]                    -> repo_e
> > [2]   Done                    git fetch https://github.com/ganarchy/GAnarchy
> > +HEAD:repo_b
> > [3]-  Done                    git fetch https://cybre.tech/SoniEx2/ganarchy
> > +HEAD:repo_c
> > [soniex2@soniex-pc multigit]$ error: unable to write file
> > .git/objects/d2/5baa9c0a78b0007a34a569b774d983b905f0b5: No such file or
> > directory
> > error: unable to write file
> > .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66: No such file or
> > directory
> > error: unable to write sha1 filename
> > .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66
> > error: Unable to find baf9414a35a2f48ed1b22644fd4522272fb4bc66 under
> > https://soniex2.autistic.space/git-repos/abdl.git
> > Fetching objects: 12, done.
> > Cannot obtain needed blob baf9414a35a2f48ed1b22644fd4522272fb4bc66
> > while processing commit 3f9f66712aaa071bd3bb32c46e1e4dc1fed13378.
> > error: fetch failed.
> > Fetching objects: 78, done.
> > From https://soniex2.autistic.space/git-repos/ganarchy
> >  * [new ref]                    -> repo_a
> > 
> > -----
>
> So when Git needs to write a loose object, it will create the sharded
> directory if it doesn't exist.  If it removes all of the objects in that
> directory, it will remove the directory, which is likely what you're
> seeing here.
>
> In general, I wouldn't recommend fetching in parallel, but if you want
> to do it anyway, I'd suggest setting `receive.unpackLimit` to 1.  That
> will result in you keeping the packs you've fetched instead of exploding
> them into loose objects, which will help this case.  It may not help
> enough to solve the problem, though.

That didn't work either, unfortunately. It was worth a shot tho, thanks.
>
> > So I figured, "okay this is a git gc issue", and started over (rm -rf .git,
> > git init) and turned off the GC (git config --local gc.auto 0, and that long
> > command to run 5 git fetch at the same time). At first, it seemed to work
> > fine, but then...
> > 
> > -----
> > 
> > $ git gc --aggressive
> > Enumerating objects: 365, done.
> > error: object file .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66 is
> > empty
> > Counting objects: 100% (365/365), done.
> > Delta compression using up to 4 threads
> > Compressing objects: 100% (357/357), done.
> > error: object file .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66 is
> > empty
> > fatal: loose object baf9414a35a2f48ed1b22644fd4522272fb4bc66 (stored in
> > .git/objects/ba/f9414a35a2f48ed1b22644fd4522272fb4bc66) is corrupt
> > fatal: failed to run repack
> > 
> > -----
>
> It looks like this particular object is corrupt.  If you fetch with
> packs this should go away, but you'll need to find which repo it's from,
> clone it (without receive.packLimit set), and replace it.  Then run git
> fsck to see if you have any more objects that are a problem.  Anything
> that says "dangling" can be ignored, but other issues can be a problem.

So I can do parallel fetches but I'd have to check that everything is 
good and re-fetch what got broken? That kinda defeats the purpose of 
parallel fetch. On the other hand, the repos with no shared commits 
don't seem to cause an issue, but the ones with shared commits do. 
However, I have no reliable way of telling which repos have shared 
commits, as one can merge unrelated branches.

Another idea I had was to make local clones using symlinks (IIRC there 
was a way to do that, but I can't remember how to do it), do parallel 
remote fetches into separate local repos, then do quick local fetches 
sequentially instead. It's probably the safest way to do it.

Relatedly, but not relevant for my use-case: Is it safe to assume that 
currently git breaks if multiple ppl push into the same repo at the same 
time, and they have not-yet-upstreamed shared commits, in particular if 
they're pushing to different branches?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching a lot of repos
  2020-06-07 21:03 ` brian m. carlson
  2020-06-08 16:03   ` Soni "They/Them" L.
@ 2020-06-08 17:14   ` Stefan Moch
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Moch @ 2020-06-08 17:14 UTC (permalink / raw)
  To: brian m. carlson, Soni "They/Them" L., git

* brian m. carlson wrote:
> In general, I wouldn't recommend fetching in parallel, but if you want
> to do it anyway, I'd suggest setting `receive.unpackLimit` to 1.  That
> will result in you keeping the packs you've fetched instead of exploding
> them into loose objects, which will help this case.  It may not help
> enough to solve the problem, though.

For fetch it should be `fetch.unpackLimit` (or
`transfer.unpackLimit`), as `receive.unpackLimit` configures this
limit for receive-pack.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fetching a lot of repos
  2020-06-08 16:03   ` Soni "They/Them" L.
@ 2020-06-08 22:36     ` brian m. carlson
  0 siblings, 0 replies; 5+ messages in thread
From: brian m. carlson @ 2020-06-08 22:36 UTC (permalink / raw)
  To: Soni "They/Them" L.; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

On 2020-06-08 at 16:03:37, Soni "They/Them" L. wrote:
> Relatedly, but not relevant for my use-case: Is it safe to assume that
> currently git breaks if multiple ppl push into the same repo at the same
> time, and they have not-yet-upstreamed shared commits, in particular if
> they're pushing to different branches?

No, if they're pushing to different branches, it's not a problem on the
server side.  The server side should be reasonably robust about this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-08 22:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 20:04 Fetching a lot of repos Soni "They/Them" L.
2020-06-07 21:03 ` brian m. carlson
2020-06-08 16:03   ` Soni "They/Them" L.
2020-06-08 22:36     ` brian m. carlson
2020-06-08 17:14   ` Stefan Moch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).