git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Bryan Turner <bturner@atlassian.com>
Cc: Git Users <git@vger.kernel.org>
Subject: Re: git push over HTTP; long delay with no progress, then hang?
Date: Sat, 16 May 2020 08:37:58 +0200	[thread overview]
Message-ID: <20200516063758.GC5925@szeder.dev> (raw)
In-Reply-To: <CAGyf7-GQSPcdheKKiZPBpfGxAj_xu4oWdwRq_esSbuqLu5P08g@mail.gmail.com>

On Fri, May 15, 2020 at 09:09:27PM -0700, Bryan Turner wrote:
> When running a huge "git push" via protocol v0/v1 over HTTP

By huge push you mean a lot of refs?

> (repository is ~10GB, with ~104,000 refs), I observe that:
> * Git makes an initial connection for a ref advertisement. This
> completes almost instantly because the repository is empty
> * "git push" then sits in absolute silence for ~10 minutes

I've run into this a few years ago, remember waiting for 57 minutes ;)

> The process chain looks like:
> git push <URL>
>     git-remote-http <URL> <URL>
>         git send-pack --stateless-rpc --helper-status --thin
> --progress <URL> --stdin
> 
> The "git send-pack" process runs at 100% usage for a single CPU core
> for this entire duration. Does anyone have any insight into what Git
> might be doing during this long delay?

Pathspec matching is, if I recall correctly,

  O(nr of refspecs * (nr of local refs + nr of remote refs))

with remote.c:count_refspec_match() responsible the "nr of remote +
local refs" part and remote.c:match_explicit_refs() for the "nr of
refspecs" part.

This is particularly bad for http/https protocols, because 'git push'
expands your refspecs to fully qualified refspecs, passes them to 'git
send-pack', which then performs pathspec matching _again_.  So if you
have a single pathspec with globbing, then 'git push' can do the
pathspec matching still fairly quickly, even if there are a lot of
local and remote refs and if that single globbing pathspec happens to
match a lot of refs, but then the refspec matching in 'git send-pack'
has a whole lot to do, spins the CPU like crazy, and there you are
writing a bug report on Friday evening.

This is less of an issue with other protocols, because they perform
pathspec matching only once, but of course all protocols suffer if you
pass a lot of refspecs to 'git push' or 'git send-pack'.

> Whatever it is, is it perhaps
> something Git should actually print some sort of status for? (I've
> reproduced this long silence with both Git 2.20.1 and the new Git
> 2.27.0-rc0.)

An immediate band-aid might be to teach 'git push' to pass on the
original refspecs to 'git send-pack', as this would reduce the
complexity of that second pathspec matching.  This, of course,
wouldn't help if someone scripted around 'git push' and invoked it
with a lot of refspecs or fed lot of refspecs directly to 'git
send-pack's stdin.

Alternatively, teach 'git send-pack' a new option e.g.
'--only-fully-qualified-refspecs', and teach 'git push' to use it, so
'git send-pack' doesn't have to perform that second pathspec matching,
it would only have to verify that the refspecs it got are indeed all
fully qualified.

Or build the remote refs index earlier and sort refspecs and local
refs, so we could match the lhs of fully qualified refspecs to local
refs in one go while looking up their rhs in the remote ref index,
resulting in O((nr of refspecs + nr of local refs) * log(nr of remote
refs) complexity.  Dunno, it was a long time ago when I last thought
about this.

All this assumes that if there are a lot of refspecs, then they are
fully qualified.  I'd assume that if there are so many refspecs to
cause trouble, then they were generated programmatically, and I'd
(naively? :) assume that if something generates refspecs, then it's
careful and generates fully qualified refspecs.  Anyway, all bets are
off if there are a lot of non-fully-qualified refspecs...

      reply	other threads:[~2020-05-16  6:38 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-16  4:09 git push over HTTP; long delay with no progress, then hang? Bryan Turner
2020-05-16  6:37 ` SZEDER Gábor [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200516063758.GC5925@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=bturner@atlassian.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).