git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Derrick Stolee <derrickstolee@github.com>
Cc: 程洋 <chengyang@xiaomi.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	何浩 <hehao@xiaomi.com>, "Xin7 Ma 马鑫" <maxin7@xiaomi.com>,
	石奉兵 <shifengbing@xiaomi.com>, 凡军辉 <fanjunhui@xiaomi.com>,
	王汉基 <wanghanji@xiaomi.com>
Subject: Re: [External Mail]Re: Partial-clone cause big performance impact on server
Date: Thu, 18 Aug 2022 01:49:48 -0400	[thread overview]
Message-ID: <Yv3S/J/ecYi0slQA@coredump.intra.peff.net> (raw)
In-Reply-To: <fc71e91b-7a66-5653-d723-e4df17bf2a9c@github.com>

On Wed, Aug 17, 2022 at 09:41:10AM -0400, Derrick Stolee wrote:

> On 8/17/2022 6:22 AM, 程洋 wrote:
> > But I still think the protocol still should tell the server which ref
> > the blob is reachable.
> > Because it would be really hard to implement any kind of ACL
> 
> I think this idea has merit on its face, but it wouldn't really solve the
> problem since the reachability query would still need to be done, just
> from a smaller set of references at first. If we were able to say "this
> blob can be found at path X at commit Y" then the server could do a
> commit-reachability query and a path traversal, which should be a lot
> faster.
> 
> However, it would be extremely difficult to plumb into the partial clone
> machinery. At the point where Git realizes it is missing a promisor
> object, that code is very generic and removed from any kind of walk from a
> reference. That is further complicated by the fact that the walk is
> probably from a local reference, which can be entirely different from the
> remote reference.

Agreed. The client often doesn't know the context of what it's asking
for in the first place. Sometimes it's not carried through the code, but
we also have commands that might not be invoked with a commit in the
first place! It's valid to run "git read-tree <tree>", and we should be
able to fault in blobs from that tree as needed.

I also think that this kind of "is the blob reachable" query is
mostly expensive if you don't have reachability bitmaps at all. If you
do, then the cost to ask "is this object reachable" is the same for a
commit or a blob. If the server has a bitmap of all objects reachable
for each branch ACL (even if it has to do some small bit of fill-in
walking to bring it up to date), then querying for any object type the
client asks for is still just a bit lookup.

Not knowing a lot about gerrit or jgit, it's not clear to me if there
are configuration knobs that could be tweaked on the server side to make
these requests more efficient.

> One possible hurdle is the fact that this branch-level security is a
> feature of Gerrit, not a feature of Git itself. Optimizing Git to that
> special case that Git does not itself support is less valuable to the Git
> project itself.

We don't have branch-level security per se, but I do think that
everything is there in Git to do fast "is this object reachable from
these branches" queries. If you're making a lot of those queries it
might influence your decision of which bitmaps to generate, but the
bitmap concept itself should be sufficient.

-Peff

  reply	other threads:[~2022-08-18  5:49 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  8:09 Partial-clone cause big performance impact on server 程洋
2022-08-11 17:22 ` Jonathan Tan
2022-08-13  7:55   ` 回复: [External Mail]Re: " 程洋
2022-08-13 11:41     ` 程洋
2022-08-15  5:16     ` ZheNing Hu
2022-08-15 13:15       ` 程洋
2022-08-12 12:21 ` Derrick Stolee
2022-08-14  6:48 ` Jeff King
2022-08-15 13:18   ` Derrick Stolee
2022-08-15 14:50     ` [External Mail]Re: " 程洋
2022-08-17 10:22     ` 程洋
2022-08-17 13:41       ` Derrick Stolee
2022-08-18  5:49         ` Jeff King [this message]
2022-09-01  6:53   ` 程洋
2022-09-01 16:19     ` Jeff King
2022-09-05 11:17       ` 程洋
2022-09-06 18:38         ` Jeff King
2022-09-06 22:58           ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Jeff King
2022-09-06 23:01             ` [PATCH 1/3] parse_object(): allow skipping hash check Jeff King
2022-09-07 14:15               ` Derrick Stolee
2022-09-07 20:44                 ` Jeff King
2022-09-06 23:05             ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 14:36               ` Derrick Stolee
2022-09-07 14:45                 ` Derrick Stolee
2022-09-07 20:50                   ` Jeff King
2022-09-07 19:26               ` Junio C Hamano
2022-09-07 20:36                 ` Jeff King
2022-09-07 20:48                   ` [BUG] t1800: Fails for error text comparison rsbecker
2022-09-07 21:55                     ` Junio C Hamano
2022-09-07 22:23                       ` rsbecker
2022-09-07 21:02                   ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 22:07                     ` Junio C Hamano
2022-09-08  5:04                       ` Jeff King
2022-09-08 16:41                         ` Junio C Hamano
2022-09-06 23:06             ` [PATCH 3/3] parse_object(): check commit-graph when skip_hash set Jeff King
2022-09-07 14:46               ` Derrick Stolee
2022-09-07 19:31               ` Junio C Hamano
2022-09-08 10:39                 ` [External Mail]Re: " 程洋
2022-09-08 18:42                   ` Jeff King
2022-09-07 14:48             ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv3S/J/ecYi0slQA@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=chengyang@xiaomi.com \
    --cc=derrickstolee@github.com \
    --cc=fanjunhui@xiaomi.com \
    --cc=git@vger.kernel.org \
    --cc=hehao@xiaomi.com \
    --cc=maxin7@xiaomi.com \
    --cc=shifengbing@xiaomi.com \
    --cc=wanghanji@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).