Partial-clone cause big performance impact on server

* Partial-clone cause big performance impact on server
@ 2022-08-11  8:09 程洋
  2022-08-11 17:22 ` Jonathan Tan
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: 程洋 @ 2022-08-11  8:09 UTC (permalink / raw)
  To: git
  Cc: 何浩, Xin7 Ma 马鑫,
	石奉兵, 凡军辉,
	王汉基

Hi.
     We observed big disk space save by partial-clone and require all of our users (2000+) to clone repository with partial-clone (filter=blob:none)
     However at busy time, we found it's extremely slow for user to fetch. Here is what we did.

    1. ask all users to fetch with filter=blob:none. And it's remarkable. Now our download size per user decrease from 460G to 180G.
    2. But at busy time, everyone's fetch become slow. (at idle hours, it takes us 5 minutes to clone a big repositories, but it takes more than 1 hour to clone the same repositories at busy hours)
    3. with GIT_TRACE_PACKET=1. We found on big repositories (200K+refs, 6m+ objects). Git will sends 40k want.
    4. And we then track our server(which is gerrit with jgit). We found the server is couting objects. Then we check those 40k objects, most of them are blobs rather than commit. (which means they're not in bitmap)
    5. We believe that's the root cause of our problem. Git sends too many "want SHA1" which are not in bitmap, cause the server to count objects  frequently, which then slow down the server.

What we want is, download the things we need to checkout to specific commit. But if one commit contain so many objects (like us , 40k+). It takes more time to counting than downloading.
Is it possible to let git only send "commit want" rather than all the objects SHA1 one by one?
#/******本邮件及其附件含有小米公司的保密信息，仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！ This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#

^ permalink raw reply	[flat|nested] 40+ messages in thread