* [Question] clone performance @ 2019-08-24 1:59 randall.s.becker 2019-08-24 21:00 ` Bryan Turner 0 siblings, 1 reply; 5+ messages in thread From: randall.s.becker @ 2019-08-24 1:59 UTC (permalink / raw) To: git Hi All, I'm trying to answer a question for a customer on clone performance. They are doing at least 2-3 clones a day, of repositories with about 2500 files and 10Gb of content. This is stressing the file system. I have tried to convince them that their process is not reasonable and should stick with existing clones, using branch checkout rather that re=cloning for each feature branch. Sadly, I have not been successful - not for a lack of trying. Is there any way to improve raw clone performance in a situation like this, where status really doesn't matter, because the clone's life span is under 48 hours. TIA, Randall ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Question] clone performance 2019-08-24 1:59 [Question] clone performance randall.s.becker @ 2019-08-24 21:00 ` Bryan Turner 2019-08-26 14:16 ` randall.s.becker 0 siblings, 1 reply; 5+ messages in thread From: Bryan Turner @ 2019-08-24 21:00 UTC (permalink / raw) To: randall.s.becker; +Cc: Git Users On Fri, Aug 23, 2019 at 6:59 PM <randall.s.becker@rogers.com> wrote: > > Hi All, > > I'm trying to answer a question for a customer on clone performance. They > are doing at least 2-3 clones a day, of repositories with about 2500 files > and 10Gb of content. This is stressing the file system. Can you go into a bit more detail about what "stress" means? Using too much disk space? Too many IOPS reading/packing? Since you specifically called out the filesystem, does that mean the CPU/memory usage is acceptable? Depending on how well-packed the repository is, Git will reuse a lot of the existing pack (and a "perfectly" packed repository can achieve complete reuse, with no "Compressing objects" phase at all). Delta islands[1] can help increase reuse and reduce the need for on-the-fly compression, if the repository includes a lot of refs that aren't generally cloned. Another relatively recent addition is uploadpack.packobjectshook[2], which can simplify caching of packfiles so they can be reused on subsequent requests. Whether or not this will be beneficial is likely to be influenced by how many times the exact same commits are cloned and how much extra disk space is available for storing cached packs. Not sure if any of this is helpful, but I hope it will be! Bryan [1] https://git-scm.com/docs/git-pack-objects#_delta_islands [2] https://git-scm.com/docs/git-config#Documentation/git-config.txt-uploadpackpackObjectsHook > I have tried to > convince them that their process is not reasonable and should stick with > existing clones, using branch checkout rather that re=cloning for each > feature branch. Sadly, I have not been successful - not for a lack of > trying. Is there any way to improve raw clone performance in a situation > like this, where status really doesn't matter, because the clone's life span > is under 48 hours. > > TIA, > Randall > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [Question] clone performance 2019-08-24 21:00 ` Bryan Turner @ 2019-08-26 14:16 ` randall.s.becker 2019-08-26 18:21 ` Jeff King 0 siblings, 1 reply; 5+ messages in thread From: randall.s.becker @ 2019-08-26 14:16 UTC (permalink / raw) To: 'Bryan Turner'; +Cc: 'Git Users' On August 24, 2019 5:00 PM, Bryan Turner wrote: > On Fri, Aug 23, 2019 at 6:59 PM <randall.s.becker@rogers.com> wrote: > > > > Hi All, > > > > I'm trying to answer a question for a customer on clone performance. > > They are doing at least 2-3 clones a day, of repositories with about > > 2500 files and 10Gb of content. This is stressing the file system. > > Can you go into a bit more detail about what "stress" means? Using too > much disk space? Too many IOPS reading/packing? Since you specifically > called out the filesystem, does that mean the CPU/memory usage is > acceptable? The upstream is BitBucket, which does a gc frequently. I'm not sure any of this is relating to the pack structure. Git is spending most of its time writing the large number of large files into the working directory - it is stress mostly the disk, with a bit on the CPU (neither is acceptable to the customer). I am really unsure there is any way to make things better. The core issue is that the customer insists on doing a clone for every feature branch instead of using pull/checkout. I have been unable to change their mind - to this point anyway. We are going to be setting up a detailed performance analysis that may lead to some data the git team can use. Regards, Randall ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Question] clone performance 2019-08-26 14:16 ` randall.s.becker @ 2019-08-26 18:21 ` Jeff King 2019-08-26 19:27 ` Elijah Newren 0 siblings, 1 reply; 5+ messages in thread From: Jeff King @ 2019-08-26 18:21 UTC (permalink / raw) To: randall.s.becker; +Cc: 'Bryan Turner', 'Git Users' On Mon, Aug 26, 2019 at 10:16:48AM -0400, randall.s.becker@rogers.com wrote: > On August 24, 2019 5:00 PM, Bryan Turner wrote: > > On Fri, Aug 23, 2019 at 6:59 PM <randall.s.becker@rogers.com> wrote: > > > > > > Hi All, > > > > > > I'm trying to answer a question for a customer on clone performance. > > > They are doing at least 2-3 clones a day, of repositories with about > > > 2500 files and 10Gb of content. This is stressing the file system. > > > > Can you go into a bit more detail about what "stress" means? Using too > > much disk space? Too many IOPS reading/packing? Since you specifically > > called out the filesystem, does that mean the CPU/memory usage is > > acceptable? > > The upstream is BitBucket, which does a gc frequently. I'm not sure > any of this is relating to the pack structure. Git is spending most of > its time writing the large number of large files into the working > directory - it is stress mostly the disk, with a bit on the CPU > (neither is acceptable to the customer). I am really unsure there is > any way to make things better. The core issue is that the customer > insists on doing a clone for every feature branch instead of using > pull/checkout. I have been unable to change their mind - to this point > anyway. Yeah, at the point of checkout there's basically no impact from anything the server is doing or has done (technically it could make things worse for you by returning a pack with absurdly long delta chains or something, but that would be CPU and not disk stress). I doubt there's much to optimize in Git here. It's literally just writing files to disk as quickly as it can, and it sounds like disk performance is your bottleneck. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Question] clone performance 2019-08-26 18:21 ` Jeff King @ 2019-08-26 19:27 ` Elijah Newren 0 siblings, 0 replies; 5+ messages in thread From: Elijah Newren @ 2019-08-26 19:27 UTC (permalink / raw) To: Jeff King; +Cc: randall.s.becker, Bryan Turner, Git Users On Mon, Aug 26, 2019 at 12:04 PM Jeff King <peff@peff.net> wrote: > > On Mon, Aug 26, 2019 at 10:16:48AM -0400, randall.s.becker@rogers.com wrote: > > > On August 24, 2019 5:00 PM, Bryan Turner wrote: > > > On Fri, Aug 23, 2019 at 6:59 PM <randall.s.becker@rogers.com> wrote: > > > > > > > > Hi All, > > > > > > > > I'm trying to answer a question for a customer on clone performance. > > > > They are doing at least 2-3 clones a day, of repositories with about > > > > 2500 files and 10Gb of content. This is stressing the file system. > > > > > > Can you go into a bit more detail about what "stress" means? Using too > > > much disk space? Too many IOPS reading/packing? Since you specifically > > > called out the filesystem, does that mean the CPU/memory usage is > > > acceptable? > > > > The upstream is BitBucket, which does a gc frequently. I'm not sure > > any of this is relating to the pack structure. Git is spending most of > > its time writing the large number of large files into the working > > directory - it is stress mostly the disk, with a bit on the CPU > > (neither is acceptable to the customer). I am really unsure there is > > any way to make things better. The core issue is that the customer > > insists on doing a clone for every feature branch instead of using > > pull/checkout. I have been unable to change their mind - to this point > > anyway. > > Yeah, at the point of checkout there's basically no impact from anything > the server is doing or has done (technically it could make things worse > for you by returning a pack with absurdly long delta chains or > something, but that would be CPU and not disk stress). > > I doubt there's much to optimize in Git here. It's literally just > writing files to disk as quickly as it can, and it sounds like disk > performance is your bottleneck. Well, if it's just checkout, Stolee's sparse-checkout series he just posted may be of interest to them...once it's polished up and included in git, of course. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-08-26 19:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-24 1:59 [Question] clone performance randall.s.becker 2019-08-24 21:00 ` Bryan Turner 2019-08-26 14:16 ` randall.s.becker 2019-08-26 18:21 ` Jeff King 2019-08-26 19:27 ` Elijah Newren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).