From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bill Lear Subject: Re: 16 gig, 350,000 file repository Date: Mon, 22 Feb 2010 16:20:08 -0600 Message-ID: <19331.792.84192.53690@blake.zopyra.com> References: <20100211234753.22574.48799.reportbug@gibbs.hungrycats.org> <20100214011812.GA2175@dpotapov.dyndns.org> <7vljer1gyg.fsf_-_@alter.siamese.dyndns.org> <201002181114.19984.trast@student.ethz.ch> <7vtytee7ff.fsf@alter.siamese.dyndns.org> <19325.40682.729141.973125@blake.zopyra.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: git@vger.kernel.org To: Nicolas Pitre X-From: git-owner@vger.kernel.org Mon Feb 22 23:20:26 2010 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Njgdd-00046c-Jc for gcvg-git-2@lo.gmane.org; Mon, 22 Feb 2010 23:20:25 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754694Ab0BVWUV (ORCPT ); Mon, 22 Feb 2010 17:20:21 -0500 Received: from 75-27-130-60.lightspeed.austtx.sbcglobal.net ([75.27.130.60]:59454 "HELO blake.zopyra.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1754554Ab0BVWUU (ORCPT ); Mon, 22 Feb 2010 17:20:20 -0500 Received: (qmail 12007 invoked by uid 500); 22 Feb 2010 22:20:18 -0000 In-Reply-To: X-Mailer: VM 8.1.0-beta under 23.1.1 (i386-redhat-linux-gnu) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thursday, February 18, 2010 at 15:58:42 (-0500) Nicolas Pitre writes: >On Thu, 18 Feb 2010, Bill Lear wrote: > >> I'm starting a new, large project and would like a quick bit of advice. >> >> Bringing in a set of test cases and other files from a ClearCase >> repository resulted in a 350,000 file git repo of about 16 gigabytes. >> >> The time to clone over a fast network was about 250 minutes. I could >> not verify if the repo had been packed properly, etc. > >I'd start from there. If you didn't do a 'git gc --aggressive' after >the import then it is quite likely that your repo isn't well packed. > >Of course you'll need a big machine to repack this. But that should be >needed only once. Ok, well they have a "big machine", but not big enough. It's running out of memory on the gc. I believe they have a fair amount of memory: % free total used free shared buffers cached Mem: 16629680 16051444 578236 0 28332 14385948 -/+ buffers/cache: 1637164 14992516 Swap: 8289500 1704 8287796 and they are using git 1.6.6. Assuming we can figure out how to gc this puppy (is there any way on a machine without 64 gigabytes?), there is still a question that remains: how to organize a project that has a very large amount of test cases (and test data) that we might not want to pull across the wire each time. Instead of shallow clone, as sort of slicing clone operation? We thought of using submodules. That is, code (say) goes in a separate repo 'src' and functional tests go in another, called 'ftests'. Then, we add 'ftests' as a submodule to 'src'. Great. However, we need to be able to branch 'src' and 'ftests' together. Example: I am working on a new feature in a branch "GLX-473_incremental_compression". I would like to be able to create the branch in both the 'src' repo and the 'ftests' repo at the same time, make changes, commit, and push to that branch for both. When developers check out the repo, they move to that branch, but do NOT want the cloned ftests. However, the QA team wants both the source and the tests that I have checked in and pushed. Is there an easy way to support this? Bill