From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH v4 00/32] Lockfile correctness and refactoring Date: Wed, 10 Sep 2014 15:11:59 -0400 Message-ID: <20140910191159.GA23655@peff.net> References: <1409989846-22401-1-git-send-email-mhagger@alum.mit.edu> <20140910081358.GB16413@peff.net> <20140910103014.GA15461@peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Duy Nguyen , Michael Haggerty , Johannes Sixt , Torsten =?utf-8?Q?B=C3=B6gershausen?= , Git Mailing List To: Junio C Hamano X-From: git-owner@vger.kernel.org Wed Sep 10 21:12:14 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XRnJI-0005z7-Kc for gcvg-git-2@plane.gmane.org; Wed, 10 Sep 2014 21:12:08 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751972AbaIJTME (ORCPT ); Wed, 10 Sep 2014 15:12:04 -0400 Received: from cloud.peff.net ([50.56.180.127]:46641 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751064AbaIJTMD (ORCPT ); Wed, 10 Sep 2014 15:12:03 -0400 Received: (qmail 8778 invoked by uid 102); 10 Sep 2014 19:12:02 -0000 Received: from Unknown (HELO peff.net) (10.0.1.1) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Wed, 10 Sep 2014 14:12:02 -0500 Received: (qmail 19127 invoked by uid 107); 10 Sep 2014 19:12:21 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Wed, 10 Sep 2014 15:12:21 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 10 Sep 2014 15:11:59 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Sep 10, 2014 at 09:51:03AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Yes, we don't let normal fetchers see these repos. They're only for > > holding shared objects and the ref tips to keep them reachable. > > Are these individual refs have relations to the real world after > they are created? To ask it another way, let's say that a branch in > a repository, which is using this as a shared object store, caused > one of these refs to be created; now the origin repository rewinds > or deletes that branch---do you do anything to the ref in the shared > object store at that point? Yes, we fetch from them before doing any maintenance in the shared repository (like running repack). That's how objects migrate into the shared repository, as well. > I am wondering if it makes sense to maintain a single ref that > reaches all the commits in this shared object store repository, > instead of keeping these millions of refs. When you need to make > more objects kept and reachable, create an octopus with the current > tip and tips of all these refs that causes you to wish making these > "more objects kept and reachable". Obviously that won't work well > if the reason why your current scheme uses refs is because you > adjust individual refs to prune some objects---hence the first > question in this message. Exactly. You could do this if you threw away and re-made the octopus after each fetch (and then threw away the individual branches that went into it). For that matter, if all you really want are the tips for reachability, you can basically run "for-each-ref | sort -u"; most of these refs are tags that are duplicated between each fork. However, having the individual tips does make some things easier. If I only keep unique tips and I drop a tip from fork A, I would then need to check every other fork to see if any other fork has the same tip. OTOH, that means visiting N packed-refs files, each with (let's say) 3000 refs. As opposed to dealing with a packed-refs file with N*3000 refs. So it's really not that different. We also use the individual ref tips for packing. They factor into the bitmap selection, and we have some patches (which I've been meaning to upstream for a while now) to make delta selections in the shared-object repository that will have a high chance of reuse in clones of individual forks. And it's useful to query them for various reasons (e.g., "who is referencing this object?"). There a lot of different ways to do it, and the giant refs file is a pain (not to mention writing objects to disk in the forks, and then migrating them separately to the shared storage). But doing it this way means that the forks and the shared-object repository are all real first-class git repositories. We follow the usual object reachability guarantees, and it's safe to run any stock git command on them. I am leaning towards a system where the shared-object repository is a pseudo-repository, forks actually write directly into the shared object store (probably with a symlinked objects/ directory), and implementing a ref backend that generates a virtual mapping on the fly (e.g., all refs in "../foo.git" appear as "refs/remotes/foo/*" in the pseudo-repository. I'm watching the refs-backend work anxiously. :) -Peff