From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261722AbVBSQm0 (ORCPT ); Sat, 19 Feb 2005 11:42:26 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261725AbVBSQm0 (ORCPT ); Sat, 19 Feb 2005 11:42:26 -0500 Received: from ppp-217-133-42-200.cust-adsl.tiscali.it ([217.133.42.200]:5433 "EHLO opteron.random") by vger.kernel.org with ESMTP id S261722AbVBSQmP (ORCPT ); Sat, 19 Feb 2005 11:42:15 -0500 Date: Sat, 19 Feb 2005 17:42:13 +0100 From: Andrea Arcangeli To: Patrick McFarland Cc: linux-kernel@veger.kernel.org, Erik =?iso-8859-1?Q?B=E5gfors?= , Tupshin Harper , darcs-users@darcs.net, lm@bitmover.com, linux-kernel@vger.kernel.org Subject: Re: [darcs-users] Re: [BK] upgrade will be needed Message-ID: <20050219164213.GB7247@opteron.random> References: <20050214020802.GA3047@bitmover.com> <845b6e8705021803533ba8cc34@mail.gmail.com> <20050218125057.GE2071@opteron.random> <200502190410.31960.pmcfarland@downeast.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200502190410.31960.pmcfarland@downeast.net> X-AA-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-Cpushare-GPG-Key: 1024D/4D11C21C 5F99 3C8B 5142 EB62 26C3 2325 8989 B72A 4D11 C21C X-Cpushare-SSL-SHA1-Cert: 3812 CD76 E482 94AF 020C 0FFA E1FF 559D 9B4F A59B X-Cpushare-SSL-MD5-Cert: EDA5 F2DA 1D32 7560 5E07 6C91 BFFC B885 User-Agent: Mutt/1.5.6i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote: > In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By > using it's super magical method, it represents how code is written and how it > changes (patch theory at its best). You can clearly see the direction code is > going, where it came from, and how it relates to other patches. I don't know anything about darcs, I was only talking about arch. I failed to compile darcs after trying for a while, so I give it up, I'll try again eventually. But anyway the only thing I care about is that you import all dozen thousands changesets of the 2.5 kernel into it, and you show it's manageable with <1G of ram and that the backup size is not very far from the 75M of CVS. I read in the webpage of the darcs kernel repository that they had to add RAM serveral times to avoid running out of memory. They needed more than 1G IIRC, and that was enough for me to lose interest into it. You're right I blamed the functional approach and so I felt it was going to be a mess to fix the ram utilization, but as someone else pointed out, perhaps it's darcs to blame and not haskell. I don't know. To me backup size matters too and for example I'm quite happy the fsfs backend of SVN generates very small backups compared to bsddb. > Sure, you can do this with RCS/SCCS style versioning, but whats the point? It > is inefficient, and backwards. It is saved in a compact format, and I don't think it'll run slower since if it's in cache it'll be fast and if it's I/O dominaed the more compact it is, the faster it will be, having a compact size both for the repository and for the backup, is more important to me. In theory one could write a backup tool that extracts the thing and rewrite a special backup-backend that is as space efficient as CVS and that can compress as well as CVS, but this won't help the working copy. > Thats all up to how the versioning system is written. Darcs developers are > working in a checkpoint system to allow you to just grab the newest stuff, This is already available with arch. Infact I suggested myself how to improve it with hardlinks so that a checkout will take a few seconds no matter the size of the tree. > and automatically grab anything else you need, instead of just grabbing > everything. In the case of the darcs linux repo, no one wants to download 600 > megs or so of changes. If you use arch/darcs as a patch-download tool, then that's fine as you say and you can already do that with arch (that in this part seems already a lot more advanced and it's written in C btw). Most people just checking out the kernel with arch (or darcs) would never need to download 600Megs of changes, but I need to download them all. The major reason a versioning system is useful to me is to track all changesets that touched a single file since the start of 2.5 to the head. So I can't get away by downloading the last dozen patches and caching the previous tree (which is perfectly doable with arch for ages and with hardlinks as well). > It may not even be space efficient. Code ultimately is just code, and changes > ultimately are changes. RCS isn't magical, and its far from it. Infact, the The way RCS stores the stuff compresses great. In that is "magical". I guess SCCS is the same. fsfs isn't bad either though, and infact I'd never use bsddb and I'd only use fsfs with SVN. > The darcs repo which has the entire history since at least the start of 2.4 > (iirc anyways) to *now* is around 600 to 700. > My suggestion is to convert _all_ dozen thousand changesets to darcs, and then > compare the size with CVS. And no, darcs doesn't eat that much memory for the What is the above 600/700 number then? I thought that was the conversion of all dozen thousand changesets of linux-2.5 CVS to darcs. > amount of work its doing. (And yes, they are working on that). I'll stay tuned. To me the only argument for not using a "magic" format like CVS or SCCS that is space efficient and compresses efficiently, is if you claim it's going to be a lot slower at checkouts (but infact applying some dozen thousand patchsets to run a checkout is going to be slower than CVS/SCCS). I know it's so much simpler to keep each patchset in a different file like arch is already doing, but that's not the best on-disk format IMHO. Note that some year ago I had the opposite idea, i.e. at some point I got convinced it was so much better to keep each patch separated from each other like you're advocating above, until I figured out how big the thing grows and how little space efficient it is, and how much I/O it forces me to do, how much disk it wastes in the backup and how slow it is as well to checkout dozen thousand patchsets. For smaller projects without dozen thousand changesets, the patch per file looks fine instead. For big projects IMHO being space efficient is much more important.