* Kernel SCM saga.. @ 2005-04-06 15:42 Linus Torvalds 2005-04-06 16:00 ` Greg KH ` (10 more replies) 0 siblings, 11 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-06 15:42 UTC (permalink / raw) To: Kernel Mailing List Ok, as a number of people are already aware (and in some cases have been aware over the last several weeks), we've been trying to work out a conflict over BK usage over the last month or two (and it feels like longer ;). That hasn't been working out, and as a result, the kernel team is looking at alternatives. [ And apparently this just hit slashdot too, so by now _everybody_ knows ] It's not like my choice of BK has been entirely conflict-free ("No, really? Do tell! Oh, you mean the gigabytes upon gigabytes of flames we had?"), so in some sense this was inevitable, but I sure had hoped that it would have happened only once there was a reasonable open-source alternative. As it is, we'll have to scramble for a while. Btw, don't blame BitMover, even if that's probably going to be a very common reaction. Larry in particular really did try to make things work out, but it got to the point where I decided that I don't want to be in the position of trying to hold two pieces together that would need as much glue as it seemed to require. We've been using BK for three years, and in fact, the biggest problem right now is that a number of people have gotten very very picky about their tools after having used the best. Me included, but in fact the people that got helped most by BitKeeper usage were often the people _around_ me who had a much easier time merging with my tree and sending their trees to me. Of course, there's also probably a ton of people who just used BK as a nicer (and much faster) "anonymous CVS" client. We'll get that sorted out, but the immediate problem is that I'm spending most my time trying to see what the best way to co-operate is. NOTE! BitKeeper isn't going away per se. Right now, the only real thing that has happened is that I've decided to not use BK mainly because I need to figure out the alternatives, and rather than continuing "things as normal", I decided to bite the bullet and just see what life without BK looks like. So far it's a gray and bleak world ;) So don't take this to mean anything more than it is. I'm going to be effectively off-line for a week (think of it as a normal "Linus went on a vacation" event) and I'm just asking that people who continue to maintain BK trees at least try to also make sure that they can send me the result as (individual) patches, since I'll eventually have to merge some other way. That "individual patches" is one of the keywords, btw. One thing that BK has been extremely good at, and that a lot of people have come to like even when they didn't use BK, is how we've been maintaining a much finer- granularity view of changes. That isn't going to go away. In fact, one impact BK ha shad is to very fundamentally make us (and me in particular) change how we do things. That ranges from the fine-grained changeset tracking to just how I ended up trusting submaintainers with much bigger things, and not having to work on a patch-by-patch basis any more. So the three years with BK are definitely not wasted: I'm convinced it caused us to do things in better ways, and one of the things I'm looking at is to make sure that those things continue to work. So I just wanted to say that I'm personally very happy with BK, and with Larry. It didn't work out, but it sure as hell made a big difference to kernel development. And we'll work out the temporary problem of having to figure out a set of tools to allow us to continue to do the things that BK allowed us to do. Let the flames begin. Linus PS. Don't bother telling me about subversion. If you must, start reading up on "monotone". That seems to be the most viable alternative, but don't pester the developers so much that they don't get any work done. They are already aware of my problems ;) ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds @ 2005-04-06 16:00 ` Greg KH 2005-04-07 16:40 ` Rik van Riel 2005-04-06 16:09 ` Daniel Phillips ` (9 subsequent siblings) 10 siblings, 1 reply; 201+ messages in thread From: Greg KH @ 2005-04-06 16:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, Apr 06, 2005 at 08:42:08AM -0700, Linus Torvalds wrote: > > So I just wanted to say that I'm personally very happy with BK, and with > Larry. It didn't work out, but it sure as hell made a big difference to > kernel development. And we'll work out the temporary problem of having to > figure out a set of tools to allow us to continue to do the things that BK > allowed us to do. I'd also like to publicly say that BK has helped out immensely in the past few years with kernel development, and has been one of the main reasons we have been able to keep up such a high patch rate over such a long period of time. Larry, and his team, have been nothing but great in dealing with all of the crap that we have been flinging at him due to the very odd demands such a large project as the kernel has caused. And I definitely owe him a beer the next time I see him. thanks, greg k-h ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 16:00 ` Greg KH @ 2005-04-07 16:40 ` Rik van Riel 2005-04-08 0:53 ` Jesse Barnes 0 siblings, 1 reply; 201+ messages in thread From: Rik van Riel @ 2005-04-07 16:40 UTC (permalink / raw) To: Greg KH; +Cc: Linus Torvalds, Kernel Mailing List On Wed, 6 Apr 2005, Greg KH wrote: > the very odd demands such a large project as the kernel has caused. And > I definitely owe him a beer the next time I see him. Seconded. Besides, now that the code won't be on bkbits any more, it's safe to get Larry drunk ;) Larry, thanks for the help you have given us by making bitkeeper available for all these years. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 16:40 ` Rik van Riel @ 2005-04-08 0:53 ` Jesse Barnes 0 siblings, 0 replies; 201+ messages in thread From: Jesse Barnes @ 2005-04-08 0:53 UTC (permalink / raw) To: Rik van Riel; +Cc: Greg KH, Linus Torvalds, Kernel Mailing List On Thursday, April 7, 2005 9:40 am, Rik van Riel wrote: > Larry, thanks for the help you have given us by making > bitkeeper available for all these years. A big thank you from me too, I've really enjoyed using BK and I think it's made me much more productive than I would have been otherwise. I don't envy you having to put up with the frequent flamefests... Jesse ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds 2005-04-06 16:00 ` Greg KH @ 2005-04-06 16:09 ` Daniel Phillips 2005-04-06 19:07 ` Jon Smirl ` (8 subsequent siblings) 10 siblings, 0 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-06 16:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wednesday 06 April 2005 11:42, Linus Torvalds wrote: > it got to the point where I decided that I don't want to be in > the position of trying to hold two pieces together that would need as much > glue as it seemed to require. Hi Linus, Well I'm really pleased to hear that you won't be drinking this koolaid any more. This is a really uplifting development for me, thanks. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds 2005-04-06 16:00 ` Greg KH 2005-04-06 16:09 ` Daniel Phillips @ 2005-04-06 19:07 ` Jon Smirl 2005-04-06 19:24 ` Matan Peled 2005-04-06 19:39 ` Paul P Komkoff Jr ` (7 subsequent siblings) 10 siblings, 1 reply; 201+ messages in thread From: Jon Smirl @ 2005-04-06 19:07 UTC (permalink / raw) To: Linus Torvalds, Larry McVoy; +Cc: Kernel Mailing List On Apr 6, 2005 11:42 AM, Linus Torvalds <torvalds@osdl.org> wrote: > So I just wanted to say that I'm personally very happy with BK, and with > Larry. It didn't work out, but it sure as hell made a big difference to > kernel development. And we'll work out the temporary problem of having to > figure out a set of tools to allow us to continue to do the things that BK > allowed us to do. Larry has stated several time that most of his revenue comes from Windows. Has ODSL approached Bitmover about simply buying out the source rights for the Linux version? From my experience in the industry a fair price would probably be around $2M, but that should be within ODSL's capabilities. ODSL could then GPL the code and quiet the critics. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 19:07 ` Jon Smirl @ 2005-04-06 19:24 ` Matan Peled 2005-04-06 19:49 ` Jon Smirl 0 siblings, 1 reply; 201+ messages in thread From: Matan Peled @ 2005-04-06 19:24 UTC (permalink / raw) To: Jon Smirl; +Cc: Linus Torvalds, Larry McVoy, Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 402 bytes --] Jon Smirl wrote: > ODSL could then GPL the code and quiet the > critics. And also cause aaid GPL'ed code to be immediatly ported over to Windows. I don't think BitMover could ever agree to that. -- [Name ] :: [Matan I. Peled ] [Location ] :: [Israel ] [Public Key] :: [0xD6F42CA5 ] [Keyserver ] :: [keyserver.kjsl.com] encrypted/signed plain text preferred [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 19:24 ` Matan Peled @ 2005-04-06 19:49 ` Jon Smirl 2005-04-06 20:34 ` Hua Zhong 2005-04-07 1:31 ` Christoph Lameter 0 siblings, 2 replies; 201+ messages in thread From: Jon Smirl @ 2005-04-06 19:49 UTC (permalink / raw) To: chaosite; +Cc: Linus Torvalds, Larry McVoy, Kernel Mailing List On Apr 6, 2005 3:24 PM, Matan Peled <chaosite@gmail.com> wrote: > Jon Smirl wrote: > > ODSL could then GPL the code and quiet the > > critics. > > And also cause aaid GPL'ed code to be immediatly ported over to Windows. I don't > think BitMover could ever agree to that. Windows Bitkeeper licenses are not that expensive, wouldn't you rather keep your source in a licensed supported version? Who is going to do this backport, then support it and track new releases? Why do people pay for RHEL when they can get it for free? They want support and a guarantee that their data won't be lost. Even with a GPL'd Linux Bitkeeper I'll bet half of the existing Linux paying customers will continue to use a paid version. There is a large difference in the behavior of corporations with huge source bases and college students with no money. The corporations will pay to have someone responsible for ensuring that the product works. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 201+ messages in thread
* RE: Kernel SCM saga.. 2005-04-06 19:49 ` Jon Smirl @ 2005-04-06 20:34 ` Hua Zhong 2005-04-07 1:31 ` Christoph Lameter 1 sibling, 0 replies; 201+ messages in thread From: Hua Zhong @ 2005-04-06 20:34 UTC (permalink / raw) To: 'Jon Smirl', chaosite Cc: 'Linus Torvalds', 'Larry McVoy', 'Kernel Mailing List' > Even with a GPL'd Linux Bitkeeper I'll bet half of the existing Linux > paying customers will continue to use a paid version. By what? How much do you plan to put down to pay Larry in case you lose your bet? Hua ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 19:49 ` Jon Smirl 2005-04-06 20:34 ` Hua Zhong @ 2005-04-07 1:31 ` Christoph Lameter 1 sibling, 0 replies; 201+ messages in thread From: Christoph Lameter @ 2005-04-07 1:31 UTC (permalink / raw) To: Jon Smirl; +Cc: Kernel Mailing List On Wed, 6 Apr 2005, Jon Smirl wrote: > There is a large difference in the behavior of corporations with huge > source bases and college students with no money. The corporations will > pay to have someone responsible for ensuring that the product works. Or they will merge with some other entity on the whim of some executive and the corporation then decides to kill the product for good without releasing the source leaving you out in the cold. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (2 preceding siblings ...) 2005-04-06 19:07 ` Jon Smirl @ 2005-04-06 19:39 ` Paul P Komkoff Jr 2005-04-07 1:40 ` Martin Pool 2005-04-07 6:36 ` bert hubert 2005-04-06 23:22 ` Jon Masters ` (6 subsequent siblings) 10 siblings, 2 replies; 201+ messages in thread From: Paul P Komkoff Jr @ 2005-04-06 19:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List Replying to Linus Torvalds: > Ok, > as a number of people are already aware (and in some cases have been Actually, I'm very disappointed things gone such counter-productive way. All along the history, I was against Larry's opponents, but at the end, they are right. That's pity. To quote vin diesel' character Riddick, "there's no such word as friend", or something. Anyway, seems that folks in Canonical was aware about it, and here's the result of this awareness: http://bazaar-ng.org/ This need some testing though, along with really hard part - transfer all history, nonlinear ... I don't know how anyone can do this till 1 Jul 2005, sorry :( > PS. Don't bother telling me about subversion. If you must, start reading > up on "monotone". That seems to be the most viable alternative, but don't > pester the developers so much that they don't get any work done. They are > already aware of my problems ;) Monotone is good, but I don't really know limits of sqlite3 wrt kernel case. And again, what we need to do to retain history ... -- Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key This message represents the official view of the voices in my head ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 19:39 ` Paul P Komkoff Jr @ 2005-04-07 1:40 ` Martin Pool 2005-04-07 1:47 ` Jeff Garzik 2005-04-07 3:35 ` Daniel Phillips 2005-04-07 6:36 ` bert hubert 1 sibling, 2 replies; 201+ messages in thread From: Martin Pool @ 2005-04-07 1:40 UTC (permalink / raw) To: linux-kernel On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote: > http://bazaar-ng.org/ I'd like bazaar-ng to be considered too. It is not ready for adoption yet, but I am working (more than) full time on it and hope to have it be usable in a couple of months. bazaar-ng is trying to integrate a lot of the work done in other systems to make something that is simple to use but also fast and powerful enough to handle large projects. The operations that are already done are pretty fast: ~60s to import a kernel tree, ~10s to import a new revision from a patch. Please check it out and do pester me with any suggestions about whatever you think it needs to suit your work. -- Martin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 1:40 ` Martin Pool @ 2005-04-07 1:47 ` Jeff Garzik 2005-04-07 2:26 ` Martin Pool 2005-04-07 7:53 ` Zwane Mwaikambo 2005-04-07 3:35 ` Daniel Phillips 1 sibling, 2 replies; 201+ messages in thread From: Jeff Garzik @ 2005-04-07 1:47 UTC (permalink / raw) To: Martin Pool; +Cc: linux-kernel On Thu, Apr 07, 2005 at 11:40:23AM +1000, Martin Pool wrote: > On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote: > > > http://bazaar-ng.org/ > > I'd like bazaar-ng to be considered too. It is not ready for adoption > yet, but I am working (more than) full time on it and hope to have it > be usable in a couple of months. > > bazaar-ng is trying to integrate a lot of the work done in other systems > to make something that is simple to use but also fast and powerful enough > to handle large projects. > > The operations that are already done are pretty fast: ~60s to import a > kernel tree, ~10s to import a new revision from a patch. By "importing", are you saying that importing all 60,000+ changesets of the current kernel tree took only 60 seconds? Jeff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 1:47 ` Jeff Garzik @ 2005-04-07 2:26 ` Martin Pool 2005-04-07 2:32 ` David Lang 2005-04-07 7:53 ` Zwane Mwaikambo 1 sibling, 1 reply; 201+ messages in thread From: Martin Pool @ 2005-04-07 2:26 UTC (permalink / raw) To: linux-kernel On Wed, 06 Apr 2005 21:47:27 -0400, Jeff Garzik wrote: >> The operations that are already done are pretty fast: ~60s to import a >> kernel tree, ~10s to import a new revision from a patch. > > By "importing", are you saying that importing all 60,000+ changesets of > the current kernel tree took only 60 seconds? Now that would be impressive. No, I mean this: % bzcat ../linux.pkg/patch-2.5.14.bz2| patch -p1 % time bzr add -v . (find any new non-ignored files; deleted files automatically noticed) 6.06s user 0.41s system 89% cpu 7.248 total % time bzr commit -v -m 'import 2.5.14' 7.71s user 0.71s system 65% cpu 12.893 total (OK, a bit slower in this case but it wasn't all in core.) This is only v0.0.3, but I think the interface simplicity and speed compares well. I haven't tested importing all 60,000+ changesets of the current bk tree, partly because I don't *have* all those changesets. (Larry said previously that someone (not me) tried to pull all of them using bkclient, and he considered this abuse and blacklisted them.) I have been testing pulling in release and rc patches, and it scales to that level. It probably could not handle 60,000 changesets yet, but there is a plan to get there. In the interim, although it cannot handle the whole history forever it can handle large trees with moderate numbers of commits -- perhaps as many as you might deal with in developing a feature over a course of a few months. The most sensible place to try out bzr, if people want to, is as a way to keep your own revisions before mailing a patch to linus or the subsystem maintainer. -- Martin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 2:26 ` Martin Pool @ 2005-04-07 2:32 ` David Lang 2005-04-07 5:38 ` Martin Pool 2005-04-07 8:14 ` Magnus Damm 0 siblings, 2 replies; 201+ messages in thread From: David Lang @ 2005-04-07 2:32 UTC (permalink / raw) To: Martin Pool; +Cc: linux-kernel On Thu, 7 Apr 2005, Martin Pool wrote: > I haven't tested importing all 60,000+ changesets of the current bk tree, > partly because I don't *have* all those changesets. (Larry said > previously that someone (not me) tried to pull all of them using bkclient, > and he considered this abuse and blacklisted them.) pull the patches from the BK2CVS server. yes some patches are combined, but it will get you in the ballpark. David Lang > I have been testing pulling in release and rc patches, and it scales to > that level. It probably could not handle 60,000 changesets yet, but there > is a plan to get there. In the interim, although it cannot handle the > whole history forever it can handle large trees with moderate numbers of > commits -- perhaps as many as you might deal with in developing a feature > over a course of a few months. > > The most sensible place to try out bzr, if people want to, is as a way to > keep your own revisions before mailing a patch to linus or the subsystem > maintainer. > > -- > Martin > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 2:32 ` David Lang @ 2005-04-07 5:38 ` Martin Pool 2005-04-07 23:27 ` Linus Torvalds 2005-04-07 8:14 ` Magnus Damm 1 sibling, 1 reply; 201+ messages in thread From: Martin Pool @ 2005-04-07 5:38 UTC (permalink / raw) To: linux-kernel, David Lang [-- Attachment #1: Type: text/plain, Size: 1459 bytes --] On Wed, 2005-04-06 at 19:32 -0700, David Lang wrote: > On Thu, 7 Apr 2005, Martin Pool wrote: > > > I haven't tested importing all 60,000+ changesets of the current bk tree, > > partly because I don't *have* all those changesets. (Larry said > > previously that someone (not me) tried to pull all of them using bkclient, > > and he considered this abuse and blacklisted them.) > > pull the patches from the BK2CVS server. yes some patches are combined, > but it will get you in the ballpark. OK, I just tried that. I know there are scripts to resynthesize changesets from the CVS info but I skipped that for now and just pulled each day's work into a separate bzr revision. It's up to the end of March and still running. Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79 total. Each subsequent day takes about 10s user, 30s elapsed to commit into bzr. The speeds are comparable to CVS or a bit faster, and may be faster than other distributed systems. (This on a laptop with a 5400rpm disk.) Pulling out a complete copy of the tree as it was on a previous date takes about 14 user, 60s elapsed. I don't want to get too distracted by benchmarks now because there are more urgent things to do and anyhow there is still lots of scope for optimization. I wouldn't be at all surprised if those times could be more than halved. I just wanted to show it is in (I hope) the right ballpark. -- Martin [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 5:38 ` Martin Pool @ 2005-04-07 23:27 ` Linus Torvalds 2005-04-08 5:56 ` Martin Pool 0 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-07 23:27 UTC (permalink / raw) To: Martin Pool; +Cc: linux-kernel, David Lang On Thu, 7 Apr 2005, Martin Pool wrote: > > Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79 > total. Each subsequent day takes about 10s user, 30s elapsed to commit > into bzr. The speeds are comparable to CVS or a bit faster, and may be > faster than other distributed systems. (This on a laptop with a 5400rpm > disk.) Pulling out a complete copy of the tree as it was on a previous > date takes about 14 user, 60s elapsed. If you have an exportable tree, can you just make it pseudo-public, tell me where to get a buildable system that works well enough, point me to some documentation, and maybe I can get some feel for it? Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 23:27 ` Linus Torvalds @ 2005-04-08 5:56 ` Martin Pool 2005-04-08 6:41 ` Linus Torvalds 0 siblings, 1 reply; 201+ messages in thread From: Martin Pool @ 2005-04-08 5:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, David Lang [-- Attachment #1: Type: text/plain, Size: 2102 bytes --] On Thu, 2005-04-07 at 16:27 -0700, Linus Torvalds wrote: > > On Thu, 7 Apr 2005, Martin Pool wrote: > > > > Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79 > > total. Each subsequent day takes about 10s user, 30s elapsed to commit > > into bzr. The speeds are comparable to CVS or a bit faster, and may be > > faster than other distributed systems. (This on a laptop with a 5400rpm > > disk.) Pulling out a complete copy of the tree as it was on a previous > > date takes about 14 user, 60s elapsed. > > If you have an exportable tree, can you just make it pseudo-public, tell > me where to get a buildable system that works well enough, point me to > some documentation, and maybe I can get some feel for it? Hi, There is a "stable" release here: http://www.bazaar-ng.org/pkg/bzr-0.0.3.tgz All you should need to do is unpack that and symlink bzr onto your path. You can get the current bzr development tree, stored in itself, by rsync: rsync -av ozlabs.org::mbp/bzr/dev ~/bzr.dev Inside that directory you can run 'bzr info', 'bzr status --all', 'bzr unknowns', 'bzr log', 'bzr ignored'. Repeated rsyncs will bring you up to date with what I've done -- and will of course overwrite any local changes. If someone was going to development on this then the method would typically be to have two copies of the tree, one tracking my version and another for your own work -- much as with bk. In your own tree, you can do 'bzr add', 'bzr remove', 'bzr diff', 'bzr commit'. At the moment all you can do is diff against the previous revision, or manually diff the two trees, or use quilt, so it is just an archival system not a full SCM system. In the near future there will be some code to extract the differences as changesets to be mailed off. I have done a rough-as-guts import from bkcvs into this, and I can advertise that when it's on a server that can handle the load. At a glance this looks very similar to git -- I can go into the differences and why I did them the other way if you want. -- Martin [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 5:56 ` Martin Pool @ 2005-04-08 6:41 ` Linus Torvalds 2005-04-08 8:38 ` Andrea Arcangeli 2005-04-08 16:46 ` Kernel SCM saga Catalin Marinas 0 siblings, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 6:41 UTC (permalink / raw) To: Martin Pool; +Cc: linux-kernel, David Lang On Fri, 8 Apr 2005, Martin Pool wrote: > > You can get the current bzr development tree, stored in itself, by > rsync: I was thinking more of an exportable kernel tree in addition to the tool. The reason I mention that is just that I know several SCM's bog down under load horribly, so it actually matters what the size of the tree is. And I'm absolutely _not_ asking you for the 60,000 changesets that are in the BK tree, I'd be prfectly happy with a 2.6.12-rc2-based one for testing. I know I can import things myself, but the reason I ask is because I've got several SCM's I should check out _and_ I've been spending the last two days writing my own fallback system so that I don't get screwed if nothing out there works right now. Which is why I'd love to hear from people who have actually used various SCM's with the kernel. There's bound to be people who have already tried. I've gotten a lot of email of the kind "I love XYZ, you should try it out", but so far I've not seen anybody say "I've tracked the kernel with XYZ, and it does ..." So, this is definitely not a "Martin Pool should do this" kind of issue: I'd like many people to test out many alternatives, to get a feel for where they are especially for a project the size of the kernel.. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 6:41 ` Linus Torvalds @ 2005-04-08 8:38 ` Andrea Arcangeli 2005-04-08 23:38 ` Daniel Phillips 2005-04-09 0:12 ` Linus Torvalds 2005-04-08 16:46 ` Kernel SCM saga Catalin Marinas 1 sibling, 2 replies; 201+ messages in thread From: Andrea Arcangeli @ 2005-04-08 8:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Martin Pool, linux-kernel, David Lang On Thu, Apr 07, 2005 at 11:41:29PM -0700, Linus Torvalds wrote: > I know I can import things myself, but the reason I ask is because I've > got several SCM's I should check out _and_ I've been spending the last two > days writing my own fallback system so that I don't get screwed if nothing > out there works right now. I tend to like bzr too (and I tend to like too many things ;), but even if the export of the data would be available it seems still too early in development to be able to help you this week, it seems to miss any form of network export too. > I'd like many people to test out many alternatives, to get a feel for > where they are especially for a project the size of the kernel.. The huge number of changesets is the crucial point, there are good distributed SCM already but they are apparently not efficient enough at handling 60k changesets. We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to evaluate how well they scale. OTOH if your git project already allows storing the data in there, that looks nice ;). I don't yet fully understand how the algorithms of the trees are meant to work (I only understand well the backing store and I tend to prefer DBMS over tree of dirs with hashes). So I've no idea how it can plug in well for a SCM replacement or how you want to use it. It seems a kind of fully lockless thing where you can merge from one tree to the other without locks and where you can make quick diffs. It looks similar to a diff -ur of two hardlinked trees, except this one can save a lot of bandwidth to copy with rsync (i.e. hardlinks becomes worthless after using rsync in the network, but hashes not). Clearly the DBMS couldn't use the rsync binary to distribute the objects, but a network protocol could do the same thing rsync does. Thanks. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 8:38 ` Andrea Arcangeli @ 2005-04-08 23:38 ` Daniel Phillips 2005-04-09 2:54 ` Andrea Arcangeli 2005-04-09 0:12 ` Linus Torvalds 1 sibling, 1 reply; 201+ messages in thread From: Daniel Phillips @ 2005-04-08 23:38 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang On Friday 08 April 2005 04:38, Andrea Arcangeli wrote: > On Thu, Apr 07, 2005 at 11:41:29PM -0700, Linus Torvalds wrote: > The huge number of changesets is the crucial point, there are good > distributed SCM already but they are apparently not efficient enough at > handling 60k changesets. > > We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to > evaluate how well they scale. > > OTOH if your git project already allows storing the data in there, > that looks nice ;). Hi Andrea, For the immediate future, all we need is something than can _losslessly_ capture the new metadata that's being generated. That buys time to bring one of the promising open source candidates up to full speed. By the way, which one are you working on? :-) Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 23:38 ` Daniel Phillips @ 2005-04-09 2:54 ` Andrea Arcangeli 0 siblings, 0 replies; 201+ messages in thread From: Andrea Arcangeli @ 2005-04-09 2:54 UTC (permalink / raw) To: Daniel Phillips; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang On Fri, Apr 08, 2005 at 07:38:30PM -0400, Daniel Phillips wrote: > For the immediate future, all we need is something than can _losslessly_ > capture the new metadata that's being generated. That buys time to bring one > of the promising open source candidates up to full speed. Agreed. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 8:38 ` Andrea Arcangeli 2005-04-08 23:38 ` Daniel Phillips @ 2005-04-09 0:12 ` Linus Torvalds 2005-04-09 2:27 ` Andrea Arcangeli 2005-04-09 16:33 ` Roman Zippel 1 sibling, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-09 0:12 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Martin Pool, linux-kernel, David Lang On Fri, 8 Apr 2005, Andrea Arcangeli wrote: > > We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to > evaluate how well they scale. Yes, that makes most sense, I believe. Especially as BKCVS does the linearization that makes other SCM's _able_ to take the data in the first place. Few enough SCM's really understand the BK merge model, although the distributed ones obviously have to do something similar. > OTOH if your git project already allows storing the data in there, > that looks nice ;). I can express the data, and I did a sparse .git archive to prove the concept. It doesn't even try to save BK-specific details, but as far as I can tell, my git-conversion did capture all the basic things (ie not just the actual source tree, but hopefully all the "who did what" parts too). Of course, my git visualization tools are so horribly crappy that it is hard to make sure ;) Also, I suspect that BKCVS actually bothers to get more details out of a BK tree than I cared about. People have pestered Larry about it, so BKCVS exports a lot of the nitty-gritty (per-file comments etc) that just doesn't actually _matter_, but people whine about. Me, I don't care. My sparse-conversion just took the important parts. > I don't yet fully understand how the algorithms of the trees are meant > to work Well, things like actually merging two git trees is not even something git tries to do. It leaves that to somebody else - you can see what the relationship is, and you can see all the data, but as far as I'm concerned, git is really a "filesystem". It's a way of expression revisions, but it's not a way of creating them. > It looks similar to a diff -ur of two hardlinked trees Yes. You could really think of it that way. It's not really about hardlinking, but the fact that objects are named by their content does mean that two objects (regardless of their type) can be seen as "hardlinked" whenever their contents match. But the more interesting part is the hierarchical virtual format it has, ie it is not only hardlinked, but it also has the three different levels of "views" into those hardlinked objects ("blob", "tree", "revision"). So even though the hash tree looks flat in the _physcal_ filesystem, it detinitely isn't flat in its own virtual world. It's just flattened to fit in a normal filesystem ;) [ There's also a fourth level view in "trust", but that one hasn't been implemented yet since I think it might as well be done at a higher level. ] Btw, the sha1 file format isn't actually designed for "rsync", since rsync is really a hell of a lot more capable than my format needs. The format is really designed for something like a offline http grabber, in that you can just grab files purely by filename (and verify that you got them right by running sha1sum on the resulting local copy). So think "wget". Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 0:12 ` Linus Torvalds @ 2005-04-09 2:27 ` Andrea Arcangeli 2005-04-09 2:32 ` David Lang ` (3 more replies) 2005-04-09 16:33 ` Roman Zippel 1 sibling, 4 replies; 201+ messages in thread From: Andrea Arcangeli @ 2005-04-09 2:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Martin Pool, linux-kernel, David Lang On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote: > really designed for something like a offline http grabber, in that you can > just grab files purely by filename (and verify that you got them right by > running sha1sum on the resulting local copy). So think "wget". I'm not entirely convinced wget is going to be an efficient way to synchronize and fetch your tree, its simplicitly is great though. It's a tradeoff between optimzing and re-using existing tools (like webservers). Perhaps that's why you were compressing the stuff too? It sounds better not to compress the stuff on-disk, and to synchronize with a rsync-like protocol (rsync server would make it) that handles the compression in the network protocol itself, and in turn that can apply compression to a large blob (i.e. the diff between the trees), and not to the single tiny files. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 2:27 ` Andrea Arcangeli @ 2005-04-09 2:32 ` David Lang 2005-04-09 3:08 ` Brian Gerst ` (2 subsequent siblings) 3 siblings, 0 replies; 201+ messages in thread From: David Lang @ 2005-04-09 2:32 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel On Sat, 9 Apr 2005, Andrea Arcangeli wrote: > On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote: >> really designed for something like a offline http grabber, in that you can >> just grab files purely by filename (and verify that you got them right by >> running sha1sum on the resulting local copy). So think "wget". > > I'm not entirely convinced wget is going to be an efficient way to > synchronize and fetch your tree, its simplicitly is great though. It's a > tradeoff between optimzing and re-using existing tools (like webservers). > Perhaps that's why you were compressing the stuff too? It sounds better > not to compress the stuff on-disk, and to synchronize with a rsync-like > protocol (rsync server would make it) that handles the compression in > the network protocol itself, and in turn that can apply compression to a > large blob (i.e. the diff between the trees), and not to the single tiny > files. note that many webservers will compress the data for you on the fly as well, so there's even less need to have it pre-compressed David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 2:27 ` Andrea Arcangeli 2005-04-09 2:32 ` David Lang @ 2005-04-09 3:08 ` Brian Gerst 2005-04-09 3:15 ` Andrea Arcangeli 2005-04-09 5:45 ` Linus Torvalds 2005-04-10 17:55 ` Matthias Andree 3 siblings, 1 reply; 201+ messages in thread From: Brian Gerst @ 2005-04-09 3:08 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang Andrea Arcangeli wrote: > On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote: > >>really designed for something like a offline http grabber, in that you can >>just grab files purely by filename (and verify that you got them right by >>running sha1sum on the resulting local copy). So think "wget". > > > I'm not entirely convinced wget is going to be an efficient way to > synchronize and fetch your tree, its simplicitly is great though. It's a > tradeoff between optimzing and re-using existing tools (like webservers). > Perhaps that's why you were compressing the stuff too? It sounds better > not to compress the stuff on-disk, and to synchronize with a rsync-like > protocol (rsync server would make it) that handles the compression in > the network protocol itself, and in turn that can apply compression to a > large blob (i.e. the diff between the trees), and not to the single tiny > files. It's my understanding that the files don't change. Only new ones are created for each revision. -- Brian gErst ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 3:08 ` Brian Gerst @ 2005-04-09 3:15 ` Andrea Arcangeli 0 siblings, 0 replies; 201+ messages in thread From: Andrea Arcangeli @ 2005-04-09 3:15 UTC (permalink / raw) To: Brian Gerst; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang On Fri, Apr 08, 2005 at 11:08:58PM -0400, Brian Gerst wrote: > It's my understanding that the files don't change. Only new ones are > created for each revision. I said diff between the trees, not diff between files ;). When you fetch the new changes with rsync, it'll compress better and in turn it'll be faster (assuming we're network bound and I am with 1mbit and 2.5ghz cpu), if it's rsync applying gzip to the big "combined diff between trees" instead of us compressing every single small file on disk, that won't compress anymore inside rsync. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 2:27 ` Andrea Arcangeli 2005-04-09 2:32 ` David Lang 2005-04-09 3:08 ` Brian Gerst @ 2005-04-09 5:45 ` Linus Torvalds 2005-04-09 22:55 ` David S. Miller 2005-04-10 17:55 ` Matthias Andree 3 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-09 5:45 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Martin Pool, linux-kernel, David Lang On Sat, 9 Apr 2005, Andrea Arcangeli wrote: > > I'm not entirely convinced wget is going to be an efficient way to > synchronize and fetch your tree I don't think it's efficient per se, but I think it's important that people can just "pass the files along". Ie it's a huge benefit if any everyday mirror script (whether rsync, wget, homebrew or whatever) will just automatically do the right thing. > Perhaps that's why you were compressing the stuff too? It sounds better > not to compress the stuff on-disk I much prefer to waste some CPU time to save disk cache. Especially since the compression is "free" if you do it early on (ie it's done only once, since the files are stable). Also, if the difference is a 1.5GB kernel repository or a 3GB kernel repository, I know which one I'll pick ;) Also, I don't want people editing repostitory files by hand. Sure, the sha1 catches it, but still... I'd rather force the low-level ops to use the proper helper routines. Which is why it's a raw zlib compressed blob, not a gzipped file. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 5:45 ` Linus Torvalds @ 2005-04-09 22:55 ` David S. Miller 2005-04-09 23:13 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 201+ messages in thread From: David S. Miller @ 2005-04-09 22:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: andrea, mbp, linux-kernel, dlang On Fri, 8 Apr 2005 22:45:18 -0700 (PDT) Linus Torvalds <torvalds@osdl.org> wrote: > Also, I don't want people editing repostitory files by hand. Sure, the > sha1 catches it, but still... I'd rather force the low-level ops to use > the proper helper routines. Which is why it's a raw zlib compressed blob, > not a gzipped file. I understand the arguments for compression, but I hate it for one simple reason: recovery is more difficult when you corrupt some file in your repository. It's happened to me more than once and I did lose data. Without compression, I might be able to recover if something causes a block of zeros to be written to the middle of some repository file. With compression, you pretty much just lose. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 22:55 ` David S. Miller @ 2005-04-09 23:13 ` Linus Torvalds 2005-04-10 0:14 ` Chris Wedgwood 2005-04-10 0:22 ` Paul Jackson 2005-04-10 11:33 ` Ingo Molnar 2 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-09 23:13 UTC (permalink / raw) To: David S. Miller; +Cc: andrea, mbp, linux-kernel, dlang On Sat, 9 Apr 2005, David S. Miller wrote: > > I understand the arguments for compression, but I hate it for one > simple reason: recovery is more difficult when you corrupt some > file in your repository. Trust me, the way git does things, you'll have so much redundancy that you'll have to really _work_ at losing data. That's the good news. The bad news is that this is obviously why it does eat a lot of disk. Since it saves full-file commits, you're going to have a lot of (compressed) full files around. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 23:13 ` Linus Torvalds @ 2005-04-10 0:14 ` Chris Wedgwood 2005-04-10 1:56 ` Paul Jackson 0 siblings, 1 reply; 201+ messages in thread From: Chris Wedgwood @ 2005-04-10 0:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: David S. Miller, andrea, mbp, linux-kernel, dlang On Sat, Apr 09, 2005 at 04:13:51PM -0700, Linus Torvalds wrote: > > I understand the arguments for compression, but I hate it for one > > simple reason: recovery is more difficult when you corrupt some > > file in your repository. I've had this too. Magic binary blobs are horrible here for data loss which is why I'm not keen on subversion. > Trust me, the way git does things, you'll have so much redundancy > that you'll have to really _work_ at losing data. It's not clear to me that compression should be *required* though. Shouldn't we be able to turn this off in some cases? > The bad news is that this is obviously why it does eat a lot of > disk. Disk is cheap, but sadly page-cache is not :-( > Since it saves full-file commits, you're going to have a lot of > (compressed) full files around. How many is alot? Are we talking 100k, 1m, 10m? ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 0:14 ` Chris Wedgwood @ 2005-04-10 1:56 ` Paul Jackson 2005-04-10 12:03 ` Ingo Molnar 0 siblings, 1 reply; 201+ messages in thread From: Paul Jackson @ 2005-04-10 1:56 UTC (permalink / raw) To: Chris Wedgwood; +Cc: torvalds, davem, andrea, mbp, linux-kernel, dlang Chris wrote: > How many is alot? Are we talking 100k, 1m, 10m? I pulled some numbers out of my bk tree for Linux. I have 16817 source files. They average 12.2 bitkeeper changes per file (counting the number of changes visible from doing 'bk sccslog' on each of the 16817 files). These 16817 files consume: 224 MBytes uncompressed and 95 MBytes compressed (using zlib's minigzip, on a 4 KB page reiserfs.) Since each change will get its own copy of the file, multiplying these two sizes (224 and 95) by 12.2 changes per file means the disk cost would be: 2.73 GByte uncompressed, or 1.16 GBytes compressed. I was pleasantly surprised at the degree of compression, shrinking files to 42% of their original size. I expected, since the classic rule of thumb here to archive before compressing wasn't being followed (nor should it be) and we were compressing lots a little files, we would save fewer disk blocks than this. Of course, since as Linus reminds us, it's disk buffers in memory, not blocks on disk, that are precious, it's more like we will save 224 - 95 == 129 MBytes of RAM to hold one entire tree. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 1:56 ` Paul Jackson @ 2005-04-10 12:03 ` Ingo Molnar 2005-04-10 17:38 ` Paul Jackson 0 siblings, 1 reply; 201+ messages in thread From: Ingo Molnar @ 2005-04-10 12:03 UTC (permalink / raw) To: Paul Jackson Cc: Chris Wedgwood, torvalds, davem, andrea, mbp, linux-kernel, dlang * Paul Jackson <pj@engr.sgi.com> wrote: > These 16817 files consume: > > 224 MBytes uncompressed and > 95 MBytes compressed > > (using zlib's minigzip, on a 4 KB page reiserfs.) that's a 42.4% compressed size. Using a (much) more CPU-intense compression method (bzip -9), the compressed size is down to 45 MBytes. (a ratio of 20.2%) using default 'gzip' i get 57 MB compressed. > Since each change will get its own copy of the file, multiplying these > two sizes (224 and 95) by 12.2 changes per file means the disk cost > would be: > > 2.73 GByte uncompressed, or > 1.16 GBytes compressed. with bzip2 -9 it would be 551 MBytes. It might as well be practical on faster CPUs, a full tree (224 MBytes, 45 MBytes compressed) decompresses in 24 seconds on a 3.4GHz P4 - single CPU. (and with dual core likely becoming the standard, we might as well divide that by two) With default gzip it's 3.3 seconds though, and that still compresses it down to 57 MB. Ingo ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 12:03 ` Ingo Molnar @ 2005-04-10 17:38 ` Paul Jackson 2005-04-10 17:46 ` Ingo Molnar 0 siblings, 1 reply; 201+ messages in thread From: Paul Jackson @ 2005-04-10 17:38 UTC (permalink / raw) To: Ingo Molnar; +Cc: cw, torvalds, davem, andrea, mbp, linux-kernel, dlang Ingo wrote: > With default gzip it's 3.3 seconds though, > and that still compresses it down to 57 MB. Interesting. I'm surprised how much a bunch of separate, modest sized files can be compressed. I'm unclear what matters most here. Space on disk certainly isn't much of an issue. Even with Andrew Morton on our side, we still can't grow the kernel as fast as the disk drive manufacturers can grow disk sizes. Main memory size of the compressed history matters to Linus and his top 20 lieutenants doing full kernel source patching as a primary mission if they can't fit the source _history_ in main memory. But those people are running 1 GByte or more of RAM - so whether it is 95, 57 or 45 MBytes, it fits fine. The rest of us are mostly concerned with whether a kernel build fits in memory. Looking at an arch i386 kernel build tree I have at hand, I see the following disk usage: 102 MBytes - BitKeeper/* 287 MBytes - */SCCS/* (outside of already counted BitKeeper/*) 232 MBytes - checked out source files 94 MBytes - ELF and other build byproducts --- 715 MBytes - Total Converting from bk to git, I guess this becomes: 97 MBytes - git (zlib) 232 MBytes - checked out source files 94 MBytes - ELF and other build byproducts --- 423 MBytes - Total Size matters when its a two to one difference, but when we are down to a 10% to 15% difference in the Total, its presentation that matters. The above numbers tell me that this is not a pure size issue for local disk or memory usage. What does matter that I can see: 1) Linus explicitly stated he wanted "a raw zlib compressed blob, not a gzipped file", to encourage everyone to use the git tools to access this data. He did not "want people editing repostitory files by hand." I'm not sure what he gains here - it did annoy me for a couple hours before I decided fixing my supper was more important. 2) The time to compress will be noticed by users as a delay when checking in changes (I'm guessing zlib compresses relatively faster). 3) The time to copy compressed data over the internet will be noticed by users when upgrading kernel versions (gzip can compress smaller). 4) Decompress times are smaller so don't matter as much. 5) Zlib has a nice library, and is patent free. I don't know about gzip. 6) As you note, zlib has rsync-friendly, recovery-friendly Z_PARTIAL_FLUSH. I don't know about gzip. My guess is that Linus finds (2) and (3) to balance each other, and that (1) decides the point, in favor of zlib. Well, that or a simpler hypothesis, that he found the nice library (5) convenient, and (1) sealed the deal, with the other tradeoffs passing through his subconscious faster than he bothered to verbalize them. You (Ingo) seem in your second message to be encouraging further consideration of gzip, for its improved compression. How will that matter to us, day to day? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 17:38 ` Paul Jackson @ 2005-04-10 17:46 ` Ingo Molnar 2005-04-10 17:56 ` Paul Jackson 0 siblings, 1 reply; 201+ messages in thread From: Ingo Molnar @ 2005-04-10 17:46 UTC (permalink / raw) To: Paul Jackson; +Cc: cw, torvalds, davem, andrea, mbp, linux-kernel, dlang * Paul Jackson <pj@engr.sgi.com> wrote: > Ingo wrote: > > With default gzip it's 3.3 seconds though, > > and that still compresses it down to 57 MB. > > Interesting. I'm surprised how much a bunch of separate, modest sized > files can be compressed. sorry, what i measured was in essence the tarball. I.e. not the compression of every file separately. I should have been clear about that ... Ingo ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 17:46 ` Ingo Molnar @ 2005-04-10 17:56 ` Paul Jackson 0 siblings, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-10 17:56 UTC (permalink / raw) To: Ingo Molnar; +Cc: cw, torvalds, davem, andrea, mbp, linux-kernel, dlang Ingo wrote: > not the compression of every file separately. ok -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 22:55 ` David S. Miller 2005-04-09 23:13 ` Linus Torvalds @ 2005-04-10 0:22 ` Paul Jackson 2005-04-10 11:33 ` Ingo Molnar 2 siblings, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-10 0:22 UTC (permalink / raw) To: David S. Miller; +Cc: torvalds, andrea, mbp, linux-kernel, dlang David wrote: > recovery is more difficult when you corrupt some > file in your repository. Agreed. I too have recovered RCS and SCCS files by hand editing. Linus wrote: > I don't want people editing repostitory files by hand. Tyrant !;) >From Wikipedia: A tyrant is a usurper of rightful power. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 22:55 ` David S. Miller 2005-04-09 23:13 ` Linus Torvalds 2005-04-10 0:22 ` Paul Jackson @ 2005-04-10 11:33 ` Ingo Molnar 2 siblings, 0 replies; 201+ messages in thread From: Ingo Molnar @ 2005-04-10 11:33 UTC (permalink / raw) To: David S. Miller Cc: Linus Torvalds, andrea, mbp, linux-kernel, dlang, Paul Jackson * David S. Miller <davem@davemloft.net> wrote: > On Fri, 8 Apr 2005 22:45:18 -0700 (PDT) > Linus Torvalds <torvalds@osdl.org> wrote: > > > Also, I don't want people editing repostitory files by hand. Sure, the > > sha1 catches it, but still... I'd rather force the low-level ops to use > > the proper helper routines. Which is why it's a raw zlib compressed blob, > > not a gzipped file. > > I understand the arguments for compression, but I hate it for one > simple reason: recovery is more difficult when you corrupt some > file in your repository. > > It's happened to me more than once and I did lose data. > > Without compression, I might be able to recover if something > causes a block of zeros to be written to the middle of some > repository file. With compression, you pretty much just lose. that depends on how you compress. You are perfectly right that with default zlib compression, where you start the compression stream and stop it at the end of the file, recovery in case of damage is very hard for the portion that comes _after_ the damaged section. You'd have to reconstruct the compression state which is akin to breaking a key. But with zlib you can 'flush' the compression state every couple of blocks and basically get the same recovery properties, at some very minimal extra space cost (because when you flush out compression state you get some extra padding bytes). Flushing has another advantage as well: a small delta (even if it increases/decreases the file size!) in the middle of a larger file will still be compressed to the same output both before and after the change area (modulo flush block size), which rsync can pick up just fine. (IIRC that is one of the reasons why Debian, when compressing .deb's, does zlib-flushes every couple of blocks, so that rsync/apt-get can pick up partial .deb's as well.) the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do chunks of compression. The flushing cost ismax 12 bytes or so, so if it's done every 4K we maximize the cost to 0.2%. so flushing is both rsync-friendly and recovery-friendly. (recovery isnt as simple as with plaintext, as you have to find the next 'block' and the block length will be inevitably variable. But it should be pretty predictable, and tools might even exist.) Ingo ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 2:27 ` Andrea Arcangeli ` (2 preceding siblings ...) 2005-04-09 5:45 ` Linus Torvalds @ 2005-04-10 17:55 ` Matthias Andree 3 siblings, 0 replies; 201+ messages in thread From: Matthias Andree @ 2005-04-10 17:55 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang Andrea Arcangeli schrieb am 2005-04-09: > On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote: > > really designed for something like a offline http grabber, in that you can > > just grab files purely by filename (and verify that you got them right by > > running sha1sum on the resulting local copy). So think "wget". > > I'm not entirely convinced wget is going to be an efficient way to > synchronize and fetch your tree, its simplicitly is great though. It's a wget is probably a VERY UNWISE choice: <http://www.derkeiler.com/Mailing-Lists/securityfocus/bugtraq/2004-12/0106.html> -- Matthias Andree ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 0:12 ` Linus Torvalds 2005-04-09 2:27 ` Andrea Arcangeli @ 2005-04-09 16:33 ` Roman Zippel 2005-04-09 23:31 ` Tupshin Harper 2005-04-10 17:24 ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr 1 sibling, 2 replies; 201+ messages in thread From: Roman Zippel @ 2005-04-09 16:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrea Arcangeli, Martin Pool, linux-kernel, David Lang Hi, On Fri, 8 Apr 2005, Linus Torvalds wrote: > Also, I suspect that BKCVS actually bothers to get more details out of a > BK tree than I cared about. People have pestered Larry about it, so BKCVS > exports a lot of the nitty-gritty (per-file comments etc) that just > doesn't actually _matter_, but people whine about. Me, I don't care. My > sparse-conversion just took the important parts. As soon as you want to synchronize and merge two trees, you will know why this information does matter. (/me looks closer at the sparse-conversion...) It seems you exported the complete parent information and this is exactly the "nitty-gritty" I was "whining" about and which is not available via bkcvs or bkweb and it's the most crucial information to make the bk data useful outside of bk. Larry was previously very clear about this that he considers this proprietary bk meta data and anyone attempting to export this information is in violation with the free bk licence, so you indeed just took the important parts and this is/was explicitly verboten for normal bk users. bye, Roman ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:33 ` Roman Zippel @ 2005-04-09 23:31 ` Tupshin Harper 2005-04-10 17:24 ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr 1 sibling, 0 replies; 201+ messages in thread From: Tupshin Harper @ 2005-04-09 23:31 UTC (permalink / raw) To: linux-kernel Cc: Roman Zippel, Linus Torvalds, Andrea Arcangeli, Martin Pool, David Lang Roman Zippel wrote: >It seems you exported the complete parent information and this is exactly >the "nitty-gritty" I was "whining" about and which is not available via >bkcvs or bkweb and it's the most crucial information to make the bk data >useful outside of bk. Larry was previously very clear about this that he >considers this proprietary bk meta data and anyone attempting to export >this information is in violation with the free bk licence, so you indeed >just took the important parts and this is/was explicitly verboten for >normal bk users. > > Yes, this is exactly the information that would be necessary to create a general interop tool between bk and darcs|arch|monotone, and is the fundamental objection I and others have had to open source projects using BK. Is Bitmover willing to grant a special dispensation to allow a lossless conversion of the linux history to another format? -Tupshin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Code snippet to reconstruct ancestry graph from bk repo 2005-04-09 16:33 ` Roman Zippel 2005-04-09 23:31 ` Tupshin Harper @ 2005-04-10 17:24 ` Paul P Komkoff Jr 2005-04-10 18:19 ` Roman Zippel 1 sibling, 1 reply; 201+ messages in thread From: Paul P Komkoff Jr @ 2005-04-10 17:24 UTC (permalink / raw) To: Roman Zippel Cc: Linus Torvalds, Andrea Arcangeli, Martin Pool, linux-kernel, David Lang Replying to Roman Zippel: > the "nitty-gritty" I was "whining" about and which is not available via > bkcvs or bkweb and it's the most crucial information to make the bk data > useful outside of bk. Larry was previously very clear about this that he > considers this proprietary bk meta data and anyone attempting to export > this information is in violation with the free bk licence, so you indeed > just took the important parts and this is/was explicitly verboten for > normal bk users. (borrowed from Tommi Virtanen) Code snippet to reconstruct ancestry graph from bk repo: bk changes -end':I: $if(:PARENT:){:PARENT:$if(:MPARENT:){ :MPARENT:}} $unless(:PARENT:){-}' |tac format is: newrev parent1 [parent2] parent2 present if merge occurs. -- Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key This message represents the official view of the voices in my head ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Code snippet to reconstruct ancestry graph from bk repo 2005-04-10 17:24 ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr @ 2005-04-10 18:19 ` Roman Zippel 0 siblings, 0 replies; 201+ messages in thread From: Roman Zippel @ 2005-04-10 18:19 UTC (permalink / raw) To: Paul P Komkoff Jr Cc: Linus Torvalds, Andrea Arcangeli, Martin Pool, linux-kernel, David Lang Hi, On Sun, 10 Apr 2005, Paul P Komkoff Jr wrote: > (borrowed from Tommi Virtanen) > > Code snippet to reconstruct ancestry graph from bk repo: > bk changes -end':I: $if(:PARENT:){:PARENT:$if(:MPARENT:){ :MPARENT:}} $unless(:PARENT:){-}' |tac > > format is: > newrev parent1 [parent2] > parent2 present if merge occurs. I know that this is possible and Larry's response would have been something like this: http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/0248.html bye, Roman ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 6:41 ` Linus Torvalds 2005-04-08 8:38 ` Andrea Arcangeli @ 2005-04-08 16:46 ` Catalin Marinas 1 sibling, 0 replies; 201+ messages in thread From: Catalin Marinas @ 2005-04-08 16:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Martin Pool, linux-kernel, David Lang Linus Torvalds <torvalds@osdl.org> wrote: > Which is why I'd love to hear from people who have actually used various > SCM's with the kernel. There's bound to be people who have already > tried. I (successfully) tried GNU Arch with the Linux kernel. I mirrored all the BKCVS changesets since Linux 2.6.9 (5300+ changesets) using this script: http://wiki.gnuarch.org/BKCVS_20to_20Arch_20Script_20for_20Linux_20Kernel My repository size is 1.1GB but this is because the script I use creates a snapshot (i.e. a full tarball) of every main and -rc release. For each individual changeset, an arch repository has a patch-xxx directory with a compressed tarball containing the patch, a log file and a checksum file. GNU Arch may have some annoying things (file naming, long commands, harder to get started, imposed version naming) and I won't try to advocate them but, for me, it looked like the best (free) option available regarding both features and speed. Being changeset oriented also has some advantages from my point of view. Being distributed means that you can create a branch on your local repository from a tree stored on a (read-only) remote repository (hosted on an ftp/http server). I can't compare it with BK since I haven't used it. The way I use it: - a main repository tracking all the changes to the bk-head, linux--main--2.6 (for those that never read/heard about arch, a tree name has the form "name--branch--version") - my main branch from the mainline tree, linux-arm--main--2.6, that was integrating my patches and was periodically merging the latest changes in linux--main--2.6 - different linux-arm--platformX--2.6 or linux-arm--deviceX--2.6 trees that were eventually merged into the linux-arm--main--2.6 tree The main merge algorithm is called star-merge and does a three-way merge between the local tree, the remote one and the common ancestor of these. Cherry picking is also supported for those that like it (I found it very useful if, for example, I fix a general bug in a branch that should be integrated in the main tree but the branch is not yet ready for inclusion). All the standard commands like commit, diff, status etc. are supported by arch. A useful command is "missing" which shows what changes are present in a tree and not in the current one. It is handy to see a summary of the remote changes before doing a merge (and faster than a full diff). It also supports file/directory renaming. To speed things up, arch uses a revision library with a directory for every revision, the files being hard-linked between revisions to save space. You can also hard-link the working tree to the revision library (which speeds the tree diff operation) but you need to make sure that your editor renames the original file before saving a copy. Having snapshots might take space but they are useful for both fast getting a revision and creating a revision in the library. A diff command takes usually around 1 min (on a P4 at 2.5GHz with IDE drives) if the current revision is in the library. The tree diff is the main time consuming operation when committing small changes. If the revision is not in the library, it will try to create it by hard-linking with a previous one and applying the corresponding patches (later version I think can reverse-apply patches from newer revisions). The merge operation might take some time (minutes, even 10-20 minutes for 1000+ changesets) depending on the number of changesets and whether the revisions are already in the revision library. You can specify a three-way merge that places conflict markers in the file (like diff3 or cvs) or a two-way merge which is equivalent to applying a patch (if you prefer a two-way merge, the "replay" command is actually the fastest, it takes ~2 seconds to apply a small changeset and doesn't need go to the revision library). Once a merge operation completes, you would need to fix the conflicts and commit the changes. All the logs are preserved but the newly merged individual changes are seen as a single commit in the local tree. In the way I use it (with a linux--main--2.6 tree similar to bk-head) I think arch would get slow with time as changesets accumulate. The way its developers advise to be used is to work, for example, on a linux--main--2.6.12 tree for preparing this release and, once it is ready, seal it (commit --seal). Further commits need to have a --fix option and they should mainly be bug fixes. At this point you can branch the linux--main--2.6.13 and start working on it. This new tree can easily merge the bug fixes applied to the previous version. Arch developers also recommend to use a new repository every year, especially if there are many changesets. A problem I found, even if the library revisions are hard-linked, they still take a lot of space and should be cleaned periodically (a cron script that checks the last access to them is available). By default, arch also complains (with exit) about unknown files in the working tree. Its developer(s) believe that the compilation should be done in a different directory. I didn't find this a problem since I use the same tree to compile for several platforms. Anyway, it can be configured to ignore them, based on regexp. I also tried monotone and darcs (since these two, unlike svn, can do proper merging and preserve the merge history) but arch was by far the fastest (CVS/RCS are hard to be bitten on speed). Unfortunately, I can't make my repository public because of IT desk issues but let me know if you'd like me to benchmark different operations (or if you'd like a simple list of commands to create your own). Hope you find this useful. -- Catalin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 2:32 ` David Lang 2005-04-07 5:38 ` Martin Pool @ 2005-04-07 8:14 ` Magnus Damm 1 sibling, 0 replies; 201+ messages in thread From: Magnus Damm @ 2005-04-07 8:14 UTC (permalink / raw) To: David Lang; +Cc: Martin Pool, linux-kernel On Apr 7, 2005 4:32 AM, David Lang <dlang@digitalinsight.com> wrote: > On Thu, 7 Apr 2005, Martin Pool wrote: > > > I haven't tested importing all 60,000+ changesets of the current bk tree, > > partly because I don't *have* all those changesets. (Larry said > > previously that someone (not me) tried to pull all of them using bkclient, > > and he considered this abuse and blacklisted them.) > > pull the patches from the BK2CVS server. yes some patches are combined, > but it will get you in the ballpark. While at it, is there any ongoing effort to convert/export the kernel BK repository to some well known format like broken out patches and a series file? I think keeping the complete repository public in a well known format is important regardless of SCM taste. / magnus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 1:47 ` Jeff Garzik 2005-04-07 2:26 ` Martin Pool @ 2005-04-07 7:53 ` Zwane Mwaikambo 1 sibling, 0 replies; 201+ messages in thread From: Zwane Mwaikambo @ 2005-04-07 7:53 UTC (permalink / raw) To: Jeff Garzik; +Cc: Martin Pool, linux-kernel On Wed, 6 Apr 2005, Jeff Garzik wrote: > On Thu, Apr 07, 2005 at 11:40:23AM +1000, Martin Pool wrote: > > On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote: > > > > > http://bazaar-ng.org/ > > > > I'd like bazaar-ng to be considered too. It is not ready for adoption > > yet, but I am working (more than) full time on it and hope to have it > > be usable in a couple of months. > > > > bazaar-ng is trying to integrate a lot of the work done in other systems > > to make something that is simple to use but also fast and powerful enough > > to handle large projects. > > > > The operations that are already done are pretty fast: ~60s to import a > > kernel tree, ~10s to import a new revision from a patch. > > By "importing", are you saying that importing all 60,000+ changesets of > the current kernel tree took only 60 seconds? Probably `cvs import` equivalent. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 1:40 ` Martin Pool 2005-04-07 1:47 ` Jeff Garzik @ 2005-04-07 3:35 ` Daniel Phillips 2005-04-07 15:08 ` Daniel Phillips 1 sibling, 1 reply; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 3:35 UTC (permalink / raw) To: Martin Pool; +Cc: linux-kernel On Wednesday 06 April 2005 21:40, Martin Pool wrote: > On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote: > > http://bazaar-ng.org/ > > I'd like bazaar-ng to be considered too. It is not ready for adoption > yet, but I am working (more than) full time on it and hope to have it > be usable in a couple of months. > > bazaar-ng is trying to integrate a lot of the work done in other systems > to make something that is simple to use but also fast and powerful enough > to handle large projects. > > The operations that are already done are pretty fast: ~60s to import a > kernel tree, ~10s to import a new revision from a patch. Hi Martin, When I tried it, it took 13 seconds to 'bzr add' the 2.6.11.3 tree on a relatively slow machine. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 3:35 ` Daniel Phillips @ 2005-04-07 15:08 ` Daniel Phillips 0 siblings, 0 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 15:08 UTC (permalink / raw) To: Martin Pool; +Cc: linux-kernel On Wednesday 06 April 2005 23:35, Daniel Phillips wrote: > When I tried it, it took 13 seconds to 'bzr add' the 2.6.11.3 tree on a > relatively slow machine. Oh, and 135 seconds to commit, so 148 seconds overall. Versus 87 seconds to to bunzip the tree in the first place. So far, you are in the ballpark. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 19:39 ` Paul P Komkoff Jr 2005-04-07 1:40 ` Martin Pool @ 2005-04-07 6:36 ` bert hubert 1 sibling, 0 replies; 201+ messages in thread From: bert hubert @ 2005-04-07 6:36 UTC (permalink / raw) To: Kernel Mailing List On Wed, Apr 06, 2005 at 11:39:11PM +0400, Paul P Komkoff Jr wrote: > Monotone is good, but I don't really know limits of sqlite3 wrt kernel > case. And again, what we need to do to retain history ... I would't fret over that :-) the big issue I have with sqlite3 is that it interacts horribly with ext3, resulting in dysmal journalled write performance compared to ext2. I do not know if this is a sqlite3 or an ext3 problem though. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (3 preceding siblings ...) 2005-04-06 19:39 ` Paul P Komkoff Jr @ 2005-04-06 23:22 ` Jon Masters 2005-04-07 6:51 ` Paul Mackerras ` (5 subsequent siblings) 10 siblings, 0 replies; 201+ messages in thread From: Jon Masters @ 2005-04-06 23:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Apr 6, 2005 4:42 PM, Linus Torvalds <torvalds@osdl.org> wrote: > as a number of people are already aware (and in some > cases have been aware over the last several weeks), we've > been trying to work out a conflict over BK usage over the last > month or two (and it feels like longer ;). That hasn't been > working out, and as a result, the kernel team is looking at > alternatives. What about the 64K changeset limitation in current releases? Did I miss something (like the fixes promised) or is there going to be another interim release before the end of support? Jon. P.S. Apologies if this already got addressed. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (4 preceding siblings ...) 2005-04-06 23:22 ` Jon Masters @ 2005-04-07 6:51 ` Paul Mackerras 2005-04-07 7:48 ` Arjan van de Ven 2005-04-07 15:10 ` Linus Torvalds 2005-04-07 7:18 ` David Woodhouse ` (4 subsequent siblings) 10 siblings, 2 replies; 201+ messages in thread From: Paul Mackerras @ 2005-04-07 6:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List Linus, > That "individual patches" is one of the keywords, btw. One thing that BK > has been extremely good at, and that a lot of people have come to like > even when they didn't use BK, is how we've been maintaining a much finer- > granularity view of changes. That isn't going to go away. Are you happy with processing patches + descriptions, one per mail? Do you have it automated to the point where processing emailed patches involves little more overhead than doing a bk pull? If so, then your mailbox (or patch queue) becomes a natural serialization point for the changes, and the need for a tool that can handle a complex graph of changes is much reduced. > In fact, one impact BK ha shad is to very fundamentally make us (and me in > particular) change how we do things. >From my point of view, the benefits that flowed from your using BK were: * Visibility into what you had accepted and committed to your repository * Lower latency of patches going into your repository * Much reduced rate of patches being dropped Those things are what have enabled us PPC developers to move away from having our own trees (with all the synchronization problems that entailed) and work directly with your tree. I don't see that it is the distinctive features of BK (such as the ability to do merges between peer repositories) that are directly responsible for producing those benefits, so I have hope that things can work just as well with some other system. Paul. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 6:51 ` Paul Mackerras @ 2005-04-07 7:48 ` Arjan van de Ven 2005-04-07 15:10 ` Linus Torvalds 1 sibling, 0 replies; 201+ messages in thread From: Arjan van de Ven @ 2005-04-07 7:48 UTC (permalink / raw) To: Paul Mackerras; +Cc: Linus Torvalds, Kernel Mailing List On Thu, 2005-04-07 at 16:51 +1000, Paul Mackerras wrote: > Linus, > > > That "individual patches" is one of the keywords, btw. One thing that BK > > has been extremely good at, and that a lot of people have come to like > > even when they didn't use BK, is how we've been maintaining a much finer- > > granularity view of changes. That isn't going to go away. > > Are you happy with processing patches + descriptions, one per mail? > Do you have it automated to the point where processing emailed patches > involves little more overhead than doing a bk pull? If so, then your > mailbox (or patch queue) becomes a natural serialization point for the > changes, and the need for a tool that can handle a complex graph of > changes is much reduced. alternatively you could send an mbox with your series in... that has a natural sequence in it ;) ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 6:51 ` Paul Mackerras 2005-04-07 7:48 ` Arjan van de Ven @ 2005-04-07 15:10 ` Linus Torvalds 2005-04-07 17:00 ` Daniel Phillips 2005-04-07 23:21 ` Dave Airlie 1 sibling, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-07 15:10 UTC (permalink / raw) To: Paul Mackerras; +Cc: Kernel Mailing List On Thu, 7 Apr 2005, Paul Mackerras wrote: > > Are you happy with processing patches + descriptions, one per mail? Yes. That's going to be my interim, I was just hoping that with 2.6.12-rc2 out the door, and us in a "calming down" period, I could afford to not even do that for a while. The real problem with the email thing is that it ends up piling up: what BK did in this respect was that anythign that piled up in a BK repository ended up still being there, and a single "bk pull" got it anyway - so if somebody got ignored because I was busy with something else, it didn't add any overhead. The queue didn't get "congested". And that's a big thing. It comes from the "Linus pulls" model where people just told me that they were ready, instead of the "everybody pushes to Linus" model, where the destination gets congested at times. So I do not want the "send Linus email patches" (whether mboxes or a single patch per email) to be a very long-term strategy. We can handle it for a while (in particular, I'm counting on it working up to the real release of 2.6.12, since we _should_ be in the calm period for the next month anyway), but it doesn't work in the long run. > Do you have it automated to the point where processing emailed patches > involves little more overhead than doing a bk pull? It's more overhead, but not a lot. Especially nice numbered sequences like Andrew sends (where I don't have to manually try to get the dependencies right by trying to figure them out and hope I'm right, but instead just sort by Subject: line) is not a lot of overhead. I can process a hundred emails almost as easily as one, as long as I trust the maintainer (which, when it's used as a BK replacement, I obviously do). However, the SCM's I've looked at make this hard. One of the things (the main thing, in fact) I've been working at is to make that process really _efficient_. If it takes half a minute to apply a patch and remember the changeset boundary etc (and quite frankly, that's _fast_ for most SCM's around for a project the size of Linux), then a series of 250 emails (which is not unheard of at all when I sync with Andrew, for example) takes two hours. If one of the patches in the middle doesn't apply, things are bad bad bad. Now, BK wasn't a speed deamon either (actually, compared to everything else, BK _is_ a speed deamon, often by one or two orders of magnitude), and took about 10-15 seconds per email when I merged with Andrew. HOWEVER, with BK that wasn't as big of an issue, since the BK<->BK merges were so easy, so I never had the slow email merges with any of the other main developers. So a patch-application-based SCM "merger" actually would need to be _faster_ than BK is. Which is really really really hard. So I'm writing some scripts to try to track things a whole lot faster. Initial indications are that I should be able to do it almost as quickly as I can just apply the patch, but quite frankly, I'm at most half done, and if I hit a snag maybe that's not true at all. Anyway, the reason I can do it quickly is that my scripts will _not_ be an SCM, they'll be a very specific "log Linus' state" kind of thing. That will make the linear patch merge a lot more time-efficient, and thus possible. (If a patch apply takes three seconds, even a big series of patches is not a problem: if I get notified within a minute or two that it failed half-way, that's fine, I can then just fix it up manually. That's why latency is critical - if I'd have to do things effectively "offline", I'd by definition not be able to fix it up when problems happen). > If so, then your mailbox (or patch queue) becomes a natural > serialization point for the changes, and the need for a tool that can > handle a complex graph of changes is much reduced. Yes. In the short term. See above why I think the congestion issue will really mean that we want to have parallell merging in the not _too_ distant future. NOTE! I detest the centralized SCM model, but if push comes to shove, and we just _can't_ get a reasonable parallell merge thing going in the short timeframe (ie month or two), I'll use something like SVN on a trusted site with just a few committers, and at least try to distribute the merging out over a few people rather than making _me_ be the throttle. The reason I don't really want to do that is once we start doing it that way, I suspect we'll have a _really_ hard time stopping. I think it's a broken model. So I'd much rather try to have some pain in the short run and get a better model running, but I just wanted to let people know that I'm pragmatic enough that I realize that we may not have much choice. > * Visibility into what you had accepted and committed to your > repository > * Lower latency of patches going into your repository > * Much reduced rate of patches being dropped Yes. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 15:10 ` Linus Torvalds @ 2005-04-07 17:00 ` Daniel Phillips 2005-04-07 17:38 ` Linus Torvalds 2005-04-07 19:56 ` Sam Ravnborg 2005-04-07 23:21 ` Dave Airlie 1 sibling, 2 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 17:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Mackerras, Kernel Mailing List On Thursday 07 April 2005 11:10, Linus Torvalds wrote: > On Thu, 7 Apr 2005, Paul Mackerras wrote: > > Do you have it automated to the point where processing emailed patches > > involves little more overhead than doing a bk pull? > > It's more overhead, but not a lot. Especially nice numbered sequences like > Andrew sends (where I don't have to manually try to get the dependencies > right by trying to figure them out and hope I'm right, but instead just > sort by Subject: line)... Hi Linus, In that case, a nice refinement is to put the sequence number at the end of the subject line so patch sequences don't interleave: Subject: [PATCH] Unbork OOM Killer (1 of 3) Subject: [PATCH] Unbork OOM Killer (2 of 3) Subject: [PATCH] Unbork OOM Killer (3 of 3) Subject: [PATCH] Unbork OOM Killer (v2, 1 of 3) Subject: [PATCH] Unbork OOM Killer (v2, 2 of 3) ... Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:00 ` Daniel Phillips @ 2005-04-07 17:38 ` Linus Torvalds 2005-04-07 17:47 ` Chris Wedgwood ` (3 more replies) 2005-04-07 19:56 ` Sam Ravnborg 1 sibling, 4 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-07 17:38 UTC (permalink / raw) To: Daniel Phillips; +Cc: Paul Mackerras, Kernel Mailing List On Thu, 7 Apr 2005, Daniel Phillips wrote: > > In that case, a nice refinement is to put the sequence number at the end of > the subject line so patch sequences don't interleave: No. That makes it unsortable, and also much harder to pick put which part of the subject line is the explanation, and which part is just metadata for me. So my prefernce is _overwhelmingly_ for the format that Andrew uses (which is partly explained by the fact that I am used to it, but also by the fact that I've asked for Andrew to make trivial changes to match my usage). That canonical format is: Subject: [PATCH 001/123] [<area>:] <explanation> together with the first line of the body being a From: Original Author <origa@email.com> followed by an empty line and then the body of the explanation. After the body of the explanation comes the "Signed-off-by:" lines, and then a simple "---" line, and below that comes the diffstat of the patch and then the patch itself. That's the "canonical email format", and it's that because my normal scripts (in BK/tools, but now I'm working on making them more generic) take input that way. It's very easy to sort the emails alphabetically by subject line - pretty much any email reader will support that - since because the sequence number is zero-padded, the numerical and alphabetic sort is the same. If you send several sequences, you either send a simple explaining email before the second sequence (hey, it's not like I'm a machine - I can use my brains too, and in particular if the final number of patches in each sequence is different, even if the sequences got re-ordered and are overlapping, I can still just extract one from the other by selecting for "/123] " in the subject line), or you modify the Subject: line subtly to still sort uniquely and alphabetically in-order, ie the subject lines for the second series might be Subject: [PATCHv2 001/207] x86: fix eflags tracking ... All very unambiguous, and my scripts already remove everything inside the brackets and will just replace it with "[PATCH]" in the final version. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:38 ` Linus Torvalds @ 2005-04-07 17:47 ` Chris Wedgwood 2005-04-07 18:06 ` Magnus Damm ` (2 subsequent siblings) 3 siblings, 0 replies; 201+ messages in thread From: Chris Wedgwood @ 2005-04-07 17:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Phillips, Paul Mackerras, Kernel Mailing List On Thu, Apr 07, 2005 at 10:38:06AM -0700, Linus Torvalds wrote: > So my prefernce is _overwhelmingly_ for the format that Andrew uses > (which is partly explained by the fact that I am used to it, but > also by the fact that I've asked for Andrew to make trivial changes > to match my usage). > > That canonical format is: > > Subject: [PATCH 001/123] [<area>:] <explanation> > > together with the first line of the body being a > > From: Original Author <origa@email.com> > > followed by an empty line and then the body of the explanation. Having a script to check people get this right before sending it via email would be a nice thing to put into scripts/ or probably Documentation/ perhaps? Does such a thing already exist? ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:38 ` Linus Torvalds 2005-04-07 17:47 ` Chris Wedgwood @ 2005-04-07 18:06 ` Magnus Damm 2005-04-07 18:36 ` Daniel Phillips 2005-04-08 3:35 ` Jeff Garzik 3 siblings, 0 replies; 201+ messages in thread From: Magnus Damm @ 2005-04-07 18:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Phillips, Paul Mackerras, Kernel Mailing List On Apr 7, 2005 7:38 PM, Linus Torvalds <torvalds@osdl.org> wrote: > So my prefernce is _overwhelmingly_ for the format that Andrew uses (which > is partly explained by the fact that I am used to it, but also by the fact > that I've asked for Andrew to make trivial changes to match my usage). > > That canonical format is: > > Subject: [PATCH 001/123] [<area>:] <explanation> > > together with the first line of the body being a > > From: Original Author <origa@email.com> > > followed by an empty line and then the body of the explanation. > > After the body of the explanation comes the "Signed-off-by:" lines, and > then a simple "---" line, and below that comes the diffstat of the patch > and then the patch itself. While specifying things, wouldn't it be useful to have a line containing tags that specifies if the patch contains new features, a bug fix or a high-priority security fix? Then that information could be used to find patches for the sucker-tree. / magnus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:38 ` Linus Torvalds 2005-04-07 17:47 ` Chris Wedgwood 2005-04-07 18:06 ` Magnus Damm @ 2005-04-07 18:36 ` Daniel Phillips 2005-04-08 3:35 ` Jeff Garzik 3 siblings, 0 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 18:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Mackerras, Kernel Mailing List On Thursday 07 April 2005 13:38, Linus Torvalds wrote: > On Thu, 7 Apr 2005, Daniel Phillips wrote: > > In that case, a nice refinement is to put the sequence number at the end > > of the subject line so patch sequences don't interleave: > > No. That makes it unsortable, and also much harder to pick put which part > of the subject line is the explanation, and which part is just metadata > for me. Well, my list in the parent post _was_ sorted by subject. But that is a quibble, the important point is that you just officially defined the canonical format, which everybody should stick to for now: > That canonical format is: > > Subject: [PATCH 001/123] [<area>:] <explanation> > > together with the first line of the body being a > > From: Original Author <origa@email.com> > > followed by an empty line and then the body of the explanation. > > After the body of the explanation comes the "Signed-off-by:" lines, and > then a simple "---" line, and below that comes the diffstat of the patch > and then the patch itself. > > That's the "canonical email format", and it's that because my normal > scripts (in BK/tools, but now I'm working on making them more generic) > take input that way. It's very easy to sort the emails alphabetically by > subject line - pretty much any email reader will support that - since > because the sequence number is zero-padded, the numerical and alphabetic > sort is the same. > > If you send several sequences, you either send a simple explaining email > before the second sequence (hey, it's not like I'm a machine - I can use > my brains too, and in particular if the final number of patches in each > sequence is different, even if the sequences got re-ordered and are > overlapping, I can still just extract one from the other by selecting for > "/123] " in the subject line), or you modify the Subject: line subtly to > still sort uniquely and alphabetically in-order, ie the subject lines for > the second series might be > > Subject: [PATCHv2 001/207] x86: fix eflags tracking > ... > > All very unambiguous, and my scripts already remove everything inside the > brackets and will just replace it with "[PATCH]" in the final version. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:38 ` Linus Torvalds ` (2 preceding siblings ...) 2005-04-07 18:36 ` Daniel Phillips @ 2005-04-08 3:35 ` Jeff Garzik 3 siblings, 0 replies; 201+ messages in thread From: Jeff Garzik @ 2005-04-08 3:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Phillips, Paul Mackerras, Kernel Mailing List Linus Torvalds wrote: > > On Thu, 7 Apr 2005, Daniel Phillips wrote: > >>In that case, a nice refinement is to put the sequence number at the end of >>the subject line so patch sequences don't interleave: > > > No. That makes it unsortable, and also much harder to pick put which part > of the subject line is the explanation, and which part is just metadata > for me. > > So my prefernce is _overwhelmingly_ for the format that Andrew uses (which > is partly explained by the fact that I am used to it, but also by the fact > that I've asked for Andrew to make trivial changes to match my usage). > > That canonical format is: > > Subject: [PATCH 001/123] [<area>:] <explanation> > > together with the first line of the body being a > > From: Original Author <origa@email.com> Nod. For future reference, people can refer to http://linux.yyz.us/patch-format.html and/or http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt Jeff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:00 ` Daniel Phillips 2005-04-07 17:38 ` Linus Torvalds @ 2005-04-07 19:56 ` Sam Ravnborg 1 sibling, 0 replies; 201+ messages in thread From: Sam Ravnborg @ 2005-04-07 19:56 UTC (permalink / raw) To: Daniel Phillips; +Cc: Linus Torvalds, Paul Mackerras, Kernel Mailing List On Thu, Apr 07, 2005 at 01:00:51PM -0400, Daniel Phillips wrote: > On Thursday 07 April 2005 11:10, Linus Torvalds wrote: > > On Thu, 7 Apr 2005, Paul Mackerras wrote: > > > Do you have it automated to the point where processing emailed patches > > > involves little more overhead than doing a bk pull? > > > > It's more overhead, but not a lot. Especially nice numbered sequences like > > Andrew sends (where I don't have to manually try to get the dependencies > > right by trying to figure them out and hope I'm right, but instead just > > sort by Subject: line)... > > Hi Linus, > > In that case, a nice refinement is to put the sequence number at the end of > the subject line so patch sequences don't interleave: > > Subject: [PATCH] Unbork OOM Killer (1 of 3) > Subject: [PATCH] Unbork OOM Killer (2 of 3) > Subject: [PATCH] Unbork OOM Killer (3 of 3) > Subject: [PATCH] Unbork OOM Killer (v2, 1 of 3) > Subject: [PATCH] Unbork OOM Killer (v2, 2 of 3) This breaks the rule of a descriptive subject for each patch. Consider 30 subjetcs telling you "Subject: PCI updates [001/030] That is not good. Sam ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 15:10 ` Linus Torvalds 2005-04-07 17:00 ` Daniel Phillips @ 2005-04-07 23:21 ` Dave Airlie 1 sibling, 0 replies; 201+ messages in thread From: Dave Airlie @ 2005-04-07 23:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Mackerras, Kernel Mailing List > > Are you happy with processing patches + descriptions, one per mail? > > Yes. That's going to be my interim, I was just hoping that with 2.6.12-rc2 > out the door, and us in a "calming down" period, I could afford to not > even do that for a while. > > The real problem with the email thing is that it ends up piling up: what > BK did in this respect was that anythign that piled up in a BK repository > ended up still being there, and a single "bk pull" got it anyway - so if > somebody got ignored because I was busy with something else, it didn't add > any overhead. The queue didn't get "congested". > > And that's a big thing. It comes from the "Linus pulls" model where people > just told me that they were ready, instead of the "everybody pushes to > Linus" model, where the destination gets congested at times. Something I think we'll miss is bkbits.net in the long run, being able to just push all patches for Linus to a tree and then forget about that tree until Linus pulled from it was invaluable.. the fact that this tree was online the whole time and you didn't queue up huge mails for Linus's INBOX to be missed, meant a lot to me compared to pre-bk workings.. Maybe now that kernel.org has been 'pimped out' we could set some sort of system up where maintainers can drop a big load of patchsets or even one big patch into some sort of public area and say this is my diffs for Linus for his next pull and let Linus pull it at his lesuire... some kinda rsync'y type thing comes to mind ... so I can mail Linus and say hey Linus please grab rsync://pimpedout.kernel.org/airlied/drm-linus and you grab everything in there and I get notified perhaps or just a log like the bkbits stats page, and Andrew can grab the patchsets the same as he does for bk-drm now ... and I can have airlied/drm-2.6 where I can queue stuff for -mm then just re-generate the patches for drm-linus later on.. Dave. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (5 preceding siblings ...) 2005-04-07 6:51 ` Paul Mackerras @ 2005-04-07 7:18 ` David Woodhouse 2005-04-07 8:50 ` Andrew Morton ` (2 more replies) 2005-04-07 7:44 ` Jan Hudec ` (3 subsequent siblings) 10 siblings, 3 replies; 201+ messages in thread From: David Woodhouse @ 2005-04-07 7:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote: > PS. Don't bother telling me about subversion. If you must, start reading > up on "monotone". That seems to be the most viable alternative, but don't > pester the developers so much that they don't get any work done. They are > already aware of my problems ;) One feature I'd want to see in a replacement version control system is the ability to _re-order_ patches, and to cherry-pick patches from my tree to be sent onwards. The lack of that capability is the main reason I always hated BitKeeper. -- dwmw2 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 7:18 ` David Woodhouse @ 2005-04-07 8:50 ` Andrew Morton 2005-04-07 9:20 ` Paul Mackerras ` (2 more replies) 2005-04-07 9:24 ` Sergei Organov 2005-04-07 15:32 ` Linus Torvalds 2 siblings, 3 replies; 201+ messages in thread From: Andrew Morton @ 2005-04-07 8:50 UTC (permalink / raw) To: David Woodhouse; +Cc: torvalds, linux-kernel David Woodhouse <dwmw2@infradead.org> wrote: > > One feature I'd want to see in a replacement version control system is > the ability to _re-order_ patches, and to cherry-pick patches from my > tree to be sent onwards. You just described quilt & patch-scripts. The problem with those is letting other people get access to it. I guess that could be fixed with a bit of scripting and rsyncing. (I don't do that for -mm because -mm basically doesn't work for 99% of the time. Takes 4-5 hours to out a release out assuming that nothing's busted, and usually something is). ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 8:50 ` Andrew Morton @ 2005-04-07 9:20 ` Paul Mackerras 2005-04-07 9:46 ` Andrew Morton 2005-04-07 10:41 ` Geert Uytterhoeven 2005-04-07 9:25 ` David Woodhouse 2005-04-07 9:40 ` David Vrabel 2 siblings, 2 replies; 201+ messages in thread From: Paul Mackerras @ 2005-04-07 9:20 UTC (permalink / raw) To: Andrew Morton; +Cc: David Woodhouse, torvalds, linux-kernel Andrew Morton writes: > The problem with those is letting other people get access to it. I guess > that could be fixed with a bit of scripting and rsyncing. Yes. > (I don't do that for -mm because -mm basically doesn't work for 99% of the > time. Takes 4-5 hours to out a release out assuming that nothing's busted, > and usually something is). With -mm we get those nice little automatic emails saying you've put the patch into -mm, which removes one of the main reasons for wanting to be able to get an up-to-date image of your tree. The other reason, of course, is to be able to see if a patch I'm about to send conflicts with something you have already taken, and rebase it if necessary. Paul. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:20 ` Paul Mackerras @ 2005-04-07 9:46 ` Andrew Morton 2005-04-07 11:17 ` Paul Mackerras 2005-04-07 10:41 ` Geert Uytterhoeven 1 sibling, 1 reply; 201+ messages in thread From: Andrew Morton @ 2005-04-07 9:46 UTC (permalink / raw) To: Paul Mackerras; +Cc: dwmw2, torvalds, linux-kernel Paul Mackerras <paulus@samba.org> wrote: > > With -mm we get those nice little automatic emails saying you've put > the patch into -mm, which removes one of the main reasons for wanting > to be able to get an up-to-date image of your tree. Should have done that ages ago.. > The other reason, > of course, is to be able to see if a patch I'm about to send conflicts > with something you have already taken, and rebase it if necessary. <hack, hack> How's this? This is a note to let you know that I've just added the patch titled ppc32: Fix AGP and sleep again to the -mm tree. Its filename is ppc32-fix-agp-and-sleep-again.patch Patches currently in -mm which might be from yourself are add-suspend-method-to-cpufreq-core.patch ppc32-fix-cpufreq-problems.patch ppc32-fix-agp-and-sleep-again.patch ppc32-fix-errata-for-some-g3-cpus.patch ppc64-fix-semantics-of-__ioremap.patch ppc64-improve-mapping-of-vdso.patch ppc64-detect-altivec-via-firmware-on-unknown-cpus.patch ppc64-remove-bogus-f50-hack-in-promc.patch ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:46 ` Andrew Morton @ 2005-04-07 11:17 ` Paul Mackerras 0 siblings, 0 replies; 201+ messages in thread From: Paul Mackerras @ 2005-04-07 11:17 UTC (permalink / raw) To: Andrew Morton; +Cc: dwmw2, torvalds, linux-kernel Andrew Morton writes: > > The other reason, > > of course, is to be able to see if a patch I'm about to send conflicts > > with something you have already taken, and rebase it if necessary. > > <hack, hack> > > How's this? Nice; but in fact I meant that I want to be able to see if a patch of mine conflicts with one from somebody else. Paul. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:20 ` Paul Mackerras 2005-04-07 9:46 ` Andrew Morton @ 2005-04-07 10:41 ` Geert Uytterhoeven 1 sibling, 0 replies; 201+ messages in thread From: Geert Uytterhoeven @ 2005-04-07 10:41 UTC (permalink / raw) To: Paul Mackerras Cc: Andrew Morton, David Woodhouse, Linus Torvalds, Linux Kernel Development On Thu, 7 Apr 2005, Paul Mackerras wrote: > Andrew Morton writes: > > The problem with those is letting other people get access to it. I guess > > that could be fixed with a bit of scripting and rsyncing. > > Yes. Me too ;-) > > (I don't do that for -mm because -mm basically doesn't work for 99% of the > > time. Takes 4-5 hours to out a release out assuming that nothing's busted, > > and usually something is). > > With -mm we get those nice little automatic emails saying you've put > the patch into -mm, which removes one of the main reasons for wanting > to be able to get an up-to-date image of your tree. The other reason, FYI, for Linus' BK tree procmail was telling me, if it encountered a patch on the commits list which was signed-off by me. > of course, is to be able to see if a patch I'm about to send conflicts > with something you have already taken, and rebase it if necessary. And yet another reason: to monitor if files/subsystems I'm interested in are changed. Summarized: I'd be happy with a mailing list that would send out all patches (incl. full comment headers, cfr. bk-commit) that Linus commits. An added bonus would be that people would really be able to reconstruct the full tree from the mails, unlike with bk-commits (due to `strange' csets caused by merges). Just make sure there are strictly monotone sequence numbers in the individual mails. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 8:50 ` Andrew Morton 2005-04-07 9:20 ` Paul Mackerras @ 2005-04-07 9:25 ` David Woodhouse 2005-04-07 9:49 ` Andrew Morton 2005-04-07 9:55 ` Russell King 2005-04-07 9:40 ` David Vrabel 2 siblings, 2 replies; 201+ messages in thread From: David Woodhouse @ 2005-04-07 9:25 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, linux-kernel On Thu, 2005-04-07 at 01:50 -0700, Andrew Morton wrote: > (I don't do that for -mm because -mm basically doesn't work for 99% of > the time. Takes 4-5 hours to out a release out assuming that > nothing's busted, and usually something is). On the subject of -mm: are you going to keep doing the BK imports to that for the time being, or would it be better to leave the BK trees alone now and send you individual patches. For that matter, will there be a brief amnesty after 2.6.12 where Linus will use BK to pull those trees which were waiting for that, or will we all need to export from BK manually? -- dwmw2 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:25 ` David Woodhouse @ 2005-04-07 9:49 ` Andrew Morton 2005-04-07 9:55 ` Russell King 1 sibling, 0 replies; 201+ messages in thread From: Andrew Morton @ 2005-04-07 9:49 UTC (permalink / raw) To: David Woodhouse; +Cc: torvalds, linux-kernel David Woodhouse <dwmw2@infradead.org> wrote: > > On Thu, 2005-04-07 at 01:50 -0700, Andrew Morton wrote: > > (I don't do that for -mm because -mm basically doesn't work for 99% of > > the time. Takes 4-5 hours to out a release out assuming that > > nothing's busted, and usually something is). > > On the subject of -mm: are you going to keep doing the BK imports to > that for the time being, or would it be better to leave the BK trees > alone now and send you individual patches. I really don't know - I'll continue to pull the bk trees for a while, until we work out what the new (probably interim) regime looks like. > For that matter, will there be a brief amnesty after 2.6.12 where Linus > will use BK to pull those trees which were waiting for that, or will we > all need to export from BK manually? > I think Linus has stopped using bk already. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:25 ` David Woodhouse 2005-04-07 9:49 ` Andrew Morton @ 2005-04-07 9:55 ` Russell King 2005-04-07 10:11 ` David Woodhouse 1 sibling, 1 reply; 201+ messages in thread From: Russell King @ 2005-04-07 9:55 UTC (permalink / raw) To: David Woodhouse; +Cc: Andrew Morton, torvalds, linux-kernel On Thu, Apr 07, 2005 at 10:25:18AM +0100, David Woodhouse wrote: > On Thu, 2005-04-07 at 01:50 -0700, Andrew Morton wrote: > > (I don't do that for -mm because -mm basically doesn't work for 99% of > > the time. Takes 4-5 hours to out a release out assuming that > > nothing's busted, and usually something is). > > On the subject of -mm: are you going to keep doing the BK imports to > that for the time being, or would it be better to leave the BK trees > alone now and send you individual patches. > > For that matter, will there be a brief amnesty after 2.6.12 where Linus > will use BK to pull those trees which were waiting for that, or will we > all need to export from BK manually? Linus indicated (maybe privately) that the end of his BK usage would be immediately after the -rc2 release. I'm taking that to mean "no more BK usage from Linus, period." Thinking about it a bit, if you're asking Linus to pull your tree, Linus would then have to extract the individual change sets as patches to put into his new fangled patch management system. Is that a reasonable expectation? However, it's ultimately up to Linus to decide. 8) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:55 ` Russell King @ 2005-04-07 10:11 ` David Woodhouse 0 siblings, 0 replies; 201+ messages in thread From: David Woodhouse @ 2005-04-07 10:11 UTC (permalink / raw) To: Russell King; +Cc: Andrew Morton, torvalds, linux-kernel On Thu, 2005-04-07 at 10:55 +0100, Russell King wrote: > Thinking about it a bit, if you're asking Linus to pull your tree, > Linus would then have to extract the individual change sets as patches > to put into his new fangled patch management system. Is that a > reasonable expectation? I don't know if it's a reasonable expectation; that's why I'm asking. I could live with having to export everything to patches; it's not so hard. It's just that if the export to whatever ends up replacing BK can be done in a way (or at a time) which allows the existing forest of BK trees to be pulled from one last time, that may save a fair amount of work all round, so I thought it was worth mentioning. -- dwmw2 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 8:50 ` Andrew Morton 2005-04-07 9:20 ` Paul Mackerras 2005-04-07 9:25 ` David Woodhouse @ 2005-04-07 9:40 ` David Vrabel 2 siblings, 0 replies; 201+ messages in thread From: David Vrabel @ 2005-04-07 9:40 UTC (permalink / raw) To: Andrew Morton; +Cc: David Woodhouse, linux-kernel Andrew Morton wrote: > David Woodhouse <dwmw2@infradead.org> wrote: > >> One feature I'd want to see in a replacement version control system is >> the ability to _re-order_ patches, and to cherry-pick patches from my >> tree to be sent onwards. > > You just described quilt & patch-scripts. > > The problem with those is letting other people get access to it. I guess > that could be fixed with a bit of scripting and rsyncing. Where I work we've been using quilt for a while now and storing the patch-set in CVS. To limit the number of potential stuff-ups due to two people working on the same patch at the same time (the chance that CVS's merge will get it right is zero) we use CVS's locking feature to ensure that only one person can edit/update a patch or the series file at any one time. It seems to work quite well (though admittedly there's only two developers working on the patch-set and it currently contains a mere 61 patches). We also have a few scripts to ensure we always due the correct locking. The main ones are: qec -- to edit a file either as part of the top 'working' patch or as an existing patch. It does the quilt push which I always forget to do otherwise. qrefr -- like quilt refresh only it locks the patch first. qimport -- like quilt import only it locks the series file first. You can grab a tarball of these (and other, less interesting ones) from http://www.davidvrabel.org.uk/quilt-n-cvs-scripts-1.tar.gz Note that I'm providing this purely on an as-is basis in case any one is interested. And I've just realized I can't remember how exactly to set up the CVS repository of the patch-set. I think you need to do a cvs watch on when it's checked-out. David Vrabel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 7:18 ` David Woodhouse 2005-04-07 8:50 ` Andrew Morton @ 2005-04-07 9:24 ` Sergei Organov 2005-04-07 10:30 ` Matthias Andree 2005-04-07 15:32 ` Linus Torvalds 2 siblings, 1 reply; 201+ messages in thread From: Sergei Organov @ 2005-04-07 9:24 UTC (permalink / raw) To: David Woodhouse; +Cc: Kernel Mailing List David Woodhouse <dwmw2@infradead.org> writes: > On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote: > > PS. Don't bother telling me about subversion. If you must, start reading > > up on "monotone". That seems to be the most viable alternative, but don't > > pester the developers so much that they don't get any work done. They are > > already aware of my problems ;) > > One feature I'd want to see in a replacement version control system is > the ability to _re-order_ patches, and to cherry-pick patches from my > tree to be sent onwards. The lack of that capability is the main reason > I always hated BitKeeper. darcs? <http://www.abridgegame.org/darcs/> ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 9:24 ` Sergei Organov @ 2005-04-07 10:30 ` Matthias Andree 2005-04-07 10:54 ` Andrew Walrond 2005-04-09 16:17 ` David Roundy 0 siblings, 2 replies; 201+ messages in thread From: Matthias Andree @ 2005-04-07 10:30 UTC (permalink / raw) To: Kernel Mailing List On Thu, 07 Apr 2005, Sergei Organov wrote: > David Woodhouse <dwmw2@infradead.org> writes: > > > On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote: > > > PS. Don't bother telling me about subversion. If you must, start reading > > > up on "monotone". That seems to be the most viable alternative, but don't > > > pester the developers so much that they don't get any work done. They are > > > already aware of my problems ;) > > > > One feature I'd want to see in a replacement version control system is > > the ability to _re-order_ patches, and to cherry-pick patches from my > > tree to be sent onwards. The lack of that capability is the main reason > > I always hated BitKeeper. > > darcs? <http://www.abridgegame.org/darcs/> Close. Some things: 1. It's rather slow and quite CPU consuming and certainly I/O consuming at times - I keep, to try it out, leafnode-2 in a DARCS repo, which has a mere 20,000 lines in 140 files, with 1,436 changes so far, on a RAID-1 with two 7200/min disk drives, with an Athlon XP 2500+ with 512 MB RAM. The repo has 1,700 files in 11.5 MB, the source itself 189 files in 1.8 MB. Example: darcs annotate nntpd.c takes 23 s. (2,660 lines, 60 kByte) The maintainer himself states that there's still optimization required. 2. It has an impressive set of dependencies around Glasgow Haskell Compiler. I don't personally have issues with that, but I can already hear the moaning and bitching. 3. DARCS is written in Haskell. This is not a problem either, but I'd think there are fewer people who can hack Haskell than people who can hack C, C++, Java, Python or similar. It is still better than BitKeeper from the hacking POV as the code is available and under an acceptable license. Getting DARCS up to the task would probably require some polishing, and should probably be discussed with the DARCS maintainer before making this decision. Don't get me wrong, DARCS looks promising, but I'm not convinced it's ready for the linux kernel yet. -- Matthias Andree ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 10:30 ` Matthias Andree @ 2005-04-07 10:54 ` Andrew Walrond 2005-04-09 16:17 ` David Roundy 1 sibling, 0 replies; 201+ messages in thread From: Andrew Walrond @ 2005-04-07 10:54 UTC (permalink / raw) To: Kernel Mailing List I recently switched from bk to darcs (actually looked into it after the author mentioned on LKML that he had imported the kernel tree). Very impressed so far, but as you say, > 1. It's rather slow and quite CPU consuming and certainly I/O consuming I expect something as large as the kernel tree would cause problems in this respect. > 2. It has an impressive set of dependencies around Glasgow Haskell > Compiler. I don't personally have issues with that, but I can already > hear the moaning and bitching. :) I try to built everthing from the original source, but in this case I couldn't. The GHC needs the GHC + some GHC addons in order to compile itself... > > 3. DARCS is written in Haskell. This is not a problem either, but I'd > think there are fewer people who can hack Haskell than people who > can hack C, C++, Java, Python or similar. It is still better than True, though as you say, not a show-stopper. >From a functionality standpoint, darcs seem very similar to monotone, with a couple minor trade-offs in either direction. I wonder if Linus would mind publishing his feature requests to the monotone developers, so that other projects, like darcs, would know what needs working on. Andrew Walrond ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 10:30 ` Matthias Andree 2005-04-07 10:54 ` Andrew Walrond @ 2005-04-09 16:17 ` David Roundy 2005-04-10 9:24 ` Giuseppe Bilotta 1 sibling, 1 reply; 201+ messages in thread From: David Roundy @ 2005-04-09 16:17 UTC (permalink / raw) To: Kernel Mailing List On Thu, Apr 07, 2005 at 12:30:18PM +0200, Matthias Andree wrote: > On Thu, 07 Apr 2005, Sergei Organov wrote: > > darcs? <http://www.abridgegame.org/darcs/> > > Close. Some things: > > 1. It's rather slow and quite CPU consuming and certainly I/O consuming > at times - I keep, to try it out, leafnode-2 in a DARCS repo, which > has a mere 20,000 lines in 140 files, with 1,436 changes so far, on a > RAID-1 with two 7200/min disk drives, with an Athlon XP 2500+ with > 512 MB RAM. The repo has 1,700 files in 11.5 MB, the source itself > 189 files in 1.8 MB. > > Example: darcs annotate nntpd.c takes 23 s. (2,660 lines, 60 kByte) > > The maintainer himself states that there's still optimization required. Indeed, there's still a lot of optimization to be done. I've recently made some improvements recently which will reduce the memory use (and speed things up) for a few of the worst-performing commands. No improvement to the initial record, but on the plus side, that's only done once. But I was able to cut down the memory used checking out a kernel repository to 500m. (Which, sadly enough, is a major improvement.) You would do much better if you recorded the initial state one directory at a time, since it's the size of the largest changeset that determines the memory use on checkout, but that's ugly. > Getting DARCS up to the task would probably require some polishing, and > should probably be discussed with the DARCS maintainer before making > this decision. > > Don't get me wrong, DARCS looks promising, but I'm not convinced it's > ready for the linux kernel yet. Indeed, I do believe that darcs has a way to go before it'll perform acceptably on the kernel. On the other hand, tar seems to perform unacceptably slow on the kernel, so I'm not sure how slow is too slow. Definitely input from interested kernel developers on which commands are too slow would be welcome. -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:17 ` David Roundy @ 2005-04-10 9:24 ` Giuseppe Bilotta 2005-04-10 13:51 ` David Roundy 0 siblings, 1 reply; 201+ messages in thread From: Giuseppe Bilotta @ 2005-04-10 9:24 UTC (permalink / raw) To: linux-kernel On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote: > I've recently made some improvements > recently which will reduce the memory use Does this include check for redundancy? ;) -- Giuseppe "Oblomov" Bilotta Hic manebimus optime ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 9:24 ` Giuseppe Bilotta @ 2005-04-10 13:51 ` David Roundy 0 siblings, 0 replies; 201+ messages in thread From: David Roundy @ 2005-04-10 13:51 UTC (permalink / raw) To: linux-kernel On Sun, Apr 10, 2005 at 11:24:07AM +0200, Giuseppe Bilotta wrote: > On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote: > > > I've recently made some improvements recently which will reduce the > > memory use > > Does this include check for redundancy? ;) Yeah, the only catch is that if the redundancy checks fail, we now may leave the repository in an inconsistent, but repairable, state. (Only a cache of the pristine tree is affected.) The recent improvements mostly came by increasing the laziness of a few operations, which meant we don't need to store the entire parsed tree (or parsed patch) in memory for certain operations. -- David Roundy http://www.darcs.net ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 7:18 ` David Woodhouse 2005-04-07 8:50 ` Andrew Morton 2005-04-07 9:24 ` Sergei Organov @ 2005-04-07 15:32 ` Linus Torvalds 2005-04-07 17:09 ` Daniel Phillips ` (2 more replies) 2 siblings, 3 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-07 15:32 UTC (permalink / raw) To: David Woodhouse; +Cc: Kernel Mailing List On Thu, 7 Apr 2005, David Woodhouse wrote: > > On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote: > > PS. Don't bother telling me about subversion. If you must, start reading > > up on "monotone". That seems to be the most viable alternative, but don't > > pester the developers so much that they don't get any work done. They are > > already aware of my problems ;) > > One feature I'd want to see in a replacement version control system is > the ability to _re-order_ patches, and to cherry-pick patches from my > tree to be sent onwards. The lack of that capability is the main reason > I always hated BitKeeper. I really disliked that in BitKeeper too originally. I argued with Larry about it, but Larry (correctly, I believe) argued that efficient and reliable distribution really requires the concept of "history is immutable". It makes replication much easier when you know that the known subset _never_ shrinks or changes - you only add on top of it. And that implies no cherry-picking. Also, there's actually a second reason why I've decided that cherry- picking is wrong, and it's non-technical. The thing is, cherry-picking very much implies that the people "up" the foodchain end up editing the work of the people "below" them. The whole reason you want cherry-picking is that you want to fix up somebody elses mistakes, ie something you disagree with. That sounds like an obviously good thing, right? Yes it does. The problem is, it actually results in the wrong dynamics and psychology in the system. First off, it makes the implicit assumption that there is an "up" and "down" in the food-chain, and I think that's wrong. It's increasingly a "network" in the kernel. I'm less and less "the top", as much as a "fairly central" person. And that is how it should be. I used to think of kernel development as a hierarchy, but I long since switched to thinking about it as a fairly arbitrary network. The other thing it does is that it implicitly puts the burden of quality control at the upper-level maintainer ("I'll pick the good things out of your tree"), while _not_ being able to cherry-pick means that there is pressure in both directions to keep the tree clean. And that is IMPORTANT. I realize that not cherry-picking means that people who want to merge upstream (or sideways or anything) are now forced to do extra work in trying to keep their tree free of random crap. And that's a HUGELY IMPORTANT THING! It means that the pressure to keep the tree clean flows in all directions, and takes pressure off the "central" point. In onther words it distributes the pain of maintenance. In other words, somebody who can't keep their act together, and creates crappy trees because he has random pieces of crud in it, quite automatically gets actively shunned by others. AND THAT IS GOOD! I've pushed back on some BK users to clean up their trees, to the point where we've had a number of "let's just re-do that" over the years. That's WONDERFUL. People are irritated at first, but I've seen what the end result is, and the end result is a much better maintainer. Some people actually end up doing the cleanup different ways. For example, Jeff Garzik kept many separate trees, and had a special merge thing. Others just kept a messy tree for development, and when they are happy, they throw the messy tree away and re-create a cleaner one. Either is fine - the point is, different people like to work different ways, and that's fine, but makign _everybody_ work at being clean means that there is no train wreck down the line when somebody is forced to try to figure out what to cherry-pick. So I've actually changed from "I want to cherry-pick" to "cherry-picking between maintainers is the wrong workflow". Now, as part of cleaning up, people may end up exporting the "ugly tree" as patches and re-importing it into the clean tree as the fixed clean series of patches, and that's "cherry-picking", but it's not between developers. NOTE! The "no cherry-picking" model absolutely also requires a model of "throw-away development trees". The two go together. BK did both, and an SCM that does one but not the other would be horribly broken. (This is my only real conceptual gripe with "monotone". I like the model, but they make it much harder than it should be to have throw-away trees due to the fact that they seem to be working on the assumption of "one database per developer" rather than "one database per tree". You don't have to follow that model, but it seems to be what the setup is geared for, and together with their "branches" it means that I think a monotone database easily gets very cruddy. The other problem with monotone is just performance right now, but that's hopefully not _too_ fundamental). Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 15:32 ` Linus Torvalds @ 2005-04-07 17:09 ` Daniel Phillips 2005-04-07 17:10 ` Al Viro 2005-04-08 22:52 ` Roman Zippel 2 siblings, 0 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 17:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Woodhouse, Kernel Mailing List On Thursday 07 April 2005 11:32, Linus Torvalds wrote: > On Thu, 7 Apr 2005, David Woodhouse wrote: > > On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote: > > > PS. Don't bother telling me about subversion. If you must, start > > > reading up on "monotone". That seems to be the most viable alternative, > > > but don't pester the developers so much that they don't get any work > > > done. They are already aware of my problems ;) > > > > One feature I'd want to see in a replacement version control system is > > the ability to _re-order_ patches, and to cherry-pick patches from my > > tree to be sent onwards. The lack of that capability is the main reason > > I always hated BitKeeper. > > I really disliked that in BitKeeper too originally. I argued with Larry > about it, but Larry (correctly, I believe) argued that efficient and > reliable distribution really requires the concept of "history is > immutable". It makes replication much easier when you know that the known > subset _never_ shrinks or changes - you only add on top of it. However, it would be easy to allow reordering before "publishing" a revision, which would preserve immutability for all published revisions while allowing the patch _author_ the flexibility of reordering/splitting/joining patches when creating them. In other words, a virtuous marriage of the BK model with Andrew's Quilt. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 15:32 ` Linus Torvalds 2005-04-07 17:09 ` Daniel Phillips @ 2005-04-07 17:10 ` Al Viro 2005-04-07 17:47 ` Linus Torvalds ` (2 more replies) 2005-04-08 22:52 ` Roman Zippel 2 siblings, 3 replies; 201+ messages in thread From: Al Viro @ 2005-04-07 17:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Woodhouse, Kernel Mailing List On Thu, Apr 07, 2005 at 08:32:04AM -0700, Linus Torvalds wrote: > Also, there's actually a second reason why I've decided that cherry- > picking is wrong, and it's non-technical. > > The thing is, cherry-picking very much implies that the people "up" the > foodchain end up editing the work of the people "below" them. The whole > reason you want cherry-picking is that you want to fix up somebody elses > mistakes, ie something you disagree with. No. There's another reason - when you are cherry-picking and reordering *your* *own* *patches*. That's what I had been unable to explain to Larry and that's what made BK unusable for me. As for the immutable history... Ever had to read or grade students' homework? * the dumbest kind: "here's an answer <expression>, whaddya mean 'where's the solution'?". * next one: "here's how I've solved the problem: <pages of text documenting the attempts, with many 'oops, there had been a mistake, here's how we fix it'>". * what you really want to see: series of steps leading to answer, with clean logical structure that allows to understand what's being done and verify correctness. The first corresponds to "here's a half-meg of patch, it fixes everything". The second is chronological history (aka "this came from our CVS, all bugs are fixed by now, including those introduced in the middle of it; see CVS history for details"). The third is a decent patch series. And to get from "here's how I came up to solution" to "here's a clean way to reach the solution" you _have_ to reorder. There's also "here are misc notes from today, here are misc notes from yesterday, etc." and to get that into sane shape you will need to split, reorder and probably collapse several into combined delta (possibly getting an empty delta as the result, if later ones negate the prior). The point being, both history and well, publishable result can be expressed as series of small steps, but they are not the same thing. So far all I've seen in the area (and that includes BK) is heavily biased towards history part and attempts to use this stuff for manipulating patch series turn into fighting the tool. I'd *love* to see something that can handle both - preferably with history of reordering, etc. being kept. IOW, not just a tree of changesets but a lattice - with multiple paths leading to the same node. So far I've seen nothing of that kind ;-/ ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:10 ` Al Viro @ 2005-04-07 17:47 ` Linus Torvalds 2005-04-07 18:04 ` Jörn Engel 2005-04-08 3:41 ` Jeff Garzik 2005-04-07 17:52 ` Bartlomiej Zolnierkiewicz 2005-04-07 17:54 ` Daniel Phillips 2 siblings, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-07 17:47 UTC (permalink / raw) To: Al Viro; +Cc: David Woodhouse, Kernel Mailing List On Thu, 7 Apr 2005, Al Viro wrote: > > No. There's another reason - when you are cherry-picking and reordering > *your* *own* *patches*. Yes. I agree. There should be some support for cherry-picking in between a temporary throw-away tree and a "cleaned-up-tree". However, it should be something you really do need to think about, and in most cases it really does boil down to "export as patch, re-import from patch". Especially since you potentially want to edit things in between anyway when you cherry-pick. (I do that myself: If I have been a messy boy, and committed mixed-up things as one commit, I export it as a patch, and then I split the patch by hand into two or more pieces - sometimes by just editing the patch directly, but sometimes with a combination of by applying it, and editing the result, and then re-exporting it as the new version). And in the cases where this happens, you in fact often have unrelated changes to the _same_file_, so you really do end up having that middle step. In other words, this cherry-picking can generally be scripted and done "outside" the SCM (you can trivially have a script that takes a revision from one tree and applies it to the other). I don't believe that the SCM needs to support it in any fundamentally inherent manner. After all, why should it, when it really boilds down to (cd old-tree ; scm export-as-patch-plus-comments) | (cd new-tree ; scm import-patch-plus-comments) where the "patch-plus-comments" part is just basically an extended patch (including rename information etc, not just the comments). Btw, this method of cherry-picking again requires two _separate_ active trees at the same time. BK is great at that, and really, that's what distributed SCM's should be all about anyway. It's not just distributed between different machines, it's literally distributed even on the same machine, and it's actively _used_ that way. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:47 ` Linus Torvalds @ 2005-04-07 18:04 ` Jörn Engel 2005-04-07 18:27 ` Daniel Phillips 2005-04-07 20:54 ` Arjan van de Ven 2005-04-08 3:41 ` Jeff Garzik 1 sibling, 2 replies; 201+ messages in thread From: Jörn Engel @ 2005-04-07 18:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, David Woodhouse, Kernel Mailing List On Thu, 7 April 2005 10:47:18 -0700, Linus Torvalds wrote: > On Thu, 7 Apr 2005, Al Viro wrote: > > > > No. There's another reason - when you are cherry-picking and reordering > > *your* *own* *patches*. > > Yes. I agree. There should be some support for cherry-picking in between a > temporary throw-away tree and a "cleaned-up-tree". However, it should be > something you really do need to think about, and in most cases it really > does boil down to "export as patch, re-import from patch". Especially > since you potentially want to edit things in between anyway when you > cherry-pick. For reordering, using patcher, you can simply edit the sequence file and move lines around. Nice and simple interface. There is no checking involved, though. If you mode dependent patches, you end up with a mess and either throw it all away or seriously scratch your head. So a serious SCM might do something like this: $ cp series new_series $ vi new_series $ SCM --reorder new_series # essentially "mv new_series series", if no checks fail Merging patches isn't that hard either. Splitting them would remain manual, as you described it. > Btw, this method of cherry-picking again requires two _separate_ active > trees at the same time. BK is great at that, and really, that's what > distributed SCM's should be all about anyway. It's not just distributed > between different machines, it's literally distributed even on the same > machine, and it's actively _used_ that way. Amen! Jörn -- He who knows that enough is enough will always have enough. -- Lao Tsu ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 18:04 ` Jörn Engel @ 2005-04-07 18:27 ` Daniel Phillips 2005-04-07 20:54 ` Arjan van de Ven 1 sibling, 0 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 18:27 UTC (permalink / raw) To: Jörn Engel Cc: Linus Torvalds, Al Viro, David Woodhouse, Kernel Mailing List On Thursday 07 April 2005 14:04, Jörn Engel wrote: > On Thu, 7 April 2005 10:47:18 -0700, Linus Torvalds wrote: >> ... There should be some support for cherry-picking in between > > a temporary throw-away tree and a "cleaned-up-tree". However, it should > > be something you really do need to think about, and in most cases it > > really does boil down to "export as patch, re-import from patch". > > Especially since you potentially want to edit things in between anyway > > when you cherry-pick. > > For reordering, using patcher, you can simply edit the sequence file > and move lines around. Nice and simple interface. > > There is no checking involved, though. If you mode dependent patches, > you end up with a mess and either throw it all away or seriously > scratch your head. So a serious SCM might do something like this: > > $ cp series new_series > $ vi new_series > $ SCM --reorder new_series > # essentially "mv new_series series", if no checks fail > > Merging patches isn't that hard either. Splitting them would remain > manual, as you described it. Well it's clear that adding cherry picking, patch reordering, splitting and merging (two patches into one) is not even hard, it's just a matter of making it convenient by _building it into the tool_. Now, can we just pick a tool and do it, please? :-) Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 18:04 ` Jörn Engel 2005-04-07 18:27 ` Daniel Phillips @ 2005-04-07 20:54 ` Arjan van de Ven 1 sibling, 0 replies; 201+ messages in thread From: Arjan van de Ven @ 2005-04-07 20:54 UTC (permalink / raw) To: Jörn Engel Cc: Linus Torvalds, Al Viro, David Woodhouse, Kernel Mailing List On Thu, 2005-04-07 at 20:04 +0200, Jörn Engel wrote: > On Thu, 7 April 2005 10:47:18 -0700, Linus Torvalds wrote: > > On Thu, 7 Apr 2005, Al Viro wrote: > > > > > > No. There's another reason - when you are cherry-picking and reordering > > > *your* *own* *patches*. > > > > Yes. I agree. There should be some support for cherry-picking in between a > > temporary throw-away tree and a "cleaned-up-tree". However, it should be > > something you really do need to think about, and in most cases it really > > does boil down to "export as patch, re-import from patch". Especially > > since you potentially want to edit things in between anyway when you > > cherry-pick. > > For reordering, using patcher, you can simply edit the sequence file > and move lines around. Nice and simple interface. > > There is no checking involved, though. If you mode dependent patches, > you end up with a mess and either throw it all away or seriously > scratch your head. So a serious SCM might do something like this: just fyi, patchutils has a tool that can "flip" the order of patches even if they patch the same line of code in the files.... with it you can make a "bubble sort" to move stuff about safely... ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:47 ` Linus Torvalds 2005-04-07 18:04 ` Jörn Engel @ 2005-04-08 3:41 ` Jeff Garzik 1 sibling, 0 replies; 201+ messages in thread From: Jeff Garzik @ 2005-04-08 3:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: Al Viro, David Woodhouse, Kernel Mailing List Linus Torvalds wrote: > In other words, this cherry-picking can generally be scripted and done > "outside" the SCM (you can trivially have a script that takes a revision > from one tree and applies it to the other). I don't believe that the SCM > needs to support it in any fundamentally inherent manner. After all, why > should it, when it really boilds down to > > (cd old-tree ; scm export-as-patch-plus-comments) | > (cd new-tree ; scm import-patch-plus-comments) > > where the "patch-plus-comments" part is just basically an extended patch > (including rename information etc, not just the comments). Not that it matters anymore, but that's precisely what the script Documentation/BK-usage/cpcset did, for BitKeeper. Jeff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:10 ` Al Viro 2005-04-07 17:47 ` Linus Torvalds @ 2005-04-07 17:52 ` Bartlomiej Zolnierkiewicz 2005-04-07 17:54 ` Daniel Phillips 2 siblings, 0 replies; 201+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2005-04-07 17:52 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List On Apr 7, 2005 7:10 PM, Al Viro <viro@parcelfarce.linux.theplanet.co.uk> wrote: > On Thu, Apr 07, 2005 at 08:32:04AM -0700, Linus Torvalds wrote: > > Also, there's actually a second reason why I've decided that cherry- > > picking is wrong, and it's non-technical. > > > > The thing is, cherry-picking very much implies that the people "up" the > > foodchain end up editing the work of the people "below" them. The whole > > reason you want cherry-picking is that you want to fix up somebody elses > > mistakes, ie something you disagree with. > > No. There's another reason - when you are cherry-picking and reordering > *your* *own* *patches*. That's what I had been unable to explain to > Larry and that's what made BK unusable for me. Yep, I missed this in BK a lot. There is another situation in which cherry-picking is very useful: even if you have a clean tree it still may contain bugfixes mixed with unrelated cleanups and sometimes you want to only apply bugfixes. Bartlomiej ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:10 ` Al Viro 2005-04-07 17:47 ` Linus Torvalds 2005-04-07 17:52 ` Bartlomiej Zolnierkiewicz @ 2005-04-07 17:54 ` Daniel Phillips 2005-04-07 18:13 ` Dmitry Yusupov 2005-04-08 17:24 ` Jon Masters 2 siblings, 2 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 17:54 UTC (permalink / raw) To: Al Viro; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List On Thursday 07 April 2005 13:10, Al Viro wrote: > The point being, both history and well, publishable result can be expressed > as series of small steps, but they are not the same thing. So far all I've > seen in the area (and that includes BK) is heavily biased towards history > part and attempts to use this stuff for manipulating patch series turn into > fighting the tool. > > I'd *love* to see something that can handle both - preferably with > history of reordering, etc. being kept. IOW, not just a tree of changesets > but a lattice - with multiple paths leading to the same node. So far > I've seen nothing of that kind ;-/ Which is a perfect demonstration of why the scm tool has to be free/open source. We should never have had to plead with BitMover to extend BK in a direction like that, but instead, just get the source and make it do it, like any other open source project. Three years ago, there was no fully working open source distributed scm code base to use as a starting point, so extending BK would have been the only easy alternative. But since then the situation has changed. There are now several working code bases to provide a good starting point: Monotone, Arch, SVK, Bazaar-ng and others. Sure, there are quibbles about all of those, but right now is not the time for quibbling, because a functional replacement for BK is needed in roughly two months, capable of losslessly importing the kernel version graph. It only has to support a subset of BK functionality, e.g., pulling and cloning. It is ok to be a little slow so long as it is not pathetically slow. The purpose of the interim solution is just to get the patch flow process back online. The key is the _lossless_ part. So long as the interim solution imports the metadata losslessly, we have the flexibility to switch to a better solution later, on short notice and without much pain. So I propose that everybody who is interested, pick one of the above projects and join it, to help get it to the point of being able to losslessly import the version graph. Given the importance, I think that _all_ viable alternatives need to be worked on in parallel, so that two months from now we have several viable options. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:54 ` Daniel Phillips @ 2005-04-07 18:13 ` Dmitry Yusupov 2005-04-07 18:29 ` Daniel Phillips 2005-04-08 17:24 ` Jon Masters 1 sibling, 1 reply; 201+ messages in thread From: Dmitry Yusupov @ 2005-04-07 18:13 UTC (permalink / raw) To: Daniel Phillips Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote: > Three years ago, there was no fully working open source distributed scm code > base to use as a starting point, so extending BK would have been the only > easy alternative. But since then the situation has changed. There are now > several working code bases to provide a good starting point: Monotone, Arch, > SVK, Bazaar-ng and others. Right. For example, SVK is pretty mature project and very close to 1.0 release now. And it supports all kind of merges including Cherry-Picking Mergeback: http://svk.elixus.org/?MergeFeatures Dmitry ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 18:13 ` Dmitry Yusupov @ 2005-04-07 18:29 ` Daniel Phillips 2005-04-10 22:33 ` Troy Benjegerdes 0 siblings, 1 reply; 201+ messages in thread From: Daniel Phillips @ 2005-04-07 18:29 UTC (permalink / raw) To: Dmitry Yusupov Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List On Thursday 07 April 2005 14:13, Dmitry Yusupov wrote: > On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote: > > Three years ago, there was no fully working open source distributed scm > > code base to use as a starting point, so extending BK would have been the > > only easy alternative. But since then the situation has changed. There > > are now several working code bases to provide a good starting point: > > Monotone, Arch, SVK, Bazaar-ng and others. > > Right. For example, SVK is pretty mature project and very close to 1.0 > release now. And it supports all kind of merges including Cherry-Picking > Mergeback: > > http://svk.elixus.org/?MergeFeatures So for an interim way to get the patch flow back online, SVK is ready to try _now_, and we only need a way to import the version graph? (true/false) Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 18:29 ` Daniel Phillips @ 2005-04-10 22:33 ` Troy Benjegerdes 2005-04-11 0:00 ` Christian Parpart 0 siblings, 1 reply; 201+ messages in thread From: Troy Benjegerdes @ 2005-04-10 22:33 UTC (permalink / raw) To: Daniel Phillips Cc: Dmitry Yusupov, Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List On Thu, Apr 07, 2005 at 02:29:24PM -0400, Daniel Phillips wrote: > On Thursday 07 April 2005 14:13, Dmitry Yusupov wrote: > > On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote: > > > Three years ago, there was no fully working open source distributed scm > > > code base to use as a starting point, so extending BK would have been the > > > only easy alternative. But since then the situation has changed. There > > > are now several working code bases to provide a good starting point: > > > Monotone, Arch, SVK, Bazaar-ng and others. > > > > Right. For example, SVK is pretty mature project and very close to 1.0 > > release now. And it supports all kind of merges including Cherry-Picking > > Mergeback: > > > > http://svk.elixus.org/?MergeFeatures > > So for an interim way to get the patch flow back online, SVK is ready to try > _now_, and we only need a way to import the version graph? (true/false) Well, I followed some of the instructions to mirror the kernel tree on svn.clkao.org/linux/cvs, and although it took around 12 hours to import 28232 versions, I seem to have a mirror of it on my own subversion server now. I think the svn.clkao.org mirror was taken from bkcvs... the last log message I see is "Rev 28232 - torvalds - 2005-04-04 09:08:33" I have no idea what's missing. What is everyone's favorite web frontend to subversion? I've got websvn (debian package) on there now, and it's a bit sluggish, but it seems to work. I hope to have time this week or next to actually make this machine publicly accessible. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 22:33 ` Troy Benjegerdes @ 2005-04-11 0:00 ` Christian Parpart 0 siblings, 0 replies; 201+ messages in thread From: Christian Parpart @ 2005-04-11 0:00 UTC (permalink / raw) To: Troy Benjegerdes Cc: Daniel Phillips, Dmitry Yusupov, Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 905 bytes --] On Monday 11 April 2005 12:33 am, you wrote: [......] > Well, I followed some of the instructions to mirror the kernel tree on > svn.clkao.org/linux/cvs, and although it took around 12 hours to import > 28232 versions, I seem to have a mirror of it on my own subversion > server now. I think the svn.clkao.org mirror was taken from bkcvs... the > last log message I see is "Rev 28232 - torvalds - 2005-04-04 09:08:33" I'd love to see svk as a real choice for you guys, but I don't mind as along as I get a door open using svn/svk ;);) > I have no idea what's missing. What is everyone's favorite web frontend > to subversion? Check out ViewCVS at: http://viewcvs.sourceforge.net/ This seem widely used (not just by me ^o^). Regards, Christian Parpart. -- Netiquette: http://www.ietf.org/rfc/rfc1855.txt 01:55:08 up 18 days, 15:01, 2 users, load average: 0.27, 0.39, 0.36 [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 17:54 ` Daniel Phillips 2005-04-07 18:13 ` Dmitry Yusupov @ 2005-04-08 17:24 ` Jon Masters 2005-04-08 22:05 ` Daniel Phillips 1 sibling, 1 reply; 201+ messages in thread From: Jon Masters @ 2005-04-08 17:24 UTC (permalink / raw) To: Daniel Phillips Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List On Apr 7, 2005 6:54 PM, Daniel Phillips <phillips@istop.com> wrote: > So I propose that everybody who is interested, pick one of the above projects > and join it, to help get it to the point of being able to losslessly import > the version graph. Given the importance, I think that _all_ viable > alternatives need to be worked on in parallel, so that two months from now we > have several viable options. What about BitKeeper licensing constraints on such involvement? Jon. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:24 ` Jon Masters @ 2005-04-08 22:05 ` Daniel Phillips 0 siblings, 0 replies; 201+ messages in thread From: Daniel Phillips @ 2005-04-08 22:05 UTC (permalink / raw) To: jonathan; +Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List On Friday 08 April 2005 13:24, Jon Masters wrote: > On Apr 7, 2005 6:54 PM, Daniel Phillips <phillips@istop.com> wrote: > > So I propose that everybody who is interested, pick one of the above > > projects and join it, to help get it to the point of being able to > > losslessly import the version graph. Given the importance, I think that > > _all_ viable alternatives need to be worked on in parallel, so that two > > months from now we have several viable options. > > What about BitKeeper licensing constraints on such involvement? They don't apply to me, for one. Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 15:32 ` Linus Torvalds 2005-04-07 17:09 ` Daniel Phillips 2005-04-07 17:10 ` Al Viro @ 2005-04-08 22:52 ` Roman Zippel 2005-04-08 23:46 ` Tupshin Harper 2005-04-09 16:52 ` Eric D. Mudama 2 siblings, 2 replies; 201+ messages in thread From: Roman Zippel @ 2005-04-08 22:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Woodhouse, Kernel Mailing List Hi, On Thu, 7 Apr 2005, Linus Torvalds wrote: > I really disliked that in BitKeeper too originally. I argued with Larry > about it, but Larry (correctly, I believe) argued that efficient and > reliable distribution really requires the concept of "history is > immutable". It makes replication much easier when you know that the known > subset _never_ shrinks or changes - you only add on top of it. The problem is you pay a price for this. There must be a reason developers were adding another GB of memory just to run BK. Preserving the complete merge history does indeed make repeated merges simpler, but it builds up complex meta data, which has to be managed forever. I doubt that this is really an advantage in the long term. I expect that we were better off serializing changesets in the main repository. For example bk does something like this: A1 -> A2 -> A3 -> BM \-> B1 -> B2 --^ and instead of creating the merge changeset, one could merge them like this: A1 -> A2 -> A3 -> B1 -> B2 This results in a simpler repository, which is more scalable and which is easier for users to work with (e.g. binary bug search). The disadvantage would be it will cause more minor conflicts, when changes are pulled back into the original tree, but which should be easily resolvable most of the time. I'm not saying with this that the bk model is bad, but I think it's a problem if it's the only model applied to everything. > The thing is, cherry-picking very much implies that the people "up" the > foodchain end up editing the work of the people "below" them. The whole > reason you want cherry-picking is that you want to fix up somebody elses > mistakes, ie something you disagree with. > > That sounds like an obviously good thing, right? Yes it does. > > The problem is, it actually results in the wrong dynamics and psychology > in the system. First off, it makes the implicit assumption that there is > an "up" and "down" in the food-chain, and I think that's wrong. These dynamics do exists and our tools should be able to represent them. For example when people post patches, they get reviewed and often need more changes and bk doesn't really help them to redo the patches. Bk helped you to offload the cherry-picking process to other people, so that you only had to do cherry-collecting very efficiently. Another prime example of cherry-picking is Andrews mm tree, he picks a number of patches which are ready for merging and forwards them to you. Our current basic development model (at least until a few days ago) looks something like this: linux-mm -> linux-bk -> linux-stable Ideally most changes would get into the tree via linux-mm and depending on depending various conditions (e.g. urgency, review state) it would get into the stable tree. In practice linux-mm is more an aggregation of patches which need testing and since most bk users were developing against linux-bk, it got a lot less testing and a lot of problems are only caught at the next stage. Changes from the stable tree would even flow in the opposite direction. Bk supports certain aspects of the kernel development process very well, but due its closed nature it was practically impossible to really integrate it fully into this process (at least for anyone outside BM). In the short term we probably are in for a tough ride and we take whatever works best for you, but in the long term we need to think about how SCM fits into our kernel development model, which includes development, review, testing and releasing of kernel changes. This is more than just pulling and merging kernel trees. I'm aiming at a tool that can also support Andrews work, so that he can also better offload some of this work (and take a break sometimes :) ). Unfortunately every existing tool I know of is lacking in its own way, so we still have some way to go... bye, Roman ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 22:52 ` Roman Zippel @ 2005-04-08 23:46 ` Tupshin Harper 2005-04-09 1:00 ` Roman Zippel 2005-04-09 16:52 ` Eric D. Mudama 1 sibling, 1 reply; 201+ messages in thread From: Tupshin Harper @ 2005-04-08 23:46 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List Roman Zippel wrote: >Preserving the complete merge history does indeed make repeated merges >simpler, but it builds up complex meta data, which has to be managed >forever. I doubt that this is really an advantage in the long term. I >expect that we were better off serializing changesets in the main >repository. For example bk does something like this: > > A1 -> A2 -> A3 -> BM > \-> B1 -> B2 --^ > >and instead of creating the merge changeset, one could merge them like >this: > > A1 -> A2 -> A3 -> B1 -> B2 > >This results in a simpler repository, which is more scalable and which >is easier for users to work with (e.g. binary bug search). >The disadvantage would be it will cause more minor conflicts, when changes >are pulled back into the original tree, but which should be easily >resolvable most of the time. > Both darcs and arch (and arch's siblings) have ways of maintaining the complete history but speeding up operations. Arch use's revision libraries: http://www.gnu.org/software/gnu-arch/tutorial/revision-libraries.html though i'm not all that up on arch so I'll just leave it at that. Darcs uses "darcs optimize --checkpoint" http://darcs.net/manual/node7.html#SECTION00764000000000000000 which "allows for users to retrieve a working repository with limited history with a savings of disk space and bandwidth." In darcs case, you can pull a partial repository by doing "darcs get --partial", in which case you only grab the state at the point that the repository was optimized and subsequent patches, and all operations only need to work against the set of patches since that optimize. Note, that I'm not promoting darcs for kernel usage because of speed (or the lack thereof) but I am curious why Linus would consider monotone given its speed issues but not consider darcs. -Tupshin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 23:46 ` Tupshin Harper @ 2005-04-09 1:00 ` Roman Zippel 2005-04-09 1:23 ` Tupshin Harper 0 siblings, 1 reply; 201+ messages in thread From: Roman Zippel @ 2005-04-09 1:00 UTC (permalink / raw) To: Tupshin Harper; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List Hi, On Fri, 8 Apr 2005, Tupshin Harper wrote: > > A1 -> A2 -> A3 -> B1 -> B2 > > > > This results in a simpler repository, which is more scalable and which is > > easier for users to work with (e.g. binary bug search). > > The disadvantage would be it will cause more minor conflicts, when changes > > are pulled back into the original tree, but which should be easily > > resolvable most of the time. > > > Both darcs and arch (and arch's siblings) have ways of maintaining the > complete history but speeding up operations. Please show me how you would do a binary search with arch. I don't really like the arch model, it's far too restrictive and it's jumping through hoops to get to an acceptable speed. What I expect from a SCM is that it maintains both a version index of the directory structure and a version index of the individual files. Arch makes it especially painful to extract this data quickly. For the common cases it throws disk space at the problem and does a lot of caching, but there are still enough problems (e.g. annotate), which require scanning of lots of tarballs. bye, Roman ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:00 ` Roman Zippel @ 2005-04-09 1:23 ` Tupshin Harper 0 siblings, 0 replies; 201+ messages in thread From: Tupshin Harper @ 2005-04-09 1:23 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List Roman Zippel wrote: > > >Please show me how you would do a binary search with arch. > >I don't really like the arch model, it's far too restrictive and it's >jumping through hoops to get to an acceptable speed. >What I expect from a SCM is that it maintains both a version index of the >directory structure and a version index of the individual files. Arch >makes it especially painful to extract this data quickly. For the common >cases it throws disk space at the problem and does a lot of caching, but >there are still enough problems (e.g. annotate), which require scanning of >lots of tarballs. > >bye, Roman > > I'm not going to defend or attack arch since I haven't used it enough. I will say that darcs largely does suffer from the same problem that you describe since its fundamental unit of storage is individual patches (though it avoids the tarball issue). This is why David Roundy has indicated his intention of eventually having a per-file cache: http://kerneltrap.org/mailarchive/1/message/24317/flat You could then make the argument that if you have a per-file representation of the history, why do you also need/want a per-patch representation as the canonical format, but that's been argued plenty on both the darcs and arch mailing lists and probably isn't worth going into here. -Tupshin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 22:52 ` Roman Zippel 2005-04-08 23:46 ` Tupshin Harper @ 2005-04-09 16:52 ` Eric D. Mudama 2005-04-09 17:40 ` Roman Zippel 1 sibling, 1 reply; 201+ messages in thread From: Eric D. Mudama @ 2005-04-09 16:52 UTC (permalink / raw) To: Roman Zippel; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List On Apr 8, 2005 4:52 PM, Roman Zippel <zippel@linux-m68k.org> wrote: > The problem is you pay a price for this. There must be a reason developers > were adding another GB of memory just to run BK. > Preserving the complete merge history does indeed make repeated merges > simpler, but it builds up complex meta data, which has to be managed > forever. I doubt that this is really an advantage in the long term. I > expect that we were better off serializing changesets in the main > repository. For example bk does something like this: > > A1 -> A2 -> A3 -> BM > \-> B1 -> B2 --^ > > and instead of creating the merge changeset, one could merge them like > this: > > A1 -> A2 -> A3 -> B1 -> B2 > > This results in a simpler repository, which is more scalable and which > is easier for users to work with (e.g. binary bug search). > The disadvantage would be it will cause more minor conflicts, when changes > are pulled back into the original tree, but which should be easily > resolvable most of the time. The kicker comes that B1 was developed based on A1, so any test results were based on B1 being a single changeset delta away from A1. If the resulting 'BM' fails testing, and you've converted into the linear model above where B2 has failed, you lose the ability to isolate B1's changes and where they came from, to revalidate the developer's results. With bugs and fixes that can be validated in a few hours, this may not be a problem, but when chasing a bug that takes days or weeks to manifest, that a developer swears they fixed, one has to be able to reproduce their exact test environment. I believe that flattening the change graph makes history reproduction impossible, or alternately, you are imposing on each developer to test the merge results at B1 + A1..3 before submission, but in doing so, the test time may require additional test periods etc and with sufficient velocity, might never close. This is the problem CVS has if you don't create micro branches for every single modification. --eric ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:52 ` Eric D. Mudama @ 2005-04-09 17:40 ` Roman Zippel 2005-04-09 18:56 ` Ray Lee 0 siblings, 1 reply; 201+ messages in thread From: Roman Zippel @ 2005-04-09 17:40 UTC (permalink / raw) To: Eric D. Mudama; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List Hi, On Sat, 9 Apr 2005, Eric D. Mudama wrote: > > For example bk does something like this: > > > > A1 -> A2 -> A3 -> BM > > \-> B1 -> B2 --^ > > > > and instead of creating the merge changeset, one could merge them like > > this: > > > > A1 -> A2 -> A3 -> B1 -> B2 > > > > This results in a simpler repository, which is more scalable and which > > is easier for users to work with (e.g. binary bug search). > > The disadvantage would be it will cause more minor conflicts, when changes > > are pulled back into the original tree, but which should be easily > > resolvable most of the time. > > The kicker comes that B1 was developed based on A1, so any test > results were based on B1 being a single changeset delta away from A1. > If the resulting 'BM' fails testing, and you've converted into the > linear model above where B2 has failed, you lose the ability to > isolate B1's changes and where they came from, to revalidate the > developer's results. What good does it do if you can revalidate the original B1? The important point is that the end result works and if it only fails in the merged version you have a big problem. The serialized version gives you the chance to test whether it fails in B1 or B2. > I believe that flattening the change graph makes history reproduction > impossible, or alternately, you are imposing on each developer to test > the merge results at B1 + A1..3 before submission, but in doing so, > the test time may require additional test periods etc and with > sufficient velocity, might never close. The merge result has to be tested either way, so I'm not exactly sure, what you're trying to say. bye, Roman ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 17:40 ` Roman Zippel @ 2005-04-09 18:56 ` Ray Lee 0 siblings, 0 replies; 201+ messages in thread From: Ray Lee @ 2005-04-09 18:56 UTC (permalink / raw) To: Roman Zippel Cc: Kernel Mailing List, David Woodhouse, Linus Torvalds, Eric D. Mudama On Sat, 2005-04-09 at 19:40 +0200, Roman Zippel wrote: > On Sat, 9 Apr 2005, Eric D. Mudama wrote: > > > For example bk does something like this: > > > > > > A1 -> A2 -> A3 -> BM > > > \-> B1 -> B2 --^ > > > > > > and instead of creating the merge changeset, one could merge them like > > > this: > > > > > > A1 -> A2 -> A3 -> B1 -> B2 > > I believe that flattening the change graph makes history reproduction > > impossible, or alternately, you are imposing on each developer to test > > the merge results at B1 + A1..3 before submission, but in doing so, > > the test time may require additional test periods etc and with > > sufficient velocity, might never close. > > The merge result has to be tested either way, so I'm not exactly sure, > what you're trying to say. The kernel changes. A lot. And often. With that in mind, if (for example) A2 and A3 are simple changes that are quick to test and B1 is large, or complex, or requires hours (days, weeks) of testing to validate, then a maintainer's decision can legitimately be to rebase a tree (say, -mm) upon the B1 line of development, and toss the A2 branch back to those developers with a "Sorry it didn't work out, something here causes Unhappiness with B1, can you track down the problem and try again?" Ray ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (6 preceding siblings ...) 2005-04-07 7:18 ` David Woodhouse @ 2005-04-07 7:44 ` Jan Hudec 2005-04-08 6:14 ` Matthias Urlichs 2005-04-09 1:01 ` Marcin Dalecki 2005-04-07 10:56 ` Andrew Walrond ` (2 subsequent siblings) 10 siblings, 2 replies; 201+ messages in thread From: Jan Hudec @ 2005-04-07 7:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 1435 bytes --] On Wed, Apr 06, 2005 at 08:42:08 -0700, Linus Torvalds wrote: > PS. Don't bother telling me about subversion. If you must, start reading > up on "monotone". That seems to be the most viable alternative, but don't > pester the developers so much that they don't get any work done. They are > already aware of my problems ;) I have looked at most systems currently available. I would suggest following for closer look on: 1) GNU Arch/Bazaar. They use the same archive format, simple, have the concepts right. It may need some scripts or add ons. When Bazaar-NG is ready, it will be able to read the GNU Arch/Bazaar archives so switching should be easy. 2) SVK. True, it is built on subversion, but adds all the distributed features necessary. It keeps mirror of the repository localy (but can mirror only some branches), but BitKeeper did that too. It just hit 1.0beta1, but the development is progressing rapidly. There was a post about ability to track changeset dependencies lately on their mailing-list. I have looked at Monotone too, of course, but I did not find any way for doing cherry-picking (ie. skipping some changes and pulling others) in it and I feel it will need more rework of the meta-data before it is possible. As for the sqlite backend, I'd not consider that a problem. ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 7:44 ` Jan Hudec @ 2005-04-08 6:14 ` Matthias Urlichs 2005-04-09 1:01 ` Marcin Dalecki 1 sibling, 0 replies; 201+ messages in thread From: Matthias Urlichs @ 2005-04-08 6:14 UTC (permalink / raw) To: linux-kernel Hi, Jan Hudec schrub am Thu, 07 Apr 2005 09:44:08 +0200: > 1) GNU Arch/Bazaar. They use the same archive format, simple, have the > concepts right. It may need some scripts or add ons. When Bazaar-NG is > ready, it will be able to read the GNU Arch/Bazaar archives so > switching should be easy. Plus Bazaar has multiple implementations (C and Python). Plus arch can trivially export single patches. Plus ... well, you get the idea. ;-) Linus: Care to share your SCM feature requirement list? -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-07 7:44 ` Jan Hudec 2005-04-08 6:14 ` Matthias Urlichs @ 2005-04-09 1:01 ` Marcin Dalecki 2005-04-09 8:32 ` Jan Hudec 2005-04-11 2:26 ` Miles Bader 1 sibling, 2 replies; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:01 UTC (permalink / raw) To: Jan Hudec; +Cc: Linus Torvalds, Kernel Mailing List On 2005-04-07, at 09:44, Jan Hudec wrote: > > I have looked at most systems currently available. I would suggest > following for closer look on: > > 1) GNU Arch/Bazaar. They use the same archive format, simple, have the > concepts right. It may need some scripts or add ons. When Bazaar-NG > is ready, it will be able to read the GNU Arch/Bazaar archives so > switching should be easy. Arch isn't a sound example of software design. Quite contrary to the random notes posted by it's author the following issues did strike me the time I did evaluate it: The application (tla) claims to have "intuitive" command names. However I didn't see that as given. Most of them where difficult to remember and appeared to be just infantile. I stopped looking further after I saw: tla my-id instead of: tla user-id or oeven tla set id ... tla make-archive instead of tla init tla my-default-archive john@dole.com--2005-VersionPatrol No more "My Compuer" please... Repository addressing requires you to use informally defined very elaborated and typing error prone conventions: mkdir ~/{archives} tla make-archive john@dole.com--20005-VersionPatrol ~/{archives}/2005-VersionPatrol You notice the requirement for two commands to accomplish a single task already well denoted by the second command? There is more of the same at quite a few places when you try to use it. You notice the triple zero it didn't catch? As an added bonus it relies on the applications named by accident patch and diff and installed on the host in question as well as few other as well to operate. Better don't waste your time with looking at Arch. Stick with patches you maintain by hand combined with some scripts containing a list of apply commands and you should be still more productive then when using Arch. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:01 ` Marcin Dalecki @ 2005-04-09 8:32 ` Jan Hudec 2005-04-11 2:26 ` Miles Bader 1 sibling, 0 replies; 201+ messages in thread From: Jan Hudec @ 2005-04-09 8:32 UTC (permalink / raw) To: Marcin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 3284 bytes --] On Sat, Apr 09, 2005 at 03:01:29 +0200, Marcin Dalecki wrote: > > On 2005-04-07, at 09:44, Jan Hudec wrote: > > > >I have looked at most systems currently available. I would suggest > >following for closer look on: > > > >1) GNU Arch/Bazaar. They use the same archive format, simple, have the > > concepts right. It may need some scripts or add ons. When Bazaar-NG > > is ready, it will be able to read the GNU Arch/Bazaar archives so > > switching should be easy. > > Arch isn't a sound example of software design. Quite contrary to the I actually _do_ agree with you. I like Arch, but it's user interface certainly is broken and some parts of it would sure needs some redesign. > random notes posted by it's author the following issues did strike me > the time I did evaluate it: > > The application (tla) claims to have "intuitive" command names. However > I didn't see that as given. Most of them where difficult to remember > and appeared to be just infantile. I stopped looking further after I > saw: > > tla my-id instead of: tla user-id or oeven tla set id ... > > tla make-archive instead of tla init In this case, tla init would be a lot *worse*, because there are two different things to initialize -- the archive and the tree. But init-archive would be a little better, for consistency. > tla my-default-archive john@dole.com--2005-VersionPatrol This one is kinda broken. Even in concept it is. > No more "My Compuer" please... > > Repository addressing requires you to use informally defined > very elaborated and typing error prone conventions: > > mkdir ~/{archives} *NO*. Usng this is name is STRONGLY recommended *AGAINST*. Tom once used it in the example or in some of his archive and people started doing it, but it's a compelete bogosity and it is not required anywhere. > tla make-archive john@dole.com--20005-VersionPatrol > ~/{archives}/2005-VersionPatrol > > You notice the requirement for two commands to accomplish a single task > already well denoted by the second command? There is more of the same > at quite a few places when you try to use it. You notice the triple > zero it didn't catch? I sure do. But the folks writing Bazaar are gradually fixing these. There is a lot of them and it's not that long since they started, so they did not fix all of them yey, but I think they eventually will. > As an added bonus it relies on the applications named by accident > patch and diff and installed on the host in question as well as few > other as well to > operate. No. The build process actually checks that the diff and patch applications are actually the GNU Diff and GNU Patch in sufficiently recent version. It's was not always the case, but now it does. > Better don't waste your time with looking at Arch. Stick with patches > you maintain by hand combined with some scripts containing a list of > apply commands > and you should be still more productive then when using Arch. I don't agree with you. Using Arch is more productive (eg. because it does merges), but certainly one could do a lot better than Arch does. ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:01 ` Marcin Dalecki 2005-04-09 8:32 ` Jan Hudec @ 2005-04-11 2:26 ` Miles Bader 2005-04-11 2:56 ` Marcin Dalecki 1 sibling, 1 reply; 201+ messages in thread From: Miles Bader @ 2005-04-11 2:26 UTC (permalink / raw) To: Marcin Dalecki; +Cc: Jan Hudec, Linus Torvalds, Kernel Mailing List Marcin Dalecki <martin@dalecki.de> writes: > Better don't waste your time with looking at Arch. Stick with patches > you maintain by hand combined with some scripts containing a list of > apply commands and you should be still more productive then when using > Arch. Arch has its problems, but please lay off the uninformed flamebait (the "issues" you complain about are so utterly minor as to be laughable). -Miles -- Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-11 2:26 ` Miles Bader @ 2005-04-11 2:56 ` Marcin Dalecki 2005-04-11 6:36 ` Jan Hudec 0 siblings, 1 reply; 201+ messages in thread From: Marcin Dalecki @ 2005-04-11 2:56 UTC (permalink / raw) To: Miles Bader; +Cc: Linus Torvalds, Jan Hudec, Kernel Mailing List On 2005-04-11, at 04:26, Miles Bader wrote: > Marcin Dalecki <martin@dalecki.de> writes: >> Better don't waste your time with looking at Arch. Stick with patches >> you maintain by hand combined with some scripts containing a list of >> apply commands and you should be still more productive then when using >> Arch. > > Arch has its problems, but please lay off the uninformed flamebait (the > "issues" you complain about are so utterly minor as to be laughable). I wish you a lot of laughter after replying to an already 3 days old message, which was my final on Arch. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-11 2:56 ` Marcin Dalecki @ 2005-04-11 6:36 ` Jan Hudec 0 siblings, 0 replies; 201+ messages in thread From: Jan Hudec @ 2005-04-11 6:36 UTC (permalink / raw) To: Marcin Dalecki; +Cc: Miles Bader, Linus Torvalds, Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 1616 bytes --] On Mon, Apr 11, 2005 at 04:56:06 +0200, Marcin Dalecki wrote: > > On 2005-04-11, at 04:26, Miles Bader wrote: > > >Marcin Dalecki <martin@dalecki.de> writes: > >>Better don't waste your time with looking at Arch. Stick with patches > >>you maintain by hand combined with some scripts containing a list of > >>apply commands and you should be still more productive then when using > >>Arch. > > > >Arch has its problems, but please lay off the uninformed flamebait (the > >"issues" you complain about are so utterly minor as to be laughable). > > I wish you a lot of laughter after replying to an already 3 days old > message, > which was my final on Arch. Marcin Dalecki <martin@dalecki.de> complained: > Arch isn't a sound example of software design. Quite contrary to the > random notes posted by it's author the following issues did strike me > the time I did evaluate it: > [...] I didn't comment on this first time, but I see I should have. *NONE* of the issues you complained about were issues of *DESIGN*. They were all issues of *ENGINEERING*. *ENGINEERING* issues can be fixed. One of the issues does not even exist any longer (the diff/patch one -- it now checks they are the right ones -- and in all other respects it is *exactly* the same as depending on a library) But what really matters here is the concept. Arch has a simple concept, that works well. Others have different concepts, that work well or almost well too (Darcs, Monotone). ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (7 preceding siblings ...) 2005-04-07 7:44 ` Jan Hudec @ 2005-04-07 10:56 ` Andrew Walrond 2005-04-08 0:57 ` Ian Wienand 2005-04-08 4:13 ` Chris Wedgwood 10 siblings, 0 replies; 201+ messages in thread From: Andrew Walrond @ 2005-04-07 10:56 UTC (permalink / raw) To: linux-kernel On Wednesday 06 April 2005 16:42, Linus Torvalds wrote: > > PS. Don't bother telling me about subversion. If you must, start reading > up on "monotone". That seems to be the most viable alternative, but don't > pester the developers so much that they don't get any work done. They are > already aware of my problems ;) Care to share your monotone wishlist? Andrew Walrond ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (8 preceding siblings ...) 2005-04-07 10:56 ` Andrew Walrond @ 2005-04-08 0:57 ` Ian Wienand 2005-04-08 4:13 ` Chris Wedgwood 10 siblings, 0 replies; 201+ messages in thread From: Ian Wienand @ 2005-04-08 0:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 594 bytes --] On Wed, Apr 06, 2005 at 08:42:08AM -0700, Linus Torvalds wrote: > If you must, start reading up on "monotone". One slightly annoying thing is that monotone doesn't appear to have a web interface. I used to use the bk one a lot when tracking down bugs, because it was really fast to have a web browser window open and click through the revisions of a file reading checkin comments, etc. Does anyone know if one is being worked on? bazaar-ng at least mention this is important in their design docs and arch has one in development too. -i ianw@gelato.unsw.edu.au http://www.gelato.unsw.edu.au [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 15:42 Kernel SCM saga Linus Torvalds ` (9 preceding siblings ...) 2005-04-08 0:57 ` Ian Wienand @ 2005-04-08 4:13 ` Chris Wedgwood 2005-04-08 4:42 ` Linus Torvalds 2005-04-08 11:42 ` Kernel SCM saga Catalin Marinas 10 siblings, 2 replies; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 4:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, Apr 06, 2005 at 08:42:08AM -0700, Linus Torvalds wrote: > PS. Don't bother telling me about subversion. If you must, start reading > up on "monotone". That seems to be the most viable alternative, but don't > pester the developers so much that they don't get any work done. They are > already aware of my problems ;) I'm playing with monotone right now. Superficially it looks like it has tons of gee-whiz neato stuff... however, it's *agonizingly* slow. I mean glacial. A heavily sedated sloth with no legs is probably faster. Using monotone to pull itself too over 2 hours wall-time and 71 minutes of CPU time. Arguably brand-new CPUs are probably about 2x the speed of what I have now and there might have been networking funnies --- but that's still 35 monutes to get ~40MB of data. The kernel is ten times larger, so does that mean to do a clean pull of the kernel we are looking at (71/2*10) ~ 355 minutes or 6 hours of CPU time? ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:13 ` Chris Wedgwood @ 2005-04-08 4:42 ` Linus Torvalds 2005-04-08 5:04 ` Chris Wedgwood ` (5 more replies) 2005-04-08 11:42 ` Kernel SCM saga Catalin Marinas 1 sibling, 6 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 4:42 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Kernel Mailing List On Thu, 7 Apr 2005, Chris Wedgwood wrote: > > I'm playing with monotone right now. Superficially it looks like it > has tons of gee-whiz neato stuff... however, it's *agonizingly* slow. > I mean glacial. A heavily sedated sloth with no legs is probably > faster. Yes. The silly thing is, at least in my local tests it doesn't actually seem to be _doing_ anything while it's slow (there are no system calls except for a few memory allocations and de-allocations). It seems to have some exponential function on the number of pathnames involved etc. I'm hoping they can fix it, though. The basic notions do not sound wrong. In the meantime (and because monotone really _is_ that slow), here's a quick challenge for you, and any crazy hacker out there: if you want to play with something _really_ nasty (but also very _very_ fast), take a look at kernel.org:/pub/linux/kernel/people/torvalds/. First one to send me the changelog tree of sparse-git (and a tool to commit and push/pull further changes) gets a gold star, and an honorable mention. I've put a hell of a lot of clues in there (*). I've worked on it (and little else) for the last two days. Time for somebody else to tell me I'm crazy. Linus (*) It should be easier than it sounds. The database is designed so that you can do the equivalent of a nonmerging (ie pure superset) push/pull with just plain rsync, so replication really should be that easy (if somewhat bandwidth-intensive due to the whole-file format). Never mind merging. It's not an SCM, it's a distribution and archival mechanism. I bet you could make a reasonable SCM on top of it, though. Another way of looking at it is to say that it's really a content- addressable filesystem, used to track directory trees. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:42 ` Linus Torvalds @ 2005-04-08 5:04 ` Chris Wedgwood 2005-04-08 5:14 ` H. Peter Anvin 2005-04-08 7:14 ` Andrea Arcangeli ` (4 subsequent siblings) 5 siblings, 1 reply; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 5:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: > Yes. The silly thing is, at least in my local tests it doesn't > actually seem to be _doing_ anything while it's slow (there are no > system calls except for a few memory allocations and > de-allocations). It seems to have some exponential function on the > number of pathnames involved etc. I see lots of brk calls changing the heap size, up, down, up, down, over and over. This smells a bit like c++ new/delete behavior to me. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 5:04 ` Chris Wedgwood @ 2005-04-08 5:14 ` H. Peter Anvin 2005-04-08 7:05 ` Rogan Dawes 0 siblings, 1 reply; 201+ messages in thread From: H. Peter Anvin @ 2005-04-08 5:14 UTC (permalink / raw) To: linux-kernel Followup to: <20050408050458.GB8720@taniwha.stupidest.org> By author: Chris Wedgwood <cw@f00f.org> In newsgroup: linux.dev.kernel > > On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: > > > Yes. The silly thing is, at least in my local tests it doesn't > > actually seem to be _doing_ anything while it's slow (there are no > > system calls except for a few memory allocations and > > de-allocations). It seems to have some exponential function on the > > number of pathnames involved etc. > > I see lots of brk calls changing the heap size, up, down, up, down, > over and over. > > This smells a bit like c++ new/delete behavior to me. > Hmmm... can glibc be clued in to do some hysteresis on the memory allocation? -hpa ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 5:14 ` H. Peter Anvin @ 2005-04-08 7:05 ` Rogan Dawes 2005-04-08 7:21 ` Daniel Phillips 0 siblings, 1 reply; 201+ messages in thread From: Rogan Dawes @ 2005-04-08 7:05 UTC (permalink / raw) To: H. Peter Anvin, cw, linux-kernel H. Peter Anvin wrote: > Followup to: <20050408050458.GB8720@taniwha.stupidest.org> > By author: Chris Wedgwood <cw@f00f.org> > In newsgroup: linux.dev.kernel > >>On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: >> >> >>>Yes. The silly thing is, at least in my local tests it doesn't >>>actually seem to be _doing_ anything while it's slow (there are no >>>system calls except for a few memory allocations and >>>de-allocations). It seems to have some exponential function on the >>>number of pathnames involved etc. >> >>I see lots of brk calls changing the heap size, up, down, up, down, >>over and over. >> >>This smells a bit like c++ new/delete behavior to me. >> > > > Hmmm... can glibc be clued in to do some hysteresis on the memory > allocation? > > -hpa Take a look at http://www.linuxshowcase.org/2001/full_papers/ezolt/ezolt_html/ Abstract GNU libc's default setting for malloc can cause a significant performance penalty for applications that use it extensively, such as Compaq's high performance extended math library, CXML. The default malloc tuning can cause a significant number of minor page faults, and result in application performance of only half of the true potential. This paper describes how to remove the performance penalty using environmental variables and the method used to discover the cause of the malloc performance penalty. Regards, Rogan ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 7:05 ` Rogan Dawes @ 2005-04-08 7:21 ` Daniel Phillips 2005-04-08 7:49 ` H. Peter Anvin 0 siblings, 1 reply; 201+ messages in thread From: Daniel Phillips @ 2005-04-08 7:21 UTC (permalink / raw) To: Rogan Dawes; +Cc: H. Peter Anvin, cw, linux-kernel On Friday 08 April 2005 03:05, Rogan Dawes wrote: > Take a look at > http://www.linuxshowcase.org/2001/full_papers/ezolt/ezolt_html/ > > Abstract > > GNU libc's default setting for malloc can cause a significant > performance penalty for applications that use it extensively, such as > Compaq's high performance extended math library, CXML. The default > malloc tuning can cause a significant number of minor page faults, and > result in application performance of only half of the true potential. This does not smell like an n*2 suckage, more like n^something suckage. Finding the elephant under the rug should not be hard. Profile? Regards, Daniel ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 7:21 ` Daniel Phillips @ 2005-04-08 7:49 ` H. Peter Anvin 0 siblings, 0 replies; 201+ messages in thread From: H. Peter Anvin @ 2005-04-08 7:49 UTC (permalink / raw) To: Daniel Phillips; +Cc: Rogan Dawes, cw, linux-kernel Daniel Phillips wrote: > On Friday 08 April 2005 03:05, Rogan Dawes wrote: > >>Take a look at >>http://www.linuxshowcase.org/2001/full_papers/ezolt/ezolt_html/ >> >>Abstract >> >>GNU libc's default setting for malloc can cause a significant >>performance penalty for applications that use it extensively, such as >>Compaq's high performance extended math library, CXML. The default >>malloc tuning can cause a significant number of minor page faults, and >>result in application performance of only half of the true potential. > > > This does not smell like an n*2 suckage, more like n^something suckage. > Finding the elephant under the rug should not be hard. Profile? > Lack of hysteresis can do that, with large swats of memory constantly being claimed and returned to the system. One way to implement hysteresis would be based on a decaying peak-based threshold; unfortunately for optimal performance that requires the C runtime to have a notion of time, and in extreme cases even be able to do asynchronous deallocation, but in reality one can probably assume that the rate of malloc/free is roughly constant over time. -hpa ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:42 ` Linus Torvalds 2005-04-08 5:04 ` Chris Wedgwood @ 2005-04-08 7:14 ` Andrea Arcangeli 2005-04-08 12:02 ` Matthias Andree 2005-04-08 14:26 ` Linus Torvalds 2005-04-08 7:17 ` ross ` (3 subsequent siblings) 5 siblings, 2 replies; 201+ messages in thread From: Andrea Arcangeli @ 2005-04-08 7:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: > play with something _really_ nasty (but also very _very_ fast), take a > look at kernel.org:/pub/linux/kernel/people/torvalds/. Why not to use sql as backend instead of the tree of directories? That solves userland journaling too (really one still has to be careful to know the read-committed semantics of sql, which is not obvious stuff, but 99% of common cases like this one just works safe automatically since all inserts/delete/update are always atomic). You can keep the design of your db exactly the same and even the command line of your script the same, except you won't have deal with the implementation of it anymore, and the end result may run even faster with proper btrees and you won't have scalability issues if the directory of hashes fills up, and it'll get userland journaling, live backups, runtime analyses of your queries with genetic algorithms (pgsql 8 seems to have it) etc... I seem to recall there's a way to do delayed commits too, so you won't be sychronous, but you'll still have journaling. You clearly don't care to do synchronous writes, all you care about is that the commit is either committed completely or not committed at all (i.e. not an half write of the patch that leaves your db corrupt). Example: CREATE TABLE patches ( patch BIGSERIAL PRIMARY KEY, commiter_name VARCHAR(32) NOT NULL CHECK(commiter_name != ''), commiter_email VARCHAR(32) NOT NULL CHECK(commiter_email != ''), md5 CHAR(32) NOT NULL CHECK(md5 != ''), len INTEGER NOT NULL CHECK(len > 0), UNIQUE(md5, len), payload BYTEA NOT NULL, timestamp TIMESTAMP NOT NULL ); CREATE INDEX patches_md5_index ON patches (md5); CREATE INDEX patches_timestamp_index ON patches (timestamp); s/md5/sha1/, no difference. This will automatically spawn fatal errors if there are hash collisions and it enforces a bit of checking. Then you need a few lines of python to insert/lookup. Example for psycopg2: import pwd, os, socket [..] patch = {'commiter_name': pwd.getpwuid(os.getuid())[4], 'commiter_email': pwd.getpwuid(os.getuid())[0] + '@' + socket.getfqdn(), 'md5' : md5.new(data).hexdigest(), 'len' : len(data), payload : data, 'timestamp' : 'now'} curs.execute("""INSERT INTO patches VALUES (%(committer_name)s, %(commiter_email)s, %(md5)s, %(len)s, %(payload)s, %(timestamp)s)""", patch) ('now' will be evaluated by the sql server, who knows about the time too) The speed I don't know for sure, but especially with lots of data the sql way should at least not be significantly slower, pgsql scales with terabytes without apparent problems (modulo the annoyance of running vacuum once per day in cron, to avoid internal sequence number overflows after >4 giga committs, and once per day the analyser too so it learns about your usage patterns and can optimize the disk format for it). For sure the python part isn't going to be noticeable, you can still write it in C if you prefer (it'll clearly run faster if you want to run tons of inserts for a benchmark), so then everything will run at bare-hardware speed and there will be no time wasted interpreting bytecode (only the sql commands have to be interpreted). The backup should also be tiny (runtime size is going to be somewhat larger due the more data structure it has, how much larger I don't know). I know for sure this kind of setup works like a charm on ppc64 (32bit userland), and x86 (32bit and 64bit userland). monotone using sqlite sounds a good idea infact (IMHO they could use a real dbms too, so that you also get parallelism and you could attach another app to the backing store at the same time or you could run a live backup and to get all other high end performance features). If you feel this is too bloated feel free to ignore this email of course! If instead you'd like to give this a spin, let me know and I can help to set it up quick (either today or from Monday). I also like quick dedicated solutions and I was about to write a backing store with a tree of dirs + hashes similar to yours for a similar problem, but I give it up while planning the userland journaling part and even worse the userland fs locking with live backups, when a DBMS gets everything right including live backups (and it provides async interface too via sockets). OTOH for this usage journaling and locking aren't a big issue since you may have the patch to hash by hand to find any potentially half-corrupted bit after reboot and you probably run it serially. About your compression of the data, I don't think you want to do that. The size of the live image isn't the issue, the issue is the size of the _backups_ and you want to compress an huge thing (i.e. the tarball of the cleartext, or the sql cleartext backup), not many tiny patches. Comparing the size of the repositories isn't interesting, the interesting thing is to compare the size of the backups. BTW, this fixed compliation for my system. --- ./Makefile.orig 2005-04-08 09:07:17.000000000 +0200 +++ ./Makefile 2005-04-08 08:52:35.000000000 +0200 @@ -8,7 +8,7 @@ all: $(PROG) install: $(PROG) install $(PROG) $(HOME)/bin/ -LIBS= -lssl +LIBS= -lssl -lz init-db: init-db.o Thanks. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 7:14 ` Andrea Arcangeli @ 2005-04-08 12:02 ` Matthias Andree 2005-04-08 12:21 ` Florian Weimer 2005-04-08 14:26 ` Linus Torvalds 1 sibling, 1 reply; 201+ messages in thread From: Matthias Andree @ 2005-04-08 12:02 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Linus Torvalds, Chris Wedgwood, Kernel Mailing List Andrea Arcangeli schrieb am 2005-04-08: > On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: > > play with something _really_ nasty (but also very _very_ fast), take a > > look at kernel.org:/pub/linux/kernel/people/torvalds/. > > Why not to use sql as backend instead of the tree of directories? That solves > userland journaling too (really one still has to be careful to know the > read-committed semantics of sql, which is not obvious stuff, but 99% of > common cases like this one just works safe automatically since all > inserts/delete/update are always atomic). > > You can keep the design of your db exactly the same and even the command line > of your script the same, except you won't have deal with the implementation of > it anymore, and the end result may run even faster with proper btrees and you > won't have scalability issues if the directory of hashes fills up, and it'll > get userland journaling, live backups, runtime analyses of your queries with > genetic algorithms (pgsql 8 seems to have it) etc... > > I seem to recall there's a way to do delayed commits too, so you won't > be sychronous, but you'll still have journaling. You clearly don't care > to do synchronous writes, all you care about is that the commit is > either committed completely or not committed at all (i.e. not an half > write of the patch that leaves your db corrupt). > > Example: > > CREATE TABLE patches ( > patch BIGSERIAL PRIMARY KEY, > > commiter_name VARCHAR(32) NOT NULL CHECK(commiter_name != ''), > commiter_email VARCHAR(32) NOT NULL CHECK(commiter_email != ''), The length is too optimistic and insufficient to import the current BK stuff. I'd vote for 64 or at least 48 for each, although 48 is going to be a tight fit. It costs a bit but considering the expected payload size it's irrelevant. Committer (double t) email is up to 36 characters at the moment and the name up to 43 characters when analyzing the shortlog script with this little Perl snippet: ------------------------------------------------------------------------ while (($k, $v) = each %addresses) { $lk = length $k; $lv = length $v; if ($lk > $mk) { $mk = $lk; } if ($lv > $mv) { $mv = $lv; } } print "max key len $mk, max val len $mv\n"; ------------------------------------------------------------------------ which prints: (key is the email, val the name) max key len 43, max val len 36 -- Matthias Andree ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 12:02 ` Matthias Andree @ 2005-04-08 12:21 ` Florian Weimer 0 siblings, 0 replies; 201+ messages in thread From: Florian Weimer @ 2005-04-08 12:21 UTC (permalink / raw) To: Kernel Mailing List * Matthias Andree: >> commiter_name VARCHAR(32) NOT NULL CHECK(commiter_name != ''), >> commiter_email VARCHAR(32) NOT NULL CHECK(commiter_email != ''), > > The length is too optimistic and insufficient to import the current BK > stuff. I'd vote for 64 or at least 48 for each, although 48 is going to > be a tight fit. It costs a bit but considering the expected payload > size it's irrelevant. You should also check your database documentation if VARCHAR(n) is actually implemented implemented in the same way as TEXT (or what the unbounded string type is called), plus an additional length check. It doesn't make much sense to use VARCHAR if there isn't a performance (or disk space) benefit, IMHO, especially for such data. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 7:14 ` Andrea Arcangeli 2005-04-08 12:02 ` Matthias Andree @ 2005-04-08 14:26 ` Linus Torvalds 2005-04-08 16:15 ` Matthias-Christian Ott 1 sibling, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 14:26 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Chris Wedgwood, Kernel Mailing List On Fri, 8 Apr 2005, Andrea Arcangeli wrote: > > Why not to use sql as backend instead of the tree of directories? Because it sucks? I can come up with millions of ways to slow things down on my own. Please come up with ways to speed things up instead. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 14:26 ` Linus Torvalds @ 2005-04-08 16:15 ` Matthias-Christian Ott 2005-04-08 17:14 ` Linus Torvalds 2005-04-09 1:00 ` Marcin Dalecki 0 siblings, 2 replies; 201+ messages in thread From: Matthias-Christian Ott @ 2005-04-08 16:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List Linus Torvalds wrote: >On Fri, 8 Apr 2005, Andrea Arcangeli wrote: > > >>Why not to use sql as backend instead of the tree of directories? >> >> > >Because it sucks? > >I can come up with millions of ways to slow things down on my own. Please >come up with ways to speed things up instead. > > Linus >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > > SQL Databases like SQLite aren't slow. But maybe a Berkeley Database v.4 is a better solution. Matthias-Christian Ott ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 16:15 ` Matthias-Christian Ott @ 2005-04-08 17:14 ` Linus Torvalds 2005-04-08 17:15 ` Chris Wedgwood ` (3 more replies) 2005-04-09 1:00 ` Marcin Dalecki 1 sibling, 4 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 17:14 UTC (permalink / raw) To: Matthias-Christian Ott Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > > SQL Databases like SQLite aren't slow. After applying a patch, I can do a complete "show-diff" on the kernel tree to see the effect of it in about 0.15 seconds. Also, I can use rsync to efficiently replicate my database without having to re-send the whole crap - it only needs to send the new stuff. You do that with an sql database, and I'll be impressed. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:14 ` Linus Torvalds @ 2005-04-08 17:15 ` Chris Wedgwood 2005-04-08 17:46 ` Linus Torvalds 2005-04-08 17:25 ` Matthias-Christian Ott ` (2 subsequent siblings) 3 siblings, 1 reply; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 17:15 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 10:14:22AM -0700, Linus Torvalds wrote: > After applying a patch, I can do a complete "show-diff" on the kernel tree > to see the effect of it in about 0.15 seconds. How does that work? Can you stat the entire tree in that time? I measure it as being higher than that. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:15 ` Chris Wedgwood @ 2005-04-08 17:46 ` Linus Torvalds 2005-04-08 18:05 ` Chris Wedgwood 0 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 17:46 UTC (permalink / raw) To: Chris Wedgwood Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, 8 Apr 2005, Chris Wedgwood wrote: > On Fri, Apr 08, 2005 at 10:14:22AM -0700, Linus Torvalds wrote: > > > After applying a patch, I can do a complete "show-diff" on the kernel tree > > to see the effect of it in about 0.15 seconds. > > How does that work? Can you stat the entire tree in that time? I > measure it as being higher than that. I can indeed stat the entire tree in that time (assuming it's in memory, of course, but my kernel trees are _always_ in memory ;), but in order to do so, I have to be good at finding the names to stat. In particular, you have to be extremely careful. You need to make sure that you don't stat anything you don't need to. We're not talking just blindly recursing the tree here, and that's exactly the point. You have to know what you're doing, but the whole point of keeping track of directory contents is that dammit, that's your whole job. Anybody who can't list the files they work on _instantly_ is doing something damn wrong. "git" is really trivial, written in four days. Most of that was not actually spent coding, but thinking about the data structures. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:46 ` Linus Torvalds @ 2005-04-08 18:05 ` Chris Wedgwood 2005-04-08 19:03 ` Linus Torvalds 0 siblings, 1 reply; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 18:05 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 10:46:40AM -0700, Linus Torvalds wrote: > I can indeed stat the entire tree in that time (assuming it's in memory, > of course, but my kernel trees are _always_ in memory ;), but in order to > do so, I have to be good at finding the names to stat. <pause ... tapity tap> I just tested this (I wanted to be sure you didn't have some 47GHz LiHe cooled Xeon or something). On my somewhat slowish machine[1] (by today's standards anyhow) I can stat a checked out tree (ie. the source files and not SCM files) in about 0.10s it seems and 0.26s for an entire tree with BK files in it. > In particular, you have to be extremely careful. You need to make > sure that you don't stat anything you don't need to. Actually, I could probably make this *much* still faster with a caveat. Given that my editor when I write a file will write a temporary file and rename it, for files in directories where nlink==2 I can check chat first and skip the stat of the individual files. And I guess if I was bored I could have my editor or some daemon sitting in the background intelligently using dnotify to have this information on-hand more or less instantly. For this purpose though that seems like a lot of effort for no real gain right now. > Anybody who can't list the files they work on _instantly_ is doing > something damn wrong. Well, I do like to do "bk sfiles -x" fairly often. But then again I can stat dirs and compare against a cache to make that fast too. [1] Dual AthlonMP 2200 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:05 ` Chris Wedgwood @ 2005-04-08 19:03 ` Linus Torvalds 2005-04-08 19:16 ` Chris Wedgwood ` (2 more replies) 0 siblings, 3 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 19:03 UTC (permalink / raw) To: Chris Wedgwood Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, 8 Apr 2005, Chris Wedgwood wrote: > > Actually, I could probably make this *much* still faster with a > caveat. Given that my editor when I write a file will write a > temporary file and rename it, for files in directories where nlink==2 > I can check chat first and skip the stat of the individual files. Yes, doing the stat just on the directory (on leaf directories only, of course, but nlink==2 does say that on most filesystems) is indeed a huge potential speedup. It doesn't matter so much for the cached case, but it _does_ matter for the uncached one. Makes a huge difference, in fact (I was playing with exactly that back when I started doing "bkr" in BK/tools - three years ago). It turns out that I expect to cache my source tree (at least the mail outline), and that guides my optimizations, but yes, your dir stat does help in the case of "occasionally working with lots of large projects" rather than "mostly working on the same ones with enough RAM to cache it all". And "git" is actually fairly anal in this respect: it not only stats all files, but the index file contains a lot more of the stat info than you'd expect. So for example, it checks both ctime and mtime to the nanosecond (did I mention that I didn't worry too much about portability?) exactly so that it can catch any changes except for actively malicious things. And if you do actively malicious things in your own directory, you get what you deserve. It's actually _hard_ to try to fool git into believing a file hasn't changed: you need to not only replace it with the exact same file length and ctime/mtime, you need to reuse the same inode/dev numbers (again - I didn't worry about portability, and filesystems where those aren't stable are a "don't do that then") and keep the mode the same. Oh, and uid/gid, but that was much me being silly. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:03 ` Linus Torvalds @ 2005-04-08 19:16 ` Chris Wedgwood 2005-04-08 19:38 ` Florian Weimer ` (2 more replies) 2005-04-09 7:20 ` Willy Tarreau 2005-04-09 15:15 ` Paul Jackson 2 siblings, 3 replies; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 19:16 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 12:03:49PM -0700, Linus Torvalds wrote: > Yes, doing the stat just on the directory (on leaf directories only, of > course, but nlink==2 does say that on most filesystems) is indeed a huge > potential speedup. Here I measure about 6ms for cache --- essentially below the noise threshold for something that does real work. > It doesn't matter so much for the cached case, but it _does_ matter > for the uncached one. Doing the minimal stat cold-cache here is about 6s for local disk. I'm somewhat surprised it's that bad actually. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:16 ` Chris Wedgwood @ 2005-04-08 19:38 ` Florian Weimer 2005-04-08 19:48 ` Chris Wedgwood 2005-04-08 19:39 ` Linus Torvalds 2005-04-08 20:50 ` Kernel SCM saga Luck, Tony 2 siblings, 1 reply; 201+ messages in thread From: Florian Weimer @ 2005-04-08 19:38 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Kernel Mailing List * Chris Wedgwood: >> It doesn't matter so much for the cached case, but it _does_ matter >> for the uncached one. > > Doing the minimal stat cold-cache here is about 6s for local disk. Does sorting by inode number make a difference? ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:38 ` Florian Weimer @ 2005-04-08 19:48 ` Chris Wedgwood 0 siblings, 0 replies; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 19:48 UTC (permalink / raw) To: Florian Weimer; +Cc: Kernel Mailing List On Fri, Apr 08, 2005 at 09:38:09PM +0200, Florian Weimer wrote: > Does sorting by inode number make a difference? It almost certainly would. But I can sort more intelligently than that even (all the world isn't ext2/3). ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:16 ` Chris Wedgwood 2005-04-08 19:38 ` Florian Weimer @ 2005-04-08 19:39 ` Linus Torvalds 2005-04-08 20:11 ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad 2005-04-08 20:50 ` Kernel SCM saga Luck, Tony 2 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 19:39 UTC (permalink / raw) To: Chris Wedgwood Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, 8 Apr 2005, Chris Wedgwood wrote: > > > It doesn't matter so much for the cached case, but it _does_ matter > > for the uncached one. > > Doing the minimal stat cold-cache here is about 6s for local disk. > I'm somewhat surprised it's that bad actually. One of the reasons I do inode numbers in the "index" file (apart from checking that the inode hasn't changed) is in fact that "stat()" is damn slow if it causes seeks. Since your stat loop is entirely You can optimize your stat() patterns on traditional unix-like filesystems by just sorting the stats by inode number (since the inode number is historically a special index into the inode table - even when filesystems distribute the inodes over several tables, sorting will generally do the right thing from a seek perspective). It's a disgusting hack, but it literally gets you orders-of-magnitude performance improvments in many real-life cases. It does have some downsides: - it buys you nothing when it's cached (and obviously you have the sorting overhead, although that's pretty cheap) - on other filesystems it can make things slower. But if the cold-cache case actually is a concern, I do have the solution for it. Just a simple "prime-cache" program that does a qsort on the index file entries and does the stat() on them all will bring the numbers down. Those 6 seconds you see are the disk head seeking around like mad. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Uncached stat performace [ Was: Re: Kernel SCM saga.. ] 2005-04-08 19:39 ` Linus Torvalds @ 2005-04-08 20:11 ` Ragnar Kjørstad 2005-04-08 20:14 ` Chris Wedgwood 0 siblings, 1 reply; 201+ messages in thread From: Ragnar Kjørstad @ 2005-04-08 20:11 UTC (permalink / raw) To: Linus Torvalds Cc: Chris Wedgwood, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 12:39:26PM -0700, Linus Torvalds wrote: > One of the reasons I do inode numbers in the "index" file (apart from > checking that the inode hasn't changed) is in fact that "stat()" is damn > slow if it causes seeks. Since your stat loop is entirely > > You can optimize your stat() patterns on traditional unix-like filesystems > by just sorting the stats by inode number (since the inode number is > historically a special index into the inode table - even when filesystems > distribute the inodes over several tables, sorting will generally do the > right thing from a seek perspective). It's a disgusting hack, but it > literally gets you orders-of-magnitude performance improvments in many > real-life cases. It does, so why isn't there a way to do this without the disgusting hack? (Your words, not mine :) ) E.g, wouldn't a aio_stat() allow simular or better speedups in a way that doesn't depend on ext2/3 internals? I bet it would make a significant difference from things like "ls -l" in large uncached directories and imap-servers with maildir? -- Ragnar Kjørstad Software Engineer Scali - http://www.scali.com Scaling the Linux Datacenter ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Uncached stat performace [ Was: Re: Kernel SCM saga.. ] 2005-04-08 20:11 ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad @ 2005-04-08 20:14 ` Chris Wedgwood 0 siblings, 0 replies; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 20:14 UTC (permalink / raw) To: Ragnar Kj?rstad Cc: Linus Torvalds, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 10:11:51PM +0200, Ragnar Kj?rstad wrote: > It does, so why isn't there a way to do this without the disgusting > hack? (Your words, not mine :) ) inode sorting probably a good guess for a number of filesystems, you can map the blocks used to do better still (somewhat fs specific) you can do better still if you multiple stats in parallel (up to a point) and let the elevator sort things out > I bet it would make a significant difference from things like "ls -l" in > large uncached directories and imap-servers with maildir? sort + concurrent stats would help here i think i'm not sure i like the idea of ls using lots of threads though :) ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:16 ` Chris Wedgwood 2005-04-08 19:38 ` Florian Weimer 2005-04-08 19:39 ` Linus Torvalds @ 2005-04-08 20:50 ` Luck, Tony 2005-04-08 21:27 ` Linus Torvalds 2 siblings, 1 reply; 201+ messages in thread From: Luck, Tony @ 2005-04-08 20:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List It looks like an operation like "show me the history of mm/memory.c" will be pretty expensive using git. I'd need to look at the current tree, and then trace backwards through all 60,000 changesets to see which ones had actual changes to this file. Could you expand the tuple in the tree object to include a back pointer to the previous tree in which the tuple changed? Or does adding history to the tree violate other goals of the tree type? -Tony ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 20:50 ` Kernel SCM saga Luck, Tony @ 2005-04-08 21:27 ` Linus Torvalds 2005-04-09 17:14 ` Roman Zippel 0 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 21:27 UTC (permalink / raw) To: Luck, Tony; +Cc: Kernel Mailing List On Fri, 8 Apr 2005 Luck@unix-os.sc.intel.com wrote: > > It looks like an operation like "show me the history of mm/memory.c" will > be pretty expensive using git. Yes. Per-file history is expensive in git, because if the way it is indexed. Things are indexed by tree and by changeset, and there are no per-file indexes. You could create per-file _caches_ (*) on top of git if you wanted to make it behave more like a real SCM, but yes, it's all definitely optimized for the things that _I_ tend to care about, which is the whole-repository operations. Linus (*) Doing caching on that level is probably find, especially since most people really tend to want it for just the relatively few files that they work on anyway. Limiting the caches to a subset of the tree should be quite effective. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 21:27 ` Linus Torvalds @ 2005-04-09 17:14 ` Roman Zippel 0 siblings, 0 replies; 201+ messages in thread From: Roman Zippel @ 2005-04-09 17:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Luck, Tony, Kernel Mailing List Hi, On Fri, 8 Apr 2005, Linus Torvalds wrote: > Yes. Per-file history is expensive in git, because if the way it is > indexed. Things are indexed by tree and by changeset, and there are no > per-file indexes. > > You could create per-file _caches_ (*) on top of git if you wanted to make > it behave more like a real SCM, but yes, it's all definitely optimized for > the things that _I_ tend to care about, which is the whole-repository > operations. Per file history is also expensive for another reason. The basic reason is that I think that a hash based storage is not the best approach for SCM. It's lacking locality, so the more it grows the more it has to seek to collect all the data. To reduce the space usage you could replace the parent file with a sha1 reference + delta to the new file. This is basically what monotone does and might cause perfomance problems if you need to restore old versions (e.g. if you want to annotate a file). bye, Roman ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:03 ` Linus Torvalds 2005-04-08 19:16 ` Chris Wedgwood @ 2005-04-09 7:20 ` Willy Tarreau 2005-04-09 15:15 ` Paul Jackson 2 siblings, 0 replies; 201+ messages in thread From: Willy Tarreau @ 2005-04-09 7:20 UTC (permalink / raw) To: Linus Torvalds Cc: Chris Wedgwood, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 12:03:49PM -0700, Linus Torvalds wrote: > And if you do actively malicious things in your own directory, you get > what you deserve. It's actually _hard_ to try to fool git into believing a > file hasn't changed: you need to not only replace it with the exact same > file length and ctime/mtime, you need to reuse the same inode/dev numbers > (again - I didn't worry about portability, and filesystems where those > aren't stable are a "don't do that then") and keep the mode the same. Oh, > and uid/gid, but that was much me being silly. It would be even easier to touch the tree with a known date before patching (eg:1/1/70). It would protect against any accidental date change if for any reason your system time went backwards while working on the tree. Another trick I use when I build the 2.4-hf patches is to build a list of filenames from the patches. It works only because I want to keep all original patches and no change should appear outside those patches. Using this + cp -al + diff -pruN makes the process very fast. It would not work if I had to rebuild those patches from hand-edited files of course. Last but not least, it only takes 0.26 seconds on my dual athlon 1800 to find date/size changes between 2.6.11{,.7} and 4.7s if the tool includes the md5 sum in its checks : $ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \ --ignore-dot --only-new --ignore-sum linux-2.6.11/. linux-2.6.11.7/. |wc -l 47 real 0m0.255s user 0m0.094s sys 0m0.162s $ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \ --ignore-dot --only-new linux-2.6.11/. linux-2.6.11.7/. |wc -l 47 real 0m4.705s user 0m3.398s sys 0m1.310s (This was with 'flx', a tool a friend developped for file-system integrity checking which we also use to build our packages). Anyway, what I wanted to show is that once the trees are cached, even somewhat heavy operations such as checksumming can be done occasionnaly (such as md5 for double checking) without you waiting too long. And I don't think that a database would provide all the comfort of a standard file-system (cp -al, rsync, choice of tools, etc...). Willy ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:03 ` Linus Torvalds 2005-04-08 19:16 ` Chris Wedgwood 2005-04-09 7:20 ` Willy Tarreau @ 2005-04-09 15:15 ` Paul Jackson 2 siblings, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-09 15:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: cw, matthias.christian, andrea, linux-kernel Linus wrote: > you need to reuse the same inode/dev numbers > (again - I didn't worry about portability, and filesystems where those > aren't stable are a "don't do that then") On filesystems that don't have a stable inode number, I use the md5sum of the full (relative to mount point) pathname as the inode number. Since these same file systems (not surprisingly) lack hard links as well, the pathname _is_ essentially the stable inode number. Off-topic details ... This is on my backup program, which does a full snapshot of my 90 Gb system, including some FAT file systems, in 6 or 7 minutes, plus time proportional to actual changes. I have given up finding a backup program I can tolerate, and write my own. It stores each md5sum unique blob exactly once, but uses the same sort of tricks you describe to detect changes from examining just the stat information so as to avoid reading every damn byte on the disk. It works with smb, fat, vfat, ntfs, reiserfs, xfs, ext2/3, ... A single manifest file, in plain ascii, one file per line, captures a full snapshot, disk-to-disk, every few hours. This comment from my backup source explains more: # Unfortunately, fat, vfat, smb, and ncpfs (Netware) file systems # do not have unique disk-based persistent inode numbers. # The kernel constructs transient inode numbers for inodes # in its cache. But after an umount and re-mount, the inode # numbers are all different. So we would end up recalculating # the md5sums of all files in any such file systems. # # To avoid this, we keep track of which directories are on such # file systems, and for files in any such directory, instead # of using the inode value from stat'ing a file, we use the # md5sum of its path as a pseudo-inode number. This digest of # a file's path has improved persistance over it's transiently # assigned inode number. Fields 5,6,7 (files total, free and # avail) happen to be zero on file systems (fat, vfat, smb, # ...) with no real inodes, so we we use this fallback means # of getting a persistent pseudo-inode if a statvfs() call on # its directory has fields 5,6,7 summing to zero: # sum(os.statvfs(dir)[5:8]) == 0 # We include that dir in the fat_directories set in this case. fat_directories = sets.Set() # set of directory paths on FAT file systems # The Python statvfs() on Linux is a tad expensive - the # glibc statvfs(2) code does several system calls, including # scanning /proc/mounts and stat'ing its entries. We need # to know for each file whether it is on a "fat" file system # (see above), but for efficiency we only statvfs at mount # points, then propagate the file system type from there down. mountpoints = [m.split()[1] for m in open("/proc/mounts")] -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:14 ` Linus Torvalds 2005-04-08 17:15 ` Chris Wedgwood @ 2005-04-08 17:25 ` Matthias-Christian Ott 2005-04-08 18:14 ` Linus Torvalds 2005-04-08 17:35 ` Jeff Garzik 2005-04-09 1:04 ` Marcin Dalecki 3 siblings, 1 reply; 201+ messages in thread From: Matthias-Christian Ott @ 2005-04-08 17:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List Linus Torvalds wrote: >On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > > >>SQL Databases like SQLite aren't slow. >> >> > >After applying a patch, I can do a complete "show-diff" on the kernel tree >to see the effect of it in about 0.15 seconds. > >Also, I can use rsync to efficiently replicate my database without having >to re-send the whole crap - it only needs to send the new stuff. > >You do that with an sql database, and I'll be impressed. > > Linus > > > Ok, but if you want to search for information in such big text files it slow, because you do linear search -- most datases use faster search algorithms like hashing and if you have multiple files (I don't if you're system uses multiple files (like bitkeeper) or not) which need a system call to be opened this will be very slow, because system calls itself are slow. And using rsync is also possible because most databases store their information as plain text with meta information. Mattthias-Christian Ott ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:25 ` Matthias-Christian Ott @ 2005-04-08 18:14 ` Linus Torvalds 2005-04-08 18:28 ` Jon Smirl ` (2 more replies) 0 siblings, 3 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 18:14 UTC (permalink / raw) To: Matthias-Christian Ott Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > > Ok, but if you want to search for information in such big text files it > slow, because you do linear search No I don't. I don't search for _anything_. I have my own content-addressable filesystem, and I guarantee you that it's faster than mysql, because it depends on the kernel doing the right thing (which it does). I never do a single "readdir". It's all direct data lookup, no "searching" anywhere. Databases aren't magical. Quite the reverse. They easily end up being _slower_ than doing it by hand, simply because they have to solve a much more generic issue. If you design your data structures and abstractions right, a database is pretty much guaranteed to only incur overhead. The advantage of a database is the abstraction and management it gives you. But I did my own special-case abstraction in git. Yeah, I bet "git" might suck if your OS sucks. I definitely depend on name caching at an OS level so that I know that opening a file is fast. In other words, there _is_ an indexing and caching database in there, and it's called the Linux VFS layer and the dentry cache. The proof is in the pudding. git is designed for _one_ thing, and one thing only: tracking a series of directory states in a way that can be replicated. It's very very fast at that. A database with a more flexible abstraction migt be faster at other things, but the fact is, you do take a hit. The problem with databases are: - they are damn hard to just replicate wildly and without control. The database backing file inherently has a lot of internal state. You may be able to "just copy it", but you have to copy the whole damn thing. In "git", the data is all there in immutable blobs that you can just rsync. In fact, you don't even need rsync: you can just look at the filenames, and anything new you copy. No need for any fancy "read the files to see that they match". They _will_ match, or you can tell immediately that a file is corrupt. Look at this: torvalds@ppc970:~/git> sha1sum .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 e7bfaadd5d2331123663a8f14a26604a3cdcb678 .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 see a pattern anywhere? Imagine that you know the list of files you have, and the list of files the other side has (never mind the contents), and how _easy_ it is to synchronize. Without ever having to even read the remote files that you know you already have. How do you replicate your database incrementally? I've given you enough clues to do it for "git" in probably five lines of perl. - they tend to take time to set up and prime. In contrast, the filesystem is always there. Sure, you effectively have to "prime" that one too, but the thing is, if your OS is doing its job, you basically only need to prime it once per reboot. No need to prime it for each process you start or play games with connecting to servers etc. It's just there. Always. So if you think of the filesystem as a database, you're all set. If you design your data structure so that there is just one index, you make that the name, and the kernel will do all the O(1) hashed lookups etc for you. You do have to limit yourself in some ways. Oh, and you have to be willing to waste diskspace. "git" is _not_ space-efficient. The good news is that it is cache-friendly, since it is also designed to never actually look at any old files that aren't part of the immediate history, so while it wastes diskspace, it does not waste the (much more precious) page cache. IOW big file-sets are always bad for performance if you need to traverse them to get anywhere, but if you index things so that you only read the stuff you really really _need_ (which git does), big file-sets are just an excuse to buy a new disk ;) Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:14 ` Linus Torvalds @ 2005-04-08 18:28 ` Jon Smirl 2005-04-08 18:58 ` Florian Weimer 2005-04-09 1:11 ` Marcin Dalecki 2005-04-08 19:16 ` Matthias-Christian Ott 2005-04-09 1:09 ` Marcin Dalecki 2 siblings, 2 replies; 201+ messages in thread From: Jon Smirl @ 2005-04-08 18:28 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote: > How do you replicate your database incrementally? I've given you enough > clues to do it for "git" in probably five lines of perl. Efficient database replication is achieved by copying the transaction logs and then replaying them. Most mid to high end databases support this. You only need to copy the parts of the logs that you don't already have. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:28 ` Jon Smirl @ 2005-04-08 18:58 ` Florian Weimer 2005-04-09 1:11 ` Marcin Dalecki 1 sibling, 0 replies; 201+ messages in thread From: Florian Weimer @ 2005-04-08 18:58 UTC (permalink / raw) To: Jon Smirl; +Cc: Kernel Mailing List * Jon Smirl: > On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote: >> How do you replicate your database incrementally? I've given you enough >> clues to do it for "git" in probably five lines of perl. > > Efficient database replication is achieved by copying the transaction > logs and then replaying them. Works only if the databases are in sync. Even if the transaction logs are pretty high-level, you risk violating constraints specified by the application. General multi-master replication is an unsolved problem. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:28 ` Jon Smirl 2005-04-08 18:58 ` Florian Weimer @ 2005-04-09 1:11 ` Marcin Dalecki 2005-04-09 1:50 ` David Lang 1 sibling, 1 reply; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:11 UTC (permalink / raw) To: Jon Smirl Cc: Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List, Linus Torvalds, Matthias-Christian Ott On 2005-04-08, at 20:28, Jon Smirl wrote: > On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote: >> How do you replicate your database incrementally? I've given you >> enough >> clues to do it for "git" in probably five lines of perl. > > Efficient database replication is achieved by copying the transaction > logs and then replaying them. Most mid to high end databases support > this. You only need to copy the parts of the logs that you don't > already have. > Databases supporting replication are called high end. You forgot the cats dance around the network this issue involves. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:11 ` Marcin Dalecki @ 2005-04-09 1:50 ` David Lang 2005-04-09 22:12 ` Florian Weimer 0 siblings, 1 reply; 201+ messages in thread From: David Lang @ 2005-04-09 1:50 UTC (permalink / raw) To: Marcin Dalecki Cc: Jon Smirl, Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List, Linus Torvalds, Matthias-Christian Ott On Sat, 9 Apr 2005, Marcin Dalecki wrote: > On 2005-04-08, at 20:28, Jon Smirl wrote: > >> On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote: >>> How do you replicate your database incrementally? I've given you enough >>> clues to do it for "git" in probably five lines of perl. >> >> Efficient database replication is achieved by copying the transaction >> logs and then replaying them. Most mid to high end databases support >> this. You only need to copy the parts of the logs that you don't >> already have. >> > Databases supporting replication are called high end. You forgot the cats > dance > around the network this issue involves. And Postgres (which is Free in all senses of the word) is high end by this definition. I'm not saying that it's an efficiant thing to use for this task, but don't be fooled into thinking you need something on the price of Oracle to do this job. David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:50 ` David Lang @ 2005-04-09 22:12 ` Florian Weimer 0 siblings, 0 replies; 201+ messages in thread From: Florian Weimer @ 2005-04-09 22:12 UTC (permalink / raw) To: David Lang; +Cc: Kernel Mailing List * David Lang: >> Databases supporting replication are called high end. You forgot >> the cats dance around the network this issue involves. > > And Postgres (which is Free in all senses of the word) is high end by this > definition. I'm not aware of *any* DBMS, commercial or not, which can perform meaningful multi-master replication on tables which mainly consist of text files as records. All you can get is single-master replication (which is well-understood), or some rather scary stuff which involves throwing away updates, or taking extrema or averages (even automatic 3-way merges aren't available). ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:14 ` Linus Torvalds 2005-04-08 18:28 ` Jon Smirl @ 2005-04-08 19:16 ` Matthias-Christian Ott 2005-04-08 19:32 ` Linus Torvalds 2005-04-09 1:09 ` Marcin Dalecki 2 siblings, 1 reply; 201+ messages in thread From: Matthias-Christian Ott @ 2005-04-08 19:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List Linus Torvalds wrote: >On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > > >>Ok, but if you want to search for information in such big text files it >>slow, because you do linear search >> >> > >No I don't. I don't search for _anything_. I have my own >content-addressable filesystem, and I guarantee you that it's faster than >mysql, because it depends on the kernel doing the right thing (which it >does). > > > I'm not talking about mysql, i'm talking about fast databases like sqlite or db4. >I never do a single "readdir". It's all direct data lookup, no "searching" >anywhere. > >Databases aren't magical. Quite the reverse. They easily end up being >_slower_ than doing it by hand, simply because they have to solve a much >more generic issue. If you design your data structures and abstractions >right, a database is pretty much guaranteed to only incur overhead. > >The advantage of a database is the abstraction and management it gives >you. But I did my own special-case abstraction in git. > >Yeah, I bet "git" might suck if your OS sucks. I definitely depend on name >caching at an OS level so that I know that opening a file is fast. In >other words, there _is_ an indexing and caching database in there, and >it's called the Linux VFS layer and the dentry cache. > >The proof is in the pudding. git is designed for _one_ thing, and one >thing only: tracking a series of directory states in a way that can be >replicated. It's very very fast at that. A database with a more flexible >abstraction migt be faster at other things, but the fact is, you do take a >hit. > >The problem with databases are: > > - they are damn hard to just replicate wildly and without control. The > database backing file inherently has a lot of internal state. You may > be able to "just copy it", but you have to copy the whole damn thing. > > This is _not_ true for every database (specialy plain/text databases with meta information). > In "git", the data is all there in immutable blobs that you can just > rsync. In fact, you don't even need rsync: you can just look at the > filenames, and anything new you copy. No need for any fancy "read the > files to see that they match". They _will_ match, or you can tell > immediately that a file is corrupt. > > Look at this: > > torvalds@ppc970:~/git> sha1sum .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 > e7bfaadd5d2331123663a8f14a26604a3cdcb678 .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 > > see a pattern anywhere? Imagine that you know the list of files you > have, and the list of files the other side has (never mind the > contents), and how _easy_ it is to synchronize. Without ever having to > even read the remote files that you know you already have. > How do you replicate your database incrementally? I've given you enough > clues to do it for "git" in probably five lines of perl. > > I replicate my database incremently by using a hash list like you (the client sends its hash list, the server compares the lists and acquaints the client behind which data (data = hash + data) the data has to added (this is like your solution -- you also submit the data and the location (you have directories too, right?)). A database is in some cases (like this one) like a filesystem, but it's build one top of better filesystem like xfs, reiser4 or ext3 which support features like LVM, Quotas or Journaling (Is your filesystem also build on top of existing filesystem? I don't think so because you're talking about vfs operatations on the filesystem). > - they tend to take time to set up and prime. > > In contrast, the filesystem is always there. Sure, you effectively have > to "prime" that one too, but the thing is, if your OS is doing its job, > you basically only need to prime it once per reboot. No need to prime > it for each process you start or play games with connecting to servers > etc. It's just there. Always. > > The database -- single file (sqlite or db4) -- is always there too because it's on the filesystem and doesn't need a server. >So if you think of the filesystem as a database, you're all set. If you >design your data structure so that there is just one index, you make that >the name, and the kernel will do all the O(1) hashed lookups etc for you. >You do have to limit yourself in some ways. > > But as mentioned you need to _open_ each file (It doesn't matter if it's cached (this speeds up only reading it) -- you need a _slow_ system call and _very slow_ hardware access anyway). Have a look at this comparison: If you have big chest and lots of small chests containing the same bulk of gold, it's more work to collect the gold from the small chests than from the big one (which would contain as many a cases as little chests exist). You can faster find your gold because you don't have to walk to the other chests and you don't have to open that much caps which saves also time. >Oh, and you have to be willing to waste diskspace. "git" is _not_ >space-efficient. The good news is that it is cache-friendly, since it is >also designed to never actually look at any old files that aren't part of >the immediate history, so while it wastes diskspace, it does not waste the >(much more precious) page cache. > >IOW big file-sets are always bad for performance if you need to traverse >them to get anywhere, but if you index things so that you only read the >stuff you really really _need_ (which git does), big file-sets are just an >excuse to buy a new disk ;) > > Linus > > > I hope my idea/opinion is clear now. Matthias-Christian ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:16 ` Matthias-Christian Ott @ 2005-04-08 19:32 ` Linus Torvalds 2005-04-08 19:44 ` Matthias-Christian Ott 0 siblings, 1 reply; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 19:32 UTC (permalink / raw) To: Matthias-Christian Ott Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > > But as mentioned you need to _open_ each file (It doesn't matter if it's > cached (this speeds up only reading it) -- you need a _slow_ system call > and _very slow_ hardware access anyway). Nope. System calls aren't slow. What crappy OS are you running? > I hope my idea/opinion is clear now. Numbers talk. I've got something that you can test ;) Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 19:32 ` Linus Torvalds @ 2005-04-08 19:44 ` Matthias-Christian Ott 0 siblings, 0 replies; 201+ messages in thread From: Matthias-Christian Ott @ 2005-04-08 19:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List Linus Torvalds wrote: >On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > > >>But as mentioned you need to _open_ each file (It doesn't matter if it's >>cached (this speeds up only reading it) -- you need a _slow_ system call >>and _very slow_ hardware access anyway). >> >> > >Nope. System calls aren't slow. What crappy OS are you running? > > > But they're slower because there're some instances checking them. >>I hope my idea/opinion is clear now. >> >> > >Numbers talk. I've got something that you can test ;) > > This doesn't mean it's better just because you had the time develope it ;). But anyhow the folk needs something, they can test to see if it's good or not, most don't believe in concepts. > Linus > > > We will see which solutions wins the "race". But I think you're solutions will "win", because you're Linus Torvalds -- the "Boss" of Linux and have to work with this system very day (usualy people are using what they have developed :)) -- and I have not the time develop a database based solution (maybe someone else is interested in developing it). Matthias-Christian ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:14 ` Linus Torvalds 2005-04-08 18:28 ` Jon Smirl 2005-04-08 19:16 ` Matthias-Christian Ott @ 2005-04-09 1:09 ` Marcin Dalecki 2 siblings, 0 replies; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:09 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List On 2005-04-08, at 20:14, Linus Torvalds wrote: > > > On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: >> >> Ok, but if you want to search for information in such big text files >> it >> slow, because you do linear search > > No I don't. I don't search for _anything_. I have my own > content-addressable filesystem, and I guarantee you that it's faster > than > mysql, because it depends on the kernel doing the right thing (which it > does). Linus.... Sorry but you mistake the frequently seen SQL db abuse as DATA storage for what SQL databases are good at storing: well defined RELATIONS. Sure a filesystem is for data. SQL is for relations. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:14 ` Linus Torvalds 2005-04-08 17:15 ` Chris Wedgwood 2005-04-08 17:25 ` Matthias-Christian Ott @ 2005-04-08 17:35 ` Jeff Garzik 2005-04-08 18:47 ` Linus Torvalds 2005-04-09 1:04 ` Marcin Dalecki 3 siblings, 1 reply; 201+ messages in thread From: Jeff Garzik @ 2005-04-08 17:35 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List Linus Torvalds wrote: > > On Fri, 8 Apr 2005, Matthias-Christian Ott wrote: > >>SQL Databases like SQLite aren't slow. > > > After applying a patch, I can do a complete "show-diff" on the kernel tree > to see the effect of it in about 0.15 seconds. > > Also, I can use rsync to efficiently replicate my database without having > to re-send the whole crap - it only needs to send the new stuff. > > You do that with an sql database, and I'll be impressed. Well... it took me over 30 seconds just to 'rm -rf' the unpacked tarballs of git and sparse-git, over my LAN's NFS. Granted that this sort of stuff works well with (a) rsync and (b) hardlinks, but it's still punishment on the i/dcache. Jeff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:35 ` Jeff Garzik @ 2005-04-08 18:47 ` Linus Torvalds 2005-04-08 18:56 ` Chris Wedgwood 2005-04-09 15:40 ` Paul Jackson 0 siblings, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 18:47 UTC (permalink / raw) To: Jeff Garzik Cc: Matthias-Christian Ott, Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List On Fri, 8 Apr 2005, Jeff Garzik wrote: > > Well... it took me over 30 seconds just to 'rm -rf' the unpacked > tarballs of git and sparse-git, over my LAN's NFS. Don't use NFS for development. It sucks for BK too. That said, normal _use_ should actually be pretty efficient even over NFS. It will "stat" a hell of a lot of files to do the "show-diff", but that part you really can't avoid unless you depend on all the tools marking their changes somewhere. Which BK does, actually, but that was pretty painful, and means that bk needed to re-implement all the normal ops (ie "bk patch"). What's also nice is that exactly because "git" depends on totally immutable files, they actually cache very well over NFS. Even if you were to share a database across machines (which is _not_ what git is meant to do, but it's certainly possible). So I actually suspect that if you actually _work_ with a tree in "git", you will find performance very good indeed. The fact that it creates a number of files when you pull in a new repository is a different thing. > Granted that this sort of stuff works well with (a) rsync and (b) > hardlinks, but it's still punishment on the i/dcache. Actually, it's not. Not once it is set up. Exactly because "git" doesn't actually access those files unless it literally needs the data in one file, and then it's always set up so that it needs either none or _all_ of the file. There is no data sharing anywhere, so you are never in the situation where it needs "ten bytes from file X" and "25 bytes from file Y". For example, if you don't have any changes in your tree, there is exactly _one_ file that a "show-diff" will read: the .dircache/index file. That's it. After that, it will "stat()" exactly the files you are tracking, and nothing more. It will not touch any internal "git" data AT ALL. That "stat" will be somewhat expensive unless your client caches stat data too, but that's it. And if it turns out that you have changed a file (or even just touched it, so that the data is the same, but the index file can no longer guarantee it with just a single "stat()"), then git will open have exactly _one_ file (no searching, no messing around), which contains absolutely nothing except for the compressed (and SHA1-signed) old contents of the file. It obviously _has_ to do that, because in order to know whether you've changed it, it needs to now compare it to the original. IOW, "git" will literally touch the minimum IO necessary, and absolutely minimum cache-footprint. The fact is, when tracking the 17,000 files in the kernel directory, most of them are never actually changed. They literally are "free". They aren't brought into the cache by "git" - not the file itself, not the backing store. You set up the index file once, and you never ever touch them again. You could put the sha1 files on a tape, for all git cares. The one exception obviously being when you actually instantiate the git archive for the first time (or when you throw it away). At that time you do touch all of the data, but that should be the only time. THAT is what git is good at. It optimized for the "not a lot of changes" things, and pretty much all the operations are O(n) in the "size of change", not in "size of repo". That includes even things like "give me the diff between the top of tree and the tree 10 days ago". If you know what your head was 10 days ago, "git" will open exactly _four_ small files for this operation (the current "top" commit, the commit file of ten days ago, and the two "tree" files associated with those). It will then need to open the backing store files for the files that are different between the two versions, but IT WILL NEVER EVEN LOOK at the files that it immediately sees are the same. And that's actually true whether we're talking about the top-of-tree or not. If I had the kernel history in git format (I don't - I estimate that it would be about 1.5GB - 2GB in size, and would take me about ten days to extract from BK ;), I could do a diff between _any_ tagged version (and I mention "tagged" only as a way to look up the commit ID - it doesn't have to be tagged if you know it some other way) in O(n) where 'n' is the number of files that have changed between the revisions. Number of changesets doesn't matter. Number of files doesn't matter. The _only_ thing that matters is the size of the change. Btw, I don't actually have a git command to do this yet. A bit of scripting required to do it, but it's pretty trivial: you open the two "commit" files that are the beginning/end of the thing, you look up what the tree state was at each point, you open up the two tree files involved, and you ignore all entries that match. Since the tree files are already sorted, that "ignoring matches" is basically free (technically that's O(n) in the number of files described, but we're talking about something that even a slow machine can do so fast you probably can't even time it with a stop-watch). You now have the complete list of files that have been changed (removed, added or "exists in both trees, but different contents"), and you can thus trivially create the whole tree with opening up _only_ the indexes for those files. Ergo: O(n) in size of change. Both in work and in disk/cache access (where the latter is often the more important one). Absolutely _zero_ indirection anywhere apart from the initial stage to go from "commit" to "tree", so there's no seeking except to actually read the files once you know what they are (and since you know them up-front and there are no dependencies at that point, you could even tell the OS to prefetch them if you really cared about getting minimal disk seeks). Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:47 ` Linus Torvalds @ 2005-04-08 18:56 ` Chris Wedgwood 2005-04-09 7:37 ` Willy Tarreau 2005-04-09 15:40 ` Paul Jackson 1 sibling, 1 reply; 201+ messages in thread From: Chris Wedgwood @ 2005-04-08 18:56 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff Garzik, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 11:47:10AM -0700, Linus Torvalds wrote: > Don't use NFS for development. It sucks for BK too. Some times NFS is unavoidable. In the best case (see previous email wrt to only stat'ing the parent directories when you can) for a current kernel though you can get away with 894 stats --- over NFS that would probably be tolerable. After claiming such an optimization is probably not worth while I'm now thinking for network filesystems it might be. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:56 ` Chris Wedgwood @ 2005-04-09 7:37 ` Willy Tarreau 2005-04-09 7:47 ` Neil Brown 0 siblings, 1 reply; 201+ messages in thread From: Willy Tarreau @ 2005-04-09 7:37 UTC (permalink / raw) To: Chris Wedgwood Cc: Linus Torvalds, Jeff Garzik, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Fri, Apr 08, 2005 at 11:56:09AM -0700, Chris Wedgwood wrote: > On Fri, Apr 08, 2005 at 11:47:10AM -0700, Linus Torvalds wrote: > > > Don't use NFS for development. It sucks for BK too. > > Some times NFS is unavoidable. > > In the best case (see previous email wrt to only stat'ing the parent > directories when you can) for a current kernel though you can get away > with 894 stats --- over NFS that would probably be tolerable. > > After claiming such an optimization is probably not worth while I'm > now thinking for network filesystems it might be. I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300 files each) and 1.3s once the trees are cached locally. This is without comparing file contents, just meta-data. And it takes 19.33s to compare the file's md5 sums once the trees are cached. I don't know if there are ways to avoid some NFS operations when everything is cached. Anyway, the system does not seem much efficient on hard links, it caches the files twice :-( Willy ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 7:37 ` Willy Tarreau @ 2005-04-09 7:47 ` Neil Brown 2005-04-09 8:00 ` Willy Tarreau 0 siblings, 1 reply; 201+ messages in thread From: Neil Brown @ 2005-04-09 7:47 UTC (permalink / raw) To: Willy Tarreau Cc: Chris Wedgwood, Linus Torvalds, Jeff Garzik, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Saturday April 9, willy@w.ods.org wrote: > > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300 > files each) and 1.3s once the trees are cached locally. This is without > comparing file contents, just meta-data. And it takes 19.33s to compare > the file's md5 sums once the trees are cached. I don't know if there are > ways to avoid some NFS operations when everything is cached. > > Anyway, the system does not seem much efficient on hard links, it caches > the files twice :-( I suspect you'll be wanting to add a "no_subtree_check" export option on your NFS server... NeilBrown ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 7:47 ` Neil Brown @ 2005-04-09 8:00 ` Willy Tarreau 2005-04-09 9:34 ` Neil Brown 0 siblings, 1 reply; 201+ messages in thread From: Willy Tarreau @ 2005-04-09 8:00 UTC (permalink / raw) To: Neil Brown Cc: Chris Wedgwood, Linus Torvalds, Jeff Garzik, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote: > On Saturday April 9, willy@w.ods.org wrote: > > > > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300 > > files each) and 1.3s once the trees are cached locally. This is without > > comparing file contents, just meta-data. And it takes 19.33s to compare > > the file's md5 sums once the trees are cached. I don't know if there are > > ways to avoid some NFS operations when everything is cached. > > > > Anyway, the system does not seem much efficient on hard links, it caches > > the files twice :-( > > I suspect you'll be wanting to add a "no_subtree_check" export option > on your NFS server... Thanks a lot, Neil ! This is very valuable information. I didn't understand such implications from the exports(5) man page, but it makes a great difference. And the diff sped up from 5.7 to 3.9s and from 19.3 to 15.3s. Cheers, Willy ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 8:00 ` Willy Tarreau @ 2005-04-09 9:34 ` Neil Brown 0 siblings, 0 replies; 201+ messages in thread From: Neil Brown @ 2005-04-09 9:34 UTC (permalink / raw) To: Willy Tarreau Cc: Chris Wedgwood, Linus Torvalds, Jeff Garzik, Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List On Saturday April 9, willy@w.ods.org wrote: > On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote: > > On Saturday April 9, willy@w.ods.org wrote: > > > > > > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300 > > > files each) and 1.3s once the trees are cached locally. This is without > > > comparing file contents, just meta-data. And it takes 19.33s to compare > > > the file's md5 sums once the trees are cached. I don't know if there are > > > ways to avoid some NFS operations when everything is cached. > > > > > > Anyway, the system does not seem much efficient on hard links, it caches > > > the files twice :-( > > > > I suspect you'll be wanting to add a "no_subtree_check" export option > > on your NFS server... > > Thanks a lot, Neil ! This is very valuable information. I didn't > understand such implications from the exports(5) man page, but it > makes a great difference. And the diff sped up from 5.7 to 3.9s > and from 19.3 to 15.3s. No, that implication had never really occurred to me before either. But when you said "caches the file twice" it suddenly made sense. With subtree_check, the NFS file handle contains information about the directory, and NFS uses the filehandle as the primary key to tell if two things are the same or not. Trond keeps prodding me to make no_subtree_check the default. Maybe it is time that I actually did.... NeilBrown ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 18:47 ` Linus Torvalds 2005-04-08 18:56 ` Chris Wedgwood @ 2005-04-09 15:40 ` Paul Jackson 2005-04-09 16:16 ` Linus Torvalds 1 sibling, 1 reply; 201+ messages in thread From: Paul Jackson @ 2005-04-09 15:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel Linus wrote: > then git will open have exactly _one_ > file (no searching, no messing around), which contains absolutely nothing > except for the compressed (and SHA1-signed) old contents of the file. It > obviously _has_ to do that, because in order to know whether you've > changed it, it needs to now compare it to the original. I must be missing something here ... If the stat shows a possible change, then you shouldn't have to open the original version to determine if it really changed - just compute the SHA1 of the new file, and see if that changed from the original SHA1. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 15:40 ` Paul Jackson @ 2005-04-09 16:16 ` Linus Torvalds 2005-04-09 17:15 ` Paul Jackson 2005-04-09 17:35 ` Paul Jackson 0 siblings, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-09 16:16 UTC (permalink / raw) To: Paul Jackson; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel On Sat, 9 Apr 2005, Paul Jackson wrote: > > I must be missing something here ... > > If the stat shows a possible change, then you shouldn't have to open the > original version to determine if it really changed - just compute the > SHA1 of the new file, and see if that changed from the original SHA1. Yes. However, I've got two reasons for this: (a) it may actually be cheaper to just unpack the compressed thing than it is to compute the sha, _especially_ since it's very likely that you have to do that anyway (ie if it turns out that they _are_ different, you need the unpacked data to then look at the differences). So when you come from your backup angle, you only care about "has it changed", and you'll do a backup. In "git", you usually care about the old contents too. (b) while I depend on the fact that if the SHA of an object matches, the objects are the same, I generally try to avoid the reverse dependency. Why? Because if I end up changing the way I pack objects, and still want to work with old objects, I may end up in the situation that two identical objects could get different object names. I don't actually know how valid a point "(b)" is, and I don't think it's likely, but imagine that SHA1 ends up being broken (*) and I decide that I want to pack new objects with a new-and-improved-SHA256 or something. Such a thing would obviously mean that you end up with lots of _duplicate_ data (any new data that is repackaged with the new name will now cause a new git object), but "duplicate" is better than "broken". I don't actually guarantee that "git" could handle that right, but I've been idly trying to avoid locking myself into the mindset that "file equality has to mean name equality over the long run". So while the system right now works on the 1:1 "name" <-> "content" mapping, it's possible that it _could_ work with a more relaxed 1:n "content" -> "name" mapping. But it's entirely possible that I'm being a git about this. Linus (*) yeah, yeah, I know about the current theoretical case, and I don't care. Not only is it theoretical, the way my objects are packed you'd have to not just generate the same SHA1 for it, it would have to _also_ still be a valid zlib object _and_ get the header to match the "type + length" of object part. IOW, the object validity checks are actually even stricter than just "sha1 matches". ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:16 ` Linus Torvalds @ 2005-04-09 17:15 ` Paul Jackson 2005-04-09 17:35 ` Paul Jackson 1 sibling, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-09 17:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel Linus wrote: > In "git", you usually care about > the old contents too. True - in your case, you probably want the old contents so might as well dig them out as soon as it becomes convenient to have them. I was objecting to your claim that you _had_ to dig out the old contents to determine if a file changed. You don't _have_ to ... but I agree that it's a good time to do so. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:16 ` Linus Torvalds 2005-04-09 17:15 ` Paul Jackson @ 2005-04-09 17:35 ` Paul Jackson 1 sibling, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-09 17:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel > (b) while I depend on the fact that if the SHA of an object matches, the > objects are the same, I generally try to avoid the reverse > dependency. It might be a valid point that you want to leave the door open to using a different (than SHA1) digest. (So this means you going to store it as an ASCII string, right?) But I don't see how that applies here. Any optimization that avoids rereading old versions if the digests match will never trigger on the day you change digests. No problem here - you doomed to reread the old version in any case. Either you got your logic backwards, or I need another cup of coffee. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 17:14 ` Linus Torvalds ` (2 preceding siblings ...) 2005-04-08 17:35 ` Jeff Garzik @ 2005-04-09 1:04 ` Marcin Dalecki 2005-04-09 15:42 ` Paul Jackson 3 siblings, 1 reply; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:04 UTC (permalink / raw) To: Linus Torvalds Cc: Matthias-Christian Ott, Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List On 2005-04-08, at 19:14, Linus Torvalds wrote: > > You do that with an sql database, and I'll be impressed. It's possible. But what will impress you are either the price tag the DB comes with or the hardware it runs on :-) ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:04 ` Marcin Dalecki @ 2005-04-09 15:42 ` Paul Jackson 2005-04-09 18:45 ` Marcin Dalecki 0 siblings, 1 reply; 201+ messages in thread From: Paul Jackson @ 2005-04-09 15:42 UTC (permalink / raw) To: Marcin Dalecki; +Cc: torvalds, matthias.christian, cw, andrea, linux-kernel Marcin wrote: > But what will impress you are either the price tag the > DB comes with or > the hardware it runs on :-) The payroll for the staffing to care and feed for these babies is often impressive as well. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 15:42 ` Paul Jackson @ 2005-04-09 18:45 ` Marcin Dalecki 0 siblings, 0 replies; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 18:45 UTC (permalink / raw) To: Paul Jackson; +Cc: linux-kernel, matthias.christian, andrea, cw, torvalds On 2005-04-09, at 17:42, Paul Jackson wrote: > Marcin wrote: >> But what will impress you are either the price tag the >> DB comes with or >> the hardware it runs on :-) > > The payroll for the staffing to care and feed for these > babies is often impressive as well. Please don't forget the bill from the electric plant behind it! ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 16:15 ` Matthias-Christian Ott 2005-04-08 17:14 ` Linus Torvalds @ 2005-04-09 1:00 ` Marcin Dalecki 2005-04-09 1:09 ` Chris Wedgwood 1 sibling, 1 reply; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:00 UTC (permalink / raw) To: Matthias-Christian Ott Cc: Linus Torvalds, Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List On 2005-04-08, at 18:15, Matthias-Christian Ott wrote: > Linus Torvalds wrote: >> > SQL Databases like SQLite aren't slow. > But maybe a Berkeley Database v.4 is a better solution. Yes it sucks less for this purpose. See subversion as reference. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:00 ` Marcin Dalecki @ 2005-04-09 1:09 ` Chris Wedgwood 2005-04-09 1:21 ` Marcin Dalecki 0 siblings, 1 reply; 201+ messages in thread From: Chris Wedgwood @ 2005-04-09 1:09 UTC (permalink / raw) To: Marcin Dalecki Cc: Matthias-Christian Ott, Linus Torvalds, Andrea Arcangeli, Kernel Mailing List On Sat, Apr 09, 2005 at 03:00:44AM +0200, Marcin Dalecki wrote: > Yes it sucks less for this purpose. See subversion as reference. Whatever solution people come up with, ideally it should be tolerant to minor amounts of corruption (so I can recover the rest of my data if need be) and it should also have decent sanity checks to find corruption as soon as reasonable possible. I've been bitten by problems that subversion didn't catch but bk did. In the subversion case by the time I noticed much data was lost and none of the subversion tools were able to recover the rest of it. In the bk case, the data-loss was almost immediately noticeable and only affected a few files making recovery much easier. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 1:09 ` Chris Wedgwood @ 2005-04-09 1:21 ` Marcin Dalecki 0 siblings, 0 replies; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:21 UTC (permalink / raw) To: Chris Wedgwood Cc: Matthias-Christian Ott, Linus Torvalds, Andrea Arcangeli, Kernel Mailing List On 2005-04-09, at 03:09, Chris Wedgwood wrote: > On Sat, Apr 09, 2005 at 03:00:44AM +0200, Marcin Dalecki wrote: > >> Yes it sucks less for this purpose. See subversion as reference. > > Whatever solution people come up with, ideally it should be tolerant > to minor amounts of corruption (so I can recover the rest of my data > if need be) and it should also have decent sanity checks to find > corruption as soon as reasonable possible. Yes this is the reason subversion is moving toward an alternative back-end based on a custom DB mapped closely to the file system. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:42 ` Linus Torvalds 2005-04-08 5:04 ` Chris Wedgwood 2005-04-08 7:14 ` Andrea Arcangeli @ 2005-04-08 7:17 ` ross 2005-04-08 15:50 ` Linus Torvalds 2005-04-08 7:34 ` Marcel Lanz ` (2 subsequent siblings) 5 siblings, 1 reply; 201+ messages in thread From: ross @ 2005-04-08 7:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 2681 bytes --] On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: > In the meantime (and because monotone really _is_ that slow), here's a > quick challenge for you, and any crazy hacker out there: if you want to > play with something _really_ nasty (but also very _very_ fast), take a > look at kernel.org:/pub/linux/kernel/people/torvalds/. Interesting. I like it, with one modification (see below). > First one to send me the changelog tree of sparse-git (and a tool to > commit and push/pull further changes) gets a gold star, and an honorable > mention. I've put a hell of a lot of clues in there (*). Here's a partial solution. It does depend on a modified version of cat-file that behaves like cat. I found it easier to have cat-file just dump the object indicated on stdout. Trivial patch for that is included. Two scripts are included: 1) makechlog.sh takes an object and generates a ChangeLog file consisting of all the parents of the given object. It's probably breakable, but correctly outputs the sparse-git changes when run on HEAD. Handles multiple parents and breaks cycles. This adds a line to each object "me <sha1>". This lets a change identify itself. It takes 35 seconds to produce all the change history on my box. It produces a single file named "ChangeLog". 2) chkchlog.sh uses the "me" entries to verify that #1 didn't miss any parents. It's mostly to prove my solution reasonably correct ::-) The patch is below, the scripts are attached, and everything is available here: http://lug.udel.edu/~ross/git/ Now to see what I come up with for commit, push, and pull... -- Ross Vandegrift ross@lug.udel.edu "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 --- cat-file.orig.c 2005-04-08 01:53:54.000000000 -0400 +++ cat-file.c 2005-04-08 01:57:51.000000000 -0400 @@ -11,18 +11,11 @@ char type[20]; void *buf; unsigned long size; - char template[] = "temp_git_file_XXXXXX"; - int fd; if (argc != 2 || get_sha1_hex(argv[1], sha1)) usage("cat-file: cat-file <sha1>"); buf = read_sha1_file(sha1, type, &size); if (!buf) exit(1); - fd = mkstemp(template); - if (fd < 0) - usage("unable to create tempfile"); - if (write(fd, buf, size) != size) - strcpy(type, "bad"); - printf("%s: %s\n", template, type); + printf ("%s", buf); } [-- Attachment #2: makechlog.sh --] [-- Type: application/x-sh, Size: 1023 bytes --] [-- Attachment #3: chkchlog.sh --] [-- Type: application/x-sh, Size: 208 bytes --] ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 7:17 ` ross @ 2005-04-08 15:50 ` Linus Torvalds 2005-04-09 2:53 ` Petr Baudis 2005-04-09 15:50 ` Paul Jackson 0 siblings, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 15:50 UTC (permalink / raw) To: ross; +Cc: Chris Wedgwood, Kernel Mailing List On Fri, 8 Apr 2005 ross@jose.lug.udel.edu wrote: > > Here's a partial solution. It does depend on a modified version of > cat-file that behaves like cat. I found it easier to have cat-file > just dump the object indicated on stdout. Trivial patch for that is included. Your trivial patch is trivially incorrect, though. First off, some files may be binary (and definitely are - the "tree" type object contains pathnames, and in order to avoid having to worry about special characters they are NUL-terminated), and your modified "cat-file" breaks that. Secondly, it doesn't check or print the tag. That said, I think I agree with your concern, and cat-file should not use a temp-file. I'll fix it, but I'll also make it verify the tag (so you'd now have to know the tag in advance if you want to cat the data). Something like cat-file -t <sha1> # output the tag cat-file <tag> <sha1> # output the data or similar. Easy enough. That way you can do torvalds@ppc970:~/git> ./cat-file -t `cat .dircache/HEAD ` commit and torvalds@ppc970:~/git> ./cat-file commit `cat .dircache/HEAD ` tree ca30cdf8df2f31545cc1f2c1be62619111b6f6aa parent c2474b336d7a96fb4e03e65d229bcddc62b244fc author Linus Torvalds <torvalds@ppc970.osdl.org> Fri Apr 8 08:16:38 2005 committer Linus Torvalds <torvalds@ppc970.osdl.org> Fri Apr 8 08:16:38 2005 Make "cat-file" output the file contents to stdout. New syntax: "cat-file -t <sha1>" shows the tag, while "cat-file <tag> <sha1>" outputs the file contents after checking that the supplied tag matches. I'll rsync the .dircache directory to kernel.org. You'll need to update your scripts. > Now to see what I come up with for commit, push, and pull... A "commit" (*) looks roughly like this: # check with "show-diff" what has changed, and check if # you need to add any files.. update-cache <list of files that have been changed/added/deleted> # check with "show-diff" that it all looks right oldhead=$(cat .dircache/HEAD) newhead=$(commit-tree $(write-tree) -p $oldhead < commit-message) # update the head information if [ "$newhead" != "" ] ; then echo $newhead > .dircache/HEAD; fi (*) I call this "commit", but it's really something much simpler. It's really just a "I now have <this directory state>, I got here from <collection of previous directory states> and the reason was <reason>". The "push" I use is rsync -avz --exclude index .dircache/ <destination-dir> and you can pull the same way, except when you pull you should save _your_ HEAD file first (and then you're screed. There's no way to merge. If you've made changes and committed them, your changes are still there, but they are now on a different HEAD than the new one). That, btw, is kind of the design. "git" really doesn't care about things like merges. You can use _any_ SCM to do a merge. What "git" does is track directory state (and how you got to that state), and nothing else. It doesn't merge, it doesn't really do a whole lot of _anything_. So when you "pull" or "push" on a git archive, you get the "union" of all directory states in the destination. The HEAD thing is _one_ pointer into the "sea of directory states", but you really have to use something else to merge two directory states together. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Re: Kernel SCM saga.. 2005-04-08 15:50 ` Linus Torvalds @ 2005-04-09 2:53 ` Petr Baudis 2005-04-09 7:08 ` Randy.Dunlap 2005-04-10 1:01 ` Phillip Lougher 2005-04-09 15:50 ` Paul Jackson 1 sibling, 2 replies; 201+ messages in thread From: Petr Baudis @ 2005-04-09 2:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: ross, Kernel Mailing List Hello, Dear diary, on Fri, Apr 08, 2005 at 05:50:21PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > > > On Fri, 8 Apr 2005 ross@jose.lug.udel.edu wrote: > > > > Here's a partial solution. It does depend on a modified version of > > cat-file that behaves like cat. I found it easier to have cat-file > > just dump the object indicated on stdout. Trivial patch for that is included. > > Your trivial patch is trivially incorrect, though. First off, some files > may be binary (and definitely are - the "tree" type object contains > pathnames, and in order to avoid having to worry about special characters > they are NUL-terminated), and your modified "cat-file" breaks that. > > Secondly, it doesn't check or print the tag. FWIW, I made few small fixes (to prevent some trivial usage errors to cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and gitlog.sh - heavily inspired by what already went through the mailing list. Everything is available at http://pasky.or.cz/~pasky/dev/git/ (including .dircache, even though it isn't shown in the index), the cumulative patch can be found below. The scripts aim to provide some (obviously very interim) more high-level interface for git. I'm now working on tree-diff.c which will (surprise!) produce a diff of two trees (I'll finish it after I get some sleep, though), and then I will probably do some dwimmy gitdiff.sh wrapper for tree-diff and show-diff. At that point I might get my hand on some pull more kind to local changes. Kind regards, Petr Baudis diff -ruN git-0.03/gitadd.sh git-devel-clean/gitadd.sh --- git-0.03/gitadd.sh 1970-01-01 01:00:00.000000000 +0100 +++ git-devel-clean/gitadd.sh 2005-04-09 03:17:34.220577000 +0200 @@ -0,0 +1,13 @@ +#!/bin/sh +# +# Add new file to a GIT repository. +# Copyright (c) Petr Baudis, 2005 +# +# Takes a list of file names at the command line, and schedules them +# for addition to the GIT repository at the next commit. +# +# FIXME: Those files are omitted from show-diff output! + +for file in "$@"; do + echo $file >>.dircache/add-queue +done diff -ruN git-0.03/gitcommit.sh git-devel-clean/gitcommit.sh --- git-0.03/gitcommit.sh 1970-01-01 01:00:00.000000000 +0100 +++ git-devel-clean/gitcommit.sh 2005-04-09 03:17:34.220577000 +0200 @@ -0,0 +1,36 @@ +#!/bin/sh +# +# Commit into a GIT repository. +# Copyright (c) Petr Baudis, 2005 +# Based on an example script fragment sent to LKML by Linus Torvalds. +# +# Ignores any parameters for now, excepts changelog entry on stdin. +# +# FIXME: Gets it wrong for filenames containing spaces. + + +if [ -r .dircache/add-queue ]; then + mv .dircache/add-queue .dircache/add-queue-progress + addedfiles=$(cat .dircache/add-queue-progress) +else + addedfiles= +fi +changedfiles=$(show-diff -s | grep -v ': ok$' | cut -d : -f 1) +commitfiles="$addedfiles $changedfiles" +if [ ! "$commitfiles" ]; then + echo 'Nothing to commit.' >&2 + exit +fi +update-cache $commitfiles +rm -f .dircache/add-queue-progress + + +oldhead=$(cat .dircache/HEAD) +treeid=$(write-tree) +newhead=$(commit-tree $treeid -p $oldhead) + +if [ "$newhead" ]; then + echo $newhead >.dircache/HEAD +else + echo "Error during commit (oldhead $oldhead, treeid $treeid)" >&2 +fi diff -ruN git-0.03/gitlog.sh git-devel-clean/gitlog.sh --- git-0.03/gitlog.sh 1970-01-01 01:00:00.000000000 +0100 +++ git-devel-clean/gitlog.sh 2005-04-09 04:28:51.227791000 +0200 @@ -0,0 +1,61 @@ +#!/bin/sh +#### +#### Call this script with an object and it will produce the change +#### information for all the parents of that object +#### +#### This script was originally written by Ross Vandegrift. +# multiple parents test 1d0f4aec21e5b66c441213643426c770dc6dedc0 +# parents: ffa098b2e187b71b86a76d3cd5eb77d074a2503c +# 6860e0d9197c7f52155466c225baf39b42d62f63 + +# regex for parent declarations +PARENTS="^parent [A-z0-9]{40}$" + +TMPCL="/tmp/gitlog.$$" + +# takes an object and generates the object's parent(s) +function unpack_parents () { + echo "me $1" + echo "me $1" >>$TMPCL + RENTS="" + + TMPCM=$(mktemp) + cat-file commit $1 >$TMPCM + while read line; do + if echo "$line" | egrep -q "$PARENTS"; then + RENTS="$RENTS "$(echo $line | sed 's/parent //g') + fi + echo $line + done <$TMPCM + rm $TMPCM + + echo -e "\n--------------------------\n" + + # if the last object had no parents, return + if [ ! "$RENTS" ]; then + return; + fi + + #useful for testing + #echo $RENTS + #read + for i in `echo $RENTS`; do + # break cycles + if grep -q "me $i" $TMPCL; then + echo "Already visited $i" >&2 + continue + else + unpack_parents $i + fi + done +} + +base=$1 +if [ ! "$base" ]; then + base=$(cat .dircache/HEAD) +fi + +rm -f $TMPCL +unpack_parents $base +rm -f $TMPCL + diff -ruN git-0.03/read-cache.c git-devel-clean/read-cache.c --- git-0.03/read-cache.c 2005-04-08 22:51:35.000000000 +0200 +++ git-devel-clean/read-cache.c 2005-04-09 03:53:44.049642000 +0200 @@ -264,11 +264,12 @@ size = 0; // avoid gcc warning map = (void *)-1; if (!fstat(fd, &st)) { - map = NULL; size = st.st_size; errno = EINVAL; if (size > sizeof(struct cache_header)) map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + else + return (!hdr->entries) ? 0 : error("inconsistent cache"); } close(fd); if (-1 == (int)(long)map) diff -ruN git-0.03/show-diff.c git-devel-clean/show-diff.c --- git-0.03/show-diff.c 2005-04-08 17:55:09.000000000 +0200 +++ git-devel-clean/show-diff.c 2005-04-09 03:53:44.063638000 +0200 @@ -49,9 +49,17 @@ int main(int argc, char **argv) { + int silent = 0; int entries = read_cache(); int i; + while (argc-- > 1) { + if (!strcmp(argv[1], "-s")) + silent = 1; + else if (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) + usage("show-diff [-s]"); + } + if (entries < 0) { perror("read_cache"); exit(1); @@ -77,6 +85,9 @@ for (n = 0; n < 20; n++) printf("%02x", ce->sha1[n]); printf("\n"); + if (silent) + continue; + new = read_sha1_file(ce->sha1, type, &size); show_differences(ce, &st, new, size); free(new); diff -ruN git-0.03/update-cache.c git-devel-clean/update-cache.c --- git-0.03/update-cache.c 2005-04-08 17:53:44.000000000 +0200 +++ git-devel-clean/update-cache.c 2005-04-09 03:53:44.069637000 +0200 @@ -231,6 +231,9 @@ return -1; } + if (argc < 2) + usage("update-cache <file>*"); + newfd = open(".dircache/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600); if (newfd < 0) { perror("unable to create new cachefile"); ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 2:53 ` Petr Baudis @ 2005-04-09 7:08 ` Randy.Dunlap 2005-04-09 18:06 ` [PATCH] " Petr Baudis 2005-04-10 1:01 ` Phillip Lougher 1 sibling, 1 reply; 201+ messages in thread From: Randy.Dunlap @ 2005-04-09 7:08 UTC (permalink / raw) To: Petr Baudis; +Cc: torvalds, ross, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2107 bytes --] On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote: | Hello, | | Dear diary, on Fri, Apr 08, 2005 at 05:50:21PM CEST, I got a letter | where Linus Torvalds <torvalds@osdl.org> told me that... | > | > | > On Fri, 8 Apr 2005 ross@jose.lug.udel.edu wrote: | > > | > > Here's a partial solution. It does depend on a modified version of | > > cat-file that behaves like cat. I found it easier to have cat-file | > > just dump the object indicated on stdout. Trivial patch for that is included. | > | > Your trivial patch is trivially incorrect, though. First off, some files | > may be binary (and definitely are - the "tree" type object contains | > pathnames, and in order to avoid having to worry about special characters | > they are NUL-terminated), and your modified "cat-file" breaks that. | > | > Secondly, it doesn't check or print the tag. | | FWIW, I made few small fixes (to prevent some trivial usage errors to | cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and | gitlog.sh - heavily inspired by what already went through the mailing | list. Everything is available at http://pasky.or.cz/~pasky/dev/git/ | (including .dircache, even though it isn't shown in the index), the | cumulative patch can be found below. The scripts aim to provide some | (obviously very interim) more high-level interface for git. | | I'm now working on tree-diff.c which will (surprise!) produce a diff | of two trees (I'll finish it after I get some sleep, though), and then I | will probably do some dwimmy gitdiff.sh wrapper for tree-diff and | show-diff. At that point I might get my hand on some pull more kind to | local changes. Hi, I'll look at your scripts this weekend. I've also been working on some, but mine are a bit more experimental (cruder) than yours are. Anyway, here they are (attached) -- also available at http://developer.osdl.org/rddunlap/git/ gitin : checkin/commit gitwhat sha1 : what is that sha1 file (type and contents if blob or commit) gitlist (blob, commit, tree, or all) : list all objects with type (commit, tree, blob, or all) --- ~Randy [-- Attachment #2: gitin --] [-- Type: application/octet-stream, Size: 742 bytes --] #! /bin/sh # gitin: checkin for git files # grep show-diff for +++ => error, print 'run update-cache <filenames>', exit # (better would be an error exit code) # write-tree > current_tree_object # print 'enter commit message:' # commit-tree `cat current_tree_object` -p `cat .dircache/HEAD` > current_commit_object # update .dircache/HEAD with current_commit_object diffs=`show-diff | grep "+++"` #echo diffs=/$diffs/ if [ x"$diffs" != x ]; then echo "run update-cache <filenames>" exit fi tree_object=`write-tree` #echo tree_obj=/$tree_object/ head=`cat .dircache/HEAD` echo "enter commit message: (end with ^D)" commit_object=`commit-tree $tree_object -p $head` #echo commit_obj=/$commit_object/ echo $commit_object > .dircache/HEAD [-- Attachment #3: gitlist --] [-- Type: application/octet-stream, Size: 580 bytes --] #! /bin/sh # gitlist: list some git objects/types # (by selected target type: blob, tree, commit, all) target=$1 if [ -z "$target" ]; then echo "usage: gitlist type {blob, tree, commit, or all}" exit 1 fi subdir=.dircache/objects/ for high in 0 1 2 3 4 5 6 7 8 9 a b c d e f ; do for low in 0 1 2 3 4 5 6 7 8 9 a b c d e f ; do top=$high$low for f in $subdir/$top/* ; do if [ ! -r $f ]; then continue fi base=`basename $f` type=`cat-file -t $top$base` if [ $target == "all" -o $target == $type ]; then echo $top$base : $type fi done done done [-- Attachment #4: gitwhat --] [-- Type: application/octet-stream, Size: 533 bytes --] #! /bin/sh # gitwhat: what is that file sha1=$1 if [ -z $sha1 ]; then echo "usage: gitwhat sha1" exit 1 fi what=`cat-file -t $sha1` if [ -z "$what" ]; then exit 1 fi echo "type is: $what" topdir=${sha1:0:2} last=${sha1:2} file=.dircache/objects/$topdir/$last if [ -z $PAGER ]; then pager=more else pager=$PAGER fi case $what in blob) #head -10 $file #$pager $file cat-file blob $sha1 | $pager ;; tree) echo "cannot print binary tree" #cat-file tree $sha1 | $pager ;; commit) cat-file commit $sha1 | $pager ;; esac ^ permalink raw reply [flat|nested] 201+ messages in thread
* [PATCH] Re: Kernel SCM saga.. 2005-04-09 7:08 ` Randy.Dunlap @ 2005-04-09 18:06 ` Petr Baudis 0 siblings, 0 replies; 201+ messages in thread From: Petr Baudis @ 2005-04-09 18:06 UTC (permalink / raw) To: Randy.Dunlap; +Cc: torvalds, ross, linux-kernel Dear diary, on Sat, Apr 09, 2005 at 09:08:59AM CEST, I got a letter where "Randy.Dunlap" <rddunlap@osdl.org> told me that... > On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote: ..snip.. > | FWIW, I made few small fixes (to prevent some trivial usage errors to > | cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and > | gitlog.sh - heavily inspired by what already went through the mailing > | list. Everything is available at http://pasky.or.cz/~pasky/dev/git/ > | (including .dircache, even though it isn't shown in the index), the > | cumulative patch can be found below. The scripts aim to provide some > | (obviously very interim) more high-level interface for git. > | > | I'm now working on tree-diff.c which will (surprise!) produce a diff > | of two trees (I'll finish it after I get some sleep, though), and then I > | will probably do some dwimmy gitdiff.sh wrapper for tree-diff and > | show-diff. At that point I might get my hand on some pull more kind to > | local changes. > > Hi, Hi, > I'll look at your scripts this weekend. I've also been > working on some, but mine are a bit more experimental (cruder) > than yours are. Anyway, here they are (attached) -- also > available at http://developer.osdl.org/rddunlap/git/ > > gitin : checkin/commit > gitwhat sha1 : what is that sha1 file (type and contents if blob or commit) > gitlist (blob, commit, tree, or all) : > list all objects with type (commit, tree, blob, or all) thanks - I had a look, but so far I borrowed only the prompt message from your gitin. ;-) I'm not sure if gitwhat would be useful for me in any way and gitlist doesn't appear too practical to me either. In the meantime, I've made some progress too. I made ls-tree, which will just convert the tree object to a human readable (and script processable) form, and wrapper gitls.sh, which will also try to guess the tree ID. parent-id will just return the commit ID(s) of the previous commit(s), practical if you want to diff against the previous commit easily etc. And finally, there is gitdiff.sh, which will produce a diff of any two trees. Everything is again available at http://pasky.or.cz/~pasky/dev/git/ and again including .dircache, even though it's invisible in the index. The cumulative patch (against 0.03) is there as well as below, generated by the ./gitdiff.sh 0af20307bb4c634722af0f9203dac7b3222c4a4f command. The empty entries are changed modes (664 vs 644), I will yet have to think about how to denote them if the content didn't change; or I might ignore them altogether...? You can obviously fetch any arbitrary change by doing the appropriate gitdiff.sh call. You can find the ids in the ChangeLog, which was generated by the plain ./gitlog.sh command. (That is for HEAD. 0af20307bb4c634722af0f9203dac7b3222c4a4f is the last commit on the Linus' branch, pass that to gitlog.sh to get his ChangeLog. ;-) Next, I will probably do some bk-style pull tool. Or perhaps first a gitpatch.sh which will verify the sha1s and do the mode changes. Linus, could you please have a look and tell me what do you think about it so far? Thanks, Petr Baudis Index: Makefile =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/Makefile (mode:100664 sha1:270cd4f8a8bf10cd513b489c4aaf76c14d4504a7) +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/Makefile (mode:100644 sha1:185ff422e68984e68da011509dec116f05fc6f8d) @@ -1,7 +1,7 @@ CFLAGS=-g -O3 -Wall CC=gcc -PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file fsck-cache +PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file fsck-cache ls-tree all: $(PROG) @@ -30,6 +30,9 @@ cat-file: cat-file.o read-cache.o $(CC) $(CFLAGS) -o cat-file cat-file.o read-cache.o $(LIBS) +ls-tree: ls-tree.o read-cache.o + $(CC) $(CFLAGS) -o ls-tree ls-tree.o read-cache.o $(LIBS) + fsck-cache: fsck-cache.o read-cache.o $(CC) $(CFLAGS) -o fsck-cache fsck-cache.o read-cache.o $(LIBS) Index: README =================================================================== Index: cache.h =================================================================== Index: cat-file.c =================================================================== Index: commit-tree.c =================================================================== Index: fsck-cache.c =================================================================== Index: gitadd.sh =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitadd.sh +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitadd.sh (mode:100755 sha1:d23be758c0c9fc1cf9756bcd3ee4d7266c60a2c9) @@ -0,0 +1,13 @@ +#!/bin/sh +# +# Add new file to a GIT repository. +# Copyright (c) Petr Baudis, 2005 +# +# Takes a list of file names at the command line, and schedules them +# for addition to the GIT repository at the next commit. +# +# FIXME: Those files are omitted from show-diff output! + +for file in "$@"; do + echo $file >>.dircache/add-queue +done Index: gitcommit.sh =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitcommit.sh +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitcommit.sh (mode:100755 sha1:67a743c6cbc9dffaa6f571d3dc83ceec2bd0c039) @@ -0,0 +1,38 @@ +#!/bin/sh +# +# Commit into a GIT repository. +# Copyright (c) Petr Baudis, 2005 +# Based on an example script fragment sent to LKML by Linus Torvalds. +# +# Ignores any parameters for now, excepts changelog entry on stdin. +# +# FIXME: Gets it wrong for filenames containing spaces. + + +if [ -r .dircache/add-queue ]; then + mv .dircache/add-queue .dircache/add-queue-progress + addedfiles=$(cat .dircache/add-queue-progress) +else + addedfiles= +fi +changedfiles=$(show-diff -s | grep -v ': ok$' | cut -d : -f 1) +commitfiles="$addedfiles $changedfiles" +if [ ! "$commitfiles" ]; then + echo 'Nothing to commit.' >&2 + exit +fi +update-cache $commitfiles +rm -f .dircache/add-queue-progress + + +oldhead=$(cat .dircache/HEAD) +treeid=$(write-tree) + +echo "Enter commit message, terminated by ctrl-D on a separate line:" >&2 +newhead=$(commit-tree $treeid -p $oldhead) + +if [ "$newhead" ]; then + echo $newhead >.dircache/HEAD +else + echo "Error during commit (oldhead $oldhead, treeid $treeid)" >&2 +fi Index: gitdiff.sh =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitdiff.sh +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitdiff.sh (mode:100755 sha1:17aec840c7c0e0b4e4e78fd94b754fe6bc2f2ff2) @@ -0,0 +1,104 @@ +#!/bin/sh +# +# Make a diff between two GIT trees. +# Copyright (c) Petr Baudis, 2005 +# +# Takes two parameters identifying the two trees/commits to compare. +# Empty string will be substitued to HEAD revision. +# +# Outputs a diff converting the first tree to the second one. + + +TREE="^tree [A-z0-9]{40}$" + +tree1ls=$(mktemp -t gitdiff.XXXXXX) +tree2ls=$(mktemp -t gitdiff.XXXXXX) +diffdir=$(mktemp -d -t gitdiff.XXXXXX) + +function die () { + echo gitdiff: $@ >&2 + rm -f "$tree1ls" "$tree2ls" + rm -rf "$diffdir" + exit +} + +function normalize_id () { + # XXX: This is basically a copy of gitls.sh + id=$1 + if [ ! "$id" ]; then + id=$(cat .dircache/HEAD) + fi + if [ $(cat-file -t "$id") = "commit" ]; then + id=$(cat-file commit $id | egrep "$TREE" | cut -d ' ' -f 2) + fi + if [ ! $(cat-file -t "$id") = "tree" ]; then + die "Invalid ID supplied: $id" + fi + echo $id +} + +function mkdiff () { + loc=$1; treeid=$2; fname=$3; mode=$4; sha1=$5; + + if [ x"$sha1" != x"!" ]; then + cat-file blob $sha1 >$loc + else + >$loc + fi + + label="$treeid/$fname"; + + labelapp="" + [ x"$mode" != x"!" ] && labelapp="$labelapp mode:$mode" + [ x"$sha1" != x"!" ] && labelapp="$labelapp sha1:$sha1" + labelapp=$(echo "$labelapp" | sed 's/^ *//') + + [ "$labelapp" ] && label="$label ($labelapp)" + + echo $label +} + +id1=$(normalize_id "$1") +id2=$(normalize_id "$2") + +[ "$2" != "$1" ] || die "Cannot diff tree against itself." + +ls-tree "$id1" >$tree1ls +[ -s "$tree1ls" ] || die "Error retrieving the first tree." +ls-tree "$id2" >$tree2ls +[ -s "$tree2ls" ] || die "Error retrieving the second tree." + +diffdir1="$diffdir/$id1" +diffdir2="$diffdir/$id2" +mkdir $diffdir1 $diffdir2 + +join -e ! -a 1 -a 2 -j 4 -o 0,1.1,1.3,2.1,2.3 $tree1ls $tree2ls | { + while read line; do + name=$(echo $line | cut -d ' ' -f 1) + mode1=$(echo $line | cut -d ' ' -f 2) + sha1=$(echo $line | cut -d ' ' -f 3) + mode2=$(echo $line | cut -d ' ' -f 4) + sha2=$(echo $line | cut -d ' ' -f 5) + + # XXX: The diff format is currently pretty ugly; + # ideally, we should print the sha1 and mode at the + # +++ and --- lines, but + + if [ "$mode1" != "$mode2" ] || [ "$sha1" != "$sha2" ]; then + echo "Index: $name" + echo "===================================================================" + + loc1="$diffdir1/$name" + loc2="$diffdir2/$name" + mkdir -p $(dirname $loc1) $(dirname $loc2) + + label1=$(mkdiff "$loc1" $id1 "$name" $mode1 $sha1) + label2=$(mkdiff "$loc2" $id2 "$name" $mode2 $sha2) + + diff -L "$label1" -L "$label2" -u "$loc1" "$loc2" + fi + done +} + +rm -f "$tree1ls" "$tree2ls" +rm -rf "$diffdir" Index: gitlog.sh =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitlog.sh +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitlog.sh (mode:100755 sha1:e7a4eed8c0526821d00b08094c73fabb72eff4df) @@ -0,0 +1,61 @@ +#!/bin/sh +#### +#### Call this script with an object and it will produce the change +#### information for all the parents of that object +#### +#### This script was originally written by Ross Vandegrift. +# multiple parents test 1d0f4aec21e5b66c441213643426c770dc6dedc0 +# parents: ffa098b2e187b71b86a76d3cd5eb77d074a2503c +# 6860e0d9197c7f52155466c225baf39b42d62f63 + +# regex for parent declarations +PARENTS="^parent [A-z0-9]{40}$" + +TMPCL="/tmp/gitlog.$$" + +# takes an object and generates the object's parent(s) +function unpack_parents () { + echo "me $1" + echo "me $1" >>$TMPCL + RENTS="" + + TMPCM=$(mktemp) + cat-file commit $1 >$TMPCM + while read line; do + if echo "$line" | egrep -q "$PARENTS"; then + RENTS="$RENTS "$(echo $line | sed 's/parent //g') + fi + echo $line + done <$TMPCM + rm $TMPCM + + echo -e "\n--------------------------\n" + + # if the last object had no parents, return + if [ ! "$RENTS" ]; then + return; + fi + + #useful for testing + #echo $RENTS + #read + for i in `echo $RENTS`; do + # break cycles + if grep -q "me $i" $TMPCL; then + echo "Already visited $i" >&2 + continue + else + unpack_parents $i + fi + done +} + +base=$1 +if [ ! "$base" ]; then + base=$(cat .dircache/HEAD) +fi + +rm -f $TMPCL +unpack_parents $base +rm -f $TMPCL + Index: gitls.sh =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitls.sh +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitls.sh (mode:100755 sha1:4fe78b764ac0ab3cdb16631bbfdd65edb138e47b) @@ -0,0 +1,22 @@ +#!/bin/sh +# +# List contents of a particular tree in a GIT repository. +# Copyright (c) Petr Baudis, 2005 +# +# Optionally takes commit or tree id as a parameter, defaulting to HEAD. + +TREE="^tree [A-z0-9]{40}$" + +id=$1 +if [ ! "$id" ]; then + id=$(cat .dircache/HEAD) +fi +if [ $(cat-file -t "$id") = "commit" ]; then + id=$(cat-file commit $id | egrep "$TREE" | cut -d ' ' -f 2) +fi +if [ ! $(cat-file -t "$id") = "tree" ]; then + echo "Invalid ID supplied: $id" >&2 + exit +fi + +ls-tree "$id" Index: init-db.c =================================================================== Index: ls-tree.c =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/ls-tree.c +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/ls-tree.c (mode:100644 sha1:ed5b82cd7f41c3ea4140fa1ee4b80b786f190151) @@ -0,0 +1,51 @@ +/* + * GIT - The information manager from hell + * + * Copyright (C) Linus Torvalds, 2005 + */ +#include "cache.h" + +static int list(unsigned char *sha1) +{ + void *buffer; + unsigned long size; + char type[20]; + + buffer = read_sha1_file(sha1, type, &size); + if (!buffer) + usage("unable to read sha1 file"); + if (strcmp(type, "tree")) + usage("expected a 'tree' node"); + while (size) { + int len = strlen(buffer)+1; + unsigned char *sha1 = buffer + len; + char *path = strchr(buffer, ' ')+1; + unsigned int mode; + + if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1) + usage("corrupt 'tree' file"); + buffer = sha1 + 20; + size -= len + 20; + /* XXX: We just assume the type is "blob" as it should be. + * It seems worthless to read each file just to get this + * and the file size. -- pasky@ucw.cz */ + printf("%03o\t%s\t%s\t%s\n", mode, "blob", sha1_to_hex(sha1), path); + } + return 0; +} + +int main(int argc, char **argv) +{ + unsigned char sha1[20]; + + if (argc != 2) + usage("ls-tree <key>"); + if (get_sha1_hex(argv[1], sha1) < 0) + usage("ls-tree <key>"); + sha1_file_directory = getenv(DB_ENVIRONMENT); + if (!sha1_file_directory) + sha1_file_directory = DEFAULT_DB_ENVIRONMENT; + if (list(sha1) < 0) + usage("list failed"); + return 0; +} Index: parent-id =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/parent-id +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/parent-id (mode:100755 sha1:198c551b7367988b48aa7a69876e098d73c19e88) @@ -0,0 +1,15 @@ +#!/bin/sh +# +# Get ID of parent commit to a given revision or HEAD. +# Copyright (c) Petr Baudis, 2005 +# +# Takes ID of the current commit, defaults to HEAD. + +PARENT="^parent [A-z0-9]{40}$" + +id=$1 +if [ ! "$id" ]; then + id=$(cat .dircache/HEAD) +fi + +cat-file commit $id | egrep "$PARENT" | cut -d ' ' -f 2 Index: read-cache.c =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/read-cache.c (mode:100664 sha1:e51c9ee84874b5ff0f22b11dcd4fe1f905e72a5e) +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/read-cache.c (mode:100644 sha1:3dbe6db46933683721ceafdcdd70da521a32269a) @@ -264,11 +264,12 @@ size = 0; // avoid gcc warning map = (void *)-1; if (!fstat(fd, &st)) { - map = NULL; size = st.st_size; errno = EINVAL; if (size > sizeof(struct cache_header)) map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + else + return (!hdr->entries) ? 0 : error("inconsistent cache"); } close(fd); if (-1 == (int)(long)map) Index: read-tree.c =================================================================== Index: show-diff.c =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/show-diff.c (mode:100664 sha1:45f6e3140b3923497fdec808aec0e86ecf358b92) +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/show-diff.c (mode:100644 sha1:9beda1382103df29914d965fc135def0e6e7e839) @@ -49,9 +49,17 @@ int main(int argc, char **argv) { + int silent = 0; int entries = read_cache(); int i; + while (argc-- > 1) { + if (!strcmp(argv[1], "-s")) + silent = 1; + else if (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) + usage("show-diff [-s]"); + } + if (entries < 0) { perror("read_cache"); exit(1); @@ -77,6 +85,9 @@ for (n = 0; n < 20; n++) printf("%02x", ce->sha1[n]); printf("\n"); + if (silent) + continue; + new = read_sha1_file(ce->sha1, type, &size); show_differences(ce, &st, new, size); free(new); Index: update-cache.c =================================================================== --- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/update-cache.c (mode:100664 sha1:9dcee6f628d5accaa5219f72a2e790c082d9dd9a) +++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/update-cache.c (mode:100644 sha1:916430a05a9da088dae1ea82eb8d5392033f548a) @@ -231,6 +231,9 @@ return -1; } + if (argc < 2) + usage("update-cache <file>*"); + newfd = open(".dircache/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600); if (newfd < 0) { perror("unable to create new cachefile"); Index: write-tree.c =================================================================== ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Re: Kernel SCM saga.. 2005-04-09 2:53 ` Petr Baudis 2005-04-09 7:08 ` Randy.Dunlap @ 2005-04-10 1:01 ` Phillip Lougher 2005-04-10 1:42 ` Petr Baudis 1 sibling, 1 reply; 201+ messages in thread From: Phillip Lougher @ 2005-04-10 1:01 UTC (permalink / raw) To: Linus Torvalds, ross, Kernel Mailing List; +Cc: rddunlap, Phil Lougher On Apr 9, 2005 3:53 AM, Petr Baudis <pasky@ucw.cz> wrote: > FWIW, I made few small fixes (to prevent some trivial usage errors to > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and > gitlog.sh - heavily inspired by what already went through the mailing > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/ > (including .dircache, even though it isn't shown in the index), the > cumulative patch can be found below. The scripts aim to provide some > (obviously very interim) more high-level interface for git. I did a bit of playing about with the changelog generate script, trying to produce a faster version. The attached version uses a couple of improvements to be a lot faster (e.g. no recursion in the common case of one parent). FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my hardware. You mileage may of course vary. Regards Phillip -------------------------------------- #!/bin/sh changelog() { local parents new_parent declare -a new_parent new_parent[0]=$1 parents=1 while [ $parents -gt 0 ]; do parent=${new_parent[$((parents-1))]} echo $parent >> $TMP cat-file commit $parent > $TMP_FILE echo me $parent cat $TMP_FILE echo -e "\n--------------------------\n" parents=0 while read type text; do if [ $type = 'committer' ]; then break; elif [ $type = 'parent' ] && ! grep -q $text $TMP ; then new_parent[$parents]=$text parents=$((parents+1)) fi done < $TMP_FILE i=0 while [ $i -lt $((parents-1)) ]; do changelog ${new_parent[$i]} i=$((i+1)) done done } TMP=`mktemp` TMP_FILE=`mktemp` base=$1 if [ ! "$base" ]; then base=$(cat .dircache/HEAD) fi changelog $base rm -rf $TMP $TMP_FILE ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Re: Re: Kernel SCM saga.. 2005-04-10 1:01 ` Phillip Lougher @ 2005-04-10 1:42 ` Petr Baudis 2005-04-10 1:57 ` Phillip Lougher 0 siblings, 1 reply; 201+ messages in thread From: Petr Baudis @ 2005-04-10 1:42 UTC (permalink / raw) To: Phillip Lougher; +Cc: Linus Torvalds, ross, Kernel Mailing List Dear diary, on Sun, Apr 10, 2005 at 03:01:12AM CEST, I got a letter where Phillip Lougher <phil.lougher@gmail.com> told me that... > On Apr 9, 2005 3:53 AM, Petr Baudis <pasky@ucw.cz> wrote: > > > FWIW, I made few small fixes (to prevent some trivial usage errors to > > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and > > gitlog.sh - heavily inspired by what already went through the mailing > > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/ > > (including .dircache, even though it isn't shown in the index), the > > cumulative patch can be found below. The scripts aim to provide some > > (obviously very interim) more high-level interface for git. > > I did a bit of playing about with the changelog generate script, > trying to produce a faster version. The attached version uses a > couple of improvements to be a lot faster (e.g. no recursion in the > common case of one parent). > > FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and > 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my > hardware. You mileage may of course vary. Wow, really impressive! Great work, I've merged it (if you don't object, of course). Wondering why I wasn't in the Cc list, BTW. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Re: Re: Kernel SCM saga.. 2005-04-10 1:42 ` Petr Baudis @ 2005-04-10 1:57 ` Phillip Lougher 0 siblings, 0 replies; 201+ messages in thread From: Phillip Lougher @ 2005-04-10 1:57 UTC (permalink / raw) To: Phillip Lougher, Linus Torvalds, ross, Kernel Mailing List, pasky On Apr 10, 2005 2:42 AM, Petr Baudis <pasky@ucw.cz> wrote: > Dear diary, on Sun, Apr 10, 2005 at 03:01:12AM CEST, I got a letter > where Phillip Lougher <phil.lougher@gmail.com> told me that... > > On Apr 9, 2005 3:53 AM, Petr Baudis <pasky@ucw.cz> wrote: > > > > > FWIW, I made few small fixes (to prevent some trivial usage errors to > > > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and > > > gitlog.sh - heavily inspired by what already went through the mailing > > > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/ > > > (including .dircache, even though it isn't shown in the index), the > > > cumulative patch can be found below. The scripts aim to provide some > > > (obviously very interim) more high-level interface for git. > > > > I did a bit of playing about with the changelog generate script, > > trying to produce a faster version. The attached version uses a > > couple of improvements to be a lot faster (e.g. no recursion in the > > common case of one parent). > > > > FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and > > 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my > > hardware. You mileage may of course vary. > > Wow, really impressive! Great work, I've merged it (if you don't object, > of course). Of course I don't object... > > Wondering why I wasn't in the Cc list, BTW. Weird, it wasn't intentional. I read LKML in Gmail (which I don't use for much else), and just clicked "reply", expecting to do the right thing. Replying to this email it's also left you off the CC list. Looking at the email source I believe it's probably to do with the following: Mail-Followup-To: Linus Torvalds <torvalds@osdl.org>, ross@jose.lug.udel.edu, Kernel Mailing List <linux-kernel@vger.kernel.org>> I've CC'd you explicitly on this. Phillip ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 15:50 ` Linus Torvalds 2005-04-09 2:53 ` Petr Baudis @ 2005-04-09 15:50 ` Paul Jackson 2005-04-09 16:26 ` Linus Torvalds 1 sibling, 1 reply; 201+ messages in thread From: Paul Jackson @ 2005-04-09 15:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: ross, cw, linux-kernel > in order to avoid having to worry about special characters > they are NUL-terminated) Would this be a possible alternative - newline terminated (convert any newlines embedded in filenames to the 3 chars '%0A', and leave it as an exercise to the reader to de-convert them.) Line formatted ASCII files are really nice - worth pissing on embedded newlines in paths to obtain. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 15:50 ` Paul Jackson @ 2005-04-09 16:26 ` Linus Torvalds 2005-04-09 17:08 ` Paul Jackson ` (2 more replies) 0 siblings, 3 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-09 16:26 UTC (permalink / raw) To: Paul Jackson; +Cc: ross, cw, linux-kernel On Sat, 9 Apr 2005, Paul Jackson wrote: > > > in order to avoid having to worry about special characters > > they are NUL-terminated) > > Would this be a possible alternative - newline terminated (convert any > newlines embedded in filenames to the 3 chars '%0A', and leave it as an > exercise to the reader to de-convert them.) Sure, you could obviously do escaping (you need to remember to escape '%' too when you do that ;). However, whenever you do escaping, that means that you're already going to have to use a tool to unpack the dang thing. So you didn't actually win anything. I pretty much guarantee that my existing format is easier to unpack than your escaped format. ASCII isn't magical. This is "fsck_tree()", which walks the unpacked tree representation and checks that it looks sane and marks the sha1's it finds as being needed (so that you can do reachability analysis in a second pass). It's not exactly complicated: static int fsck_tree(unsigned char *sha1, void *data, unsigned long size) { while (size) { int len = 1+strlen(data); unsigned char *file_sha1 = data + len; char *path = strchr(data, ' '); if (size < len + 20 || !path) return -1; data += len + 20; size -= len + 20; mark_needs_sha1(sha1, "blob", file_sha1); } return 0; } and there's one HUGE advantage to _not_ having escaping: sorting and comparing. If you escape things, you now have to decide how you sort filenames. Do you sort them by the escaped representation, or by the "raw" representation? Do you always have to escape or unescape the name in order to sort it. So I like ASCII as much as the next guy, but it's not a religion. If there isn't any point to it, there isn't any point to it. The biggest irritation I have with the "tree" format I chose is actually not the name (which is trivial), it's the <sha1> part. Almost everything else keeps the <sha1> in the ASCII hexadecimal representation, and I should have done that here too. Why? Not because it's a <sha1> - hey, the binary representation is certainly denser and equivalent - but because an ASCII representation there would have allowed me to much more easily change the key format if I ever wanted to. Now it's very SHA1-specific. Which I guess is fine - I don't really see any reason to change, and if I do change, I could always just re-generate the whole tree. But I think it would have been cleaner to have _that_ part in ASCII. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:26 ` Linus Torvalds @ 2005-04-09 17:08 ` Paul Jackson 2005-04-10 3:41 ` Paul Jackson 2005-04-10 8:39 ` David Lang 2 siblings, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-09 17:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: ross, cw, linux-kernel Linus wrote: > (you need to remember to escape '%' > too when you do that ;). No - don't have to. Not if I don't mind giving fools that embed newlines in paths second class service. In my case, if I create a file named "foo\nbar", then backup and restore it, I end up with a restored file named "foo%0Abar". If I had backed up another file named "foo%0Abar", and now restore it, it collides, and last one to be restored wins. If I really need the "foo\nbar" file back as originally named, I will have to dig it out by hand. I dare say that Linux kernel source does not require first class support for newlines embedded in pathnames. > ASCII isn't magical. No - but it's damn convenient. Alot of tools work on line-oriented ASCII that don't work elsewhere. I guess Perl-hackers won't care much, but those working with either classic shell script tools or Python will find line formatted ASCII more convenient. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:26 ` Linus Torvalds 2005-04-09 17:08 ` Paul Jackson @ 2005-04-10 3:41 ` Paul Jackson 2005-04-10 8:39 ` David Lang 2 siblings, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-10 3:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: ross, cw, linux-kernel Linus wrote: > Almost everything > else keeps the <sha1> in the ASCII hexadecimal representation, and I > should have done that here too. Why? Not because it's a <sha1> - hey, the > binary representation is certainly denser and equivalent Since the size of <compressed> ASCII sha1's is only about 18% larger than the size of the same number of binary sha1's <compressed or not>, I don't see you gain much from the binary. I cast my non-existent vote for making the sha1 ascii - while you still can ;). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-09 16:26 ` Linus Torvalds 2005-04-09 17:08 ` Paul Jackson 2005-04-10 3:41 ` Paul Jackson @ 2005-04-10 8:39 ` David Lang 2005-04-10 9:40 ` Junio C Hamano 2 siblings, 1 reply; 201+ messages in thread From: David Lang @ 2005-04-10 8:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, ross, cw, linux-kernel On Sat, 9 Apr 2005, Linus Torvalds wrote: > > The biggest irritation I have with the "tree" format I chose is actually > not the name (which is trivial), it's the <sha1> part. Almost everything > else keeps the <sha1> in the ASCII hexadecimal representation, and I > should have done that here too. Why? Not because it's a <sha1> - hey, the > binary representation is certainly denser and equivalent - but because an > ASCII representation there would have allowed me to much more easily > change the key format if I ever wanted to. Now it's very SHA1-specific. > > Which I guess is fine - I don't really see any reason to change, and if I > do change, I could always just re-generate the whole tree. But I think it > would have been cleaner to have _that_ part in ASCII. > just wanted to point out that recent news shows that sha1 isn't as good as it was thought to be (far easier to deliberatly create collisions then it should be) this hasn't reached a point where you HAVE to quit useing it (especially since you have the other validity checks in place), but it's a good reason to expect that you may want to change to something else in a few years. it's a lot easier to change things now to make that move easier then once this is being used extensively David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 8:39 ` David Lang @ 2005-04-10 9:40 ` Junio C Hamano 2005-04-10 16:46 ` Bill Davidsen 0 siblings, 1 reply; 201+ messages in thread From: Junio C Hamano @ 2005-04-10 9:40 UTC (permalink / raw) To: David Lang; +Cc: linux-kernel >>>>> "DL" == David Lang <dlang@digitalinsight.com> writes: DL> just wanted to point out that recent news shows that sha1 isn't as DL> good as it was thought to be (far easier to deliberatly create DL> collisions then it should be) I suspect there is no need to do so... Message-ID: <Pine.LNX.4.58.0504090902170.1267@ppc970.osdl.org> From: Linus Torvalds <torvalds@osdl.org> Subject: Re: Kernel SCM saga.. Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT) ... Linus (*) yeah, yeah, I know about the current theoretical case, and I don't care. Not only is it theoretical, the way my objects are packed you'd have to not just generate the same SHA1 for it, it would have to _also_ still be a valid zlib object _and_ get the header to match the "type + length" of object part. IOW, the object validity checks are actually even stricter than just "sha1 matches". ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 9:40 ` Junio C Hamano @ 2005-04-10 16:46 ` Bill Davidsen 2005-04-10 17:50 ` Paul Jackson 0 siblings, 1 reply; 201+ messages in thread From: Bill Davidsen @ 2005-04-10 16:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: David Lang, linux-kernel On Sun, 10 Apr 2005, Junio C Hamano wrote: > >>>>> "DL" == David Lang <dlang@digitalinsight.com> writes: > > DL> just wanted to point out that recent news shows that sha1 isn't as > DL> good as it was thought to be (far easier to deliberatly create > DL> collisions then it should be) > > I suspect there is no need to do so... It's possible to generate another object with the same hash, but: - you can't just take your desired object and do magic to make it hash right - it may not have the same length (almost certainly) - it's still non-trivial in terms of computation needed > > Message-ID: <Pine.LNX.4.58.0504090902170.1267@ppc970.osdl.org> > From: Linus Torvalds <torvalds@osdl.org> > Subject: Re: Kernel SCM saga.. > Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT) > > ... > > Linus > > (*) yeah, yeah, I know about the current theoretical case, and I don't > care. Not only is it theoretical, the way my objects are packed you'd have > to not just generate the same SHA1 for it, it would have to _also_ still > be a valid zlib object _and_ get the header to match the "type + length" > of object part. IOW, the object validity checks are actually even stricter > than just "sha1 matches". > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 16:46 ` Bill Davidsen @ 2005-04-10 17:50 ` Paul Jackson 2005-04-12 23:20 ` Pavel Machek 0 siblings, 1 reply; 201+ messages in thread From: Paul Jackson @ 2005-04-10 17:50 UTC (permalink / raw) To: Bill Davidsen; +Cc: junkio, dlang, linux-kernel > It's possible to generate another object with the same hash, but: Yeah - the real check is that the modified object has to compile and do something useful for someone (the cracker if no one else). Just getting a random bucket of bits substituted for a real kernel source file isn't going to get me into the cracker hall of fame, only into their odd-news of the day. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-10 17:50 ` Paul Jackson @ 2005-04-12 23:20 ` Pavel Machek 0 siblings, 0 replies; 201+ messages in thread From: Pavel Machek @ 2005-04-12 23:20 UTC (permalink / raw) To: Paul Jackson; +Cc: Bill Davidsen, junkio, dlang, linux-kernel Hi! > > It's possible to generate another object with the same hash, but: > > Yeah - the real check is that the modified object has to > compile and do something useful for someone (the cracker > if no one else). > > Just getting a random bucket of bits substituted for a > real kernel source file isn't going to get me into the > cracker hall of fame, only into their odd-news of the > day. I actually two different files with same md5 sum in my local CVS repository. It would be very wrong if CVS did not do the right thing with those files. Yes, I was playing with md5, see "md5 to be considered harmfull today". And I wanted old version of my "exploits" to be archived. Pavel -- Boycott Kodak -- for their patent abuse against Java. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:42 ` Linus Torvalds ` (2 preceding siblings ...) 2005-04-08 7:17 ` ross @ 2005-04-08 7:34 ` Marcel Lanz 2005-04-08 9:23 ` Geert Uytterhoeven 2005-04-08 8:38 ` Matt Johnston 2005-04-12 7:14 ` Kernel SCM saga.. (bk license?) Kedar Sovani 5 siblings, 1 reply; 201+ messages in thread From: Marcel Lanz @ 2005-04-08 7:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List git on sarge --- git-0.02/Makefile.orig 2005-04-07 23:06:19.000000000 +0200 +++ git-0.02/Makefile 2005-04-08 09:24:28.472672224 +0200 @@ -8,7 +8,7 @@ all: $(PROG) install: $(PROG) install $(PROG) $(HOME)/bin/ -LIBS= -lssl +LIBS= -lssl -lz init-db: init-db.o ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 7:34 ` Marcel Lanz @ 2005-04-08 9:23 ` Geert Uytterhoeven 0 siblings, 0 replies; 201+ messages in thread From: Geert Uytterhoeven @ 2005-04-08 9:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Fri, 8 Apr 2005, Marcel Lanz wrote: > git on sarge > > --- git-0.02/Makefile.orig 2005-04-07 23:06:19.000000000 +0200 > +++ git-0.02/Makefile 2005-04-08 09:24:28.472672224 +0200 > @@ -8,7 +8,7 @@ all: $(PROG) > install: $(PROG) > install $(PROG) $(HOME)/bin/ > > -LIBS= -lssl > +LIBS= -lssl -lz > > init-db: init-db.o > I found a few more `issues' after adding `-O3 -Wall'. Most are cosmetic, but the missing return value in remove_file_from_cache() is a real bug. Hmm, upon closer look the caller uses its return value in a weird way, so another bug may be hiding in add_file_to_cache(). Caveat: everything is untested, besides compilation ;-) diff -purN git-0.02.orig/Makefile git-0.02/Makefile --- git-0.02.orig/Makefile 2005-04-07 23:06:19.000000000 +0200 +++ git-0.02/Makefile 2005-04-08 11:02:02.000000000 +0200 @@ -1,4 +1,4 @@ -CFLAGS=-g +CFLAGS=-g -O3 -Wall CC=gcc PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file @@ -8,7 +8,7 @@ all: $(PROG) install: $(PROG) install $(PROG) $(HOME)/bin/ -LIBS= -lssl +LIBS= -lssl -lz init-db: init-db.o diff -purN git-0.02.orig/cat-file.c git-0.02/cat-file.c --- git-0.02.orig/cat-file.c 2005-04-07 23:15:17.000000000 +0200 +++ git-0.02/cat-file.c 2005-04-08 11:07:28.000000000 +0200 @@ -5,6 +5,8 @@ */ #include "cache.h" +#include <string.h> + int main(int argc, char **argv) { unsigned char sha1[20]; @@ -25,4 +27,5 @@ int main(int argc, char **argv) if (write(fd, buf, size) != size) strcpy(type, "bad"); printf("%s: %s\n", template, type); + exit(0); } diff -purN git-0.02.orig/commit-tree.c git-0.02/commit-tree.c --- git-0.02.orig/commit-tree.c 2005-04-07 23:15:17.000000000 +0200 +++ git-0.02/commit-tree.c 2005-04-08 11:06:08.000000000 +0200 @@ -6,6 +6,7 @@ #include "cache.h" #include <pwd.h> +#include <string.h> #include <time.h> #define BLOCKING (1ul << 14) diff -purN git-0.02.orig/init-db.c git-0.02/init-db.c --- git-0.02.orig/init-db.c 2005-04-07 23:15:17.000000000 +0200 +++ git-0.02/init-db.c 2005-04-08 11:07:33.000000000 +0200 @@ -5,10 +5,12 @@ */ #include "cache.h" +#include <string.h> + int main(int argc, char **argv) { char *sha1_dir = getenv(DB_ENVIRONMENT), *path; - int len, i, fd; + int len, i; if (mkdir(".dircache", 0700) < 0) { perror("unable to create .dircache"); @@ -25,7 +27,7 @@ int main(int argc, char **argv) if (sha1_dir) { struct stat st; if (!stat(sha1_dir, &st) < 0 && S_ISDIR(st.st_mode)) - return; + exit(1); fprintf(stderr, "DB_ENVIRONMENT set to bad directory %s: ", sha1_dir); } diff -purN git-0.02.orig/read-cache.c git-0.02/read-cache.c --- git-0.02.orig/read-cache.c 2005-04-07 23:23:43.000000000 +0200 +++ git-0.02/read-cache.c 2005-04-08 11:07:37.000000000 +0200 @@ -5,6 +5,8 @@ */ #include "cache.h" +#include <string.h> + const char *sha1_file_directory = NULL; struct cache_entry **active_cache = NULL; unsigned int active_nr = 0, active_alloc = 0; @@ -89,7 +91,7 @@ void * read_sha1_file(unsigned char *sha z_stream stream; char buffer[8192]; struct stat st; - int i, fd, ret, bytes; + int fd, ret, bytes; void *map, *buf; char *filename = sha1_file_name(sha1); @@ -173,7 +175,7 @@ int write_sha1_file(char *buf, unsigned int write_sha1_buffer(unsigned char *sha1, void *buf, unsigned int size) { char *filename = sha1_file_name(sha1); - int i, fd; + int fd; fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666); if (fd < 0) diff -purN git-0.02.orig/read-tree.c git-0.02/read-tree.c --- git-0.02.orig/read-tree.c 2005-04-08 04:58:44.000000000 +0200 +++ git-0.02/read-tree.c 2005-04-08 11:07:41.000000000 +0200 @@ -5,6 +5,8 @@ */ #include "cache.h" +#include <string.h> + static void create_directories(const char *path) { int len = strlen(path); @@ -72,7 +74,6 @@ static int unpack(unsigned char *sha1) int main(int argc, char **argv) { - int fd; unsigned char sha1[20]; if (argc != 2) diff -purN git-0.02.orig/show-diff.c git-0.02/show-diff.c --- git-0.02.orig/show-diff.c 2005-04-07 23:15:17.000000000 +0200 +++ git-0.02/show-diff.c 2005-04-08 11:07:44.000000000 +0200 @@ -5,6 +5,8 @@ */ #include "cache.h" +#include <string.h> + #define MTIME_CHANGED 0x0001 #define CTIME_CHANGED 0x0002 #define OWNER_CHANGED 0x0004 @@ -60,7 +62,6 @@ int main(int argc, char **argv) struct stat st; struct cache_entry *ce = active_cache[i]; int n, changed; - unsigned int mode; unsigned long size; char type[20]; void *new; diff -purN git-0.02.orig/update-cache.c git-0.02/update-cache.c --- git-0.02.orig/update-cache.c 2005-04-07 23:15:17.000000000 +0200 +++ git-0.02/update-cache.c 2005-04-08 11:08:55.000000000 +0200 @@ -5,6 +5,8 @@ */ #include "cache.h" +#include <string.h> + static int cache_name_compare(const char *name1, int len1, const char *name2, int len2) { int len = len1 < len2 ? len1 : len2; @@ -50,6 +52,7 @@ static int remove_file_from_cache(char * if (pos < active_nr) memmove(active_cache + pos, active_cache + pos + 1, (active_nr - pos - 1) * sizeof(struct cache_entry *)); } + return 0; } static int add_cache_entry(struct cache_entry *ce) @@ -250,4 +253,5 @@ int main(int argc, char **argv) return 0; out: unlink(".dircache/index.lock"); + exit(0); } diff -purN git-0.02.orig/write-tree.c git-0.02/write-tree.c --- git-0.02.orig/write-tree.c 2005-04-07 23:15:17.000000000 +0200 +++ git-0.02/write-tree.c 2005-04-08 11:07:51.000000000 +0200 @@ -5,6 +5,8 @@ */ #include "cache.h" +#include <string.h> + static int check_valid_sha1(unsigned char *sha1) { char *filename = sha1_file_name(sha1); @@ -31,7 +33,7 @@ static int prepend_integer(char *buffer, int main(int argc, char **argv) { - unsigned long size, offset, val; + unsigned long size, offset; int i, entries = read_cache(); char *buffer; Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:42 ` Linus Torvalds ` (3 preceding siblings ...) 2005-04-08 7:34 ` Marcel Lanz @ 2005-04-08 8:38 ` Matt Johnston 2005-04-12 7:14 ` Kernel SCM saga.. (bk license?) Kedar Sovani 5 siblings, 0 replies; 201+ messages in thread From: Matt Johnston @ 2005-04-08 8:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote: > > On Thu, 7 Apr 2005, Chris Wedgwood wrote: > > > > I'm playing with monotone right now. Superficially it looks like it > > has tons of gee-whiz neato stuff... however, it's *agonizingly* slow. > > I mean glacial. A heavily sedated sloth with no legs is probably > > faster. > > Yes. The silly thing is, at least in my local tests it doesn't actually > seem to be _doing_ anything while it's slow (there are no system calls > except for a few memory allocations and de-allocations). It seems to have > some exponential function on the number of pathnames involved etc. > > I'm hoping they can fix it, though. The basic notions do not sound wrong. That is indeed correct wrt pathnames. The current head of monotone is a lot better in this regard (the order of 2-3 minutes for "monotone import" on a 2.6 linux untar). The basic problem is that in the last release (0.17), a huge amount of sanity checking code was added to ensure that inconsistent or generally bad revisions can never be written/received/transmitted. The focus is now on speeding that up - there's a _lot_ of low hanging fruit for us to look at. Matt ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. (bk license?) 2005-04-08 4:42 ` Linus Torvalds ` (4 preceding siblings ...) 2005-04-08 8:38 ` Matt Johnston @ 2005-04-12 7:14 ` Kedar Sovani 2005-04-12 9:34 ` Catalin Marinas 2005-04-13 4:04 ` Ricky Beam 5 siblings, 2 replies; 201+ messages in thread From: Kedar Sovani @ 2005-04-12 7:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List I was wondering if working on git, is in anyway, in violation of the Bitkeeper license, which states that you cannot work on any other SCM (SCM-like?) tool for "x" amount of time after using Bitkeeper ? Kedar. On Apr 8, 2005 10:12 AM, Linus Torvalds <torvalds@osdl.org> wrote: > > > On Thu, 7 Apr 2005, Chris Wedgwood wrote: > > > > I'm playing with monotone right now. Superficially it looks like it > > has tons of gee-whiz neato stuff... however, it's *agonizingly* slow. > > I mean glacial. A heavily sedated sloth with no legs is probably > > faster. > > Yes. The silly thing is, at least in my local tests it doesn't actually > seem to be _doing_ anything while it's slow (there are no system calls > except for a few memory allocations and de-allocations). It seems to have > some exponential function on the number of pathnames involved etc. > > I'm hoping they can fix it, though. The basic notions do not sound wrong. > > In the meantime (and because monotone really _is_ that slow), here's a > quick challenge for you, and any crazy hacker out there: if you want to > play with something _really_ nasty (but also very _very_ fast), take a > look at kernel.org:/pub/linux/kernel/people/torvalds/. > > First one to send me the changelog tree of sparse-git (and a tool to > commit and push/pull further changes) gets a gold star, and an honorable > mention. I've put a hell of a lot of clues in there (*). > > I've worked on it (and little else) for the last two days. Time for > somebody else to tell me I'm crazy. > > Linus > > (*) It should be easier than it sounds. The database is designed so that > you can do the equivalent of a nonmerging (ie pure superset) push/pull > with just plain rsync, so replication really should be that easy (if > somewhat bandwidth-intensive due to the whole-file format). > > Never mind merging. It's not an SCM, it's a distribution and archival > mechanism. I bet you could make a reasonable SCM on top of it, though. > Another way of looking at it is to say that it's really a content- > addressable filesystem, used to track directory trees. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. (bk license?) 2005-04-12 7:14 ` Kernel SCM saga.. (bk license?) Kedar Sovani @ 2005-04-12 9:34 ` Catalin Marinas 2005-04-13 4:04 ` Ricky Beam 1 sibling, 0 replies; 201+ messages in thread From: Catalin Marinas @ 2005-04-12 9:34 UTC (permalink / raw) To: Kedar Sovani; +Cc: Linus Torvalds, Chris Wedgwood, Kernel Mailing List Kedar Sovani <kedars@gmail.com> wrote: > I was wondering if working on git, is in anyway, in violation of the > Bitkeeper license, which states that you cannot work on any other SCM > (SCM-like?) tool for "x" amount of time after using Bitkeeper ? That's valid for the new BK license only which probably wasn't accepted by Linus. -- Catalin ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. (bk license?) 2005-04-12 7:14 ` Kernel SCM saga.. (bk license?) Kedar Sovani 2005-04-12 9:34 ` Catalin Marinas @ 2005-04-13 4:04 ` Ricky Beam 1 sibling, 0 replies; 201+ messages in thread From: Ricky Beam @ 2005-04-13 4:04 UTC (permalink / raw) To: Kedar Sovani; +Cc: Kernel Mailing List On Tue, 12 Apr 2005, Kedar Sovani wrote: >I was wondering if working on git, is in anyway, in violation of the >Bitkeeper license, which states that you cannot work on any other SCM >(SCM-like?) tool for "x" amount of time after using Bitkeeper ? Technically, yes, it is. However, as BitMover has given the community little other choice, I don't see how they could hold anyone to it. They'd have a hard time making that 1year clause stick given their abandonment of the free product and refusal to grant licenses to OSDL employees. Plus, there's nothing in the bkl specifically granting BitMover the right to revoke the license and thus use of BK/Free at their whim. They have every right to stop developing, supporting, and distributing BK/Free, but recending all BK/Free licenses just for spite does not appear to be within their legal rights. (Sorry Larry, but that's what you're doing. Tridge was working on taking your toys apart -- he does that, what can I say. He explicitly lied and said he would stop, but of course didn't. And then you got all pissed at OSDL for not smiting him when, technically, they can't -- an employer is not responsible for the actions of their employees on their own time, on their own property, unrelated to their employ. Sorry, but I know that one by heart :-)) --Ricky ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 4:13 ` Chris Wedgwood 2005-04-08 4:42 ` Linus Torvalds @ 2005-04-08 11:42 ` Catalin Marinas 1 sibling, 0 replies; 201+ messages in thread From: Catalin Marinas @ 2005-04-08 11:42 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Kernel Mailing List Chris Wedgwood <cw@f00f.org> wrote: > I'm playing with monotone right now. Superficially it looks like it > has tons of gee-whiz neato stuff... however, it's *agonizingly* slow. > I mean glacial. A heavily sedated sloth with no legs is probably > faster. I tried some time ago to import the BKCVS revisions since Linux 2.6.9 into a monotone-0.16 repository. I later tried to upgrade the database (repository) to monotone version 0.17. The result - converting ~3500 revisions would have taken more than *one year*, fact confirmed by the monotone developers. The bottleneck seemed to be the big size of the manifest (which stores the file names and the corresponding SHA1 values) and all the validation performed when converting. The solution, unsafe, is to disable the revision checks in monotone but you can end up with an inconsistent repository (haven't tried this). -- Catalin ^ permalink raw reply [flat|nested] 201+ messages in thread
[parent not found: <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org>]
* Re: Kernel SCM saga.. [not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org> @ 2005-04-06 21:13 ` kfogel 2005-04-06 22:39 ` Jeff Garzik 2005-04-09 1:00 ` Marcin Dalecki 0 siblings, 2 replies; 201+ messages in thread From: kfogel @ 2005-04-06 21:13 UTC (permalink / raw) To: linux-kernel Linus Torvalds wrote: > PS. Don't bother telling me about subversion. If you must, start reading > up on "monotone". That seems to be the most viable alternative, but don't > pester the developers so much that they don't get any work done. They are > already aware of my problems ;) By the way, the Subversion developers have no argument with the claim that Subversion would not be the right choice for Linux kernel development. We've written an open letter entitled "Please Stop Bugging Linus Torvalds About Subversion" to explain why: http://subversion.tigris.org/subversion-linus.html Best, -Karl Fogel (on behalf of the Subversion team) ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 21:13 ` kfogel @ 2005-04-06 22:39 ` Jeff Garzik 2005-04-09 1:00 ` Marcin Dalecki 1 sibling, 0 replies; 201+ messages in thread From: Jeff Garzik @ 2005-04-06 22:39 UTC (permalink / raw) To: kfogel; +Cc: linux-kernel kfogel@collab.net wrote: > Linus Torvalds wrote: > >>PS. Don't bother telling me about subversion. If you must, start reading >>up on "monotone". That seems to be the most viable alternative, but don't >>pester the developers so much that they don't get any work done. They are >>already aware of my problems ;) > > > By the way, the Subversion developers have no argument with the claim > that Subversion would not be the right choice for Linux kernel > development. We've written an open letter entitled "Please Stop > Bugging Linus Torvalds About Subversion" to explain why: > > http://subversion.tigris.org/subversion-linus.html A thoughtful post. Thanks for writing this. Jeff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-06 21:13 ` kfogel 2005-04-06 22:39 ` Jeff Garzik @ 2005-04-09 1:00 ` Marcin Dalecki 1 sibling, 0 replies; 201+ messages in thread From: Marcin Dalecki @ 2005-04-09 1:00 UTC (permalink / raw) To: kfogel; +Cc: linux-kernel On 2005-04-06, at 23:13, kfogel@collab.net wrote: > Linus Torvalds wrote: >> PS. Don't bother telling me about subversion. If you must, start >> reading >> up on "monotone". That seems to be the most viable alternative, but >> don't >> pester the developers so much that they don't get any work done. They >> are >> already aware of my problems ;) > > By the way, the Subversion developers have no argument with the claim > that Subversion would not be the right choice for Linux kernel > development. We've written an open letter entitled "Please Stop > Bugging Linus Torvalds About Subversion" to explain why: > > http://subversion.tigris.org/subversion-linus.html Thumbs up "Subverters"! I just love you. I love your attitude toward high engineering quality. And I appreciate actually very much what you provide as software. Both: from function and in terms of quality of implementation. ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga..
@ 2005-04-08 22:27 Rajesh Venkatasubramanian
2005-04-08 23:29 ` Linus Torvalds
0 siblings, 1 reply; 201+ messages in thread
From: Rajesh Venkatasubramanian @ 2005-04-08 22:27 UTC (permalink / raw)
To: torvalds, linux-kernel
Linus wrote:
>> It looks like an operation like "show me the history of mm/memory.c" will
>> be pretty expensive using git.
>
> Yes. Per-file history is expensive in git, because if the way it is
> indexed. Things are indexed by tree and by changeset, and there are no
> per-file indexes.
Although directory changes are tracked using change-sets, there
seems to be no easy way to answer "give me the diff corresponding to
the commit (change-set) object <sha1>". That will be really helpful to
review the changes.
Rajesh
^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 22:27 Rajesh Venkatasubramanian @ 2005-04-08 23:29 ` Linus Torvalds 2005-04-09 0:29 ` Linus Torvalds 2005-04-09 16:20 ` Paul Jackson 0 siblings, 2 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-08 23:29 UTC (permalink / raw) To: Rajesh Venkatasubramanian; +Cc: linux-kernel On Fri, 8 Apr 2005, Rajesh Venkatasubramanian wrote: > > Although directory changes are tracked using change-sets, there > seems to be no easy way to answer "give me the diff corresponding to > the commit (change-set) object <sha1>". That will be really helpful to > review the changes. Actually, it is very easy indeed. Here's what you do: - look up the commit object ("cat-file commit <sha1>") This object starts out with "tree <sha1>", followed by a list of parent commit objects: "parent <sha1>" Remember the tree object (it defines what the tree looks like at the time of the commit). Pick the parent object you want to diff against (normally the first one). Also, print the checking messages at the end of the commit object. - look up the parent object ("cat-file commit <parentsha1>") Here you have the same kind of object, but this time you don't care about going deeper, you just pick up the tree <sha1> that describes the tree at the parent. - look up the two tree objects. Unlike a commit object, a tree object is a binary data blob, but the format is an _extremely_ simple table of thse guys: <ascii octal filemode> <space> <pathname> <NUL character> <20-byte sha1> and the reason it's binary is really that that way "git" doesn't end up having any issues with strange pathnames. If you want to have spaces and newlines in your pathname, go wild. In particular, the tree object is also _sorted_ by the pathname. This makes things simple, because you now have to sorted trees, and the first thing you do is just walk the two trees in lock-step, which is trivial thanks to the sorted nature of the tree "array". So now you have three cases: - you have the same name, and the same sha1 ignore it - the file didn't change, you don't even have to look at the contents (although if the file mode changed you might want to note that) - you have the same name in parent and child tree lists, but the sha differs. Now you just need to do a "cat-file" on both of the SHA1 values, and do a "diff -u" between them. - you have the filename in only parent or only child. Do a "create" or "delete" diff with the content of the sha1 file. See? Very efficient. For any files that didn't change, you didn't have to do anything at all - you didn't even have to look at their data. Also note that the above algorithm really works for _any_ two commit points (apart for the two first steps, which are obviously all about finding the parent tree when you want to diff against a predecessor). It doesn't have to be parent and child. Pick any commit you have. And pick them in the other order, and you'll automatically get the reverse diff. You can even do diffs between unrelated projects this way if you use the shared sha1 directory model, although that obviously doesn't tend to be all that sensible ;) Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 23:29 ` Linus Torvalds @ 2005-04-09 0:29 ` Linus Torvalds 2005-04-09 16:20 ` Paul Jackson 1 sibling, 0 replies; 201+ messages in thread From: Linus Torvalds @ 2005-04-09 0:29 UTC (permalink / raw) To: Rajesh Venkatasubramanian; +Cc: linux-kernel On Fri, 8 Apr 2005, Linus Torvalds wrote: > > Also note that the above algorithm really works for _any_ two commit > points (apart for the two first steps, which are obviously all about > finding the parent tree when you want to diff against a predecessor). Btw, if you want to try this, you should get an updated copy. I've pushed a "raw" git archive of both git and sparse (the latter is much more interesting from an archive standpoint, since it actually has 1400 changesets in it) to kernel.org, but I'm not convinced it gets mirrored out. I think the mirror scripts may mirror only things they understand. I've also added a partial "fsck" for the "git filesystem". It doesn't do the connectivity analysis yet, but that should be pretty straightforward to add - it already parses all the data, it just doesn't save it away (and the connectivity analysis will automatically show how many "root" changesets you have, and what the different HEADs are). I'll make a tar-file (git-0.03), although at this point I've actually been maintaining it in itself, so to some degree it's almost getting easier if I'd just have a place to rsync it.. Linus ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. 2005-04-08 23:29 ` Linus Torvalds 2005-04-09 0:29 ` Linus Torvalds @ 2005-04-09 16:20 ` Paul Jackson 1 sibling, 0 replies; 201+ messages in thread From: Paul Jackson @ 2005-04-09 16:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: vrajesh, linux-kernel Linus wrote: > If you want to have spaces > and newlines in your pathname, go wild. So long as there is only one pathname in a record, you don't need nul-terminators to be allow spaces in the name. The rest of the record is well known, so the pathname is just whatever is left after chomping off the rest of the record. It's only the support for embedded newlines that forces you to use nul-terminators. Not worth it - in my view. Rather, do just enough hackery that such a pathname doesn't break you, even if it means not giving full service to such names. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga..
@ 2005-04-09 4:06 Walter Landry
0 siblings, 0 replies; 201+ messages in thread
From: Walter Landry @ 2005-04-09 4:06 UTC (permalink / raw)
To: linux-kernel; +Cc: arx-users
Linus Torvalds wrote:
> Which is why I'd love to hear from people who have actually used
> various SCM's with the kernel. There's bound to be people who have
> already tried.
At the end of my Codecon talk, there is a performance comparison of a
number of different distributed SCM's with the kernel.
http://superbeast.ucsd.edu/~landry/ArX/codecon/codecon.html
I develop ArX (http://www.nongnu.org/arx). You may find it of
interest ;)
Cheers,
Walter Landry
wlandry@ucsd.edu
^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. @ 2005-04-09 11:02 Samium Gromoff 0 siblings, 0 replies; 201+ messages in thread From: Samium Gromoff @ 2005-04-09 11:02 UTC (permalink / raw) To: martin; +Cc: linux-kernel Ok, this was literally screaming for a rebuttal! :-) > Arch isn't a sound example of software design. Quite contrary to the > random notes posted by it's author the following issues did strike me > the time I did evaluate it: (Note that here you take a stab at the Arch design fundamentals, but actually fail to substantiate it later) > The application (tla) claims to have "intuitive" command names. However > I didn't see that as given. Most of them where difficult to remember > and appeared to be just infantile. I stopped looking further after I > saw: [ UI issues snipped, not really core design ] Yes, some people perceive that there _are_ UI issues in Arch. However, as strange as it may sound, some don`t feel so. > As an added bonus it relies on the applications named by accident > patch and diff and installed on the host in question as well as few > other as well to > operate. This is called modularity and code reuse. And given that patch and diff are installed by default on all of the relevant developer machines i fail to see as to why it is by any measure a derogatory. (and the rest you speak about is tar and gzip) > Better don't waste your time with looking at Arch. Stick with patches > you maintain by hand combined with some scripts containing a list of > apply commands > and you should be still more productive then when using Arch. Sure, you should`ve had come up with something more based than that! :-) Now to the real design issues... Globally unique, meaningful, symbolic revision names -- the core of the Arch namespace. "Stone simple" on-disk format to store things -- a hierarchy of directories with textual files and tarballs. No smart server -- any sftp, ftp, webdav (or just http for read-only access) server is exactly up to the task. O(0) branching -- a branch is simply a tag, a continuation from some point of development. A network-capable-symlink if you would like. It is actually made possible due to the global Arch namespace. Revision ancestry graph, of course. Enables smart merging. Now, to the features: Archives/revisions are trivially crypto-signed -- thanks to the "stone-simple" on-disk format. Trivial push/pull mirroring -- a mirror is exactly a read-only archive, and can be turned into a full-blown archive by removal of a single file. Revision libraries as client-side operation speedup mechanism with partially automated updates. Cached revisions as server-side speedup. Possibility for hardlinked checkouts for local archives. This requires that your text editor is smart and deletes the original file when it writes changes. Various pre/post/whatever-commit hooks. That much for starters... :-) --- cheers, Samium Gromoff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga.. @ 2005-04-09 11:29 Samium Gromoff 0 siblings, 0 replies; 201+ messages in thread From: Samium Gromoff @ 2005-04-09 11:29 UTC (permalink / raw) To: linux-kernel It seems that Tom Lord, the primary architect behind GNU Arch has recently published an open letter to Linus Torvalds. Because no open letter to Linus would be really open without an accompanying reference post on lkml, here it is: http://lists.seyza.com/pipermail/gnu-arch-dev/2005-April/001001.html --- cheers, Samium Gromoff ^ permalink raw reply [flat|nested] 201+ messages in thread
* Re: Kernel SCM saga..
@ 2005-04-10 4:20 Albert Cahalan
0 siblings, 0 replies; 201+ messages in thread
From: Albert Cahalan @ 2005-04-10 4:20 UTC (permalink / raw)
To: torvalds, linux-kernel mailing list
Linus Torvalds writes:
> NOTE! I detest the centralized SCM model, but if push comes to shove,
> and we just _can't_ get a reasonable parallell merge thing going in
> the short timeframe (ie month or two), I'll use something like SVN
> on a trusted site with just a few committers, and at least try to
> distribute the merging out over a few people rather than making _me_
> be the throttle.
>
> The reason I don't really want to do that is once we start doing
> it that way, I suspect we'll have a _really_ hard time stopping.
> I think it's a broken model. So I'd much rather try to have some
> pain in the short run and get a better model running, but I just
> wanted to let people know that I'm pragmatic enough that I realize
> that we may not have much choice.
I think you at least instinctively know this, but...
Centralized SCM means you have to grant and revoke commit access,
which means that Linux gets the disease of ugly BSD politics.
Under both the old pre-BitKeeper patch system and under BitKeeper,
developer rank is fuzzy. Everyone knows that some developers are
more central than others, but it isn't fully public and well-defined.
You can change things day by day without having to demote anyone.
While Linux development isn't completely without jealousy and pride,
few have stormed off (mostly IDE developers AFAIK) and none have
forked things as severely as OpenBSD and DragonflyBSD.
You may rank developer X higher than developer Y, but they have
only a guess as to how things are. Perhaps developer X would be
a prideful jerk if he knew. Perhaps developer Y would quit in
resentment if he knew.
Whatever you do, please avoid the BSD-style politics.
(the MAINTAINERS file is bad enough; it has caused problems)
^ permalink raw reply [flat|nested] 201+ messages in thread
end of thread, other threads:[~2005-04-13 4:14 UTC | newest] Thread overview: 201+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-04-06 15:42 Kernel SCM saga Linus Torvalds 2005-04-06 16:00 ` Greg KH 2005-04-07 16:40 ` Rik van Riel 2005-04-08 0:53 ` Jesse Barnes 2005-04-06 16:09 ` Daniel Phillips 2005-04-06 19:07 ` Jon Smirl 2005-04-06 19:24 ` Matan Peled 2005-04-06 19:49 ` Jon Smirl 2005-04-06 20:34 ` Hua Zhong 2005-04-07 1:31 ` Christoph Lameter 2005-04-06 19:39 ` Paul P Komkoff Jr 2005-04-07 1:40 ` Martin Pool 2005-04-07 1:47 ` Jeff Garzik 2005-04-07 2:26 ` Martin Pool 2005-04-07 2:32 ` David Lang 2005-04-07 5:38 ` Martin Pool 2005-04-07 23:27 ` Linus Torvalds 2005-04-08 5:56 ` Martin Pool 2005-04-08 6:41 ` Linus Torvalds 2005-04-08 8:38 ` Andrea Arcangeli 2005-04-08 23:38 ` Daniel Phillips 2005-04-09 2:54 ` Andrea Arcangeli 2005-04-09 0:12 ` Linus Torvalds 2005-04-09 2:27 ` Andrea Arcangeli 2005-04-09 2:32 ` David Lang 2005-04-09 3:08 ` Brian Gerst 2005-04-09 3:15 ` Andrea Arcangeli 2005-04-09 5:45 ` Linus Torvalds 2005-04-09 22:55 ` David S. Miller 2005-04-09 23:13 ` Linus Torvalds 2005-04-10 0:14 ` Chris Wedgwood 2005-04-10 1:56 ` Paul Jackson 2005-04-10 12:03 ` Ingo Molnar 2005-04-10 17:38 ` Paul Jackson 2005-04-10 17:46 ` Ingo Molnar 2005-04-10 17:56 ` Paul Jackson 2005-04-10 0:22 ` Paul Jackson 2005-04-10 11:33 ` Ingo Molnar 2005-04-10 17:55 ` Matthias Andree 2005-04-09 16:33 ` Roman Zippel 2005-04-09 23:31 ` Tupshin Harper 2005-04-10 17:24 ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr 2005-04-10 18:19 ` Roman Zippel 2005-04-08 16:46 ` Kernel SCM saga Catalin Marinas 2005-04-07 8:14 ` Magnus Damm 2005-04-07 7:53 ` Zwane Mwaikambo 2005-04-07 3:35 ` Daniel Phillips 2005-04-07 15:08 ` Daniel Phillips 2005-04-07 6:36 ` bert hubert 2005-04-06 23:22 ` Jon Masters 2005-04-07 6:51 ` Paul Mackerras 2005-04-07 7:48 ` Arjan van de Ven 2005-04-07 15:10 ` Linus Torvalds 2005-04-07 17:00 ` Daniel Phillips 2005-04-07 17:38 ` Linus Torvalds 2005-04-07 17:47 ` Chris Wedgwood 2005-04-07 18:06 ` Magnus Damm 2005-04-07 18:36 ` Daniel Phillips 2005-04-08 3:35 ` Jeff Garzik 2005-04-07 19:56 ` Sam Ravnborg 2005-04-07 23:21 ` Dave Airlie 2005-04-07 7:18 ` David Woodhouse 2005-04-07 8:50 ` Andrew Morton 2005-04-07 9:20 ` Paul Mackerras 2005-04-07 9:46 ` Andrew Morton 2005-04-07 11:17 ` Paul Mackerras 2005-04-07 10:41 ` Geert Uytterhoeven 2005-04-07 9:25 ` David Woodhouse 2005-04-07 9:49 ` Andrew Morton 2005-04-07 9:55 ` Russell King 2005-04-07 10:11 ` David Woodhouse 2005-04-07 9:40 ` David Vrabel 2005-04-07 9:24 ` Sergei Organov 2005-04-07 10:30 ` Matthias Andree 2005-04-07 10:54 ` Andrew Walrond 2005-04-09 16:17 ` David Roundy 2005-04-10 9:24 ` Giuseppe Bilotta 2005-04-10 13:51 ` David Roundy 2005-04-07 15:32 ` Linus Torvalds 2005-04-07 17:09 ` Daniel Phillips 2005-04-07 17:10 ` Al Viro 2005-04-07 17:47 ` Linus Torvalds 2005-04-07 18:04 ` Jörn Engel 2005-04-07 18:27 ` Daniel Phillips 2005-04-07 20:54 ` Arjan van de Ven 2005-04-08 3:41 ` Jeff Garzik 2005-04-07 17:52 ` Bartlomiej Zolnierkiewicz 2005-04-07 17:54 ` Daniel Phillips 2005-04-07 18:13 ` Dmitry Yusupov 2005-04-07 18:29 ` Daniel Phillips 2005-04-10 22:33 ` Troy Benjegerdes 2005-04-11 0:00 ` Christian Parpart 2005-04-08 17:24 ` Jon Masters 2005-04-08 22:05 ` Daniel Phillips 2005-04-08 22:52 ` Roman Zippel 2005-04-08 23:46 ` Tupshin Harper 2005-04-09 1:00 ` Roman Zippel 2005-04-09 1:23 ` Tupshin Harper 2005-04-09 16:52 ` Eric D. Mudama 2005-04-09 17:40 ` Roman Zippel 2005-04-09 18:56 ` Ray Lee 2005-04-07 7:44 ` Jan Hudec 2005-04-08 6:14 ` Matthias Urlichs 2005-04-09 1:01 ` Marcin Dalecki 2005-04-09 8:32 ` Jan Hudec 2005-04-11 2:26 ` Miles Bader 2005-04-11 2:56 ` Marcin Dalecki 2005-04-11 6:36 ` Jan Hudec 2005-04-07 10:56 ` Andrew Walrond 2005-04-08 0:57 ` Ian Wienand 2005-04-08 4:13 ` Chris Wedgwood 2005-04-08 4:42 ` Linus Torvalds 2005-04-08 5:04 ` Chris Wedgwood 2005-04-08 5:14 ` H. Peter Anvin 2005-04-08 7:05 ` Rogan Dawes 2005-04-08 7:21 ` Daniel Phillips 2005-04-08 7:49 ` H. Peter Anvin 2005-04-08 7:14 ` Andrea Arcangeli 2005-04-08 12:02 ` Matthias Andree 2005-04-08 12:21 ` Florian Weimer 2005-04-08 14:26 ` Linus Torvalds 2005-04-08 16:15 ` Matthias-Christian Ott 2005-04-08 17:14 ` Linus Torvalds 2005-04-08 17:15 ` Chris Wedgwood 2005-04-08 17:46 ` Linus Torvalds 2005-04-08 18:05 ` Chris Wedgwood 2005-04-08 19:03 ` Linus Torvalds 2005-04-08 19:16 ` Chris Wedgwood 2005-04-08 19:38 ` Florian Weimer 2005-04-08 19:48 ` Chris Wedgwood 2005-04-08 19:39 ` Linus Torvalds 2005-04-08 20:11 ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad 2005-04-08 20:14 ` Chris Wedgwood 2005-04-08 20:50 ` Kernel SCM saga Luck, Tony 2005-04-08 21:27 ` Linus Torvalds 2005-04-09 17:14 ` Roman Zippel 2005-04-09 7:20 ` Willy Tarreau 2005-04-09 15:15 ` Paul Jackson 2005-04-08 17:25 ` Matthias-Christian Ott 2005-04-08 18:14 ` Linus Torvalds 2005-04-08 18:28 ` Jon Smirl 2005-04-08 18:58 ` Florian Weimer 2005-04-09 1:11 ` Marcin Dalecki 2005-04-09 1:50 ` David Lang 2005-04-09 22:12 ` Florian Weimer 2005-04-08 19:16 ` Matthias-Christian Ott 2005-04-08 19:32 ` Linus Torvalds 2005-04-08 19:44 ` Matthias-Christian Ott 2005-04-09 1:09 ` Marcin Dalecki 2005-04-08 17:35 ` Jeff Garzik 2005-04-08 18:47 ` Linus Torvalds 2005-04-08 18:56 ` Chris Wedgwood 2005-04-09 7:37 ` Willy Tarreau 2005-04-09 7:47 ` Neil Brown 2005-04-09 8:00 ` Willy Tarreau 2005-04-09 9:34 ` Neil Brown 2005-04-09 15:40 ` Paul Jackson 2005-04-09 16:16 ` Linus Torvalds 2005-04-09 17:15 ` Paul Jackson 2005-04-09 17:35 ` Paul Jackson 2005-04-09 1:04 ` Marcin Dalecki 2005-04-09 15:42 ` Paul Jackson 2005-04-09 18:45 ` Marcin Dalecki 2005-04-09 1:00 ` Marcin Dalecki 2005-04-09 1:09 ` Chris Wedgwood 2005-04-09 1:21 ` Marcin Dalecki 2005-04-08 7:17 ` ross 2005-04-08 15:50 ` Linus Torvalds 2005-04-09 2:53 ` Petr Baudis 2005-04-09 7:08 ` Randy.Dunlap 2005-04-09 18:06 ` [PATCH] " Petr Baudis 2005-04-10 1:01 ` Phillip Lougher 2005-04-10 1:42 ` Petr Baudis 2005-04-10 1:57 ` Phillip Lougher 2005-04-09 15:50 ` Paul Jackson 2005-04-09 16:26 ` Linus Torvalds 2005-04-09 17:08 ` Paul Jackson 2005-04-10 3:41 ` Paul Jackson 2005-04-10 8:39 ` David Lang 2005-04-10 9:40 ` Junio C Hamano 2005-04-10 16:46 ` Bill Davidsen 2005-04-10 17:50 ` Paul Jackson 2005-04-12 23:20 ` Pavel Machek 2005-04-08 7:34 ` Marcel Lanz 2005-04-08 9:23 ` Geert Uytterhoeven 2005-04-08 8:38 ` Matt Johnston 2005-04-12 7:14 ` Kernel SCM saga.. (bk license?) Kedar Sovani 2005-04-12 9:34 ` Catalin Marinas 2005-04-13 4:04 ` Ricky Beam 2005-04-08 11:42 ` Kernel SCM saga Catalin Marinas [not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org> 2005-04-06 21:13 ` kfogel 2005-04-06 22:39 ` Jeff Garzik 2005-04-09 1:00 ` Marcin Dalecki 2005-04-08 22:27 Rajesh Venkatasubramanian 2005-04-08 23:29 ` Linus Torvalds 2005-04-09 0:29 ` Linus Torvalds 2005-04-09 16:20 ` Paul Jackson 2005-04-09 4:06 Walter Landry 2005-04-09 11:02 Samium Gromoff 2005-04-09 11:29 Samium Gromoff 2005-04-10 4:20 Albert Cahalan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).