linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel SCM saga..
@ 2005-04-06 15:42 Linus Torvalds
  2005-04-06 16:00 ` Greg KH
                   ` (10 more replies)
  0 siblings, 11 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-06 15:42 UTC (permalink / raw)
  To: Kernel Mailing List


Ok,
 as a number of people are already aware (and in some cases have been
aware over the last several weeks), we've been trying to work out a
conflict over BK usage over the last month or two (and it feels like
longer ;). That hasn't been working out, and as a result, the kernel team
is looking at alternatives.

[ And apparently this just hit slashdot too, so by now _everybody_ knows ]

It's not like my choice of BK has been entirely conflict-free ("No,
really? Do tell! Oh, you mean the gigabytes upon gigabytes of flames we
had?"), so in some sense this was inevitable, but I sure had hoped that it
would have happened only once there was a reasonable open-source
alternative. As it is, we'll have to scramble for a while.

Btw, don't blame BitMover, even if that's probably going to be a very
common reaction. Larry in particular really did try to make things work
out, but it got to the point where I decided that I don't want to be in
the position of trying to hold two pieces together that would need as much
glue as it seemed to require.

We've been using BK for three years, and in fact, the biggest problem
right now is that a number of people have gotten very very picky about
their tools after having used the best. Me included, but in fact the
people that got helped most by BitKeeper usage were often the people
_around_ me who had a much easier time merging with my tree and sending
their trees to me.

Of course, there's also probably a ton of people who just used BK as a
nicer (and much faster) "anonymous CVS" client. We'll get that sorted out,
but the immediate problem is that I'm spending most my time trying to see
what the best way to co-operate is.

NOTE! BitKeeper isn't going away per se. Right now, the only real thing
that has happened is that I've decided to not use BK mainly because I need
to figure out the alternatives, and rather than continuing "things as
normal", I decided to bite the bullet and just see what life without BK
looks like. So far it's a gray and bleak world ;)

So don't take this to mean anything more than it is. I'm going to be
effectively off-line for a week (think of it as a normal "Linus went on a
vacation" event) and I'm just asking that people who continue to maintain
BK trees at least try to also make sure that they can send me the result
as (individual) patches, since I'll eventually have to merge some other
way.

That "individual patches" is one of the keywords, btw. One thing that BK 
has been extremely good at, and that a lot of people have come to like 
even when they didn't use BK, is how we've been maintaining a much finer- 
granularity view of changes. That isn't going to go away. 

In fact, one impact BK ha shad is to very fundamentally make us (and me in
particular) change how we do things. That ranges from the fine-grained
changeset tracking to just how I ended up trusting submaintainers with
much bigger things, and not having to work on a patch-by-patch basis any
more. So the three years with BK are definitely not wasted: I'm convinced 
it caused us to do things in better ways, and one of the things I'm 
looking at is to make sure that those things continue to work.

So I just wanted to say that I'm personally very happy with BK, and with 
Larry. It didn't work out, but it sure as hell made a big difference to 
kernel development. And we'll work out the temporary problem of having to 
figure out a set of tools to allow us to continue to do the things that BK 
allowed us to do.

Let the flames begin.

		Linus

PS. Don't bother telling me about subversion. If you must, start reading
up on "monotone". That seems to be the most viable alternative, but don't
pester the developers so much that they don't get any work done. They are
already aware of my problems ;)

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
@ 2005-04-06 16:00 ` Greg KH
  2005-04-07 16:40   ` Rik van Riel
  2005-04-06 16:09 ` Daniel Phillips
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 201+ messages in thread
From: Greg KH @ 2005-04-06 16:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, Apr 06, 2005 at 08:42:08AM -0700, Linus Torvalds wrote:
> 
> So I just wanted to say that I'm personally very happy with BK, and with 
> Larry. It didn't work out, but it sure as hell made a big difference to 
> kernel development. And we'll work out the temporary problem of having to 
> figure out a set of tools to allow us to continue to do the things that BK 
> allowed us to do.

I'd also like to publicly say that BK has helped out immensely in the
past few years with kernel development, and has been one of the main
reasons we have been able to keep up such a high patch rate over such a
long period of time.  Larry, and his team, have been nothing but great
in dealing with all of the crap that we have been flinging at him due to
the very odd demands such a large project as the kernel has caused.  And
I definitely owe him a beer the next time I see him.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
  2005-04-06 16:00 ` Greg KH
@ 2005-04-06 16:09 ` Daniel Phillips
  2005-04-06 19:07 ` Jon Smirl
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-06 16:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wednesday 06 April 2005 11:42, Linus Torvalds wrote:
> it got to the point where I decided that I don't want to be in
> the position of trying to hold two pieces together that would need as much
> glue as it seemed to require.

Hi Linus,

Well I'm really pleased to hear that you won't be drinking this koolaid any 
more.  This is a really uplifting development for me, thanks.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
  2005-04-06 16:00 ` Greg KH
  2005-04-06 16:09 ` Daniel Phillips
@ 2005-04-06 19:07 ` Jon Smirl
  2005-04-06 19:24   ` Matan Peled
  2005-04-06 19:39 ` Paul P Komkoff Jr
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 201+ messages in thread
From: Jon Smirl @ 2005-04-06 19:07 UTC (permalink / raw)
  To: Linus Torvalds, Larry McVoy; +Cc: Kernel Mailing List

On Apr 6, 2005 11:42 AM, Linus Torvalds <torvalds@osdl.org> wrote:
> So I just wanted to say that I'm personally very happy with BK, and with
> Larry. It didn't work out, but it sure as hell made a big difference to
> kernel development. And we'll work out the temporary problem of having to
> figure out a set of tools to allow us to continue to do the things that BK
> allowed us to do.

Larry has stated several time that most of his revenue comes from
Windows. Has ODSL approached Bitmover about simply buying out the
source rights for the Linux version? From my experience in the
industry a fair price would probably be around $2M, but that should be
within ODSL's capabilities. ODSL could then GPL the code and quiet the
critics.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 19:07 ` Jon Smirl
@ 2005-04-06 19:24   ` Matan Peled
  2005-04-06 19:49     ` Jon Smirl
  0 siblings, 1 reply; 201+ messages in thread
From: Matan Peled @ 2005-04-06 19:24 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Linus Torvalds, Larry McVoy, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

Jon Smirl wrote:
> ODSL could then GPL the code and quiet the
> critics.

And also cause aaid GPL'ed code to be immediatly ported over to Windows. I don't
think BitMover could ever agree to that.

-- 
[Name      ]   ::  [Matan I. Peled    ]
[Location  ]   ::  [Israel            ]
[Public Key]   ::  [0xD6F42CA5        ]
[Keyserver ]   ::  [keyserver.kjsl.com]
encrypted/signed  plain text  preferred


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (2 preceding siblings ...)
  2005-04-06 19:07 ` Jon Smirl
@ 2005-04-06 19:39 ` Paul P Komkoff Jr
  2005-04-07  1:40   ` Martin Pool
  2005-04-07  6:36   ` bert hubert
  2005-04-06 23:22 ` Jon Masters
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 201+ messages in thread
From: Paul P Komkoff Jr @ 2005-04-06 19:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Replying to Linus Torvalds:
> Ok,
>  as a number of people are already aware (and in some cases have been

Actually, I'm very disappointed things gone such counter-productive
way. All along the history, I was against Larry's opponents, but at
the end, they are right. That's pity. To quote vin diesel' character
Riddick, "there's no such word as friend", or something.

Anyway, seems that folks in Canonical was aware about it, and here's
the result of this awareness: http://bazaar-ng.org/
This need some testing though, along with really hard part - transfer
all history, nonlinear ... I don't know how anyone can do this till 1
Jul 2005, sorry :(

> PS. Don't bother telling me about subversion. If you must, start reading
> up on "monotone". That seems to be the most viable alternative, but don't
> pester the developers so much that they don't get any work done. They are
> already aware of my problems ;)

Monotone is good, but I don't really know limits of sqlite3 wrt kernel
case. And again, what we need to do to retain history ...


-- 
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key
 This message represents the official view of the voices in my head

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 19:24   ` Matan Peled
@ 2005-04-06 19:49     ` Jon Smirl
  2005-04-06 20:34       ` Hua Zhong
  2005-04-07  1:31       ` Christoph Lameter
  0 siblings, 2 replies; 201+ messages in thread
From: Jon Smirl @ 2005-04-06 19:49 UTC (permalink / raw)
  To: chaosite; +Cc: Linus Torvalds, Larry McVoy, Kernel Mailing List

On Apr 6, 2005 3:24 PM, Matan Peled <chaosite@gmail.com> wrote:
> Jon Smirl wrote:
> > ODSL could then GPL the code and quiet the
> > critics.
> 
> And also cause aaid GPL'ed code to be immediatly ported over to Windows. I don't
> think BitMover could ever agree to that.

Windows Bitkeeper licenses are not that expensive, wouldn't you rather
keep your source in a licensed supported version? Who is going to do
this backport, then support it and track new releases? Why do people
pay for RHEL when they can get it for free? They want support and a
guarantee that their data won't be lost. Even with a GPL'd Linux
Bitkeeper I'll bet half of the existing Linux paying customers will
continue to use a paid version.

There is a large difference in the behavior of corporations with huge
source bases and college students with no money. The corporations will
pay to have someone responsible for ensuring that the product works.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 201+ messages in thread

* RE: Kernel SCM saga..
  2005-04-06 19:49     ` Jon Smirl
@ 2005-04-06 20:34       ` Hua Zhong
  2005-04-07  1:31       ` Christoph Lameter
  1 sibling, 0 replies; 201+ messages in thread
From: Hua Zhong @ 2005-04-06 20:34 UTC (permalink / raw)
  To: 'Jon Smirl', chaosite
  Cc: 'Linus Torvalds', 'Larry McVoy',
	'Kernel Mailing List'

> Even with a GPL'd Linux Bitkeeper I'll bet half of the existing Linux 
> paying customers will continue to use a paid version.

By what? How much do you plan to put down to pay Larry in case you lose your
bet?

Hua

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (3 preceding siblings ...)
  2005-04-06 19:39 ` Paul P Komkoff Jr
@ 2005-04-06 23:22 ` Jon Masters
  2005-04-07  6:51 ` Paul Mackerras
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 201+ messages in thread
From: Jon Masters @ 2005-04-06 23:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Apr 6, 2005 4:42 PM, Linus Torvalds <torvalds@osdl.org> wrote:

> as a number of people are already aware (and in some
> cases have been aware over the last several weeks), we've
> been trying to work out a conflict over BK usage over the last
> month or two (and it feels like longer ;). That hasn't been
> working out, and as a result, the kernel team is looking at
> alternatives.

What about the 64K changeset limitation in current releases?

Did I miss something (like the fixes promised) or is there going to be
another interim release before the end of support?

Jon.

P.S. Apologies if this already got addressed.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 19:49     ` Jon Smirl
  2005-04-06 20:34       ` Hua Zhong
@ 2005-04-07  1:31       ` Christoph Lameter
  1 sibling, 0 replies; 201+ messages in thread
From: Christoph Lameter @ 2005-04-07  1:31 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Kernel Mailing List

On Wed, 6 Apr 2005, Jon Smirl wrote:

> There is a large difference in the behavior of corporations with huge
> source bases and college students with no money. The corporations will
> pay to have someone responsible for ensuring that the product works.

Or they will merge with some other entity on the whim of some executive
and the corporation then decides to kill the product for good without
releasing the source leaving you out in the cold.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 19:39 ` Paul P Komkoff Jr
@ 2005-04-07  1:40   ` Martin Pool
  2005-04-07  1:47     ` Jeff Garzik
  2005-04-07  3:35     ` Daniel Phillips
  2005-04-07  6:36   ` bert hubert
  1 sibling, 2 replies; 201+ messages in thread
From: Martin Pool @ 2005-04-07  1:40 UTC (permalink / raw)
  To: linux-kernel

On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote:

> http://bazaar-ng.org/

I'd like bazaar-ng to be considered too.  It is not ready for adoption
yet, but I am working (more than) full time on it and hope to have it
be usable in a couple of months.  

bazaar-ng is trying to integrate a lot of the work done in other systems
to make something that is simple to use but also fast and powerful enough
to handle large projects.

The operations that are already done are pretty fast: ~60s to import a
kernel tree, ~10s to import a new revision from a patch.  

Please check it out and do pester me with any suggestions about whatever
you think it needs to suit your work.

-- 
Martin



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  1:40   ` Martin Pool
@ 2005-04-07  1:47     ` Jeff Garzik
  2005-04-07  2:26       ` Martin Pool
  2005-04-07  7:53       ` Zwane Mwaikambo
  2005-04-07  3:35     ` Daniel Phillips
  1 sibling, 2 replies; 201+ messages in thread
From: Jeff Garzik @ 2005-04-07  1:47 UTC (permalink / raw)
  To: Martin Pool; +Cc: linux-kernel

On Thu, Apr 07, 2005 at 11:40:23AM +1000, Martin Pool wrote:
> On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote:
> 
> > http://bazaar-ng.org/
> 
> I'd like bazaar-ng to be considered too.  It is not ready for adoption
> yet, but I am working (more than) full time on it and hope to have it
> be usable in a couple of months.  
> 
> bazaar-ng is trying to integrate a lot of the work done in other systems
> to make something that is simple to use but also fast and powerful enough
> to handle large projects.
> 
> The operations that are already done are pretty fast: ~60s to import a
> kernel tree, ~10s to import a new revision from a patch.  

By "importing", are you saying that importing all 60,000+ changesets of
the current kernel tree took only 60 seconds?

	Jeff




^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  1:47     ` Jeff Garzik
@ 2005-04-07  2:26       ` Martin Pool
  2005-04-07  2:32         ` David Lang
  2005-04-07  7:53       ` Zwane Mwaikambo
  1 sibling, 1 reply; 201+ messages in thread
From: Martin Pool @ 2005-04-07  2:26 UTC (permalink / raw)
  To: linux-kernel

On Wed, 06 Apr 2005 21:47:27 -0400, Jeff Garzik wrote:

>> The operations that are already done are pretty fast: ~60s to import a
>> kernel tree, ~10s to import a new revision from a patch.
> 
> By "importing", are you saying that importing all 60,000+ changesets of
> the current kernel tree took only 60 seconds?

Now that would be impressive.

No, I mean this:

 % bzcat ../linux.pkg/patch-2.5.14.bz2| patch -p1 

 % time bzr add -v .   
 (find any new non-ignored files; deleted files automatically noticed) 
 6.06s user 0.41s system 89% cpu 7.248 total

 % time bzr commit -v -m 'import 2.5.14'
 7.71s user 0.71s system 65% cpu 12.893 total

(OK, a bit slower in this case but it wasn't all in core.)

This is only v0.0.3, but I think the interface simplicity and speed
compares well.

I haven't tested importing all 60,000+ changesets of the current bk tree,
partly because I don't *have* all those changesets.  (Larry said
previously that someone (not me) tried to pull all of them using bkclient,
and he considered this abuse and blacklisted them.)

I have been testing pulling in release and rc patches, and it scales to
that level.  It probably could not handle 60,000 changesets yet, but there
is a plan to get there.  In the interim, although it cannot handle the
whole history forever it can handle large trees with moderate numbers of
commits -- perhaps as many as you might deal with in developing a feature
over a course of a few months.

The most sensible place to try out bzr, if people want to, is as a way to
keep your own revisions before mailing a patch to linus or the subsystem
maintainer.

-- 
Martin



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  2:26       ` Martin Pool
@ 2005-04-07  2:32         ` David Lang
  2005-04-07  5:38           ` Martin Pool
  2005-04-07  8:14           ` Magnus Damm
  0 siblings, 2 replies; 201+ messages in thread
From: David Lang @ 2005-04-07  2:32 UTC (permalink / raw)
  To: Martin Pool; +Cc: linux-kernel

On Thu, 7 Apr 2005, Martin Pool wrote:

> I haven't tested importing all 60,000+ changesets of the current bk tree,
> partly because I don't *have* all those changesets.  (Larry said
> previously that someone (not me) tried to pull all of them using bkclient,
> and he considered this abuse and blacklisted them.)

pull the patches from the BK2CVS server. yes some patches are combined, 
but it will get you in the ballpark.

David Lang

> I have been testing pulling in release and rc patches, and it scales to
> that level.  It probably could not handle 60,000 changesets yet, but there
> is a plan to get there.  In the interim, although it cannot handle the
> whole history forever it can handle large trees with moderate numbers of
> commits -- perhaps as many as you might deal with in developing a feature
> over a course of a few months.
>
> The most sensible place to try out bzr, if people want to, is as a way to
> keep your own revisions before mailing a patch to linus or the subsystem
> maintainer.
>
> -- 
> Martin
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  1:40   ` Martin Pool
  2005-04-07  1:47     ` Jeff Garzik
@ 2005-04-07  3:35     ` Daniel Phillips
  2005-04-07 15:08       ` Daniel Phillips
  1 sibling, 1 reply; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07  3:35 UTC (permalink / raw)
  To: Martin Pool; +Cc: linux-kernel

On Wednesday 06 April 2005 21:40, Martin Pool wrote:
> On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote:
> > http://bazaar-ng.org/
>
> I'd like bazaar-ng to be considered too.  It is not ready for adoption
> yet, but I am working (more than) full time on it and hope to have it
> be usable in a couple of months.
>
> bazaar-ng is trying to integrate a lot of the work done in other systems
> to make something that is simple to use but also fast and powerful enough
> to handle large projects.
>
> The operations that are already done are pretty fast: ~60s to import a
> kernel tree, ~10s to import a new revision from a patch.

Hi Martin,

When I tried it, it took 13 seconds to 'bzr add' the 2.6.11.3 tree on a 
relatively slow machine.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  2:32         ` David Lang
@ 2005-04-07  5:38           ` Martin Pool
  2005-04-07 23:27             ` Linus Torvalds
  2005-04-07  8:14           ` Magnus Damm
  1 sibling, 1 reply; 201+ messages in thread
From: Martin Pool @ 2005-04-07  5:38 UTC (permalink / raw)
  To: linux-kernel, David Lang

[-- Attachment #1: Type: text/plain, Size: 1459 bytes --]

On Wed, 2005-04-06 at 19:32 -0700, David Lang wrote:
> On Thu, 7 Apr 2005, Martin Pool wrote:
> 
> > I haven't tested importing all 60,000+ changesets of the current bk tree,
> > partly because I don't *have* all those changesets.  (Larry said
> > previously that someone (not me) tried to pull all of them using bkclient,
> > and he considered this abuse and blacklisted them.)
> 
> pull the patches from the BK2CVS server. yes some patches are combined, 
> but it will get you in the ballpark.

OK, I just tried that.  I know there are scripts to resynthesize
changesets from the CVS info but I skipped that for now and just pulled
each day's work into a separate bzr revision.  It's up to the end of
March and still running.

Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79
total.  Each subsequent day takes about 10s user, 30s elapsed to commit
into bzr.  The speeds are comparable to CVS or a bit faster, and may be
faster than other distributed systems. (This on a laptop with a 5400rpm
disk.)  Pulling out a complete copy of the tree as it was on a previous
date takes about 14 user, 60s elapsed.

I don't want to get too distracted by benchmarks now because there are
more urgent things to do and anyhow there is still lots of scope for
optimization.  I wouldn't be at all surprised if those times could be
more than halved.  I just wanted to show it is in (I hope) the right
ballpark.

-- 
Martin


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 19:39 ` Paul P Komkoff Jr
  2005-04-07  1:40   ` Martin Pool
@ 2005-04-07  6:36   ` bert hubert
  1 sibling, 0 replies; 201+ messages in thread
From: bert hubert @ 2005-04-07  6:36 UTC (permalink / raw)
  To: Kernel Mailing List

On Wed, Apr 06, 2005 at 11:39:11PM +0400, Paul P Komkoff Jr wrote:

> Monotone is good, but I don't really know limits of sqlite3 wrt kernel
> case. And again, what we need to do to retain history ...

I would't fret over that :-) the big issue I have with sqlite3 is that it
interacts horribly with ext3, resulting in dysmal journalled write
performance compared to ext2. I do not know if this is a sqlite3 or an ext3
problem though.

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://netherlabs.nl              Open and Closed source services

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (4 preceding siblings ...)
  2005-04-06 23:22 ` Jon Masters
@ 2005-04-07  6:51 ` Paul Mackerras
  2005-04-07  7:48   ` Arjan van de Ven
  2005-04-07 15:10   ` Linus Torvalds
  2005-04-07  7:18 ` David Woodhouse
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 201+ messages in thread
From: Paul Mackerras @ 2005-04-07  6:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Linus,

> That "individual patches" is one of the keywords, btw. One thing that BK 
> has been extremely good at, and that a lot of people have come to like 
> even when they didn't use BK, is how we've been maintaining a much finer- 
> granularity view of changes. That isn't going to go away. 

Are you happy with processing patches + descriptions, one per mail?
Do you have it automated to the point where processing emailed patches
involves little more overhead than doing a bk pull?  If so, then your
mailbox (or patch queue) becomes a natural serialization point for the
changes, and the need for a tool that can handle a complex graph of
changes is much reduced.

> In fact, one impact BK ha shad is to very fundamentally make us (and me in
> particular) change how we do things.

>From my point of view, the benefits that flowed from your using BK
were:

* Visibility into what you had accepted and committed to your
  repository
* Lower latency of patches going into your repository
* Much reduced rate of patches being dropped

Those things are what have enabled us PPC developers to move away from
having our own trees (with all the synchronization problems that
entailed) and work directly with your tree.  I don't see that it is
the distinctive features of BK (such as the ability to do merges
between peer repositories) that are directly responsible for producing
those benefits, so I have hope that things can work just as well with
some other system.

Paul.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (5 preceding siblings ...)
  2005-04-07  6:51 ` Paul Mackerras
@ 2005-04-07  7:18 ` David Woodhouse
  2005-04-07  8:50   ` Andrew Morton
                     ` (2 more replies)
  2005-04-07  7:44 ` Jan Hudec
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 201+ messages in thread
From: David Woodhouse @ 2005-04-07  7:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote:
> PS. Don't bother telling me about subversion. If you must, start reading
> up on "monotone". That seems to be the most viable alternative, but don't
> pester the developers so much that they don't get any work done. They are
> already aware of my problems ;)

One feature I'd want to see in a replacement version control system is
the ability to _re-order_ patches, and to cherry-pick patches from my
tree to be sent onwards. The lack of that capability is the main reason
I always hated BitKeeper.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (6 preceding siblings ...)
  2005-04-07  7:18 ` David Woodhouse
@ 2005-04-07  7:44 ` Jan Hudec
  2005-04-08  6:14   ` Matthias Urlichs
  2005-04-09  1:01   ` Marcin Dalecki
  2005-04-07 10:56 ` Andrew Walrond
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 201+ messages in thread
From: Jan Hudec @ 2005-04-07  7:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1435 bytes --]

On Wed, Apr 06, 2005 at 08:42:08 -0700, Linus Torvalds wrote:
> PS. Don't bother telling me about subversion. If you must, start reading
> up on "monotone". That seems to be the most viable alternative, but don't
> pester the developers so much that they don't get any work done. They are
> already aware of my problems ;)

I have looked at most systems currently available. I would suggest
following for closer look on:

1) GNU Arch/Bazaar. They use the same archive format, simple, have the
   concepts right. It may need some scripts or add ons. When Bazaar-NG
   is ready, it will be able to read the GNU Arch/Bazaar archives so
   switching should be easy.
2) SVK. True, it is built on subversion, but adds all the distributed
   features necessary. It keeps mirror of the repository localy (but can
   mirror only some branches), but BitKeeper did that too. It just hit
   1.0beta1, but the development is progressing rapidly. There was
   a post about ability to track changeset dependencies lately on their
   mailing-list.

I have looked at Monotone too, of course, but I did not find any way for
doing cherry-picking (ie. skipping some changes and pulling others) in
it and I feel it will need more rework of the meta-data before it is
possible. As for the sqlite backend, I'd not consider that a problem.

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  6:51 ` Paul Mackerras
@ 2005-04-07  7:48   ` Arjan van de Ven
  2005-04-07 15:10   ` Linus Torvalds
  1 sibling, 0 replies; 201+ messages in thread
From: Arjan van de Ven @ 2005-04-07  7:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, Kernel Mailing List

On Thu, 2005-04-07 at 16:51 +1000, Paul Mackerras wrote:
> Linus,
> 
> > That "individual patches" is one of the keywords, btw. One thing that BK 
> > has been extremely good at, and that a lot of people have come to like 
> > even when they didn't use BK, is how we've been maintaining a much finer- 
> > granularity view of changes. That isn't going to go away. 
> 
> Are you happy with processing patches + descriptions, one per mail?
> Do you have it automated to the point where processing emailed patches
> involves little more overhead than doing a bk pull?  If so, then your
> mailbox (or patch queue) becomes a natural serialization point for the
> changes, and the need for a tool that can handle a complex graph of
> changes is much reduced.

alternatively you could send an mbox with your series in... that has a
natural sequence in it ;)


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  1:47     ` Jeff Garzik
  2005-04-07  2:26       ` Martin Pool
@ 2005-04-07  7:53       ` Zwane Mwaikambo
  1 sibling, 0 replies; 201+ messages in thread
From: Zwane Mwaikambo @ 2005-04-07  7:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Martin Pool, linux-kernel

On Wed, 6 Apr 2005, Jeff Garzik wrote:

> On Thu, Apr 07, 2005 at 11:40:23AM +1000, Martin Pool wrote:
> > On Wed, 06 Apr 2005 23:39:11 +0400, Paul P Komkoff Jr wrote:
> > 
> > > http://bazaar-ng.org/
> > 
> > I'd like bazaar-ng to be considered too.  It is not ready for adoption
> > yet, but I am working (more than) full time on it and hope to have it
> > be usable in a couple of months.  
> > 
> > bazaar-ng is trying to integrate a lot of the work done in other systems
> > to make something that is simple to use but also fast and powerful enough
> > to handle large projects.
> > 
> > The operations that are already done are pretty fast: ~60s to import a
> > kernel tree, ~10s to import a new revision from a patch.  
> 
> By "importing", are you saying that importing all 60,000+ changesets of
> the current kernel tree took only 60 seconds?

Probably `cvs import` equivalent.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  2:32         ` David Lang
  2005-04-07  5:38           ` Martin Pool
@ 2005-04-07  8:14           ` Magnus Damm
  1 sibling, 0 replies; 201+ messages in thread
From: Magnus Damm @ 2005-04-07  8:14 UTC (permalink / raw)
  To: David Lang; +Cc: Martin Pool, linux-kernel

On Apr 7, 2005 4:32 AM, David Lang <dlang@digitalinsight.com> wrote:
> On Thu, 7 Apr 2005, Martin Pool wrote:
> 
> > I haven't tested importing all 60,000+ changesets of the current bk tree,
> > partly because I don't *have* all those changesets.  (Larry said
> > previously that someone (not me) tried to pull all of them using bkclient,
> > and he considered this abuse and blacklisted them.)
> 
> pull the patches from the BK2CVS server. yes some patches are combined,
> but it will get you in the ballpark.

While at it, is there any ongoing effort to convert/export the kernel
BK repository to some well known format like broken out patches and a
series file? I think keeping the complete repository public in a well
known format is important regardless of SCM taste.

/ magnus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  7:18 ` David Woodhouse
@ 2005-04-07  8:50   ` Andrew Morton
  2005-04-07  9:20     ` Paul Mackerras
                       ` (2 more replies)
  2005-04-07  9:24   ` Sergei Organov
  2005-04-07 15:32   ` Linus Torvalds
  2 siblings, 3 replies; 201+ messages in thread
From: Andrew Morton @ 2005-04-07  8:50 UTC (permalink / raw)
  To: David Woodhouse; +Cc: torvalds, linux-kernel

David Woodhouse <dwmw2@infradead.org> wrote:
>
> One feature I'd want to see in a replacement version control system is
>  the ability to _re-order_ patches, and to cherry-pick patches from my
>  tree to be sent onwards.

You just described quilt & patch-scripts.

The problem with those is letting other people get access to it.  I guess
that could be fixed with a bit of scripting and rsyncing.

(I don't do that for -mm because -mm basically doesn't work for 99% of the
time.  Takes 4-5 hours to out a release out assuming that nothing's busted,
and usually something is).


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  8:50   ` Andrew Morton
@ 2005-04-07  9:20     ` Paul Mackerras
  2005-04-07  9:46       ` Andrew Morton
  2005-04-07 10:41       ` Geert Uytterhoeven
  2005-04-07  9:25     ` David Woodhouse
  2005-04-07  9:40     ` David Vrabel
  2 siblings, 2 replies; 201+ messages in thread
From: Paul Mackerras @ 2005-04-07  9:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Woodhouse, torvalds, linux-kernel

Andrew Morton writes:

> The problem with those is letting other people get access to it.  I guess
> that could be fixed with a bit of scripting and rsyncing.

Yes.

> (I don't do that for -mm because -mm basically doesn't work for 99% of the
> time.  Takes 4-5 hours to out a release out assuming that nothing's busted,
> and usually something is).

With -mm we get those nice little automatic emails saying you've put
the patch into -mm, which removes one of the main reasons for wanting
to be able to get an up-to-date image of your tree.  The other reason,
of course, is to be able to see if a patch I'm about to send conflicts
with something you have already taken, and rebase it if necessary.

Paul.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  7:18 ` David Woodhouse
  2005-04-07  8:50   ` Andrew Morton
@ 2005-04-07  9:24   ` Sergei Organov
  2005-04-07 10:30     ` Matthias Andree
  2005-04-07 15:32   ` Linus Torvalds
  2 siblings, 1 reply; 201+ messages in thread
From: Sergei Organov @ 2005-04-07  9:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Kernel Mailing List

David Woodhouse <dwmw2@infradead.org> writes:

> On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote:
> > PS. Don't bother telling me about subversion. If you must, start reading
> > up on "monotone". That seems to be the most viable alternative, but don't
> > pester the developers so much that they don't get any work done. They are
> > already aware of my problems ;)
> 
> One feature I'd want to see in a replacement version control system is
> the ability to _re-order_ patches, and to cherry-pick patches from my
> tree to be sent onwards. The lack of that capability is the main reason
> I always hated BitKeeper.

darcs? <http://www.abridgegame.org/darcs/>


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  8:50   ` Andrew Morton
  2005-04-07  9:20     ` Paul Mackerras
@ 2005-04-07  9:25     ` David Woodhouse
  2005-04-07  9:49       ` Andrew Morton
  2005-04-07  9:55       ` Russell King
  2005-04-07  9:40     ` David Vrabel
  2 siblings, 2 replies; 201+ messages in thread
From: David Woodhouse @ 2005-04-07  9:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel

On Thu, 2005-04-07 at 01:50 -0700, Andrew Morton wrote:
> (I don't do that for -mm because -mm basically doesn't work for 99% of
> the time.  Takes 4-5 hours to out a release out assuming that
> nothing's busted, and usually something is).

On the subject of -mm: are you going to keep doing the BK imports to
that for the time being, or would it be better to leave the BK trees
alone now and send you individual patches.

For that matter, will there be a brief amnesty after 2.6.12 where Linus
will use BK to pull those trees which were waiting for that, or will we
all need to export from BK manually?

-- 
dwmw2


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  8:50   ` Andrew Morton
  2005-04-07  9:20     ` Paul Mackerras
  2005-04-07  9:25     ` David Woodhouse
@ 2005-04-07  9:40     ` David Vrabel
  2 siblings, 0 replies; 201+ messages in thread
From: David Vrabel @ 2005-04-07  9:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Woodhouse, linux-kernel

Andrew Morton wrote:
> David Woodhouse <dwmw2@infradead.org> wrote:
> 
>> One feature I'd want to see in a replacement version control system is
>> the ability to _re-order_ patches, and to cherry-pick patches from my
>> tree to be sent onwards.
> 
> You just described quilt & patch-scripts.
> 
> The problem with those is letting other people get access to it.  I guess
> that could be fixed with a bit of scripting and rsyncing.

Where I work we've been using quilt for a while now and storing the
patch-set in CVS.  To limit the number of potential stuff-ups due to two
people working on the same patch at the same time (the chance that CVS's
merge will get it right is zero) we use CVS's locking feature to ensure
that only one person can edit/update a patch or the series file at any
one time.  It seems to work quite well (though admittedly there's only
two developers working on the patch-set and it currently contains a mere
61 patches).

We also have a few scripts to ensure we always due the correct locking.
 The main ones are:

qec -- to edit a file either as part of the top 'working' patch or as an
existing patch.  It does the quilt push which I always forget to do
otherwise.

qrefr -- like quilt refresh only it locks the patch first.

qimport -- like quilt import only it locks the series file first.

You can grab a tarball of these (and other, less interesting ones) from

http://www.davidvrabel.org.uk/quilt-n-cvs-scripts-1.tar.gz

Note that I'm providing this purely on an as-is basis in case any one is
interested.

And I've just realized I can't remember how exactly to set up the CVS
repository of the patch-set.  I think you need to do a cvs watch on when
it's checked-out.

David Vrabel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:20     ` Paul Mackerras
@ 2005-04-07  9:46       ` Andrew Morton
  2005-04-07 11:17         ` Paul Mackerras
  2005-04-07 10:41       ` Geert Uytterhoeven
  1 sibling, 1 reply; 201+ messages in thread
From: Andrew Morton @ 2005-04-07  9:46 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: dwmw2, torvalds, linux-kernel

Paul Mackerras <paulus@samba.org> wrote:
>
> With -mm we get those nice little automatic emails saying you've put
>  the patch into -mm, which removes one of the main reasons for wanting
>  to be able to get an up-to-date image of your tree.

Should have done that ages ago..

>  The other reason,
>  of course, is to be able to see if a patch I'm about to send conflicts
>  with something you have already taken, and rebase it if necessary.

<hack, hack>

How's this?


This is a note to let you know that I've just added the patch titled

     ppc32: Fix AGP and sleep again

to the -mm tree.  Its filename is

     ppc32-fix-agp-and-sleep-again.patch

Patches currently in -mm which might be from yourself are

add-suspend-method-to-cpufreq-core.patch
ppc32-fix-cpufreq-problems.patch
ppc32-fix-agp-and-sleep-again.patch
ppc32-fix-errata-for-some-g3-cpus.patch
ppc64-fix-semantics-of-__ioremap.patch
ppc64-improve-mapping-of-vdso.patch
ppc64-detect-altivec-via-firmware-on-unknown-cpus.patch
ppc64-remove-bogus-f50-hack-in-promc.patch

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:25     ` David Woodhouse
@ 2005-04-07  9:49       ` Andrew Morton
  2005-04-07  9:55       ` Russell King
  1 sibling, 0 replies; 201+ messages in thread
From: Andrew Morton @ 2005-04-07  9:49 UTC (permalink / raw)
  To: David Woodhouse; +Cc: torvalds, linux-kernel

David Woodhouse <dwmw2@infradead.org> wrote:
>
> On Thu, 2005-04-07 at 01:50 -0700, Andrew Morton wrote:
> > (I don't do that for -mm because -mm basically doesn't work for 99% of
> > the time.  Takes 4-5 hours to out a release out assuming that
> > nothing's busted, and usually something is).
> 
> On the subject of -mm: are you going to keep doing the BK imports to
> that for the time being, or would it be better to leave the BK trees
> alone now and send you individual patches.

I really don't know - I'll continue to pull the bk trees for a while, until
we work out what the new (probably interim) regime looks like.

> For that matter, will there be a brief amnesty after 2.6.12 where Linus
> will use BK to pull those trees which were waiting for that, or will we
> all need to export from BK manually?
> 

I think Linus has stopped using bk already.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:25     ` David Woodhouse
  2005-04-07  9:49       ` Andrew Morton
@ 2005-04-07  9:55       ` Russell King
  2005-04-07 10:11         ` David Woodhouse
  1 sibling, 1 reply; 201+ messages in thread
From: Russell King @ 2005-04-07  9:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Morton, torvalds, linux-kernel

On Thu, Apr 07, 2005 at 10:25:18AM +0100, David Woodhouse wrote:
> On Thu, 2005-04-07 at 01:50 -0700, Andrew Morton wrote:
> > (I don't do that for -mm because -mm basically doesn't work for 99% of
> > the time.  Takes 4-5 hours to out a release out assuming that
> > nothing's busted, and usually something is).
> 
> On the subject of -mm: are you going to keep doing the BK imports to
> that for the time being, or would it be better to leave the BK trees
> alone now and send you individual patches.
> 
> For that matter, will there be a brief amnesty after 2.6.12 where Linus
> will use BK to pull those trees which were waiting for that, or will we
> all need to export from BK manually?

Linus indicated (maybe privately) that the end of his BK usage would
be immediately after the -rc2 release.  I'm taking that to mean "no
more BK usage from Linus, period."

Thinking about it a bit, if you're asking Linus to pull your tree,
Linus would then have to extract the individual change sets as patches
to put into his new fangled patch management system.  Is that a
reasonable expectation?

However, it's ultimately up to Linus to decide. 8)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:55       ` Russell King
@ 2005-04-07 10:11         ` David Woodhouse
  0 siblings, 0 replies; 201+ messages in thread
From: David Woodhouse @ 2005-04-07 10:11 UTC (permalink / raw)
  To: Russell King; +Cc: Andrew Morton, torvalds, linux-kernel

On Thu, 2005-04-07 at 10:55 +0100, Russell King wrote:
> Thinking about it a bit, if you're asking Linus to pull your tree,
> Linus would then have to extract the individual change sets as patches
> to put into his new fangled patch management system.  Is that a
> reasonable expectation?

I don't know if it's a reasonable expectation; that's why I'm asking.

I could live with having to export everything to patches; it's not so
hard. It's just that if the export to whatever ends up replacing BK can
be done in a way (or at a time) which allows the existing forest of BK
trees to be pulled from one last time, that may save a fair amount of
work all round, so I thought it was worth mentioning.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:24   ` Sergei Organov
@ 2005-04-07 10:30     ` Matthias Andree
  2005-04-07 10:54       ` Andrew Walrond
  2005-04-09 16:17       ` David Roundy
  0 siblings, 2 replies; 201+ messages in thread
From: Matthias Andree @ 2005-04-07 10:30 UTC (permalink / raw)
  To: Kernel Mailing List

On Thu, 07 Apr 2005, Sergei Organov wrote:

> David Woodhouse <dwmw2@infradead.org> writes:
> 
> > On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote:
> > > PS. Don't bother telling me about subversion. If you must, start reading
> > > up on "monotone". That seems to be the most viable alternative, but don't
> > > pester the developers so much that they don't get any work done. They are
> > > already aware of my problems ;)
> > 
> > One feature I'd want to see in a replacement version control system is
> > the ability to _re-order_ patches, and to cherry-pick patches from my
> > tree to be sent onwards. The lack of that capability is the main reason
> > I always hated BitKeeper.
> 
> darcs? <http://www.abridgegame.org/darcs/>

Close. Some things:

1. It's rather slow and quite CPU consuming and certainly I/O consuming
   at times - I keep, to try it out, leafnode-2 in a DARCS repo, which
   has a mere 20,000 lines in 140 files, with 1,436 changes so far, on a
   RAID-1 with two 7200/min disk drives, with an Athlon XP 2500+ with
   512 MB RAM. The repo has 1,700 files in 11.5 MB, the source itself
   189 files in 1.8 MB.

   Example: darcs annotate nntpd.c takes 23 s. (2,660 lines, 60 kByte)

   The maintainer himself states that there's still optimization required.

2. It has an impressive set of dependencies around Glasgow Haskell
   Compiler. I don't personally have issues with that, but I can already
   hear the moaning and bitching.

3. DARCS is written in Haskell. This is not a problem either, but I'd
   think there are fewer people who can hack Haskell than people who
   can hack C, C++, Java, Python or similar. It is still better than
   BitKeeper from the hacking POV as the code is available and under an
   acceptable license.

Getting DARCS up to the task would probably require some polishing, and
should probably be discussed with the DARCS maintainer before making
this decision.

Don't get me wrong, DARCS looks promising, but I'm not convinced it's
ready for the linux kernel yet.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:20     ` Paul Mackerras
  2005-04-07  9:46       ` Andrew Morton
@ 2005-04-07 10:41       ` Geert Uytterhoeven
  1 sibling, 0 replies; 201+ messages in thread
From: Geert Uytterhoeven @ 2005-04-07 10:41 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, David Woodhouse, Linus Torvalds, Linux Kernel Development

On Thu, 7 Apr 2005, Paul Mackerras wrote:
> Andrew Morton writes:
> > The problem with those is letting other people get access to it.  I guess
> > that could be fixed with a bit of scripting and rsyncing.
> 
> Yes.

Me too ;-)

> > (I don't do that for -mm because -mm basically doesn't work for 99% of the
> > time.  Takes 4-5 hours to out a release out assuming that nothing's busted,
> > and usually something is).
> 
> With -mm we get those nice little automatic emails saying you've put
> the patch into -mm, which removes one of the main reasons for wanting
> to be able to get an up-to-date image of your tree.  The other reason,

FYI, for Linus' BK tree procmail was telling me, if it encountered a patch on
the commits list which was signed-off by me.

> of course, is to be able to see if a patch I'm about to send conflicts
> with something you have already taken, and rebase it if necessary.

And yet another reason: to monitor if files/subsystems I'm interested in are
changed.

Summarized: I'd be happy with a mailing list that would send out all patches
(incl. full comment headers, cfr. bk-commit) that Linus commits.

An added bonus would be that people would really be able to reconstruct the
full tree from the mails, unlike with bk-commits (due to `strange' csets caused
by merges). Just make sure there are strictly monotone sequence numbers in the
individual mails.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 10:30     ` Matthias Andree
@ 2005-04-07 10:54       ` Andrew Walrond
  2005-04-09 16:17       ` David Roundy
  1 sibling, 0 replies; 201+ messages in thread
From: Andrew Walrond @ 2005-04-07 10:54 UTC (permalink / raw)
  To: Kernel Mailing List

I recently switched from bk to darcs (actually looked into it after the author 
mentioned on LKML that he had imported the kernel tree). Very impressed so 
far, but as you say,

> 1. It's rather slow and quite CPU consuming and certainly I/O consuming

I expect something as large as the kernel tree would cause problems in this 
respect.

> 2. It has an impressive set of dependencies around Glasgow Haskell
>    Compiler. I don't personally have issues with that, but I can already
>    hear the moaning and bitching.

:) I try to built everthing from the original source, but in this case I 
couldn't. The GHC needs the GHC + some GHC addons in order  to compile 
itself...

>
> 3. DARCS is written in Haskell. This is not a problem either, but I'd
>    think there are fewer people who can hack Haskell than people who
>    can hack C, C++, Java, Python or similar. It is still better than

True, though as you say, not a show-stopper.

>From a functionality standpoint, darcs seem very similar to monotone, with a 
couple minor trade-offs in either direction.

I wonder if Linus would mind publishing his feature requests to the monotone 
developers, so that other projects, like darcs, would know what needs working 
on.

Andrew Walrond

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (7 preceding siblings ...)
  2005-04-07  7:44 ` Jan Hudec
@ 2005-04-07 10:56 ` Andrew Walrond
  2005-04-08  0:57 ` Ian Wienand
  2005-04-08  4:13 ` Chris Wedgwood
  10 siblings, 0 replies; 201+ messages in thread
From: Andrew Walrond @ 2005-04-07 10:56 UTC (permalink / raw)
  To: linux-kernel

On Wednesday 06 April 2005 16:42, Linus Torvalds wrote:
>
> PS. Don't bother telling me about subversion. If you must, start reading
> up on "monotone". That seems to be the most viable alternative, but don't
> pester the developers so much that they don't get any work done. They are
> already aware of my problems ;)

Care to share your monotone wishlist?

Andrew Walrond

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  9:46       ` Andrew Morton
@ 2005-04-07 11:17         ` Paul Mackerras
  0 siblings, 0 replies; 201+ messages in thread
From: Paul Mackerras @ 2005-04-07 11:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dwmw2, torvalds, linux-kernel

Andrew Morton writes:

> >  The other reason,
> >  of course, is to be able to see if a patch I'm about to send conflicts
> >  with something you have already taken, and rebase it if necessary.
> 
> <hack, hack>
> 
> How's this?

Nice; but in fact I meant that I want to be able to see if a patch of
mine conflicts with one from somebody else.

Paul.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  3:35     ` Daniel Phillips
@ 2005-04-07 15:08       ` Daniel Phillips
  0 siblings, 0 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 15:08 UTC (permalink / raw)
  To: Martin Pool; +Cc: linux-kernel

On Wednesday 06 April 2005 23:35, Daniel Phillips wrote:
> When I tried it, it took 13 seconds to 'bzr add' the 2.6.11.3 tree on a
> relatively slow machine.

Oh, and 135 seconds to commit, so 148 seconds overall.  Versus 87 seconds to 
to bunzip the tree in the first place.  So far, you are in the ballpark.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  6:51 ` Paul Mackerras
  2005-04-07  7:48   ` Arjan van de Ven
@ 2005-04-07 15:10   ` Linus Torvalds
  2005-04-07 17:00     ` Daniel Phillips
  2005-04-07 23:21     ` Dave Airlie
  1 sibling, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-07 15:10 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Kernel Mailing List



On Thu, 7 Apr 2005, Paul Mackerras wrote:
> 
> Are you happy with processing patches + descriptions, one per mail?

Yes. That's going to be my interim, I was just hoping that with 2.6.12-rc2 
out the door, and us in a "calming down" period, I could afford to not 
even do that for a while.

The real problem with the email thing is that it ends up piling up: what 
BK did in this respect was that anythign that piled up in a BK repository 
ended up still being there, and a single "bk pull" got it anyway - so if 
somebody got ignored because I was busy with something else, it didn't add 
any overhead. The queue didn't get "congested".

And that's a big thing. It comes from the "Linus pulls" model where people 
just told me that they were ready, instead of the "everybody pushes to 
Linus" model, where the destination gets congested at times.

So I do not want the "send Linus email patches" (whether mboxes or a 
single patch per email) to be a very long-term strategy. We can handle it 
for a while (in particular, I'm counting on it working up to the real 
release of 2.6.12, since we _should_ be in the calm period for the next 
month anyway), but it doesn't work in the long run.

> Do you have it automated to the point where processing emailed patches
> involves little more overhead than doing a bk pull?

It's more overhead, but not a lot. Especially nice numbered sequences like
Andrew sends (where I don't have to manually try to get the dependencies
right by trying to figure them out and hope I'm right, but instead just
sort by Subject: line) is not a lot of overhead. I can process a hundred
emails almost as easily as one, as long as I trust the maintainer (which,
when it's used as a BK replacement, I obviously do).

However, the SCM's I've looked at make this hard. One of the things (the
main thing, in fact) I've been working at is to make that process really
_efficient_. If it takes half a minute to apply a patch and remember the
changeset boundary etc (and quite frankly, that's _fast_ for most SCM's
around for a project the size of Linux), then a series of 250 emails
(which is not unheard of at all when I sync with Andrew, for example)  
takes two hours. If one of the patches in the middle doesn't apply, things
are bad bad bad.

Now, BK wasn't a speed deamon either (actually, compared to everything
else, BK _is_ a speed deamon, often by one or two orders of magnitude),
and took about 10-15 seconds per email when I merged with Andrew. HOWEVER,
with BK that wasn't as big of an issue, since the BK<->BK merges were so
easy, so I never had the slow email merges with any of the other main
developers. So a patch-application-based SCM "merger" actually would need
to be _faster_ than BK is. Which is really really really hard.

So I'm writing some scripts to try to track things a whole lot faster.  
Initial indications are that I should be able to do it almost as quickly
as I can just apply the patch, but quite frankly, I'm at most half done,
and if I hit a snag maybe that's not true at all. Anyway, the reason I can
do it quickly is that my scripts will _not_ be an SCM, they'll be a very
specific "log Linus' state" kind of thing. That will make the linear patch
merge a lot more time-efficient, and thus possible.

(If a patch apply takes three seconds, even a big series of patches is not
a problem: if I get notified within a minute or two that it failed
half-way, that's fine, I can then just fix it up manually. That's why 
latency is critical - if I'd have to do things effectively "offline", 
I'd by definition not be able to fix it up when problems happen).

> If so, then your mailbox (or patch queue) becomes a natural
> serialization point for the changes, and the need for a tool that can
> handle a complex graph of changes is much reduced.

Yes. In the short term. See above why I think the congestion issue will 
really mean that we want to have parallell merging in the not _too_ 
distant future.

NOTE! I detest the centralized SCM model, but if push comes to shove, and
we just _can't_ get a reasonable parallell merge thing going in the short
timeframe (ie month or two), I'll use something like SVN on a trusted site
with just a few committers, and at least try to distribute the merging out
over a few people rather than making _me_ be the throttle.

The reason I don't really want to do that is once we start doing it that
way, I suspect we'll have a _really_ hard time stopping. I think it's a
broken model. So I'd much rather try to have some pain in the short run 
and get a better model running, but I just wanted to let people know that 
I'm pragmatic enough that I realize that we may not have much choice.

> * Visibility into what you had accepted and committed to your
>   repository
> * Lower latency of patches going into your repository
> * Much reduced rate of patches being dropped

Yes. 

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  7:18 ` David Woodhouse
  2005-04-07  8:50   ` Andrew Morton
  2005-04-07  9:24   ` Sergei Organov
@ 2005-04-07 15:32   ` Linus Torvalds
  2005-04-07 17:09     ` Daniel Phillips
                       ` (2 more replies)
  2 siblings, 3 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-07 15:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Kernel Mailing List



On Thu, 7 Apr 2005, David Woodhouse wrote:
>
> On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote:
> > PS. Don't bother telling me about subversion. If you must, start reading
> > up on "monotone". That seems to be the most viable alternative, but don't
> > pester the developers so much that they don't get any work done. They are
> > already aware of my problems ;)
> 
> One feature I'd want to see in a replacement version control system is
> the ability to _re-order_ patches, and to cherry-pick patches from my
> tree to be sent onwards. The lack of that capability is the main reason
> I always hated BitKeeper.

I really disliked that in BitKeeper too originally. I argued with Larry
about it, but Larry (correctly, I believe) argued that efficient and
reliable distribution really requires the concept of "history is
immutable". It makes replication much easier when you know that the known
subset _never_ shrinks or changes - you only add on top of it.

And that implies no cherry-picking.

Also, there's actually a second reason why I've decided that cherry-
picking is wrong, and it's non-technical. 

The thing is, cherry-picking very much implies that the people "up" the 
foodchain end up editing the work of the people "below" them. The whole 
reason you want cherry-picking is that you want to fix up somebody elses 
mistakes, ie something you disagree with.

That sounds like an obviously good thing, right? Yes it does.

The problem is, it actually results in the wrong dynamics and psychology 
in the system. First off, it makes the implicit assumption that there is 
an "up" and "down" in the food-chain, and I think that's wrong. It's 
increasingly a "network" in the kernel. I'm less and less "the top", as 
much as a "fairly central" person. And that is how it should be. I used to 
think of kernel development as a hierarchy, but I long since switched to 
thinking about it as a fairly arbitrary network.

The other thing it does is that it implicitly puts the burden of quality 
control at the upper-level maintainer ("I'll pick the good things out of 
your tree"), while _not_ being able to cherry-pick means that there is 
pressure in both directions to keep the tree clean.

And that is IMPORTANT. I realize that not cherry-picking means that people
who want to merge upstream (or sideways or anything) are now forced to do
extra work in trying to keep their tree free of random crap. And that's a
HUGELY IMPORTANT THING! It means that the pressure to keep the tree clean
flows in all directions, and takes pressure off the "central" point. In
onther words it distributes the pain of maintenance.

In other words, somebody who can't keep their act together, and creates 
crappy trees because he has random pieces of crud in it, quite 
automatically gets actively shunned by others. AND THAT IS GOOD! I've 
pushed back on some BK users to clean up their trees, to the point where 
we've had a number of "let's just re-do that" over the years. That's 
WONDERFUL. People are irritated at first, but I've seen what the end 
result is, and the end result is a much better maintainer. 

Some people actually end up doing the cleanup different ways. For example,
Jeff Garzik kept many separate trees, and had a special merge thing.
Others just kept a messy tree for development, and when they are happy,
they throw the messy tree away and re-create a cleaner one. Either is fine
- the point is, different people like to work different ways, and that's
fine, but makign _everybody_ work at being clean means that there is no
train wreck down the line when somebody is forced to try to figure out
what to cherry-pick.

So I've actually changed from "I want to cherry-pick" to "cherry-picking
between maintainers is the wrong workflow". Now, as part of cleaning up,
people may end up exporting the "ugly tree" as patches and re-importing it
into the clean tree as the fixed clean series of patches, and that's
"cherry-picking", but it's not between developers.

NOTE! The "no cherry-picking" model absolutely also requires a model of 
"throw-away development trees". The two go together. BK did both, and an 
SCM that does one but not the other would be horribly broken.

(This is my only real conceptual gripe with "monotone". I like the model,
but they make it much harder than it should be to have throw-away trees
due to the fact that they seem to be working on the assumption of "one
database per developer" rather than "one database per tree". You don't 
have to follow that model, but it seems to be what the setup is geared 
for, and together with their "branches" it means that I think a monotone 
database easily gets very cruddy. The other problem with monotone is 
just performance right now, but that's hopefully not _too_ fundamental).

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 16:00 ` Greg KH
@ 2005-04-07 16:40   ` Rik van Riel
  2005-04-08  0:53     ` Jesse Barnes
  0 siblings, 1 reply; 201+ messages in thread
From: Rik van Riel @ 2005-04-07 16:40 UTC (permalink / raw)
  To: Greg KH; +Cc: Linus Torvalds, Kernel Mailing List

On Wed, 6 Apr 2005, Greg KH wrote:

> the very odd demands such a large project as the kernel has caused.  And
> I definitely owe him a beer the next time I see him.

Seconded.  Besides, now that the code won't be on bkbits
any more, it's safe to get Larry drunk ;)

Larry, thanks for the help you have given us by making
bitkeeper available for all these years.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 15:10   ` Linus Torvalds
@ 2005-04-07 17:00     ` Daniel Phillips
  2005-04-07 17:38       ` Linus Torvalds
  2005-04-07 19:56       ` Sam Ravnborg
  2005-04-07 23:21     ` Dave Airlie
  1 sibling, 2 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 17:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, Kernel Mailing List

On Thursday 07 April 2005 11:10, Linus Torvalds wrote:
> On Thu, 7 Apr 2005, Paul Mackerras wrote:
> > Do you have it automated to the point where processing emailed patches
> > involves little more overhead than doing a bk pull?
>
> It's more overhead, but not a lot. Especially nice numbered sequences like
> Andrew sends (where I don't have to manually try to get the dependencies
> right by trying to figure them out and hope I'm right, but instead just
> sort by Subject: line)...

Hi Linus,

In that case, a nice refinement is to put the sequence number at the end of 
the subject line so patch sequences don't interleave:

   Subject: [PATCH] Unbork OOM Killer (1 of 3)
   Subject: [PATCH] Unbork OOM Killer (2 of 3)
   Subject: [PATCH] Unbork OOM Killer (3 of 3)
   Subject: [PATCH] Unbork OOM Killer (v2, 1 of 3)
   Subject: [PATCH] Unbork OOM Killer (v2, 2 of 3)
   ...

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 15:32   ` Linus Torvalds
@ 2005-04-07 17:09     ` Daniel Phillips
  2005-04-07 17:10     ` Al Viro
  2005-04-08 22:52     ` Roman Zippel
  2 siblings, 0 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 17:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, Kernel Mailing List

On Thursday 07 April 2005 11:32, Linus Torvalds wrote:
> On Thu, 7 Apr 2005, David Woodhouse wrote:
> > On Wed, 2005-04-06 at 08:42 -0700, Linus Torvalds wrote:
> > > PS. Don't bother telling me about subversion. If you must, start
> > > reading up on "monotone". That seems to be the most viable alternative,
> > > but don't pester the developers so much that they don't get any work
> > > done. They are already aware of my problems ;)
> >
> > One feature I'd want to see in a replacement version control system is
> > the ability to _re-order_ patches, and to cherry-pick patches from my
> > tree to be sent onwards. The lack of that capability is the main reason
> > I always hated BitKeeper.
>
> I really disliked that in BitKeeper too originally. I argued with Larry
> about it, but Larry (correctly, I believe) argued that efficient and
> reliable distribution really requires the concept of "history is
> immutable". It makes replication much easier when you know that the known
> subset _never_ shrinks or changes - you only add on top of it.

However, it would be easy to allow reordering before "publishing" a revision, 
which would preserve immutability for all published revisions while allowing 
the patch _author_ the flexibility of reordering/splitting/joining patches 
when creating them.  In other words, a virtuous marriage of the BK model with 
Andrew's Quilt.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 15:32   ` Linus Torvalds
  2005-04-07 17:09     ` Daniel Phillips
@ 2005-04-07 17:10     ` Al Viro
  2005-04-07 17:47       ` Linus Torvalds
                         ` (2 more replies)
  2005-04-08 22:52     ` Roman Zippel
  2 siblings, 3 replies; 201+ messages in thread
From: Al Viro @ 2005-04-07 17:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, Kernel Mailing List

On Thu, Apr 07, 2005 at 08:32:04AM -0700, Linus Torvalds wrote:
> Also, there's actually a second reason why I've decided that cherry-
> picking is wrong, and it's non-technical. 
> 
> The thing is, cherry-picking very much implies that the people "up" the 
> foodchain end up editing the work of the people "below" them. The whole 
> reason you want cherry-picking is that you want to fix up somebody elses 
> mistakes, ie something you disagree with.

No.  There's another reason - when you are cherry-picking and reordering
*your* *own* *patches*.  That's what I had been unable to explain to
Larry and that's what made BK unusable for me.

As for the immutable history...  Ever had to read or grade students'
homework?
	* the dumbest kind: "here's an answer <expression>, whaddya
mean 'where's the solution'?".
	* next one: "here's how I've solved the problem: <pages of text
documenting the attempts, with many 'oops, there had been a mistake,
here's how we fix it'>".  
	* what you really want to see: series of steps leading to answer,
with clean logical structure that allows to understand what's being
done and verify correctness.

The first corresponds to "here's a half-meg of patch, it fixes everything".
The second is chronological history (aka "this came from our CVS, all bugs
are fixed by now, including those introduced in the middle of it; see
CVS history for details").  The third is a decent patch series.

And to get from "here's how I came up to solution" to "here's a clean way
to reach the solution" you _have_ to reorder.  There's also "here are
misc notes from today, here are misc notes from yesterday, etc." and to
get that into sane shape you will need to split, reorder and probably
collapse several into combined delta (possibly getting an empty delta
as the result, if later ones negate the prior).

The point being, both history and well, publishable result can be expressed
as series of small steps, but they are not the same thing.  So far all I've
seen in the area (and that includes BK) is heavily biased towards history part
and attempts to use this stuff for manipulating patch series turn into fighting
the tool.

I'd *love* to see something that can handle both - preferably with
history of reordering, etc. being kept.  IOW, not just a tree of changesets
but a lattice - with multiple paths leading to the same node.  So far
I've seen nothing of that kind ;-/

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:00     ` Daniel Phillips
@ 2005-04-07 17:38       ` Linus Torvalds
  2005-04-07 17:47         ` Chris Wedgwood
                           ` (3 more replies)
  2005-04-07 19:56       ` Sam Ravnborg
  1 sibling, 4 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-07 17:38 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Paul Mackerras, Kernel Mailing List



On Thu, 7 Apr 2005, Daniel Phillips wrote:
> 
> In that case, a nice refinement is to put the sequence number at the end of 
> the subject line so patch sequences don't interleave:

No. That makes it unsortable, and also much harder to pick put which part 
of the subject line is the explanation, and which part is just metadata 
for me.

So my prefernce is _overwhelmingly_ for the format that Andrew uses (which 
is partly explained by the fact that I am used to it, but also by the fact 
that I've asked for Andrew to make trivial changes to match my usage).

That canonical format is:

	Subject: [PATCH 001/123] [<area>:] <explanation>

together with the first line of the body being a

	From: Original Author <origa@email.com>

followed by an empty line and then the body of the explanation.

After the body of the explanation comes the "Signed-off-by:" lines, and 
then a simple "---" line, and below that comes the diffstat of the patch 
and then the patch itself.

That's the "canonical email format", and it's that because my normal
scripts (in BK/tools, but now I'm working on making them more generic)  
take input that way. It's very easy to sort the emails alphabetically by
subject line - pretty much any email reader will support that - since
because the sequence number is zero-padded, the numerical and alphabetic
sort is the same.

If you send several sequences, you either send a simple explaining email
before the second sequence (hey, it's not like I'm a machine - I can use
my brains too, and in particular if the final number of patches in each
sequence is different, even if the sequences got re-ordered and are
overlapping, I can still just extract one from the other by selecting for
"/123] " in the subject line), or you modify the Subject: line subtly to
still sort uniquely and alphabetically in-order, ie the subject lines for
the second series might be

	Subject: [PATCHv2 001/207] x86: fix eflags tracking
	...

All very unambiguous, and my scripts already remove everything inside the 
brackets and will just replace it with "[PATCH]" in the final version.

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:10     ` Al Viro
@ 2005-04-07 17:47       ` Linus Torvalds
  2005-04-07 18:04         ` Jörn Engel
  2005-04-08  3:41         ` Jeff Garzik
  2005-04-07 17:52       ` Bartlomiej Zolnierkiewicz
  2005-04-07 17:54       ` Daniel Phillips
  2 siblings, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-07 17:47 UTC (permalink / raw)
  To: Al Viro; +Cc: David Woodhouse, Kernel Mailing List



On Thu, 7 Apr 2005, Al Viro wrote:
> 
> No.  There's another reason - when you are cherry-picking and reordering
> *your* *own* *patches*.

Yes. I agree. There should be some support for cherry-picking in between a
temporary throw-away tree and a "cleaned-up-tree". However, it should be
something you really do need to think about, and in most cases it really
does boil down to "export as patch, re-import from patch". Especially
since you potentially want to edit things in between anyway when you
cherry-pick.

(I do that myself: If I have been a messy boy, and committed mixed-up 
things as one commit, I export it as a patch, and then I split the patch 
by hand into two or more pieces - sometimes by just editing the patch 
directly, but sometimes with a combination of by applying it, and editing 
the result, and then re-exporting it as the new version).

And in the cases where this happens, you in fact often have unrelated
changes to the _same_file_, so you really do end up having that middle 
step.

In other words, this cherry-picking can generally be scripted and done
"outside" the SCM (you can trivially have a script that takes a revision
from one tree and applies it to the other). I don't believe that the SCM
needs to support it in any fundamentally inherent manner. After all, why 
should it, when it really boilds down to 

	(cd old-tree ; scm export-as-patch-plus-comments) |
		(cd new-tree ; scm import-patch-plus-comments)

where the "patch-plus-comments" part is just basically an extended patch
(including rename information etc, not just the comments).

Btw, this method of cherry-picking again requires two _separate_ active 
trees at the same time. BK is great at that, and really, that's what 
distributed SCM's should be all about anyway. It's not just distributed 
between different machines, it's literally distributed even on the same 
machine, and it's actively _used_ that way.

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:38       ` Linus Torvalds
@ 2005-04-07 17:47         ` Chris Wedgwood
  2005-04-07 18:06         ` Magnus Damm
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-07 17:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Phillips, Paul Mackerras, Kernel Mailing List

On Thu, Apr 07, 2005 at 10:38:06AM -0700, Linus Torvalds wrote:

> So my prefernce is _overwhelmingly_ for the format that Andrew uses
> (which is partly explained by the fact that I am used to it, but
> also by the fact that I've asked for Andrew to make trivial changes
> to match my usage).
>
> That canonical format is:
>
> 	Subject: [PATCH 001/123] [<area>:] <explanation>
>
> together with the first line of the body being a
>
> 	From: Original Author <origa@email.com>
>
> followed by an empty line and then the body of the explanation.

Having a script to check people get this right before sending it via
email would be a nice thing to put into scripts/ or probably
Documentation/ perhaps?

Does such a thing already exist?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:10     ` Al Viro
  2005-04-07 17:47       ` Linus Torvalds
@ 2005-04-07 17:52       ` Bartlomiej Zolnierkiewicz
  2005-04-07 17:54       ` Daniel Phillips
  2 siblings, 0 replies; 201+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2005-04-07 17:52 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

On Apr 7, 2005 7:10 PM, Al Viro <viro@parcelfarce.linux.theplanet.co.uk> wrote:
> On Thu, Apr 07, 2005 at 08:32:04AM -0700, Linus Torvalds wrote:
> > Also, there's actually a second reason why I've decided that cherry-
> > picking is wrong, and it's non-technical.
> >
> > The thing is, cherry-picking very much implies that the people "up" the
> > foodchain end up editing the work of the people "below" them. The whole
> > reason you want cherry-picking is that you want to fix up somebody elses
> > mistakes, ie something you disagree with.
> 
> No.  There's another reason - when you are cherry-picking and reordering
> *your* *own* *patches*.  That's what I had been unable to explain to
> Larry and that's what made BK unusable for me.

Yep, I missed this in BK a lot.

There is another situation in which cherry-picking is very useful:
even if you have a clean tree it still may contain bugfixes mixed with
unrelated cleanups and sometimes you want to only apply bugfixes.

Bartlomiej

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:10     ` Al Viro
  2005-04-07 17:47       ` Linus Torvalds
  2005-04-07 17:52       ` Bartlomiej Zolnierkiewicz
@ 2005-04-07 17:54       ` Daniel Phillips
  2005-04-07 18:13         ` Dmitry Yusupov
  2005-04-08 17:24         ` Jon Masters
  2 siblings, 2 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 17:54 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

On Thursday 07 April 2005 13:10, Al Viro wrote:
> The point being, both history and well, publishable result can be expressed
> as series of small steps, but they are not the same thing.  So far all I've
> seen in the area (and that includes BK) is heavily biased towards history
> part and attempts to use this stuff for manipulating patch series turn into
> fighting the tool.
>
> I'd *love* to see something that can handle both - preferably with
> history of reordering, etc. being kept.  IOW, not just a tree of changesets
> but a lattice - with multiple paths leading to the same node.  So far
> I've seen nothing of that kind ;-/

Which is a perfect demonstration of why the scm tool has to be free/open 
source.  We should never have had to plead with BitMover to extend BK in a 
direction like that, but instead, just get the source and make it do it, like 
any other open source project.

Three years ago, there was no fully working open source distributed scm code 
base to use as a starting point, so extending BK would have been the only 
easy alternative.  But since then the situation has changed.  There are now 
several working code bases to provide a good starting point: Monotone, Arch, 
SVK, Bazaar-ng and others.

Sure, there are quibbles about all of those, but right now is not the time for 
quibbling, because a functional replacement for BK is needed in roughly two 
months, capable of losslessly importing the kernel version graph.  It only 
has to support a subset of BK functionality, e.g., pulling and cloning.  It 
is ok to be a little slow so long as it is not pathetically slow.  The 
purpose of the interim solution is just to get the patch flow process back 
online.

The key is the _lossless_ part.  So long as the interim solution imports the 
metadata losslessly, we have the flexibility to switch to a better solution 
later, on short notice and without much pain.

So I propose that everybody who is interested, pick one of the above projects 
and join it, to help get it to the point of being able to losslessly import 
the version graph.  Given the importance, I think that _all_ viable 
alternatives need to be worked on in parallel, so that two months from now we 
have several viable options.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:47       ` Linus Torvalds
@ 2005-04-07 18:04         ` Jörn Engel
  2005-04-07 18:27           ` Daniel Phillips
  2005-04-07 20:54           ` Arjan van de Ven
  2005-04-08  3:41         ` Jeff Garzik
  1 sibling, 2 replies; 201+ messages in thread
From: Jörn Engel @ 2005-04-07 18:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, David Woodhouse, Kernel Mailing List

On Thu, 7 April 2005 10:47:18 -0700, Linus Torvalds wrote:
> On Thu, 7 Apr 2005, Al Viro wrote:
> > 
> > No.  There's another reason - when you are cherry-picking and reordering
> > *your* *own* *patches*.
> 
> Yes. I agree. There should be some support for cherry-picking in between a
> temporary throw-away tree and a "cleaned-up-tree". However, it should be
> something you really do need to think about, and in most cases it really
> does boil down to "export as patch, re-import from patch". Especially
> since you potentially want to edit things in between anyway when you
> cherry-pick.

For reordering, using patcher, you can simply edit the sequence file
and move lines around.  Nice and simple interface.

There is no checking involved, though.  If you mode dependent patches,
you end up with a mess and either throw it all away or seriously
scratch your head.  So a serious SCM might do something like this:

$ cp series new_series
$ vi new_series
$ SCM --reorder new_series
  # essentially "mv new_series series", if no checks fail

Merging patches isn't that hard either.  Splitting them would remain
manual, as you described it.

> Btw, this method of cherry-picking again requires two _separate_ active 
> trees at the same time. BK is great at that, and really, that's what 
> distributed SCM's should be all about anyway. It's not just distributed 
> between different machines, it's literally distributed even on the same 
> machine, and it's actively _used_ that way.

Amen!

Jörn

-- 
He who knows that enough is enough will always have enough.
-- Lao Tsu

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:38       ` Linus Torvalds
  2005-04-07 17:47         ` Chris Wedgwood
@ 2005-04-07 18:06         ` Magnus Damm
  2005-04-07 18:36         ` Daniel Phillips
  2005-04-08  3:35         ` Jeff Garzik
  3 siblings, 0 replies; 201+ messages in thread
From: Magnus Damm @ 2005-04-07 18:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Phillips, Paul Mackerras, Kernel Mailing List

On Apr 7, 2005 7:38 PM, Linus Torvalds <torvalds@osdl.org> wrote:
> So my prefernce is _overwhelmingly_ for the format that Andrew uses (which
> is partly explained by the fact that I am used to it, but also by the fact
> that I've asked for Andrew to make trivial changes to match my usage).
> 
> That canonical format is:
> 
>         Subject: [PATCH 001/123] [<area>:] <explanation>
> 
> together with the first line of the body being a
> 
>         From: Original Author <origa@email.com>
> 
> followed by an empty line and then the body of the explanation.
> 
> After the body of the explanation comes the "Signed-off-by:" lines, and
> then a simple "---" line, and below that comes the diffstat of the patch
> and then the patch itself.

While specifying things, wouldn't it be useful to have a line
containing tags that specifies if the patch contains new features, a
bug fix or a high-priority security fix? Then that information could
be used to find patches for the sucker-tree.

/ magnus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:54       ` Daniel Phillips
@ 2005-04-07 18:13         ` Dmitry Yusupov
  2005-04-07 18:29           ` Daniel Phillips
  2005-04-08 17:24         ` Jon Masters
  1 sibling, 1 reply; 201+ messages in thread
From: Dmitry Yusupov @ 2005-04-07 18:13 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List

On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote:
> Three years ago, there was no fully working open source distributed scm code 
> base to use as a starting point, so extending BK would have been the only 
> easy alternative.  But since then the situation has changed.  There are now 
> several working code bases to provide a good starting point: Monotone, Arch, 
> SVK, Bazaar-ng and others.

Right. For example, SVK is pretty mature project and very close to 1.0
release now. And it supports all kind of merges including Cherry-Picking
Mergeback:

http://svk.elixus.org/?MergeFeatures

Dmitry


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 18:04         ` Jörn Engel
@ 2005-04-07 18:27           ` Daniel Phillips
  2005-04-07 20:54           ` Arjan van de Ven
  1 sibling, 0 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 18:27 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Linus Torvalds, Al Viro, David Woodhouse, Kernel Mailing List

On Thursday 07 April 2005 14:04, Jörn Engel wrote:
> On Thu, 7 April 2005 10:47:18 -0700, Linus Torvalds wrote:
>> ... There should be some support for cherry-picking in between
> > a temporary throw-away tree and a "cleaned-up-tree". However, it should
> > be something you really do need to think about, and in most cases it
> > really does boil down to "export as patch, re-import from patch".
> > Especially since you potentially want to edit things in between anyway
> > when you cherry-pick.
>
> For reordering, using patcher, you can simply edit the sequence file
> and move lines around.  Nice and simple interface.
>
> There is no checking involved, though.  If you mode dependent patches,
> you end up with a mess and either throw it all away or seriously
> scratch your head.  So a serious SCM might do something like this:
>
> $ cp series new_series
> $ vi new_series
> $ SCM --reorder new_series
>   # essentially "mv new_series series", if no checks fail
>
> Merging patches isn't that hard either.  Splitting them would remain
> manual, as you described it.

Well it's clear that adding cherry picking, patch reordering, splitting and 
merging (two patches into one) is not even hard, it's just a matter of making 
it convenient by _building it into the tool_.  Now, can we just pick a tool 
and do it, please?  :-)

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 18:13         ` Dmitry Yusupov
@ 2005-04-07 18:29           ` Daniel Phillips
  2005-04-10 22:33             ` Troy Benjegerdes
  0 siblings, 1 reply; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 18:29 UTC (permalink / raw)
  To: Dmitry Yusupov
  Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List

On Thursday 07 April 2005 14:13, Dmitry Yusupov wrote:
> On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote:
> > Three years ago, there was no fully working open source distributed scm
> > code base to use as a starting point, so extending BK would have been the
> > only easy alternative.  But since then the situation has changed.  There
> > are now several working code bases to provide a good starting point:
> > Monotone, Arch, SVK, Bazaar-ng and others.
>
> Right. For example, SVK is pretty mature project and very close to 1.0
> release now. And it supports all kind of merges including Cherry-Picking
> Mergeback:
>
> http://svk.elixus.org/?MergeFeatures

So for an interim way to get the patch flow back online, SVK is ready to try 
_now_, and we only need a way to import the version graph?  (true/false)

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:38       ` Linus Torvalds
  2005-04-07 17:47         ` Chris Wedgwood
  2005-04-07 18:06         ` Magnus Damm
@ 2005-04-07 18:36         ` Daniel Phillips
  2005-04-08  3:35         ` Jeff Garzik
  3 siblings, 0 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-07 18:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, Kernel Mailing List

On Thursday 07 April 2005 13:38, Linus Torvalds wrote:
> On Thu, 7 Apr 2005, Daniel Phillips wrote:
> > In that case, a nice refinement is to put the sequence number at the end
> > of the subject line so patch sequences don't interleave:
>
> No. That makes it unsortable, and also much harder to pick put which part
> of the subject line is the explanation, and which part is just metadata
> for me.

Well, my list in the parent post _was_ sorted by subject.  But that is a 
quibble, the important point is that you just officially defined the 
canonical format, which everybody should stick to for now:

> That canonical format is:
>
>  Subject: [PATCH 001/123] [<area>:] <explanation>
>
> together with the first line of the body being a
>
>  From: Original Author <origa@email.com>
>
> followed by an empty line and then the body of the explanation.
>
> After the body of the explanation comes the "Signed-off-by:" lines, and
> then a simple "---" line, and below that comes the diffstat of the patch
> and then the patch itself.
>
> That's the "canonical email format", and it's that because my normal
> scripts (in BK/tools, but now I'm working on making them more generic)
> take input that way. It's very easy to sort the emails alphabetically by
> subject line - pretty much any email reader will support that - since
> because the sequence number is zero-padded, the numerical and alphabetic
> sort is the same.
>
> If you send several sequences, you either send a simple explaining email
> before the second sequence (hey, it's not like I'm a machine - I can use
> my brains too, and in particular if the final number of patches in each
> sequence is different, even if the sequences got re-ordered and are
> overlapping, I can still just extract one from the other by selecting for
> "/123] " in the subject line), or you modify the Subject: line subtly to
> still sort uniquely and alphabetically in-order, ie the subject lines for
> the second series might be
>
>  Subject: [PATCHv2 001/207] x86: fix eflags tracking
>  ...
>
> All very unambiguous, and my scripts already remove everything inside the
> brackets and will just replace it with "[PATCH]" in the final version.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:00     ` Daniel Phillips
  2005-04-07 17:38       ` Linus Torvalds
@ 2005-04-07 19:56       ` Sam Ravnborg
  1 sibling, 0 replies; 201+ messages in thread
From: Sam Ravnborg @ 2005-04-07 19:56 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Linus Torvalds, Paul Mackerras, Kernel Mailing List

On Thu, Apr 07, 2005 at 01:00:51PM -0400, Daniel Phillips wrote:
> On Thursday 07 April 2005 11:10, Linus Torvalds wrote:
> > On Thu, 7 Apr 2005, Paul Mackerras wrote:
> > > Do you have it automated to the point where processing emailed patches
> > > involves little more overhead than doing a bk pull?
> >
> > It's more overhead, but not a lot. Especially nice numbered sequences like
> > Andrew sends (where I don't have to manually try to get the dependencies
> > right by trying to figure them out and hope I'm right, but instead just
> > sort by Subject: line)...
> 
> Hi Linus,
> 
> In that case, a nice refinement is to put the sequence number at the end of 
> the subject line so patch sequences don't interleave:
> 
>    Subject: [PATCH] Unbork OOM Killer (1 of 3)
>    Subject: [PATCH] Unbork OOM Killer (2 of 3)
>    Subject: [PATCH] Unbork OOM Killer (3 of 3)
>    Subject: [PATCH] Unbork OOM Killer (v2, 1 of 3)
>    Subject: [PATCH] Unbork OOM Killer (v2, 2 of 3)

This breaks the rule of a descriptive subject for each patch.
Consider 30 subjetcs telling you "Subject: PCI updates [001/030]
That is not good.

	Sam

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 18:04         ` Jörn Engel
  2005-04-07 18:27           ` Daniel Phillips
@ 2005-04-07 20:54           ` Arjan van de Ven
  1 sibling, 0 replies; 201+ messages in thread
From: Arjan van de Ven @ 2005-04-07 20:54 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Linus Torvalds, Al Viro, David Woodhouse, Kernel Mailing List

On Thu, 2005-04-07 at 20:04 +0200, Jörn Engel wrote:
> On Thu, 7 April 2005 10:47:18 -0700, Linus Torvalds wrote:
> > On Thu, 7 Apr 2005, Al Viro wrote:
> > > 
> > > No.  There's another reason - when you are cherry-picking and reordering
> > > *your* *own* *patches*.
> > 
> > Yes. I agree. There should be some support for cherry-picking in between a
> > temporary throw-away tree and a "cleaned-up-tree". However, it should be
> > something you really do need to think about, and in most cases it really
> > does boil down to "export as patch, re-import from patch". Especially
> > since you potentially want to edit things in between anyway when you
> > cherry-pick.
> 
> For reordering, using patcher, you can simply edit the sequence file
> and move lines around.  Nice and simple interface.
> 
> There is no checking involved, though.  If you mode dependent patches,
> you end up with a mess and either throw it all away or seriously
> scratch your head.  So a serious SCM might do something like this:


just fyi, patchutils has a tool that can "flip" the order of patches
even if they patch the same line of code in the files.... with it you
can make a "bubble sort" to move stuff about safely...



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 15:10   ` Linus Torvalds
  2005-04-07 17:00     ` Daniel Phillips
@ 2005-04-07 23:21     ` Dave Airlie
  1 sibling, 0 replies; 201+ messages in thread
From: Dave Airlie @ 2005-04-07 23:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, Kernel Mailing List

> > Are you happy with processing patches + descriptions, one per mail?
> 
> Yes. That's going to be my interim, I was just hoping that with 2.6.12-rc2
> out the door, and us in a "calming down" period, I could afford to not
> even do that for a while.
> 
> The real problem with the email thing is that it ends up piling up: what
> BK did in this respect was that anythign that piled up in a BK repository
> ended up still being there, and a single "bk pull" got it anyway - so if
> somebody got ignored because I was busy with something else, it didn't add
> any overhead. The queue didn't get "congested".
> 
> And that's a big thing. It comes from the "Linus pulls" model where people
> just told me that they were ready, instead of the "everybody pushes to
> Linus" model, where the destination gets congested at times.

Something I think we'll miss is bkbits.net in the long run, being able
to just push all patches for Linus to a tree and then forget about
that tree until Linus pulled from it was invaluable.. the fact that
this tree was online the whole time and you didn't queue up huge mails
for Linus's INBOX to be missed, meant a lot to me compared to pre-bk
workings..

Maybe now that kernel.org has been 'pimped out' we could set some sort
of system up where maintainers can drop a big load of patchsets or
even one big patch into some sort of public area and say this is my
diffs for Linus for his next pull and let Linus pull it at his
lesuire... some kinda rsync'y type thing comes to mind ...

so I can mail Linus and say hey Linus please grab
rsync://pimpedout.kernel.org/airlied/drm-linus and you grab everything
in there and I get notified perhaps or just a log like the bkbits
stats page, and Andrew can grab the patchsets the same as he does for
bk-drm now ... and I can have airlied/drm-2.6 where I can queue stuff
for -mm then just re-generate the patches for drm-linus later on..

Dave.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  5:38           ` Martin Pool
@ 2005-04-07 23:27             ` Linus Torvalds
  2005-04-08  5:56               ` Martin Pool
  0 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-07 23:27 UTC (permalink / raw)
  To: Martin Pool; +Cc: linux-kernel, David Lang



On Thu, 7 Apr 2005, Martin Pool wrote:
> 
> Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79
> total.  Each subsequent day takes about 10s user, 30s elapsed to commit
> into bzr.  The speeds are comparable to CVS or a bit faster, and may be
> faster than other distributed systems. (This on a laptop with a 5400rpm
> disk.)  Pulling out a complete copy of the tree as it was on a previous
> date takes about 14 user, 60s elapsed.

If you have an exportable tree, can you just make it pseudo-public, tell
me where to get a buildable system that works well enough, point me to
some documentation, and maybe I can get some feel for it?

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 16:40   ` Rik van Riel
@ 2005-04-08  0:53     ` Jesse Barnes
  0 siblings, 0 replies; 201+ messages in thread
From: Jesse Barnes @ 2005-04-08  0:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Greg KH, Linus Torvalds, Kernel Mailing List

On Thursday, April 7, 2005 9:40 am, Rik van Riel wrote:
> Larry, thanks for the help you have given us by making
> bitkeeper available for all these years.

A big thank you from me too, I've really enjoyed using BK and I think it's 
made me much more productive than I would have been otherwise.  I don't envy 
you having to put up with the frequent flamefests...

Jesse

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (8 preceding siblings ...)
  2005-04-07 10:56 ` Andrew Walrond
@ 2005-04-08  0:57 ` Ian Wienand
  2005-04-08  4:13 ` Chris Wedgwood
  10 siblings, 0 replies; 201+ messages in thread
From: Ian Wienand @ 2005-04-08  0:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 594 bytes --]

On Wed, Apr 06, 2005 at 08:42:08AM -0700, Linus Torvalds wrote:
> If you must, start reading up on "monotone".

One slightly annoying thing is that monotone doesn't appear to have a
web interface.  I used to use the bk one a lot when tracking down
bugs, because it was really fast to have a web browser window open and
click through the revisions of a file reading checkin comments, etc.
Does anyone know if one is being worked on?

bazaar-ng at least mention this is important in their design docs and
arch has one in development too.

-i
ianw@gelato.unsw.edu.au
http://www.gelato.unsw.edu.au

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:38       ` Linus Torvalds
                           ` (2 preceding siblings ...)
  2005-04-07 18:36         ` Daniel Phillips
@ 2005-04-08  3:35         ` Jeff Garzik
  3 siblings, 0 replies; 201+ messages in thread
From: Jeff Garzik @ 2005-04-08  3:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Phillips, Paul Mackerras, Kernel Mailing List

Linus Torvalds wrote:
> 
> On Thu, 7 Apr 2005, Daniel Phillips wrote:
> 
>>In that case, a nice refinement is to put the sequence number at the end of 
>>the subject line so patch sequences don't interleave:
> 
> 
> No. That makes it unsortable, and also much harder to pick put which part 
> of the subject line is the explanation, and which part is just metadata 
> for me.
> 
> So my prefernce is _overwhelmingly_ for the format that Andrew uses (which 
> is partly explained by the fact that I am used to it, but also by the fact 
> that I've asked for Andrew to make trivial changes to match my usage).
> 
> That canonical format is:
> 
> 	Subject: [PATCH 001/123] [<area>:] <explanation>
> 
> together with the first line of the body being a
> 
> 	From: Original Author <origa@email.com>


Nod.  For future reference, people can refer to

http://linux.yyz.us/patch-format.html
	and/or
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt

	Jeff



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:47       ` Linus Torvalds
  2005-04-07 18:04         ` Jörn Engel
@ 2005-04-08  3:41         ` Jeff Garzik
  1 sibling, 0 replies; 201+ messages in thread
From: Jeff Garzik @ 2005-04-08  3:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, David Woodhouse, Kernel Mailing List

Linus Torvalds wrote:
> In other words, this cherry-picking can generally be scripted and done
> "outside" the SCM (you can trivially have a script that takes a revision
> from one tree and applies it to the other). I don't believe that the SCM
> needs to support it in any fundamentally inherent manner. After all, why 
> should it, when it really boilds down to 
> 
> 	(cd old-tree ; scm export-as-patch-plus-comments) |
> 		(cd new-tree ; scm import-patch-plus-comments)
> 
> where the "patch-plus-comments" part is just basically an extended patch
> (including rename information etc, not just the comments).


Not that it matters anymore, but that's precisely what the script
	Documentation/BK-usage/cpcset
did, for BitKeeper.

	Jeff



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 15:42 Kernel SCM saga Linus Torvalds
                   ` (9 preceding siblings ...)
  2005-04-08  0:57 ` Ian Wienand
@ 2005-04-08  4:13 ` Chris Wedgwood
  2005-04-08  4:42   ` Linus Torvalds
  2005-04-08 11:42   ` Kernel SCM saga Catalin Marinas
  10 siblings, 2 replies; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08  4:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, Apr 06, 2005 at 08:42:08AM -0700, Linus Torvalds wrote:

> PS. Don't bother telling me about subversion. If you must, start reading
> up on "monotone". That seems to be the most viable alternative, but don't
> pester the developers so much that they don't get any work done. They are
> already aware of my problems ;)

I'm playing with monotone right now.  Superficially it looks like it
has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
I mean glacial.  A heavily sedated sloth with no legs is probably
faster.

Using monotone to pull itself too over 2 hours wall-time and 71
minutes of CPU time.

Arguably brand-new CPUs are probably about 2x the speed of what I have
now and there might have been networking funnies --- but that's still
35 monutes to get ~40MB of data.

The kernel is ten times larger, so does that mean to do a clean pull
of the kernel we are looking at (71/2*10) ~ 355 minutes or 6 hours of
CPU time?


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:13 ` Chris Wedgwood
@ 2005-04-08  4:42   ` Linus Torvalds
  2005-04-08  5:04     ` Chris Wedgwood
                       ` (5 more replies)
  2005-04-08 11:42   ` Kernel SCM saga Catalin Marinas
  1 sibling, 6 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08  4:42 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List



On Thu, 7 Apr 2005, Chris Wedgwood wrote:
> 
> I'm playing with monotone right now.  Superficially it looks like it
> has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
> I mean glacial.  A heavily sedated sloth with no legs is probably
> faster.

Yes. The silly thing is, at least in my local tests it doesn't actually
seem to be _doing_ anything while it's slow (there are no system calls
except for a few memory allocations and de-allocations). It seems to have
some exponential function on the number of pathnames involved etc.

I'm hoping they can fix it, though. The basic notions do not sound wrong.

In the meantime (and because monotone really _is_ that slow), here's a
quick challenge for you, and any crazy hacker out there: if you want to
play with something _really_ nasty (but also very _very_ fast), take a
look at kernel.org:/pub/linux/kernel/people/torvalds/.

First one to send me the changelog tree of sparse-git (and a tool to
commit and push/pull further changes) gets a gold star, and an honorable
mention. I've put a hell of a lot of clues in there (*).

I've worked on it (and little else) for the last two days. Time for 
somebody else to tell me I'm crazy.

		Linus

(*) It should be easier than it sounds. The database is designed so that
you can do the equivalent of a nonmerging (ie pure superset) push/pull
with just plain rsync, so replication really should be that easy (if
somewhat bandwidth-intensive due to the whole-file format).

Never mind merging. It's not an SCM, it's a distribution and archival
mechanism. I bet you could make a reasonable SCM on top of it, though.
Another way of looking at it is to say that it's really a content-
addressable filesystem, used to track directory trees.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:42   ` Linus Torvalds
@ 2005-04-08  5:04     ` Chris Wedgwood
  2005-04-08  5:14       ` H. Peter Anvin
  2005-04-08  7:14     ` Andrea Arcangeli
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08  5:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:

> Yes. The silly thing is, at least in my local tests it doesn't
> actually seem to be _doing_ anything while it's slow (there are no
> system calls except for a few memory allocations and
> de-allocations). It seems to have some exponential function on the
> number of pathnames involved etc.

I see lots of brk calls changing the heap size, up, down, up, down,
over and over.

This smells a bit like c++ new/delete behavior to me.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  5:04     ` Chris Wedgwood
@ 2005-04-08  5:14       ` H. Peter Anvin
  2005-04-08  7:05         ` Rogan Dawes
  0 siblings, 1 reply; 201+ messages in thread
From: H. Peter Anvin @ 2005-04-08  5:14 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20050408050458.GB8720@taniwha.stupidest.org>
By author:    Chris Wedgwood <cw@f00f.org>
In newsgroup: linux.dev.kernel
>
> On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:
> 
> > Yes. The silly thing is, at least in my local tests it doesn't
> > actually seem to be _doing_ anything while it's slow (there are no
> > system calls except for a few memory allocations and
> > de-allocations). It seems to have some exponential function on the
> > number of pathnames involved etc.
> 
> I see lots of brk calls changing the heap size, up, down, up, down,
> over and over.
> 
> This smells a bit like c++ new/delete behavior to me.
> 

Hmmm... can glibc be clued in to do some hysteresis on the memory
allocation?

	-hpa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 23:27             ` Linus Torvalds
@ 2005-04-08  5:56               ` Martin Pool
  2005-04-08  6:41                 ` Linus Torvalds
  0 siblings, 1 reply; 201+ messages in thread
From: Martin Pool @ 2005-04-08  5:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, David Lang

[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]

On Thu, 2005-04-07 at 16:27 -0700, Linus Torvalds wrote:
> 
> On Thu, 7 Apr 2005, Martin Pool wrote:
> > 
> > Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79
> > total.  Each subsequent day takes about 10s user, 30s elapsed to commit
> > into bzr.  The speeds are comparable to CVS or a bit faster, and may be
> > faster than other distributed systems. (This on a laptop with a 5400rpm
> > disk.)  Pulling out a complete copy of the tree as it was on a previous
> > date takes about 14 user, 60s elapsed.
> 
> If you have an exportable tree, can you just make it pseudo-public, tell
> me where to get a buildable system that works well enough, point me to
> some documentation, and maybe I can get some feel for it?

Hi,

There is a "stable" release here:
http://www.bazaar-ng.org/pkg/bzr-0.0.3.tgz

All you should need to do is unpack that and symlink bzr onto your path.

You can get the current bzr development tree, stored in itself, by
rsync:

  rsync -av ozlabs.org::mbp/bzr/dev ~/bzr.dev

Inside that directory you can run 'bzr info', 'bzr status --all', 'bzr
unknowns', 'bzr log', 'bzr ignored'.  

Repeated rsyncs will bring you up to date with what I've done -- and
will of course overwrite any local changes. 

If someone was going to development on this then the method would
typically be to have two copies of the tree, one tracking my version and
another for your own work -- much as with bk.  In your own tree, you can
do 'bzr add', 'bzr remove', 'bzr diff', 'bzr commit'.

At the moment all you can do is diff against the previous revision, or
manually diff the two trees, or use quilt, so it is just an archival
system not a full SCM system.  In the near future there will be some
code to extract the differences as changesets to be mailed off.

I have done a rough-as-guts import from bkcvs into this, and I can
advertise that when it's on a server that can handle the load. 

At a glance this looks very similar to git -- I can go into the
differences and why I did them the other way if you want.

-- 
Martin


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  7:44 ` Jan Hudec
@ 2005-04-08  6:14   ` Matthias Urlichs
  2005-04-09  1:01   ` Marcin Dalecki
  1 sibling, 0 replies; 201+ messages in thread
From: Matthias Urlichs @ 2005-04-08  6:14 UTC (permalink / raw)
  To: linux-kernel

Hi,   Jan Hudec schrub am Thu, 07 Apr 2005 09:44:08 +0200:

> 1) GNU Arch/Bazaar. They use the same archive format, simple, have the
>    concepts right. It may need some scripts or add ons. When Bazaar-NG is
>    ready, it will be able to read the GNU Arch/Bazaar archives so
>    switching should be easy.

Plus Bazaar has multiple implementations (C and Python). Plus arch can
trivially export single patches. Plus ... well, you get the idea. ;-)

Linus: Care to share your SCM feature requirement list?

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  5:56               ` Martin Pool
@ 2005-04-08  6:41                 ` Linus Torvalds
  2005-04-08  8:38                   ` Andrea Arcangeli
  2005-04-08 16:46                   ` Kernel SCM saga Catalin Marinas
  0 siblings, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08  6:41 UTC (permalink / raw)
  To: Martin Pool; +Cc: linux-kernel, David Lang



On Fri, 8 Apr 2005, Martin Pool wrote:
> 
> You can get the current bzr development tree, stored in itself, by
> rsync:

I was thinking more of an exportable kernel tree in addition to the tool.

The reason I mention that is just that I know several SCM's bog down under 
load horribly, so it actually matters what the size of the tree is.

And I'm absolutely _not_ asking you for the 60,000 changesets that are in
the BK tree, I'd be prfectly happy with a 2.6.12-rc2-based one for
testing.

I know I can import things myself, but the reason I ask is because I've
got several SCM's I should check out _and_ I've been spending the last two
days writing my own fallback system so that I don't get screwed if nothing
out there works right now. 

Which is why I'd love to hear from people who have actually used various 
SCM's with the kernel. There's bound to be people who have already tried.

I've gotten a lot of email of the kind "I love XYZ, you should try it 
out", but so far I've not seen anybody say "I've tracked the kernel with 
XYZ, and it does ..."

So, this is definitely not a "Martin Pool should do this" kind of issue: 
I'd like many people to test out many alternatives, to get a feel for 
where they are especially for a project the size of the kernel..

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  5:14       ` H. Peter Anvin
@ 2005-04-08  7:05         ` Rogan Dawes
  2005-04-08  7:21           ` Daniel Phillips
  0 siblings, 1 reply; 201+ messages in thread
From: Rogan Dawes @ 2005-04-08  7:05 UTC (permalink / raw)
  To: H. Peter Anvin, cw, linux-kernel

H. Peter Anvin wrote:
> Followup to:  <20050408050458.GB8720@taniwha.stupidest.org>
> By author:    Chris Wedgwood <cw@f00f.org>
> In newsgroup: linux.dev.kernel
> 
>>On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:
>>
>>
>>>Yes. The silly thing is, at least in my local tests it doesn't
>>>actually seem to be _doing_ anything while it's slow (there are no
>>>system calls except for a few memory allocations and
>>>de-allocations). It seems to have some exponential function on the
>>>number of pathnames involved etc.
>>
>>I see lots of brk calls changing the heap size, up, down, up, down,
>>over and over.
>>
>>This smells a bit like c++ new/delete behavior to me.
>>
> 
> 
> Hmmm... can glibc be clued in to do some hysteresis on the memory
> allocation?
> 
> 	-hpa

Take a look at 
http://www.linuxshowcase.org/2001/full_papers/ezolt/ezolt_html/

Abstract

GNU libc's default setting for malloc can cause a significant 
performance penalty for applications that use it extensively, such as 
Compaq's high performance extended math library, CXML.  The default 
malloc tuning can cause a significant number of minor page faults, and 
result in application performance of only half of the true potential. 
This paper describes how to remove the performance penalty using 
environmental variables and the method used to discover the cause of the 
malloc performance penalty.

Regards,

Rogan

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:42   ` Linus Torvalds
  2005-04-08  5:04     ` Chris Wedgwood
@ 2005-04-08  7:14     ` Andrea Arcangeli
  2005-04-08 12:02       ` Matthias Andree
  2005-04-08 14:26       ` Linus Torvalds
  2005-04-08  7:17     ` ross
                       ` (3 subsequent siblings)
  5 siblings, 2 replies; 201+ messages in thread
From: Andrea Arcangeli @ 2005-04-08  7:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List

On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:
> play with something _really_ nasty (but also very _very_ fast), take a
> look at kernel.org:/pub/linux/kernel/people/torvalds/.

Why not to use sql as backend instead of the tree of directories? That solves
userland journaling too (really one still has to be careful to know the
read-committed semantics of sql, which is not obvious stuff, but 99% of
common cases like this one just works safe automatically since all
inserts/delete/update are always atomic).

You can keep the design of your db exactly the same and even the command line
of your script the same, except you won't have deal with the implementation of
it anymore, and the end result may run even faster with proper btrees and you
won't have scalability issues if the directory of hashes fills up, and it'll
get userland journaling, live backups, runtime analyses of your queries with
genetic algorithms (pgsql 8 seems to have it) etc...

I seem to recall there's a way to do delayed commits too, so you won't
be sychronous, but you'll still have journaling. You clearly don't care
to do synchronous writes, all you care about is that the commit is
either committed completely or not committed at all (i.e. not an half
write of the patch that leaves your db corrupt).

Example:

CREATE TABLE patches (
	patch			BIGSERIAL	PRIMARY KEY,

	commiter_name		VARCHAR(32)	NOT NULL CHECK(commiter_name != ''),
	commiter_email		VARCHAR(32)	NOT NULL CHECK(commiter_email != ''),

	md5			CHAR(32)	NOT NULL CHECK(md5 != ''),
	len			INTEGER		NOT NULL CHECK(len > 0),
	UNIQUE(md5, len),

	payload			BYTEA		NOT NULL,

	timestamp		TIMESTAMP	NOT NULL
);
CREATE INDEX patches_md5_index ON patches (md5);
CREATE INDEX patches_timestamp_index ON patches (timestamp);

s/md5/sha1/, no difference.

This will automatically spawn fatal errors if there are hash collisions and it
enforces a bit of checking.

Then you need a few lines of python to insert/lookup. Example for psycopg2:

import pwd, os, socket

[..]

patch = {'commiter_name': pwd.getpwuid(os.getuid())[4],
         'commiter_email': pwd.getpwuid(os.getuid())[0] + '@' + socket.getfqdn(),
	 'md5' : md5.new(data).hexdigest(), 'len' : len(data),
	 payload : data, 'timestamp' : 'now'}
curs.execute("""INSERT INTO patches
                  VALUES (%(committer_name)s, %(commiter_email)s, 
	          %(md5)s, %(len)s, %(payload)s, %(timestamp)s)""", patch)

('now' will be evaluated by the sql server, who knows about the time too)

The speed I don't know for sure, but especially with lots of data the sql way
should at least not be significantly slower, pgsql scales with terabytes
without apparent problems (modulo the annoyance of running vacuum once per day
in cron, to avoid internal sequence number overflows after >4 giga
committs, and once per day the analyser too so it learns about your
usage patterns and can optimize the disk format for it).

For sure the python part isn't going to be noticeable, you can still write it
in C if you prefer (it'll clearly run faster if you want to run tons of
inserts for a benchmark), so then everything will run at bare-hardware
speed and there will be no time wasted interpreting bytecode (only the
sql commands have to be interpreted).

The backup should also be tiny (runtime size is going to be somewhat larger due
the more data structure it has, how much larger I don't know). I know for sure
this kind of setup works like a charm on ppc64 (32bit userland), and x86 (32bit
and 64bit userland).

monotone using sqlite sounds a good idea infact (IMHO they could use a real
dbms too, so that you also get parallelism and you could attach another app to
the backing store at the same time or you could run a live backup and to
get all other high end performance features).

If you feel this is too bloated feel free to ignore this email of course! If
instead you'd like to give this a spin, let me know and I can help to
set it up quick (either today or from Monday).

I also like quick dedicated solutions and I was about to write a backing
store with a tree of dirs + hashes similar to yours for a similar
problem, but I give it up while planning the userland journaling part
and even worse the userland fs locking with live backups, when a DBMS
gets everything right including live backups (and it provides async
interface too via sockets). OTOH for this usage journaling and locking
aren't a big issue since you may have the patch to hash by hand to find
any potentially half-corrupted bit after reboot and you probably run it
serially.

About your compression of the data, I don't think you want to do that.
The size of the live image isn't the issue, the issue is the size of the
_backups_ and you want to compress an huge thing (i.e. the tarball of
the cleartext, or the sql cleartext backup), not many tiny patches.

Comparing the size of the repositories isn't interesting, the
interesting thing is to compare the size of the backups.

BTW, this fixed compliation for my system.

--- ./Makefile.orig	2005-04-08 09:07:17.000000000 +0200
+++ ./Makefile	2005-04-08 08:52:35.000000000 +0200
@@ -8,7 +8,7 @@ all: $(PROG)
 install: $(PROG)
 	install $(PROG) $(HOME)/bin/
 
-LIBS= -lssl
+LIBS= -lssl -lz
 
 init-db: init-db.o
 

Thanks.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:42   ` Linus Torvalds
  2005-04-08  5:04     ` Chris Wedgwood
  2005-04-08  7:14     ` Andrea Arcangeli
@ 2005-04-08  7:17     ` ross
  2005-04-08 15:50       ` Linus Torvalds
  2005-04-08  7:34     ` Marcel Lanz
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 201+ messages in thread
From: ross @ 2005-04-08  7:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:
> In the meantime (and because monotone really _is_ that slow), here's a
> quick challenge for you, and any crazy hacker out there: if you want to
> play with something _really_ nasty (but also very _very_ fast), take a
> look at kernel.org:/pub/linux/kernel/people/torvalds/.

Interesting.  I like it, with one modification (see below).

> First one to send me the changelog tree of sparse-git (and a tool to
> commit and push/pull further changes) gets a gold star, and an honorable
> mention. I've put a hell of a lot of clues in there (*).

Here's a partial solution.  It does depend on a modified version of
cat-file that behaves like cat.  I found it easier to have cat-file
just dump the object indicated on stdout.  Trivial patch for that is included.

Two scripts are included:

1) makechlog.sh takes an object and generates a ChangeLog file
consisting of all the parents of the given object.  It's probably
breakable, but correctly outputs the sparse-git changes when run on
HEAD.  Handles multiple parents and breaks cycles.

This adds a line to each object "me <sha1>".  This lets a change
identify itself.

It takes 35 seconds to produce all the change history on my box.  It
produces a single file named "ChangeLog".

2) chkchlog.sh uses the "me" entries to verify that #1 didn't miss any
parents.  It's mostly to prove my solution reasonably correct ::-)

The patch is below, the scripts are attached, and everything is
available here:

http://lug.udel.edu/~ross/git/

Now to see what I come up with for commit, push, and pull...

-- 
Ross Vandegrift
ross@lug.udel.edu

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37


--- cat-file.orig.c     2005-04-08 01:53:54.000000000 -0400
+++ cat-file.c  2005-04-08 01:57:51.000000000 -0400
@@ -11,18 +11,11 @@
        char type[20];
        void *buf;
        unsigned long size;
-       char template[] = "temp_git_file_XXXXXX";
-       int fd;
 
        if (argc != 2 || get_sha1_hex(argv[1], sha1))
                usage("cat-file: cat-file <sha1>");
        buf = read_sha1_file(sha1, type, &size);
        if (!buf)
                exit(1);
-       fd = mkstemp(template);
-       if (fd < 0)
-               usage("unable to create tempfile");
-       if (write(fd, buf, size) != size)
-               strcpy(type, "bad");
-       printf("%s: %s\n", template, type);
+       printf ("%s", buf);
 }


[-- Attachment #2: makechlog.sh --]
[-- Type: application/x-sh, Size: 1023 bytes --]

[-- Attachment #3: chkchlog.sh --]
[-- Type: application/x-sh, Size: 208 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  7:05         ` Rogan Dawes
@ 2005-04-08  7:21           ` Daniel Phillips
  2005-04-08  7:49             ` H. Peter Anvin
  0 siblings, 1 reply; 201+ messages in thread
From: Daniel Phillips @ 2005-04-08  7:21 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: H. Peter Anvin, cw, linux-kernel

On Friday 08 April 2005 03:05, Rogan Dawes wrote:
> Take a look at
> http://www.linuxshowcase.org/2001/full_papers/ezolt/ezolt_html/
>
> Abstract
>
> GNU libc's default setting for malloc can cause a significant
> performance penalty for applications that use it extensively, such as
> Compaq's high performance extended math library, CXML.  The default
> malloc tuning can cause a significant number of minor page faults, and
> result in application performance of only half of the true potential.

This does not smell like an n*2 suckage, more like n^something suckage.  
Finding the elephant under the rug should not be hard.  Profile?

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:42   ` Linus Torvalds
                       ` (2 preceding siblings ...)
  2005-04-08  7:17     ` ross
@ 2005-04-08  7:34     ` Marcel Lanz
  2005-04-08  9:23       ` Geert Uytterhoeven
  2005-04-08  8:38     ` Matt Johnston
  2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
  5 siblings, 1 reply; 201+ messages in thread
From: Marcel Lanz @ 2005-04-08  7:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List

git on sarge

--- git-0.02/Makefile.orig      2005-04-07 23:06:19.000000000 +0200
+++ git-0.02/Makefile   2005-04-08 09:24:28.472672224 +0200
@@ -8,7 +8,7 @@ all: $(PROG)
 install: $(PROG)
        install $(PROG) $(HOME)/bin/
 
-LIBS= -lssl
+LIBS= -lssl -lz
 
 init-db: init-db.o
 

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  7:21           ` Daniel Phillips
@ 2005-04-08  7:49             ` H. Peter Anvin
  0 siblings, 0 replies; 201+ messages in thread
From: H. Peter Anvin @ 2005-04-08  7:49 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Rogan Dawes, cw, linux-kernel

Daniel Phillips wrote:
> On Friday 08 April 2005 03:05, Rogan Dawes wrote:
> 
>>Take a look at
>>http://www.linuxshowcase.org/2001/full_papers/ezolt/ezolt_html/
>>
>>Abstract
>>
>>GNU libc's default setting for malloc can cause a significant
>>performance penalty for applications that use it extensively, such as
>>Compaq's high performance extended math library, CXML.  The default
>>malloc tuning can cause a significant number of minor page faults, and
>>result in application performance of only half of the true potential.
> 
> 
> This does not smell like an n*2 suckage, more like n^something suckage.  
> Finding the elephant under the rug should not be hard.  Profile?
> 

Lack of hysteresis can do that, with large swats of memory constantly 
being claimed and returned to the system.  One way to implement 
hysteresis would be based on a decaying peak-based threshold; 
unfortunately for optimal performance that requires the C runtime to 
have a notion of time, and in extreme cases even be able to do 
asynchronous deallocation, but in reality one can probably assume that 
the rate of malloc/free is roughly constant over time.

	-hpa

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:42   ` Linus Torvalds
                       ` (3 preceding siblings ...)
  2005-04-08  7:34     ` Marcel Lanz
@ 2005-04-08  8:38     ` Matt Johnston
  2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
  5 siblings, 0 replies; 201+ messages in thread
From: Matt Johnston @ 2005-04-08  8:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List

On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:
> 
> On Thu, 7 Apr 2005, Chris Wedgwood wrote:
> > 
> > I'm playing with monotone right now.  Superficially it looks like it
> > has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
> > I mean glacial.  A heavily sedated sloth with no legs is probably
> > faster.
> 
> Yes. The silly thing is, at least in my local tests it doesn't actually
> seem to be _doing_ anything while it's slow (there are no system calls
> except for a few memory allocations and de-allocations). It seems to have
> some exponential function on the number of pathnames involved etc.
> 
> I'm hoping they can fix it, though. The basic notions do not sound wrong.

That is indeed correct wrt pathnames. The current head of
monotone is a lot better in this regard (the order of 2-3
minutes for "monotone import" on a 2.6 linux untar). The
basic problem is that in the last release (0.17), a huge
amount of sanity checking code was added to ensure that
inconsistent or generally bad revisions can never be
written/received/transmitted.

The focus is now on speeding that up - there's a _lot_ of
low hanging fruit for us to look at.

Matt


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  6:41                 ` Linus Torvalds
@ 2005-04-08  8:38                   ` Andrea Arcangeli
  2005-04-08 23:38                     ` Daniel Phillips
  2005-04-09  0:12                     ` Linus Torvalds
  2005-04-08 16:46                   ` Kernel SCM saga Catalin Marinas
  1 sibling, 2 replies; 201+ messages in thread
From: Andrea Arcangeli @ 2005-04-08  8:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Pool, linux-kernel, David Lang

On Thu, Apr 07, 2005 at 11:41:29PM -0700, Linus Torvalds wrote:
> I know I can import things myself, but the reason I ask is because I've
> got several SCM's I should check out _and_ I've been spending the last two
> days writing my own fallback system so that I don't get screwed if nothing
> out there works right now. 

I tend to like bzr too (and I tend to like too many things ;), but even
if the export of the data would be available it seems still too early in
development to be able to help you this week, it seems to miss any form
of network export too.

> I'd like many people to test out many alternatives, to get a feel for 
> where they are especially for a project the size of the kernel..

The huge number of changesets is the crucial point, there are good
distributed SCM already but they are apparently not efficient enough at
handling 60k changesets.

We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to
evaluate how well they scale.

OTOH if your git project already allows storing the data in there,
that looks nice ;). I don't yet fully understand how the algorithms of
the trees are meant to work (I only understand well the backing store
and I tend to prefer DBMS over tree of dirs with hashes). So I've no
idea how it can plug in well for a SCM replacement or how you want to
use it. It seems a kind of fully lockless thing where you can merge from
one tree to the other without locks and where you can make quick diffs.
It looks similar to a diff -ur of two hardlinked trees, except this one
can save a lot of bandwidth to copy with rsync (i.e.  hardlinks becomes
worthless after using rsync in the network, but hashes not). Clearly the
DBMS couldn't use the rsync binary to distribute the objects, but a
network protocol could do the same thing rsync does.

Thanks.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  7:34     ` Marcel Lanz
@ 2005-04-08  9:23       ` Geert Uytterhoeven
  0 siblings, 0 replies; 201+ messages in thread
From: Geert Uytterhoeven @ 2005-04-08  9:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Fri, 8 Apr 2005, Marcel Lanz wrote:
> git on sarge
> 
> --- git-0.02/Makefile.orig      2005-04-07 23:06:19.000000000 +0200
> +++ git-0.02/Makefile   2005-04-08 09:24:28.472672224 +0200
> @@ -8,7 +8,7 @@ all: $(PROG)
>  install: $(PROG)
>         install $(PROG) $(HOME)/bin/
>  
> -LIBS= -lssl
> +LIBS= -lssl -lz
>  
>  init-db: init-db.o
>  

I found a few more `issues' after adding `-O3 -Wall'.
Most are cosmetic, but the missing return value in remove_file_from_cache() is
a real bug. Hmm, upon closer look the caller uses its return value in a weird
way, so another bug may be hiding in add_file_to_cache().

Caveat: everything is untested, besides compilation ;-)

diff -purN git-0.02.orig/Makefile git-0.02/Makefile
--- git-0.02.orig/Makefile	2005-04-07 23:06:19.000000000 +0200
+++ git-0.02/Makefile	2005-04-08 11:02:02.000000000 +0200
@@ -1,4 +1,4 @@
-CFLAGS=-g
+CFLAGS=-g -O3 -Wall
 CC=gcc
 
 PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file
@@ -8,7 +8,7 @@ all: $(PROG)
 install: $(PROG)
 	install $(PROG) $(HOME)/bin/
 
-LIBS= -lssl
+LIBS= -lssl -lz
 
 init-db: init-db.o
 
diff -purN git-0.02.orig/cat-file.c git-0.02/cat-file.c
--- git-0.02.orig/cat-file.c	2005-04-07 23:15:17.000000000 +0200
+++ git-0.02/cat-file.c	2005-04-08 11:07:28.000000000 +0200
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 int main(int argc, char **argv)
 {
 	unsigned char sha1[20];
@@ -25,4 +27,5 @@ int main(int argc, char **argv)
 	if (write(fd, buf, size) != size)
 		strcpy(type, "bad");
 	printf("%s: %s\n", template, type);
+	exit(0);
 }
diff -purN git-0.02.orig/commit-tree.c git-0.02/commit-tree.c
--- git-0.02.orig/commit-tree.c	2005-04-07 23:15:17.000000000 +0200
+++ git-0.02/commit-tree.c	2005-04-08 11:06:08.000000000 +0200
@@ -6,6 +6,7 @@
 #include "cache.h"
 
 #include <pwd.h>
+#include <string.h>
 #include <time.h>
 
 #define BLOCKING (1ul << 14)
diff -purN git-0.02.orig/init-db.c git-0.02/init-db.c
--- git-0.02.orig/init-db.c	2005-04-07 23:15:17.000000000 +0200
+++ git-0.02/init-db.c	2005-04-08 11:07:33.000000000 +0200
@@ -5,10 +5,12 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 int main(int argc, char **argv)
 {
 	char *sha1_dir = getenv(DB_ENVIRONMENT), *path;
-	int len, i, fd;
+	int len, i;
 
 	if (mkdir(".dircache", 0700) < 0) {
 		perror("unable to create .dircache");
@@ -25,7 +27,7 @@ int main(int argc, char **argv)
 	if (sha1_dir) {
 		struct stat st;
 		if (!stat(sha1_dir, &st) < 0 && S_ISDIR(st.st_mode))
-			return;
+			exit(1);
 		fprintf(stderr, "DB_ENVIRONMENT set to bad directory %s: ", sha1_dir);
 	}
 
diff -purN git-0.02.orig/read-cache.c git-0.02/read-cache.c
--- git-0.02.orig/read-cache.c	2005-04-07 23:23:43.000000000 +0200
+++ git-0.02/read-cache.c	2005-04-08 11:07:37.000000000 +0200
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 const char *sha1_file_directory = NULL;
 struct cache_entry **active_cache = NULL;
 unsigned int active_nr = 0, active_alloc = 0;
@@ -89,7 +91,7 @@ void * read_sha1_file(unsigned char *sha
 	z_stream stream;
 	char buffer[8192];
 	struct stat st;
-	int i, fd, ret, bytes;
+	int fd, ret, bytes;
 	void *map, *buf;
 	char *filename = sha1_file_name(sha1);
 
@@ -173,7 +175,7 @@ int write_sha1_file(char *buf, unsigned 
 int write_sha1_buffer(unsigned char *sha1, void *buf, unsigned int size)
 {
 	char *filename = sha1_file_name(sha1);
-	int i, fd;
+	int fd;
 
 	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
 	if (fd < 0)
diff -purN git-0.02.orig/read-tree.c git-0.02/read-tree.c
--- git-0.02.orig/read-tree.c	2005-04-08 04:58:44.000000000 +0200
+++ git-0.02/read-tree.c	2005-04-08 11:07:41.000000000 +0200
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 static void create_directories(const char *path)
 {
 	int len = strlen(path);
@@ -72,7 +74,6 @@ static int unpack(unsigned char *sha1)
 
 int main(int argc, char **argv)
 {
-	int fd;
 	unsigned char sha1[20];
 
 	if (argc != 2)
diff -purN git-0.02.orig/show-diff.c git-0.02/show-diff.c
--- git-0.02.orig/show-diff.c	2005-04-07 23:15:17.000000000 +0200
+++ git-0.02/show-diff.c	2005-04-08 11:07:44.000000000 +0200
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 #define MTIME_CHANGED	0x0001
 #define CTIME_CHANGED	0x0002
 #define OWNER_CHANGED	0x0004
@@ -60,7 +62,6 @@ int main(int argc, char **argv)
 		struct stat st;
 		struct cache_entry *ce = active_cache[i];
 		int n, changed;
-		unsigned int mode;
 		unsigned long size;
 		char type[20];
 		void *new;
diff -purN git-0.02.orig/update-cache.c git-0.02/update-cache.c
--- git-0.02.orig/update-cache.c	2005-04-07 23:15:17.000000000 +0200
+++ git-0.02/update-cache.c	2005-04-08 11:08:55.000000000 +0200
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 static int cache_name_compare(const char *name1, int len1, const char *name2, int len2)
 {
 	int len = len1 < len2 ? len1 : len2;
@@ -50,6 +52,7 @@ static int remove_file_from_cache(char *
 		if (pos < active_nr)
 			memmove(active_cache + pos, active_cache + pos + 1, (active_nr - pos - 1) * sizeof(struct cache_entry *));
 	}
+	return 0;
 }
 
 static int add_cache_entry(struct cache_entry *ce)
@@ -250,4 +253,5 @@ int main(int argc, char **argv)
 		return 0;
 out:
 	unlink(".dircache/index.lock");
+	exit(0);
 }
diff -purN git-0.02.orig/write-tree.c git-0.02/write-tree.c
--- git-0.02.orig/write-tree.c	2005-04-07 23:15:17.000000000 +0200
+++ git-0.02/write-tree.c	2005-04-08 11:07:51.000000000 +0200
@@ -5,6 +5,8 @@
  */
 #include "cache.h"
 
+#include <string.h>
+
 static int check_valid_sha1(unsigned char *sha1)
 {
 	char *filename = sha1_file_name(sha1);
@@ -31,7 +33,7 @@ static int prepend_integer(char *buffer,
 
 int main(int argc, char **argv)
 {
-	unsigned long size, offset, val;
+	unsigned long size, offset;
 	int i, entries = read_cache();
 	char *buffer;
 

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  4:13 ` Chris Wedgwood
  2005-04-08  4:42   ` Linus Torvalds
@ 2005-04-08 11:42   ` Catalin Marinas
  1 sibling, 0 replies; 201+ messages in thread
From: Catalin Marinas @ 2005-04-08 11:42 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Kernel Mailing List

Chris Wedgwood <cw@f00f.org> wrote:
> I'm playing with monotone right now.  Superficially it looks like it
> has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
> I mean glacial.  A heavily sedated sloth with no legs is probably
> faster.

I tried some time ago to import the BKCVS revisions since Linux 2.6.9
into a monotone-0.16 repository. I later tried to upgrade the database
(repository) to monotone version 0.17. The result - converting ~3500
revisions would have taken more than *one year*, fact confirmed by the
monotone developers. The bottleneck seemed to be the big size of the
manifest (which stores the file names and the corresponding SHA1
values) and all the validation performed when converting. The
solution, unsafe, is to disable the revision checks in monotone but
you can end up with an inconsistent repository (haven't tried this).

-- 
Catalin


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  7:14     ` Andrea Arcangeli
@ 2005-04-08 12:02       ` Matthias Andree
  2005-04-08 12:21         ` Florian Weimer
  2005-04-08 14:26       ` Linus Torvalds
  1 sibling, 1 reply; 201+ messages in thread
From: Matthias Andree @ 2005-04-08 12:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Chris Wedgwood, Kernel Mailing List

Andrea Arcangeli schrieb am 2005-04-08:

> On Thu, Apr 07, 2005 at 09:42:04PM -0700, Linus Torvalds wrote:
> > play with something _really_ nasty (but also very _very_ fast), take a
> > look at kernel.org:/pub/linux/kernel/people/torvalds/.
> 
> Why not to use sql as backend instead of the tree of directories? That solves
> userland journaling too (really one still has to be careful to know the
> read-committed semantics of sql, which is not obvious stuff, but 99% of
> common cases like this one just works safe automatically since all
> inserts/delete/update are always atomic).
> 
> You can keep the design of your db exactly the same and even the command line
> of your script the same, except you won't have deal with the implementation of
> it anymore, and the end result may run even faster with proper btrees and you
> won't have scalability issues if the directory of hashes fills up, and it'll
> get userland journaling, live backups, runtime analyses of your queries with
> genetic algorithms (pgsql 8 seems to have it) etc...
> 
> I seem to recall there's a way to do delayed commits too, so you won't
> be sychronous, but you'll still have journaling. You clearly don't care
> to do synchronous writes, all you care about is that the commit is
> either committed completely or not committed at all (i.e. not an half
> write of the patch that leaves your db corrupt).
> 
> Example:
> 
> CREATE TABLE patches (
> 	patch			BIGSERIAL	PRIMARY KEY,
> 
> 	commiter_name		VARCHAR(32)	NOT NULL CHECK(commiter_name != ''),
> 	commiter_email		VARCHAR(32)	NOT NULL CHECK(commiter_email != ''),

The length is too optimistic and insufficient to import the current BK
stuff.  I'd vote for 64 or at least 48 for each, although 48 is going to
be a tight fit.  It costs a bit but considering the expected payload
size it's irrelevant.

Committer (double t) email is up to 36 characters at the moment and the
name up to 43 characters when analyzing the shortlog script with this
little Perl snippet:

------------------------------------------------------------------------
while (($k, $v) = each %addresses) {
    $lk = length $k;
    $lv = length $v;
    if ($lk > $mk) { $mk = $lk; }
    if ($lv > $mv) { $mv = $lv; }
}
print "max key len $mk, max val len $mv\n";
------------------------------------------------------------------------

which prints: (key is the email, val the name)

max key len 43, max val len 36

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 12:02       ` Matthias Andree
@ 2005-04-08 12:21         ` Florian Weimer
  0 siblings, 0 replies; 201+ messages in thread
From: Florian Weimer @ 2005-04-08 12:21 UTC (permalink / raw)
  To: Kernel Mailing List

* Matthias Andree:

>> 	commiter_name		VARCHAR(32)	NOT NULL CHECK(commiter_name != ''),
>> 	commiter_email		VARCHAR(32)	NOT NULL CHECK(commiter_email != ''),
>
> The length is too optimistic and insufficient to import the current BK
> stuff.  I'd vote for 64 or at least 48 for each, although 48 is going to
> be a tight fit.  It costs a bit but considering the expected payload
> size it's irrelevant.

You should also check your database documentation if VARCHAR(n) is
actually implemented implemented in the same way as TEXT (or what the
unbounded string type is called), plus an additional length check.  It
doesn't make much sense to use VARCHAR if there isn't a performance
(or disk space) benefit, IMHO, especially for such data.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  7:14     ` Andrea Arcangeli
  2005-04-08 12:02       ` Matthias Andree
@ 2005-04-08 14:26       ` Linus Torvalds
  2005-04-08 16:15         ` Matthias-Christian Ott
  1 sibling, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 14:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Chris Wedgwood, Kernel Mailing List



On Fri, 8 Apr 2005, Andrea Arcangeli wrote:
> 
> Why not to use sql as backend instead of the tree of directories?

Because it sucks? 

I can come up with millions of ways to slow things down on my own. Please 
come up with ways to speed things up instead.

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  7:17     ` ross
@ 2005-04-08 15:50       ` Linus Torvalds
  2005-04-09  2:53         ` Petr Baudis
  2005-04-09 15:50         ` Paul Jackson
  0 siblings, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 15:50 UTC (permalink / raw)
  To: ross; +Cc: Chris Wedgwood, Kernel Mailing List



On Fri, 8 Apr 2005 ross@jose.lug.udel.edu wrote:
> 
> Here's a partial solution.  It does depend on a modified version of
> cat-file that behaves like cat.  I found it easier to have cat-file
> just dump the object indicated on stdout.  Trivial patch for that is included.

Your trivial patch is trivially incorrect, though. First off, some files
may be binary (and definitely are - the "tree" type object contains
pathnames, and in order to avoid having to worry about special characters
they are NUL-terminated), and your modified "cat-file" breaks that.  

Secondly, it doesn't check or print the tag.

That said, I think I agree with your concern, and cat-file should not use 
a temp-file. I'll fix it, but I'll also make it verify the tag (so you'd 
now have to know the tag in advance if you want to cat the data).

Something like

	cat-file -t <sha1>		# output the tag
	cat-file <tag> <sha1>		# output the data

or similar. Easy enough. That way you can do

	torvalds@ppc970:~/git> ./cat-file -t `cat .dircache/HEAD `
	commit

and

	torvalds@ppc970:~/git> ./cat-file commit `cat .dircache/HEAD `

	tree ca30cdf8df2f31545cc1f2c1be62619111b6f6aa
	parent c2474b336d7a96fb4e03e65d229bcddc62b244fc
	author Linus Torvalds <torvalds@ppc970.osdl.org> Fri Apr  8 08:16:38 2005
	committer Linus Torvalds <torvalds@ppc970.osdl.org> Fri Apr  8 08:16:38 2005

	Make "cat-file" output the file contents to stdout.

	New syntax: "cat-file -t <sha1>" shows the tag, while "cat-file <tag> <sha1>"
	outputs the file contents after checking that the supplied tag matches.

I'll rsync the .dircache directory to kernel.org. You'll need to update 
your scripts.

> Now to see what I come up with for commit, push, and pull...

A "commit" (*) looks roughly like this:

	# check with "show-diff" what has changed, and check if
	# you need to add any files..

	update-cache <list of files that have been changed/added/deleted>

	# check with "show-diff" that it all looks right

	oldhead=$(cat .dircache/HEAD)
	newhead=$(commit-tree $(write-tree) -p $oldhead < commit-message)

	# update the head information
	if [ "$newhead" != "" ] ; then echo $newhead > .dircache/HEAD; fi

(*) I call this "commit", but it's really something much simpler. It's
really just a "I now have <this directory state>, I got here from
<collection of previous directory states> and the reason was <reason>". 

The "push" I use is

	rsync -avz --exclude index .dircache/ <destination-dir>

and you can pull the same way, except when you pull you should save _your_
HEAD file first (and then you're screed. There's no way to merge. If
you've made changes and committed them, your changes are still there, but
they are now on a different HEAD than the new one).

That, btw, is kind of the design. "git" really doesn't care about things
like merges. You can use _any_ SCM to do a merge. What "git" does is track
directory state (and how you got to that state), and nothing else. It
doesn't merge, it doesn't really do a whole lot of _anything_.

So when you "pull" or "push" on a git archive, you get the "union" of all
directory states in the destination. The HEAD thing is _one_ pointer into 
the "sea of directory states", but you really have to use something else 
to merge two directory states together. 

			Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 14:26       ` Linus Torvalds
@ 2005-04-08 16:15         ` Matthias-Christian Ott
  2005-04-08 17:14           ` Linus Torvalds
  2005-04-09  1:00           ` Marcin Dalecki
  0 siblings, 2 replies; 201+ messages in thread
From: Matthias-Christian Ott @ 2005-04-08 16:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List

Linus Torvalds wrote:

>On Fri, 8 Apr 2005, Andrea Arcangeli wrote:
>  
>
>>Why not to use sql as backend instead of the tree of directories?
>>    
>>
>
>Because it sucks? 
>
>I can come up with millions of ways to slow things down on my own. Please 
>come up with ways to speed things up instead.
>
>		Linus
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>  
>
SQL Databases like SQLite aren't slow.
But maybe a Berkeley Database v.4 is a better solution.

Matthias-Christian Ott

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  6:41                 ` Linus Torvalds
  2005-04-08  8:38                   ` Andrea Arcangeli
@ 2005-04-08 16:46                   ` Catalin Marinas
  1 sibling, 0 replies; 201+ messages in thread
From: Catalin Marinas @ 2005-04-08 16:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Pool, linux-kernel, David Lang

Linus Torvalds <torvalds@osdl.org> wrote:
> Which is why I'd love to hear from people who have actually used various 
> SCM's with the kernel. There's bound to be people who have already
> tried.

I (successfully) tried GNU Arch with the Linux kernel. I mirrored all
the BKCVS changesets since Linux 2.6.9 (5300+ changesets) using this
script:
http://wiki.gnuarch.org/BKCVS_20to_20Arch_20Script_20for_20Linux_20Kernel

My repository size is 1.1GB but this is because the script I use
creates a snapshot (i.e. a full tarball) of every main and -rc
release. For each individual changeset, an arch repository has a
patch-xxx directory with a compressed tarball containing the patch, a
log file and a checksum file.

GNU Arch may have some annoying things (file naming, long commands,
harder to get started, imposed version naming) and I won't try to
advocate them but, for me, it looked like the best (free) option
available regarding both features and speed. Being changeset oriented
also has some advantages from my point of view. Being distributed
means that you can create a branch on your local repository from a
tree stored on a (read-only) remote repository (hosted on an ftp/http
server).

I can't compare it with BK since I haven't used it.

The way I use it:

- a main repository tracking all the changes to the bk-head,
  linux--main--2.6 (for those that never read/heard about arch, a tree
  name has the form "name--branch--version")
- my main branch from the mainline tree, linux-arm--main--2.6, that
  was integrating my patches and was periodically merging the latest
  changes in linux--main--2.6
- different linux-arm--platformX--2.6 or linux-arm--deviceX--2.6 trees
  that were eventually merged into the linux-arm--main--2.6 tree

The main merge algorithm is called star-merge and does a three-way
merge between the local tree, the remote one and the common ancestor
of these. Cherry picking is also supported for those that like it (I
found it very useful if, for example, I fix a general bug in a branch
that should be integrated in the main tree but the branch is not yet
ready for inclusion).

All the standard commands like commit, diff, status etc. are supported
by arch. A useful command is "missing" which shows what changes are
present in a tree and not in the current one. It is handy to see a
summary of the remote changes before doing a merge (and faster than a
full diff). It also supports file/directory renaming.

To speed things up, arch uses a revision library with a directory for
every revision, the files being hard-linked between revisions to save
space. You can also hard-link the working tree to the revision library
(which speeds the tree diff operation) but you need to make sure that
your editor renames the original file before saving a copy.

Having snapshots might take space but they are useful for both fast
getting a revision and creating a revision in the library.

A diff command takes usually around 1 min (on a P4 at 2.5GHz with IDE
drives) if the current revision is in the library. The tree diff is
the main time consuming operation when committing small changes. If
the revision is not in the library, it will try to create it by
hard-linking with a previous one and applying the corresponding
patches (later version I think can reverse-apply patches from newer
revisions).

The merge operation might take some time (minutes, even 10-20 minutes
for 1000+ changesets) depending on the number of changesets and
whether the revisions are already in the revision library. You can
specify a three-way merge that places conflict markers in the file
(like diff3 or cvs) or a two-way merge which is equivalent to applying
a patch (if you prefer a two-way merge, the "replay" command is
actually the fastest, it takes ~2 seconds to apply a small changeset
and doesn't need go to the revision library). Once a merge operation
completes, you would need to fix the conflicts and commit the
changes. All the logs are preserved but the newly merged individual
changes are seen as a single commit in the local tree.

In the way I use it (with a linux--main--2.6 tree similar to bk-head)
I think arch would get slow with time as changesets accumulate. The
way its developers advise to be used is to work, for example, on a
linux--main--2.6.12 tree for preparing this release and, once it is
ready, seal it (commit --seal). Further commits need to have a --fix
option and they should mainly be bug fixes. At this point you can
branch the linux--main--2.6.13 and start working on it. This new tree
can easily merge the bug fixes applied to the previous version. Arch
developers also recommend to use a new repository every year,
especially if there are many changesets.

A problem I found, even if the library revisions are hard-linked, they
still take a lot of space and should be cleaned periodically (a cron
script that checks the last access to them is available).

By default, arch also complains (with exit) about unknown files in the
working tree. Its developer(s) believe that the compilation should be
done in a different directory. I didn't find this a problem since I
use the same tree to compile for several platforms. Anyway, it can be
configured to ignore them, based on regexp.

I also tried monotone and darcs (since these two, unlike svn, can do
proper merging and preserve the merge history) but arch was by far the
fastest (CVS/RCS are hard to be bitten on speed).

Unfortunately, I can't make my repository public because of IT desk
issues but let me know if you'd like me to benchmark different
operations (or if you'd like a simple list of commands to create your
own).

Hope you find this useful.

-- 
Catalin


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 16:15         ` Matthias-Christian Ott
@ 2005-04-08 17:14           ` Linus Torvalds
  2005-04-08 17:15             ` Chris Wedgwood
                               ` (3 more replies)
  2005-04-09  1:00           ` Marcin Dalecki
  1 sibling, 4 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 17:14 UTC (permalink / raw)
  To: Matthias-Christian Ott
  Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List



On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>
> SQL Databases like SQLite aren't slow.

After applying a patch, I can do a complete "show-diff" on the kernel tree
to see the effect of it in about 0.15 seconds.

Also, I can use rsync to efficiently replicate my database without having 
to re-send the whole crap - it only needs to send the new stuff.

You do that with an sql database, and I'll be impressed.

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:14           ` Linus Torvalds
@ 2005-04-08 17:15             ` Chris Wedgwood
  2005-04-08 17:46               ` Linus Torvalds
  2005-04-08 17:25             ` Matthias-Christian Ott
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08 17:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List

On Fri, Apr 08, 2005 at 10:14:22AM -0700, Linus Torvalds wrote:

> After applying a patch, I can do a complete "show-diff" on the kernel tree
> to see the effect of it in about 0.15 seconds.

How does that work?  Can you stat the entire tree in that time?  I
measure it as being higher than that.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 17:54       ` Daniel Phillips
  2005-04-07 18:13         ` Dmitry Yusupov
@ 2005-04-08 17:24         ` Jon Masters
  2005-04-08 22:05           ` Daniel Phillips
  1 sibling, 1 reply; 201+ messages in thread
From: Jon Masters @ 2005-04-08 17:24 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List

On Apr 7, 2005 6:54 PM, Daniel Phillips <phillips@istop.com> wrote:

> So I propose that everybody who is interested, pick one of the above projects
> and join it, to help get it to the point of being able to losslessly import
> the version graph.  Given the importance, I think that _all_ viable
> alternatives need to be worked on in parallel, so that two months from now we
> have several viable options.

What about BitKeeper licensing constraints on such involvement?

Jon.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:14           ` Linus Torvalds
  2005-04-08 17:15             ` Chris Wedgwood
@ 2005-04-08 17:25             ` Matthias-Christian Ott
  2005-04-08 18:14               ` Linus Torvalds
  2005-04-08 17:35             ` Jeff Garzik
  2005-04-09  1:04             ` Marcin Dalecki
  3 siblings, 1 reply; 201+ messages in thread
From: Matthias-Christian Ott @ 2005-04-08 17:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List

Linus Torvalds wrote:

>On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>  
>
>>SQL Databases like SQLite aren't slow.
>>    
>>
>
>After applying a patch, I can do a complete "show-diff" on the kernel tree
>to see the effect of it in about 0.15 seconds.
>
>Also, I can use rsync to efficiently replicate my database without having 
>to re-send the whole crap - it only needs to send the new stuff.
>
>You do that with an sql database, and I'll be impressed.
>
>		Linus
>
>  
>
Ok, but if you want to search for information in such big text files it 
slow, because you do linear search -- most datases use faster search 
algorithms like hashing and if you have multiple files (I don't if 
you're system uses multiple files (like bitkeeper) or not) which need a 
system call to be opened this will be very slow, because system calls 
itself are slow. And using rsync is also possible because most databases 
store their information as plain text with meta information.

Mattthias-Christian Ott

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:14           ` Linus Torvalds
  2005-04-08 17:15             ` Chris Wedgwood
  2005-04-08 17:25             ` Matthias-Christian Ott
@ 2005-04-08 17:35             ` Jeff Garzik
  2005-04-08 18:47               ` Linus Torvalds
  2005-04-09  1:04             ` Marcin Dalecki
  3 siblings, 1 reply; 201+ messages in thread
From: Jeff Garzik @ 2005-04-08 17:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Chris Wedgwood,
	Kernel Mailing List

Linus Torvalds wrote:
> 
> On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
> 
>>SQL Databases like SQLite aren't slow.
> 
> 
> After applying a patch, I can do a complete "show-diff" on the kernel tree
> to see the effect of it in about 0.15 seconds.
> 
> Also, I can use rsync to efficiently replicate my database without having 
> to re-send the whole crap - it only needs to send the new stuff.
> 
> You do that with an sql database, and I'll be impressed.

Well...  it took me over 30 seconds just to 'rm -rf' the unpacked 
tarballs of git and sparse-git, over my LAN's NFS.

Granted that this sort of stuff works well with (a) rsync and (b) 
hardlinks, but it's still punishment on the i/dcache.

	Jeff




^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:15             ` Chris Wedgwood
@ 2005-04-08 17:46               ` Linus Torvalds
  2005-04-08 18:05                 ` Chris Wedgwood
  0 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 17:46 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List



On Fri, 8 Apr 2005, Chris Wedgwood wrote:
> On Fri, Apr 08, 2005 at 10:14:22AM -0700, Linus Torvalds wrote:
> 
> > After applying a patch, I can do a complete "show-diff" on the kernel tree
> > to see the effect of it in about 0.15 seconds.
> 
> How does that work?  Can you stat the entire tree in that time?  I
> measure it as being higher than that.

I can indeed stat the entire tree in that time (assuming it's in memory,
of course, but my kernel trees are _always_ in memory ;), but in order to
do so, I have to be good at finding the names to stat.

In particular, you have to be extremely careful. You need to make sure 
that you don't stat anything you don't need to. We're not talking just 
blindly recursing the tree here, and that's exactly the point. You have to 
know what you're doing, but the whole point of keeping track of directory 
contents is that dammit, that's your whole job.

Anybody who can't list the files they work on _instantly_ is doing 
something damn wrong. 

"git" is really trivial, written in four days. Most of that was not 
actually spent coding, but thinking about the data structures.

			Linus



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:46               ` Linus Torvalds
@ 2005-04-08 18:05                 ` Chris Wedgwood
  2005-04-08 19:03                   ` Linus Torvalds
  0 siblings, 1 reply; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List

On Fri, Apr 08, 2005 at 10:46:40AM -0700, Linus Torvalds wrote:

> I can indeed stat the entire tree in that time (assuming it's in memory,
> of course, but my kernel trees are _always_ in memory ;), but in order to
> do so, I have to be good at finding the names to stat.

<pause ... tapity tap>

I just tested this (I wanted to be sure you didn't have some 47GHz
LiHe cooled Xeon or something).

On my somewhat slowish machine[1] (by today's standards anyhow) I can
stat a checked out tree (ie. the source files and not SCM files) in
about 0.10s it seems and 0.26s for an entire tree with BK files in it.

> In particular, you have to be extremely careful. You need to make
> sure that you don't stat anything you don't need to.

Actually, I could probably make this *much* still faster with a
caveat.  Given that my editor when I write a file will write a
temporary file and rename it, for files in directories where nlink==2
I can check chat first and skip the stat of the individual files.

And I guess if I was bored I could have my editor or some daemon
sitting in the background intelligently using dnotify to have this
information on-hand more or less instantly.  For this purpose though
that seems like a lot of effort for no real gain right now.

> Anybody who can't list the files they work on _instantly_ is doing
> something damn wrong.

Well, I do like to do "bk sfiles -x" fairly often.  But then again I
can stat dirs and compare against a cache to make that fast too.


[1] Dual AthlonMP 2200

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:25             ` Matthias-Christian Ott
@ 2005-04-08 18:14               ` Linus Torvalds
  2005-04-08 18:28                 ` Jon Smirl
                                   ` (2 more replies)
  0 siblings, 3 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 18:14 UTC (permalink / raw)
  To: Matthias-Christian Ott
  Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List



On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>
> Ok, but if you want to search for information in such big text files it 
> slow, because you do linear search 

No I don't. I don't search for _anything_. I have my own
content-addressable filesystem, and I guarantee you that it's faster than
mysql, because it depends on the kernel doing the right thing (which it
does).

I never do a single "readdir". It's all direct data lookup, no "searching"  
anywhere.

Databases aren't magical. Quite the reverse. They easily end up being
_slower_ than doing it by hand, simply because they have to solve a much
more generic issue. If you design your data structures and abstractions
right, a database is pretty much guaranteed to only incur overhead.

The advantage of a database is the abstraction and management it gives 
you. But I did my own special-case abstraction in git.

Yeah, I bet "git" might suck if your OS sucks. I definitely depend on name
caching at an OS level so that I know that opening a file is fast. In
other words, there _is_ an indexing and caching database in there, and
it's called the Linux VFS layer and the dentry cache.

The proof is in the pudding. git is designed for _one_ thing, and one 
thing only: tracking a series of directory states in a way that can be 
replicated. It's very very fast at that. A database with a more flexible 
abstraction migt be faster at other things, but the fact is, you do take a 
hit.

The problem with databases are:

 - they are damn hard to just replicate wildly and without control. The 
   database backing file inherently has a lot of internal state. You may 
   be able to "just copy it", but you have to copy the whole damn thing.

   In "git", the data is all there in immutable blobs that you can just 
   rsync. In fact, you don't even need rsync: you can just look at the 
   filenames, and anything new you copy. No need for any fancy "read the 
   files to see that they match". They _will_ match, or you can tell 
   immediately that a file is corrupt.

   Look at this:

	torvalds@ppc970:~/git> sha1sum .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 
	e7bfaadd5d2331123663a8f14a26604a3cdcb678  .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678

   see a pattern anywhere? Imagine that you know the list of files you 
   have, and the list of files the other side has (never mind the 
   contents), and how _easy_ it is to synchronize. Without ever having to 
   even read the remote files that you know you already have.

   How do you replicate your database incrementally? I've given you enough 
   clues to do it for "git" in probably five lines of perl.

 - they tend to take time to set up and prime.

   In contrast, the filesystem is always there. Sure, you effectively have 
   to "prime" that one too, but the thing is, if your OS is doing its job, 
   you basically only need to prime it once per reboot. No need to prime 
   it for each process you start or play games with connecting to servers 
   etc. It's just there. Always. 

So if you think of the filesystem as a database, you're all set. If you 
design your data structure so that there is just one index, you make that 
the name, and the kernel will do all the O(1) hashed lookups etc for you. 
You do have to limit yourself in some ways. 

Oh, and you have to be willing to waste diskspace. "git" is _not_
space-efficient. The good news is that it is cache-friendly, since it is
also designed to never actually look at any old files that aren't part of
the immediate history, so while it wastes diskspace, it does not waste the
(much more precious) page cache.

IOW big file-sets are always bad for performance if you need to traverse
them to get anywhere, but if you index things so that you only read the 
stuff you really really _need_ (which git does), big file-sets are just an 
excuse to buy a new disk ;)

			Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:14               ` Linus Torvalds
@ 2005-04-08 18:28                 ` Jon Smirl
  2005-04-08 18:58                   ` Florian Weimer
  2005-04-09  1:11                   ` Marcin Dalecki
  2005-04-08 19:16                 ` Matthias-Christian Ott
  2005-04-09  1:09                 ` Marcin Dalecki
  2 siblings, 2 replies; 201+ messages in thread
From: Jon Smirl @ 2005-04-08 18:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Chris Wedgwood,
	Kernel Mailing List

On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote:
>    How do you replicate your database incrementally? I've given you enough
>    clues to do it for "git" in probably five lines of perl.

Efficient database replication is achieved by copying the transaction
logs and then replaying them. Most mid to high end databases support
this. You only need to copy the parts of the logs that you don't
already have.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:35             ` Jeff Garzik
@ 2005-04-08 18:47               ` Linus Torvalds
  2005-04-08 18:56                 ` Chris Wedgwood
  2005-04-09 15:40                 ` Paul Jackson
  0 siblings, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 18:47 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Chris Wedgwood,
	Kernel Mailing List



On Fri, 8 Apr 2005, Jeff Garzik wrote:
> 
> Well...  it took me over 30 seconds just to 'rm -rf' the unpacked 
> tarballs of git and sparse-git, over my LAN's NFS.

Don't use NFS for development. It sucks for BK too. 

That said, normal _use_ should actually be pretty efficient even over NFS.  
It will "stat" a hell of a lot of files to do the "show-diff", but that
part you really can't avoid unless you depend on all the tools marking
their changes somewhere. Which BK does, actually, but that was pretty
painful, and means that bk needed to re-implement all the normal ops (ie
"bk patch").

What's also nice is that exactly because "git" depends on totally 
immutable files, they actually cache very well over NFS. Even if you were 
to share a database across machines (which is _not_ what git is meant to 
do, but it's certainly possible). 

So I actually suspect that if you actually _work_ with a tree in "git", 
you will find performance very good indeed. The fact that it creates a 
number of files when you pull in a new repository is a different thing.

> Granted that this sort of stuff works well with (a) rsync and (b) 
> hardlinks, but it's still punishment on the i/dcache.

Actually, it's not. Not once it is set up. Exactly because "git" doesn't
actually access those files unless it literally needs the data in one
file, and then it's always set up so that it needs either none or _all_ of
the file. There is no data sharing anywhere, so you are never in the
situation where it needs "ten bytes from file X" and "25 bytes from file
Y".

For example, if you don't have any changes in your tree, there is exactly 
_one_ file that a "show-diff" will read: the .dircache/index file. That's 
it. After that, it will "stat()" exactly the files you are tracking, and 
nothing more. It will not touch any internal "git" data AT ALL. That 
"stat" will be somewhat expensive unless your client caches stat data too, 
but that's it.

And if it turns out that you have changed a file (or even just touched it, 
so that the data is the same, but the index file can no longer guarantee 
it with just a single "stat()"), then git will open have exactly _one_ 
file (no searching, no messing around), which contains absolutely nothing 
except for the compressed (and SHA1-signed) old contents of the file. It 
obviously _has_ to do that, because in order to know whether you've 
changed it, it needs to now compare it to the original.

IOW, "git" will literally touch the minimum IO necessary, and absolutely 
minimum cache-footprint.

The fact is, when tracking the 17,000 files in the kernel directory, most
of them are never actually changed. They literally are "free". They aren't
brought into the cache by "git" - not the file itself, not the backing
store. You set up the index file once, and you never ever touch them
again.  You could put the sha1 files on a tape, for all git cares.

The one exception obviously being when you actually instantiate the git 
archive for the first time (or when you throw it away). At that time you 
do touch all of the data, but that should be the only time.

THAT is what git is good at. It optimized for the "not a lot of changes"  
things, and pretty much all the operations are O(n) in the "size of
change", not in "size of repo".

That includes even things like "give me the diff between the top of tree
and the tree 10 days ago". If you know what your head was 10 days ago,
"git" will open exactly _four_ small files for this operation (the current
"top"  commit, the commit file of ten days ago, and the two "tree" files
associated with those). It will then need to open the backing store files 
for the files that are different between the two versions, but IT WILL 
NEVER EVEN LOOK at the files that it immediately sees are the same.

And that's actually true whether we're talking about the top-of-tree or
not. If I had the kernel history in git format (I don't - I estimate that
it would be about 1.5GB - 2GB in size, and would take me about ten days to
extract from BK ;), I could do a diff between _any_ tagged version (and I
mention "tagged" only as a way to look up the commit ID - it doesn't have
to be tagged if you know it some other way) in O(n) where 'n' is the
number of files that have changed between the revisions.

Number of changesets doesn't matter. Number of files doesn't matter. The
_only_ thing that matters is the size of the change.

Btw, I don't actually have a git command to do this yet. A bit of
scripting required to do it, but it's pretty trivial: you open the two
"commit" files that are the beginning/end of the thing, you look up what
the tree state was at each point, you open up the two tree files involved,
and you ignore all entries that match.

Since the tree files are already sorted, that "ignoring matches" is
basically free (technically that's O(n) in the number of files described,
but we're talking about something that even a slow machine can do so fast
you probably can't even time it with a stop-watch). You now have the 
complete list of files that have been changed (removed, added or "exists 
in both trees, but different contents"), and you can thus trivially create 
the whole tree with opening up _only_ the indexes for those files.

Ergo: O(n) in size of change. Both in work and in disk/cache access (where
the latter is often the more important one). Absolutely _zero_ indirection
anywhere apart from the initial stage to go from "commit" to "tree", so
there's no seeking except to actually read the files once you know what
they are (and since you know them up-front and there are no dependencies
at that point, you could even tell the OS to prefetch them if you really
cared about getting minimal disk seeks).

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:47               ` Linus Torvalds
@ 2005-04-08 18:56                 ` Chris Wedgwood
  2005-04-09  7:37                   ` Willy Tarreau
  2005-04-09 15:40                 ` Paul Jackson
  1 sibling, 1 reply; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08 18:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Matthias-Christian Ott, Andrea Arcangeli,
	Kernel Mailing List

On Fri, Apr 08, 2005 at 11:47:10AM -0700, Linus Torvalds wrote:

> Don't use NFS for development. It sucks for BK too.

Some times NFS is unavoidable.

In the best case (see previous email wrt to only stat'ing the parent
directories when you can) for a current kernel though you can get away
with 894 stats --- over NFS that would probably be tolerable.

After claiming such an optimization is probably not worth while I'm
now thinking for network filesystems it might be.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:28                 ` Jon Smirl
@ 2005-04-08 18:58                   ` Florian Weimer
  2005-04-09  1:11                   ` Marcin Dalecki
  1 sibling, 0 replies; 201+ messages in thread
From: Florian Weimer @ 2005-04-08 18:58 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Kernel Mailing List

* Jon Smirl:

> On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote:
>>    How do you replicate your database incrementally? I've given you enough
>>    clues to do it for "git" in probably five lines of perl.
>
> Efficient database replication is achieved by copying the transaction
> logs and then replaying them.

Works only if the databases are in sync.  Even if the transaction logs
are pretty high-level, you risk violating constraints specified by the
application.  General multi-master replication is an unsolved problem.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:05                 ` Chris Wedgwood
@ 2005-04-08 19:03                   ` Linus Torvalds
  2005-04-08 19:16                     ` Chris Wedgwood
                                       ` (2 more replies)
  0 siblings, 3 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 19:03 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List



On Fri, 8 Apr 2005, Chris Wedgwood wrote:
> 
> Actually, I could probably make this *much* still faster with a
> caveat.  Given that my editor when I write a file will write a
> temporary file and rename it, for files in directories where nlink==2
> I can check chat first and skip the stat of the individual files.

Yes, doing the stat just on the directory (on leaf directories only, of
course, but nlink==2 does say that on most filesystems) is indeed a huge
potential speedup.

It doesn't matter so much for the cached case, but it _does_ matter for
the uncached one. Makes a huge difference, in fact (I was playing with
exactly that back when I started doing "bkr" in BK/tools - three years
ago).

It turns out that I expect to cache my source tree (at least the mail
outline), and that guides my optimizations, but yes, your dir stat does
help in the case of "occasionally working with lots of large projects"  
rather than "mostly working on the same ones with enough RAM to cache it
all".

And "git" is actually fairly anal in this respect: it not only stats all
files, but the index file contains a lot more of the stat info than you'd
expect. So for example, it checks both ctime and mtime to the nanosecond
(did I mention that I didn't worry too much about portability?) exactly so
that it can catch any changes except for actively malicious things.

And if you do actively malicious things in your own directory, you get
what you deserve. It's actually _hard_ to try to fool git into believing a
file hasn't changed: you need to not only replace it with the exact same
file length and ctime/mtime, you need to reuse the same inode/dev numbers
(again - I didn't worry about portability, and filesystems where those
aren't stable are a "don't do that then") and keep the mode the same. Oh,
and uid/gid, but that was much me being silly.

			Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:14               ` Linus Torvalds
  2005-04-08 18:28                 ` Jon Smirl
@ 2005-04-08 19:16                 ` Matthias-Christian Ott
  2005-04-08 19:32                   ` Linus Torvalds
  2005-04-09  1:09                 ` Marcin Dalecki
  2 siblings, 1 reply; 201+ messages in thread
From: Matthias-Christian Ott @ 2005-04-08 19:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List

Linus Torvalds wrote:

>On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>  
>
>>Ok, but if you want to search for information in such big text files it 
>>slow, because you do linear search 
>>    
>>
>
>No I don't. I don't search for _anything_. I have my own
>content-addressable filesystem, and I guarantee you that it's faster than
>mysql, because it depends on the kernel doing the right thing (which it
>does).
>
>  
>
I'm not talking about mysql, i'm talking about fast databases like 
sqlite or db4.

>I never do a single "readdir". It's all direct data lookup, no "searching"  
>anywhere.
>
>Databases aren't magical. Quite the reverse. They easily end up being
>_slower_ than doing it by hand, simply because they have to solve a much
>more generic issue. If you design your data structures and abstractions
>right, a database is pretty much guaranteed to only incur overhead.
>
>The advantage of a database is the abstraction and management it gives 
>you. But I did my own special-case abstraction in git.
>
>Yeah, I bet "git" might suck if your OS sucks. I definitely depend on name
>caching at an OS level so that I know that opening a file is fast. In
>other words, there _is_ an indexing and caching database in there, and
>it's called the Linux VFS layer and the dentry cache.
>
>The proof is in the pudding. git is designed for _one_ thing, and one 
>thing only: tracking a series of directory states in a way that can be 
>replicated. It's very very fast at that. A database with a more flexible 
>abstraction migt be faster at other things, but the fact is, you do take a 
>hit.
>
>The problem with databases are:
>
> - they are damn hard to just replicate wildly and without control. The 
>   database backing file inherently has a lot of internal state. You may 
>   be able to "just copy it", but you have to copy the whole damn thing.
>  
>
This is _not_ true for every database (specialy plain/text databases 
with meta information).

>   In "git", the data is all there in immutable blobs that you can just 
>   rsync. In fact, you don't even need rsync: you can just look at the 
>   filenames, and anything new you copy. No need for any fancy "read the 
>   files to see that they match". They _will_ match, or you can tell 
>   immediately that a file is corrupt.
>
>   Look at this:
>
>	torvalds@ppc970:~/git> sha1sum .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678 
>	e7bfaadd5d2331123663a8f14a26604a3cdcb678  .dircache/objects/e7/bfaadd5d2331123663a8f14a26604a3cdcb678
>
>   see a pattern anywhere? Imagine that you know the list of files you 
>   have, and the list of files the other side has (never mind the 
>   contents), and how _easy_ it is to synchronize. Without ever having to 
>   even read the remote files that you know you already have.
>   How do you replicate your database incrementally? I've given you enough 
>   clues to do it for "git" in probably five lines of perl.
>  
>
I replicate my database incremently by using a hash list like you (the 
client sends its hash list, the server compares the lists and acquaints 
the client behind which data (data = hash + data) the data has to added 
(this is like your solution -- you also submit the data and the location 
(you have directories too, right?)). A database is in some cases (like 
this one) like a filesystem, but it's build one top of better filesystem 
like xfs, reiser4 or ext3 which support features like LVM, Quotas or 
Journaling (Is your filesystem also build on top of existing filesystem? 
I don't think so because you're talking about vfs operatations on the 
filesystem).

> - they tend to take time to set up and prime.
>
>   In contrast, the filesystem is always there. Sure, you effectively have 
>   to "prime" that one too, but the thing is, if your OS is doing its job, 
>   you basically only need to prime it once per reboot. No need to prime 
>   it for each process you start or play games with connecting to servers 
>   etc. It's just there. Always.
>  
>
The database -- single file (sqlite or db4) -- is always there too 
because it's on the filesystem and doesn't need a server.

>So if you think of the filesystem as a database, you're all set. If you 
>design your data structure so that there is just one index, you make that 
>the name, and the kernel will do all the O(1) hashed lookups etc for you. 
>You do have to limit yourself in some ways. 
>  
>
But as mentioned you need to _open_ each file (It doesn't matter if it's 
cached (this speeds up only reading it) -- you need a _slow_ system call 
and _very slow_ hardware access anyway).
Have a look at this comparison:
If you have big chest and lots of small chests containing the same bulk 
of gold, it's more work to collect the gold from the small chests than 
from the big one (which would contain as many a cases as little chests 
exist). You can faster find your gold because you don't have to walk to 
the other chests and you don't have to open that much caps which saves 
also time.

>Oh, and you have to be willing to waste diskspace. "git" is _not_
>space-efficient. The good news is that it is cache-friendly, since it is
>also designed to never actually look at any old files that aren't part of
>the immediate history, so while it wastes diskspace, it does not waste the
>(much more precious) page cache.
>
>IOW big file-sets are always bad for performance if you need to traverse
>them to get anywhere, but if you index things so that you only read the 
>stuff you really really _need_ (which git does), big file-sets are just an 
>excuse to buy a new disk ;)
>
>			Linus
>
>  
>
I hope my idea/opinion is clear now.

Matthias-Christian

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:03                   ` Linus Torvalds
@ 2005-04-08 19:16                     ` Chris Wedgwood
  2005-04-08 19:38                       ` Florian Weimer
                                         ` (2 more replies)
  2005-04-09  7:20                     ` Willy Tarreau
  2005-04-09 15:15                     ` Paul Jackson
  2 siblings, 3 replies; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08 19:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List

On Fri, Apr 08, 2005 at 12:03:49PM -0700, Linus Torvalds wrote:

> Yes, doing the stat just on the directory (on leaf directories only, of
> course, but nlink==2 does say that on most filesystems) is indeed a huge
> potential speedup.

Here I measure about 6ms for cache --- essentially below the noise
threshold for something that does real work.

> It doesn't matter so much for the cached case, but it _does_ matter
> for the uncached one.

Doing the minimal stat cold-cache here is about 6s for local disk.
I'm somewhat surprised it's that bad actually.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:16                 ` Matthias-Christian Ott
@ 2005-04-08 19:32                   ` Linus Torvalds
  2005-04-08 19:44                     ` Matthias-Christian Ott
  0 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 19:32 UTC (permalink / raw)
  To: Matthias-Christian Ott
  Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List



On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>
> But as mentioned you need to _open_ each file (It doesn't matter if it's 
> cached (this speeds up only reading it) -- you need a _slow_ system call 
> and _very slow_ hardware access anyway).

Nope. System calls aren't slow. What crappy OS are you running?

> I hope my idea/opinion is clear now.

Numbers talk. I've got something that you can test ;)

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:16                     ` Chris Wedgwood
@ 2005-04-08 19:38                       ` Florian Weimer
  2005-04-08 19:48                         ` Chris Wedgwood
  2005-04-08 19:39                       ` Linus Torvalds
  2005-04-08 20:50                       ` Kernel SCM saga Luck, Tony
  2 siblings, 1 reply; 201+ messages in thread
From: Florian Weimer @ 2005-04-08 19:38 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Kernel Mailing List

* Chris Wedgwood:

>> It doesn't matter so much for the cached case, but it _does_ matter
>> for the uncached one.
>
> Doing the minimal stat cold-cache here is about 6s for local disk.

Does sorting by inode number make a difference?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:16                     ` Chris Wedgwood
  2005-04-08 19:38                       ` Florian Weimer
@ 2005-04-08 19:39                       ` Linus Torvalds
  2005-04-08 20:11                         ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad
  2005-04-08 20:50                       ` Kernel SCM saga Luck, Tony
  2 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 19:39 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List



On Fri, 8 Apr 2005, Chris Wedgwood wrote:
> 
> > It doesn't matter so much for the cached case, but it _does_ matter
> > for the uncached one.
> 
> Doing the minimal stat cold-cache here is about 6s for local disk.
> I'm somewhat surprised it's that bad actually.

One of the reasons I do inode numbers in the "index" file (apart from 
checking that the inode hasn't changed) is in fact that "stat()" is damn 
slow if it causes seeks. Since your stat loop is entirely 

You can optimize your stat() patterns on traditional unix-like filesystems
by just sorting the stats by inode number (since the inode number is
historically a special index into the inode table - even when filesystems
distribute the inodes over several tables, sorting will generally do the
right thing from a seek perspective). It's a disgusting hack, but it
literally gets you orders-of-magnitude performance improvments in many
real-life cases.

It does have some downsides:
 - it buys you nothing when it's cached (and obviously you have the 
   sorting overhead, although that's pretty cheap)
 - on other filesystems it can make things slower.

But if the cold-cache case actually is a concern, I do have the solution 
for it. Just a simple "prime-cache" program that does a qsort on the index 
file entries and does the stat() on them all will bring the numbers down. 
Those 6 seconds you see are the disk head seeking around like mad.

			Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:32                   ` Linus Torvalds
@ 2005-04-08 19:44                     ` Matthias-Christian Ott
  0 siblings, 0 replies; 201+ messages in thread
From: Matthias-Christian Ott @ 2005-04-08 19:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Chris Wedgwood, Kernel Mailing List

Linus Torvalds wrote:

>On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>  
>
>>But as mentioned you need to _open_ each file (It doesn't matter if it's 
>>cached (this speeds up only reading it) -- you need a _slow_ system call 
>>and _very slow_ hardware access anyway).
>>    
>>
>
>Nope. System calls aren't slow. What crappy OS are you running?
>
>  
>
But they're slower because there're some instances checking them.

>>I hope my idea/opinion is clear now.
>>    
>>
>
>Numbers talk. I've got something that you can test ;)
>  
>
This doesn't mean it's better just because you had the time develope it 
;). But anyhow the folk needs something, they can test to see if it's 
good or not, most don't believe in concepts.

>		Linus
>
>  
>
We will see which solutions wins the "race". But I think you're 
solutions will "win", because you're Linus Torvalds -- the "Boss" of 
Linux and have to work with this system very day (usualy people are 
using what they have developed :)) -- and I have not the time develop a 
database based solution (maybe someone else is interested in developing it).

Matthias-Christian

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:38                       ` Florian Weimer
@ 2005-04-08 19:48                         ` Chris Wedgwood
  0 siblings, 0 replies; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08 19:48 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Kernel Mailing List

On Fri, Apr 08, 2005 at 09:38:09PM +0200, Florian Weimer wrote:

> Does sorting by inode number make a difference?

It almost certainly would.  But I can sort more intelligently than
that even (all the world isn't ext2/3).

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Uncached stat performace [ Was: Re: Kernel SCM saga.. ]
  2005-04-08 19:39                       ` Linus Torvalds
@ 2005-04-08 20:11                         ` Ragnar Kjørstad
  2005-04-08 20:14                           ` Chris Wedgwood
  0 siblings, 1 reply; 201+ messages in thread
From: Ragnar Kjørstad @ 2005-04-08 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Wedgwood, Matthias-Christian Ott, Andrea Arcangeli,
	Kernel Mailing List

On Fri, Apr 08, 2005 at 12:39:26PM -0700, Linus Torvalds wrote:
> One of the reasons I do inode numbers in the "index" file (apart from 
> checking that the inode hasn't changed) is in fact that "stat()" is damn 
> slow if it causes seeks. Since your stat loop is entirely 
> 
> You can optimize your stat() patterns on traditional unix-like filesystems
> by just sorting the stats by inode number (since the inode number is
> historically a special index into the inode table - even when filesystems
> distribute the inodes over several tables, sorting will generally do the
> right thing from a seek perspective). It's a disgusting hack, but it
> literally gets you orders-of-magnitude performance improvments in many
> real-life cases.

It does, so why isn't there a way to do this without the disgusting
hack? (Your words, not mine :) )

E.g, wouldn't a aio_stat() allow simular or better speedups in a way
that doesn't depend on ext2/3 internals?

I bet it would make a significant difference from things like "ls -l" in
large uncached directories and imap-servers with maildir?



-- 
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
Scaling the Linux Datacenter

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Uncached stat performace [ Was: Re: Kernel SCM saga.. ]
  2005-04-08 20:11                         ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad
@ 2005-04-08 20:14                           ` Chris Wedgwood
  0 siblings, 0 replies; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-08 20:14 UTC (permalink / raw)
  To: Ragnar Kj?rstad
  Cc: Linus Torvalds, Matthias-Christian Ott, Andrea Arcangeli,
	Kernel Mailing List

On Fri, Apr 08, 2005 at 10:11:51PM +0200, Ragnar Kj?rstad wrote:

> It does, so why isn't there a way to do this without the disgusting
> hack? (Your words, not mine :) )

inode sorting probably a good guess for a number of filesystems, you
can map the blocks used to do better still (somewhat fs specific)

you can do better still if you multiple stats in parallel (up to a
point) and let the elevator sort things out

> I bet it would make a significant difference from things like "ls -l" in
> large uncached directories and imap-servers with maildir?

sort + concurrent stats would help here i think

i'm not sure i like the idea of ls using lots of threads though :)

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:16                     ` Chris Wedgwood
  2005-04-08 19:38                       ` Florian Weimer
  2005-04-08 19:39                       ` Linus Torvalds
@ 2005-04-08 20:50                       ` Luck, Tony
  2005-04-08 21:27                         ` Linus Torvalds
  2 siblings, 1 reply; 201+ messages in thread
From: Luck, Tony @ 2005-04-08 20:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

It looks like an operation like "show me the history of mm/memory.c" will
be pretty expensive using git.  I'd need to look at the current tree, and
then trace backwards through all 60,000 changesets to see which ones had
actual changes to this file.  Could you expand the tuple in the tree object
to include a back pointer to the previous tree in which the tuple changed?
Or does adding history to the tree violate other goals of the tree type?

-Tony

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 20:50                       ` Kernel SCM saga Luck, Tony
@ 2005-04-08 21:27                         ` Linus Torvalds
  2005-04-09 17:14                           ` Roman Zippel
  0 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 21:27 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Kernel Mailing List



On Fri, 8 Apr 2005 Luck@unix-os.sc.intel.com wrote:
>
> It looks like an operation like "show me the history of mm/memory.c" will
> be pretty expensive using git.

Yes.  Per-file history is expensive in git, because if the way it is 
indexed. Things are indexed by tree and by changeset, and there are no 
per-file indexes.

You could create per-file _caches_ (*) on top of git if you wanted to make
it behave more like a real SCM, but yes, it's all definitely optimized for
the things that _I_ tend to care about, which is the whole-repository
operations.

		Linus

(*) Doing caching on that level is probably find, especially since most
people really tend to want it for just the relatively few files that they
work on anyway. Limiting the caches to a subset of the tree should be
quite effective.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:24         ` Jon Masters
@ 2005-04-08 22:05           ` Daniel Phillips
  0 siblings, 0 replies; 201+ messages in thread
From: Daniel Phillips @ 2005-04-08 22:05 UTC (permalink / raw)
  To: jonathan; +Cc: Al Viro, Linus Torvalds, David Woodhouse, Kernel Mailing List

On Friday 08 April 2005 13:24, Jon Masters wrote:
> On Apr 7, 2005 6:54 PM, Daniel Phillips <phillips@istop.com> wrote:
> > So I propose that everybody who is interested, pick one of the above
> > projects and join it, to help get it to the point of being able to
> > losslessly import the version graph.  Given the importance, I think that
> > _all_ viable alternatives need to be worked on in parallel, so that two
> > months from now we have several viable options.
>
> What about BitKeeper licensing constraints on such involvement?

They don't apply to me, for one.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 15:32   ` Linus Torvalds
  2005-04-07 17:09     ` Daniel Phillips
  2005-04-07 17:10     ` Al Viro
@ 2005-04-08 22:52     ` Roman Zippel
  2005-04-08 23:46       ` Tupshin Harper
  2005-04-09 16:52       ` Eric D. Mudama
  2 siblings, 2 replies; 201+ messages in thread
From: Roman Zippel @ 2005-04-08 22:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Woodhouse, Kernel Mailing List

Hi,

On Thu, 7 Apr 2005, Linus Torvalds wrote:

> I really disliked that in BitKeeper too originally. I argued with Larry
> about it, but Larry (correctly, I believe) argued that efficient and
> reliable distribution really requires the concept of "history is
> immutable". It makes replication much easier when you know that the known
> subset _never_ shrinks or changes - you only add on top of it.

The problem is you pay a price for this. There must be a reason developers 
were adding another GB of memory just to run BK.
Preserving the complete merge history does indeed make repeated merges 
simpler, but it builds up complex meta data, which has to be managed 
forever. I doubt that this is really an advantage in the long term. I 
expect that we were better off serializing changesets in the main 
repository. For example bk does something like this:

	A1 -> A2 -> A3 -> BM
	  \-> B1 -> B2 --^

and instead of creating the merge changeset, one could merge them like 
this:

	A1 -> A2 -> A3 -> B1 -> B2

This results in a simpler repository, which is more scalable and which 
is easier for users to work with (e.g. binary bug search).
The disadvantage would be it will cause more minor conflicts, when changes 
are pulled back into the original tree, but which should be easily 
resolvable most of the time.
I'm not saying with this that the bk model is bad, but I think it's a 
problem if it's the only model applied to everything.

> The thing is, cherry-picking very much implies that the people "up" the 
> foodchain end up editing the work of the people "below" them. The whole 
> reason you want cherry-picking is that you want to fix up somebody elses 
> mistakes, ie something you disagree with.
> 
> That sounds like an obviously good thing, right? Yes it does.
> 
> The problem is, it actually results in the wrong dynamics and psychology 
> in the system. First off, it makes the implicit assumption that there is 
> an "up" and "down" in the food-chain, and I think that's wrong.

These dynamics do exists and our tools should be able to represent them.
For example when people post patches, they get reviewed and often need 
more changes and bk doesn't really help them to redo the patches.
Bk helped you to offload the cherry-picking process to other people, so 
that you only had to do cherry-collecting very efficiently.
Another prime example of cherry-picking is Andrews mm tree, he picks a 
number of patches which are ready for merging and forwards them to you.
Our current basic development model (at least until a few days ago) looks 
something like this:

	linux-mm -> linux-bk -> linux-stable

Ideally most changes would get into the tree via linux-mm and depending 
on depending various conditions (e.g. urgency, review state) it would get 
into the stable tree. In practice linux-mm is more an aggregation of 
patches which need testing and since most bk users were developing 
against linux-bk, it got a lot less testing and a lot of problems are 
only caught at the next stage. Changes from the stable tree would even 
flow in the opposite direction.
Bk supports certain aspects of the kernel development process very well, 
but due its closed nature it was practically impossible to really 
integrate it fully into this process (at least for anyone outside BM). 
In the short term we probably are in for a tough ride and we take whatever 
works best for you, but in the long term we need to think about how SCM 
fits into our kernel development model, which includes development, 
review, testing and releasing of kernel changes. This is more than just 
pulling and merging kernel trees. I'm aiming at a tool that can also 
support Andrews work, so that he can also better offload some of this 
work (and take a break sometimes :) ). Unfortunately every existing tool I 
know of is lacking in its own way, so we still have some way to go...

bye, Roman

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  8:38                   ` Andrea Arcangeli
@ 2005-04-08 23:38                     ` Daniel Phillips
  2005-04-09  2:54                       ` Andrea Arcangeli
  2005-04-09  0:12                     ` Linus Torvalds
  1 sibling, 1 reply; 201+ messages in thread
From: Daniel Phillips @ 2005-04-08 23:38 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang

On Friday 08 April 2005 04:38, Andrea Arcangeli wrote:
> On Thu, Apr 07, 2005 at 11:41:29PM -0700, Linus Torvalds wrote:
> The huge number of changesets is the crucial point, there are good
> distributed SCM already but they are apparently not efficient enough at
> handling 60k changesets.
>
> We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to
> evaluate how well they scale.
>
> OTOH if your git project already allows storing the data in there,
> that looks nice ;).

Hi Andrea,

For the immediate future, all we need is something than can _losslessly_ 
capture the new metadata that's being generated.  That buys time to bring one 
of the promising open source candidates up to full speed.

By the way, which one are you working on? :-)

Regards,

Daniel

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 22:52     ` Roman Zippel
@ 2005-04-08 23:46       ` Tupshin Harper
  2005-04-09  1:00         ` Roman Zippel
  2005-04-09 16:52       ` Eric D. Mudama
  1 sibling, 1 reply; 201+ messages in thread
From: Tupshin Harper @ 2005-04-08 23:46 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

Roman Zippel wrote:

>Preserving the complete merge history does indeed make repeated merges 
>simpler, but it builds up complex meta data, which has to be managed 
>forever. I doubt that this is really an advantage in the long term. I 
>expect that we were better off serializing changesets in the main 
>repository. For example bk does something like this:
>
>	A1 -> A2 -> A3 -> BM
>	  \-> B1 -> B2 --^
>
>and instead of creating the merge changeset, one could merge them like 
>this:
>
>	A1 -> A2 -> A3 -> B1 -> B2
>
>This results in a simpler repository, which is more scalable and which 
>is easier for users to work with (e.g. binary bug search).
>The disadvantage would be it will cause more minor conflicts, when changes 
>are pulled back into the original tree, but which should be easily 
>resolvable most of the time.
>
Both darcs and arch (and arch's siblings) have ways of maintaining the 
complete history but speeding up operations.

Arch use's revision libraries:
http://www.gnu.org/software/gnu-arch/tutorial/revision-libraries.html
though i'm not all that up on arch so I'll just leave it at that.

Darcs uses "darcs optimize --checkpoint"
http://darcs.net/manual/node7.html#SECTION00764000000000000000
which "allows for users to retrieve a working repository with limited 
history with a savings of disk space and bandwidth." In darcs case, you 
can pull a partial repository by doing "darcs get --partial", in which 
case you only grab the state at the point that the repository was 
optimized and subsequent patches, and all operations only need to work 
against the set of patches since that optimize.

Note, that I'm not promoting darcs for kernel usage because of speed (or 
the lack thereof) but I am curious why Linus would consider monotone 
given its speed issues but not consider darcs.

-Tupshin

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08  8:38                   ` Andrea Arcangeli
  2005-04-08 23:38                     ` Daniel Phillips
@ 2005-04-09  0:12                     ` Linus Torvalds
  2005-04-09  2:27                       ` Andrea Arcangeli
  2005-04-09 16:33                       ` Roman Zippel
  1 sibling, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-09  0:12 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Martin Pool, linux-kernel, David Lang



On Fri, 8 Apr 2005, Andrea Arcangeli wrote:
> 
> We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to
> evaluate how well they scale.

Yes, that makes most sense, I believe. Especially as BKCVS does the 
linearization that makes other SCM's _able_ to take the data in the first 
place. Few enough SCM's really understand the BK merge model, although the 
distributed ones obviously have to do something similar.

> OTOH if your git project already allows storing the data in there,
> that looks nice ;).

I can express the data, and I did a sparse .git archive to prove the 
concept. It doesn't even try to save BK-specific details, but as far as I 
can tell, my git-conversion did capture all the basic things (ie not just 
the actual source tree, but hopefully all the "who did what" parts too).

Of course, my git visualization tools are so horribly crappy that it is 
hard to make sure ;)

Also, I suspect that BKCVS actually bothers to get more details out of a
BK tree than I cared about. People have pestered Larry about it, so BKCVS
exports a lot of the nitty-gritty (per-file comments etc) that just
doesn't actually _matter_, but people whine about. Me, I don't care. My
sparse-conversion just took the important parts.

> I don't yet fully understand how the algorithms of the trees are meant
> to work

Well, things like actually merging two git trees is not even something git
tries to do. It leaves that to somebody else - you can see what the
relationship is, and you can see all the data, but as far as I'm
concerned, git is really a "filesystem". It's a way of expression
revisions, but it's not a way of creating them.

> It looks similar to a diff -ur of two hardlinked trees

Yes. You could really think of it that way. It's not really about
hardlinking, but the fact that objects are named by their content does
mean that two objects (regardless of their type) can be seen as
"hardlinked" whenever their contents match.

But the more interesting part is the hierarchical virtual format it has,
ie it is not only hardlinked, but it also has the three different levels
of "views" into those hardlinked objects ("blob", "tree", "revision").

So even though the hash tree looks flat in the _physcal_ filesystem, it 
detinitely isn't flat in its own virtual world. It's just flattened to fit 
in a normal filesystem ;)

[ There's also a fourth level view in "trust", but that one hasn't been
  implemented yet since I think it might as well be done at a higher
  level. ]

Btw, the sha1 file format isn't actually designed for "rsync", since rsync 
is really a hell of a lot more capable than my format needs. The format is 
really designed for something like a offline http grabber, in that you can 
just grab files purely by filename (and verify that you got them right by 
running sha1sum on the resulting local copy). So think "wget".

				Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 23:46       ` Tupshin Harper
@ 2005-04-09  1:00         ` Roman Zippel
  2005-04-09  1:23           ` Tupshin Harper
  0 siblings, 1 reply; 201+ messages in thread
From: Roman Zippel @ 2005-04-09  1:00 UTC (permalink / raw)
  To: Tupshin Harper; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

Hi,

On Fri, 8 Apr 2005, Tupshin Harper wrote:

> > 	A1 -> A2 -> A3 -> B1 -> B2
> > 
> > This results in a simpler repository, which is more scalable and which is
> > easier for users to work with (e.g. binary bug search).
> > The disadvantage would be it will cause more minor conflicts, when changes
> > are pulled back into the original tree, but which should be easily
> > resolvable most of the time.
> > 
> Both darcs and arch (and arch's siblings) have ways of maintaining the
> complete history but speeding up operations.

Please show me how you would do a binary search with arch.

I don't really like the arch model, it's far too restrictive and it's 
jumping through hoops to get to an acceptable speed.
What I expect from a SCM is that it maintains both a version index of the 
directory structure and a version index of the individual files. Arch 
makes it especially painful to extract this data quickly. For the common 
cases it throws disk space at the problem and does a lot of caching, but 
there are still enough problems (e.g. annotate), which require scanning of 
lots of tarballs.

bye, Roman

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 16:15         ` Matthias-Christian Ott
  2005-04-08 17:14           ` Linus Torvalds
@ 2005-04-09  1:00           ` Marcin Dalecki
  2005-04-09  1:09             ` Chris Wedgwood
  1 sibling, 1 reply; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:00 UTC (permalink / raw)
  To: Matthias-Christian Ott
  Cc: Linus Torvalds, Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List


On 2005-04-08, at 18:15, Matthias-Christian Ott wrote:

> Linus Torvalds wrote:
>>
> SQL Databases like SQLite aren't slow.
> But maybe a Berkeley Database v.4 is a better solution.

Yes it sucks less for this purpose. See subversion as reference.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07  7:44 ` Jan Hudec
  2005-04-08  6:14   ` Matthias Urlichs
@ 2005-04-09  1:01   ` Marcin Dalecki
  2005-04-09  8:32     ` Jan Hudec
  2005-04-11  2:26     ` Miles Bader
  1 sibling, 2 replies; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:01 UTC (permalink / raw)
  To: Jan Hudec; +Cc: Linus Torvalds, Kernel Mailing List


On 2005-04-07, at 09:44, Jan Hudec wrote:
>
> I have looked at most systems currently available. I would suggest
> following for closer look on:
>
> 1) GNU Arch/Bazaar. They use the same archive format, simple, have the
>    concepts right. It may need some scripts or add ons. When Bazaar-NG
>    is ready, it will be able to read the GNU Arch/Bazaar archives so
>    switching should be easy.

Arch isn't a sound example of software design. Quite contrary to the 
random notes posted by it's author the following issues did strike me 
the time I did evaluate it:

The application (tla) claims to have "intuitive" command names. However
I didn't see that as given. Most of them where difficult to remember
and appeared to be just infantile. I stopped looking further after I 
saw:

tla my-id instead of: tla user-id or oeven tla set id ...

tla make-archive instead of tla init

tla my-default-archive john@dole.com--2005-VersionPatrol

No more "My Compuer" please...

Repository addressing requires you to use informally defined
very elaborated and typing error prone conventions:

mkdir ~/{archives}
tla make-archive john@dole.com--20005-VersionPatrol 
~/{archives}/2005-VersionPatrol

You notice the requirement for two commands to accomplish a single task 
already
well denoted by the second command? There is more of the same at quite 
a few places
when you try to use it. You notice the triple zero it didn't catch?

As an added bonus it relies on the applications named by accident
patch and diff and installed on the host in question as well as few 
other as well to
operate.

Better don't waste your time with looking at Arch. Stick with patches
you maintain by hand combined with some scripts containing a list of 
apply commands
and you should be still more productive then when using Arch.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 17:14           ` Linus Torvalds
                               ` (2 preceding siblings ...)
  2005-04-08 17:35             ` Jeff Garzik
@ 2005-04-09  1:04             ` Marcin Dalecki
  2005-04-09 15:42               ` Paul Jackson
  3 siblings, 1 reply; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Chris Wedgwood, Andrea Arcangeli,
	Kernel Mailing List


On 2005-04-08, at 19:14, Linus Torvalds wrote:
>
> You do that with an sql database, and I'll be impressed.

It's possible. But what will impress you are either the price tag the 
DB comes with or
the hardware it runs on :-)


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:00           ` Marcin Dalecki
@ 2005-04-09  1:09             ` Chris Wedgwood
  2005-04-09  1:21               ` Marcin Dalecki
  0 siblings, 1 reply; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-09  1:09 UTC (permalink / raw)
  To: Marcin Dalecki
  Cc: Matthias-Christian Ott, Linus Torvalds, Andrea Arcangeli,
	Kernel Mailing List

On Sat, Apr 09, 2005 at 03:00:44AM +0200, Marcin Dalecki wrote:

> Yes it sucks less for this purpose. See subversion as reference.

Whatever solution people come up with, ideally it should be tolerant
to minor amounts of corruption (so I can recover the rest of my data
if need be) and it should also have decent sanity checks to find
corruption as soon as reasonable possible.

I've been bitten by problems that subversion didn't catch but bk did.
In the subversion case by the time I noticed much data was lost and
none of the subversion tools were able to recover the rest of it.

In the bk case, the data-loss was almost immediately noticeable and
only affected a few files making recovery much easier.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:14               ` Linus Torvalds
  2005-04-08 18:28                 ` Jon Smirl
  2005-04-08 19:16                 ` Matthias-Christian Ott
@ 2005-04-09  1:09                 ` Marcin Dalecki
  2 siblings, 0 replies; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias-Christian Ott, Chris Wedgwood, Andrea Arcangeli,
	Kernel Mailing List


On 2005-04-08, at 20:14, Linus Torvalds wrote:

>
>
> On Fri, 8 Apr 2005, Matthias-Christian Ott wrote:
>>
>> Ok, but if you want to search for information in such big text files 
>> it
>> slow, because you do linear search
>
> No I don't. I don't search for _anything_. I have my own
> content-addressable filesystem, and I guarantee you that it's faster 
> than
> mysql, because it depends on the kernel doing the right thing (which it
> does).

Linus.... Sorry but you mistake the frequently seen SQL db abuse as DATA
storage for what SQL databases are good at storing: well defined 
RELATIONS.
Sure a filesystem is for data. SQL is for relations.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:28                 ` Jon Smirl
  2005-04-08 18:58                   ` Florian Weimer
@ 2005-04-09  1:11                   ` Marcin Dalecki
  2005-04-09  1:50                     ` David Lang
  1 sibling, 1 reply; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:11 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List,
	Linus Torvalds, Matthias-Christian Ott


On 2005-04-08, at 20:28, Jon Smirl wrote:

> On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote:
>>    How do you replicate your database incrementally? I've given you 
>> enough
>>    clues to do it for "git" in probably five lines of perl.
>
> Efficient database replication is achieved by copying the transaction
> logs and then replaying them. Most mid to high end databases support
> this. You only need to copy the parts of the logs that you don't
> already have.
>
Databases supporting replication are called high end. You forgot the 
cats dance
around the network this issue involves.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:09             ` Chris Wedgwood
@ 2005-04-09  1:21               ` Marcin Dalecki
  0 siblings, 0 replies; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:21 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Matthias-Christian Ott, Linus Torvalds, Andrea Arcangeli,
	Kernel Mailing List


On 2005-04-09, at 03:09, Chris Wedgwood wrote:

> On Sat, Apr 09, 2005 at 03:00:44AM +0200, Marcin Dalecki wrote:
>
>> Yes it sucks less for this purpose. See subversion as reference.
>
> Whatever solution people come up with, ideally it should be tolerant
> to minor amounts of corruption (so I can recover the rest of my data
> if need be) and it should also have decent sanity checks to find
> corruption as soon as reasonable possible.

Yes this is the reason subversion is moving toward an alternative 
back-end
based on a custom DB mapped closely to the file system.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:00         ` Roman Zippel
@ 2005-04-09  1:23           ` Tupshin Harper
  0 siblings, 0 replies; 201+ messages in thread
From: Tupshin Harper @ 2005-04-09  1:23 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

Roman Zippel wrote:

>
>
>Please show me how you would do a binary search with arch.
>
>I don't really like the arch model, it's far too restrictive and it's 
>jumping through hoops to get to an acceptable speed.
>What I expect from a SCM is that it maintains both a version index of the 
>directory structure and a version index of the individual files. Arch 
>makes it especially painful to extract this data quickly. For the common 
>cases it throws disk space at the problem and does a lot of caching, but 
>there are still enough problems (e.g. annotate), which require scanning of 
>lots of tarballs.
>
>bye, Roman
>  
>
I'm not going to defend or attack arch since I haven't used it enough. I 
will say that darcs largely does suffer from the same problem that you 
describe since its fundamental unit of storage is individual patches 
(though it avoids the tarball issue). This is why David Roundy has 
indicated his intention of eventually having a per-file cache:
http://kerneltrap.org/mailarchive/1/message/24317/flat

You could then make the argument that if you have a per-file 
representation of the history, why do you also need/want a per-patch 
representation as the canonical format, but that's been argued plenty on 
both the darcs and arch mailing lists and probably isn't worth going 
into here.

-Tupshin

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:11                   ` Marcin Dalecki
@ 2005-04-09  1:50                     ` David Lang
  2005-04-09 22:12                       ` Florian Weimer
  0 siblings, 1 reply; 201+ messages in thread
From: David Lang @ 2005-04-09  1:50 UTC (permalink / raw)
  To: Marcin Dalecki
  Cc: Jon Smirl, Chris Wedgwood, Andrea Arcangeli, Kernel Mailing List,
	Linus Torvalds, Matthias-Christian Ott

On Sat, 9 Apr 2005, Marcin Dalecki wrote:

> On 2005-04-08, at 20:28, Jon Smirl wrote:
>
>> On Apr 8, 2005 2:14 PM, Linus Torvalds <torvalds@osdl.org> wrote:
>>>    How do you replicate your database incrementally? I've given you enough
>>>    clues to do it for "git" in probably five lines of perl.
>> 
>> Efficient database replication is achieved by copying the transaction
>> logs and then replaying them. Most mid to high end databases support
>> this. You only need to copy the parts of the logs that you don't
>> already have.
>> 
> Databases supporting replication are called high end. You forgot the cats 
> dance
> around the network this issue involves.

And Postgres (which is Free in all senses of the word) is high end by this 
definition.

I'm not saying that it's an efficiant thing to use for this task, but 
don't be fooled into thinking you need something on the price of Oracle to 
do this job.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  0:12                     ` Linus Torvalds
@ 2005-04-09  2:27                       ` Andrea Arcangeli
  2005-04-09  2:32                         ` David Lang
                                           ` (3 more replies)
  2005-04-09 16:33                       ` Roman Zippel
  1 sibling, 4 replies; 201+ messages in thread
From: Andrea Arcangeli @ 2005-04-09  2:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Pool, linux-kernel, David Lang

On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote:
> really designed for something like a offline http grabber, in that you can 
> just grab files purely by filename (and verify that you got them right by 
> running sha1sum on the resulting local copy). So think "wget".

I'm not entirely convinced wget is going to be an efficient way to
synchronize and fetch your tree, its simplicitly is great though. It's a
tradeoff between optimzing and re-using existing tools (like webservers).
Perhaps that's why you were compressing the stuff too? It sounds better
not to compress the stuff on-disk, and to synchronize with a rsync-like
protocol (rsync server would make it) that handles the compression in
the network protocol itself, and in turn that can apply compression to a
large blob (i.e. the diff between the trees), and not to the single tiny
files.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  2:27                       ` Andrea Arcangeli
@ 2005-04-09  2:32                         ` David Lang
  2005-04-09  3:08                         ` Brian Gerst
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 201+ messages in thread
From: David Lang @ 2005-04-09  2:32 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel

On Sat, 9 Apr 2005, Andrea Arcangeli wrote:

> On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote:
>> really designed for something like a offline http grabber, in that you can
>> just grab files purely by filename (and verify that you got them right by
>> running sha1sum on the resulting local copy). So think "wget".
>
> I'm not entirely convinced wget is going to be an efficient way to
> synchronize and fetch your tree, its simplicitly is great though. It's a
> tradeoff between optimzing and re-using existing tools (like webservers).
> Perhaps that's why you were compressing the stuff too? It sounds better
> not to compress the stuff on-disk, and to synchronize with a rsync-like
> protocol (rsync server would make it) that handles the compression in
> the network protocol itself, and in turn that can apply compression to a
> large blob (i.e. the diff between the trees), and not to the single tiny
> files.

note that many webservers will compress the data for you on the fly as 
well, so there's even less need to have it pre-compressed

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Re: Kernel SCM saga..
  2005-04-08 15:50       ` Linus Torvalds
@ 2005-04-09  2:53         ` Petr Baudis
  2005-04-09  7:08           ` Randy.Dunlap
  2005-04-10  1:01           ` Phillip Lougher
  2005-04-09 15:50         ` Paul Jackson
  1 sibling, 2 replies; 201+ messages in thread
From: Petr Baudis @ 2005-04-09  2:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ross, Kernel Mailing List

  Hello,

Dear diary, on Fri, Apr 08, 2005 at 05:50:21PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Fri, 8 Apr 2005 ross@jose.lug.udel.edu wrote:
> > 
> > Here's a partial solution.  It does depend on a modified version of
> > cat-file that behaves like cat.  I found it easier to have cat-file
> > just dump the object indicated on stdout.  Trivial patch for that is included.
> 
> Your trivial patch is trivially incorrect, though. First off, some files
> may be binary (and definitely are - the "tree" type object contains
> pathnames, and in order to avoid having to worry about special characters
> they are NUL-terminated), and your modified "cat-file" breaks that.  
> 
> Secondly, it doesn't check or print the tag.

  FWIW, I made few small fixes (to prevent some trivial usage errors to
cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
gitlog.sh - heavily inspired by what already went through the mailing
list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
(including .dircache, even though it isn't shown in the index), the
cumulative patch can be found below. The scripts aim to provide some
(obviously very interim) more high-level interface for git.

  I'm now working on tree-diff.c which will (surprise!) produce a diff
of two trees (I'll finish it after I get some sleep, though), and then I
will probably do some dwimmy gitdiff.sh wrapper for tree-diff and
show-diff. At that point I might get my hand on some pull more kind to
local changes.

  Kind regards,
				Petr Baudis

diff -ruN git-0.03/gitadd.sh git-devel-clean/gitadd.sh
--- git-0.03/gitadd.sh	1970-01-01 01:00:00.000000000 +0100
+++ git-devel-clean/gitadd.sh	2005-04-09 03:17:34.220577000 +0200
@@ -0,0 +1,13 @@
+#!/bin/sh
+#
+# Add new file to a GIT repository.
+# Copyright (c) Petr Baudis, 2005
+#
+# Takes a list of file names at the command line, and schedules them
+# for addition to the GIT repository at the next commit.
+#
+# FIXME: Those files are omitted from show-diff output!
+
+for file in "$@"; do
+	echo $file >>.dircache/add-queue
+done
diff -ruN git-0.03/gitcommit.sh git-devel-clean/gitcommit.sh
--- git-0.03/gitcommit.sh	1970-01-01 01:00:00.000000000 +0100
+++ git-devel-clean/gitcommit.sh	2005-04-09 03:17:34.220577000 +0200
@@ -0,0 +1,36 @@
+#!/bin/sh
+#
+# Commit into a GIT repository.
+# Copyright (c) Petr Baudis, 2005
+# Based on an example script fragment sent to LKML by Linus Torvalds.
+#
+# Ignores any parameters for now, excepts changelog entry on stdin.
+#
+# FIXME: Gets it wrong for filenames containing spaces.
+
+
+if [ -r .dircache/add-queue ]; then
+	mv .dircache/add-queue .dircache/add-queue-progress
+	addedfiles=$(cat .dircache/add-queue-progress)
+else
+	addedfiles=
+fi
+changedfiles=$(show-diff -s | grep -v ': ok$' | cut -d : -f 1)
+commitfiles="$addedfiles $changedfiles"
+if [ ! "$commitfiles" ]; then
+	echo 'Nothing to commit.' >&2
+	exit
+fi
+update-cache $commitfiles
+rm -f .dircache/add-queue-progress
+
+
+oldhead=$(cat .dircache/HEAD)
+treeid=$(write-tree)
+newhead=$(commit-tree $treeid -p $oldhead)
+
+if [ "$newhead" ]; then
+	echo $newhead >.dircache/HEAD
+else
+	echo "Error during commit (oldhead $oldhead, treeid $treeid)" >&2
+fi
diff -ruN git-0.03/gitlog.sh git-devel-clean/gitlog.sh
--- git-0.03/gitlog.sh	1970-01-01 01:00:00.000000000 +0100
+++ git-devel-clean/gitlog.sh	2005-04-09 04:28:51.227791000 +0200
@@ -0,0 +1,61 @@
+#!/bin/sh
+####
+#### Call this script with an object and it will produce the change
+#### information for all the parents of that object
+####
+#### This script was originally written by Ross Vandegrift.
+# multiple parents test 1d0f4aec21e5b66c441213643426c770dc6dedc0
+# parents: ffa098b2e187b71b86a76d3cd5eb77d074a2503c
+# 6860e0d9197c7f52155466c225baf39b42d62f63
+
+# regex for parent declarations
+PARENTS="^parent [A-z0-9]{40}$"
+
+TMPCL="/tmp/gitlog.$$"
+
+# takes an object and generates the object's parent(s)
+function unpack_parents () {
+	echo "me $1"
+	echo "me $1" >>$TMPCL
+	RENTS=""
+
+	TMPCM=$(mktemp)
+	cat-file commit $1 >$TMPCM
+	while read line; do
+		if echo "$line" | egrep -q "$PARENTS"; then
+			RENTS="$RENTS "$(echo $line | sed 's/parent //g')
+		fi
+		echo $line
+	done <$TMPCM
+	rm $TMPCM
+
+	echo -e "\n--------------------------\n"
+
+	# if the last object had no parents, return
+	if [ ! "$RENTS" ]; then
+		return;
+	fi
+
+	#useful for testing
+	#echo $RENTS
+	#read
+	for i in `echo $RENTS`; do
+		# break cycles
+		if grep -q "me $i" $TMPCL; then
+			echo "Already visited $i" >&2
+			continue
+		else
+			unpack_parents $i
+		fi
+	done
+}
+
+base=$1
+if [ ! "$base" ]; then
+	base=$(cat .dircache/HEAD)
+fi
+
+rm -f $TMPCL
+unpack_parents $base
+rm -f $TMPCL
+
diff -ruN git-0.03/read-cache.c git-devel-clean/read-cache.c
--- git-0.03/read-cache.c	2005-04-08 22:51:35.000000000 +0200
+++ git-devel-clean/read-cache.c	2005-04-09 03:53:44.049642000 +0200
@@ -264,11 +264,12 @@
 	size = 0; // avoid gcc warning
 	map = (void *)-1;
 	if (!fstat(fd, &st)) {
-		map = NULL;
 		size = st.st_size;
 		errno = EINVAL;
 		if (size > sizeof(struct cache_header))
 			map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+		else
+			return (!hdr->entries) ? 0 : error("inconsistent cache");
 	}
 	close(fd);
 	if (-1 == (int)(long)map)
diff -ruN git-0.03/show-diff.c git-devel-clean/show-diff.c
--- git-0.03/show-diff.c	2005-04-08 17:55:09.000000000 +0200
+++ git-devel-clean/show-diff.c	2005-04-09 03:53:44.063638000 +0200
@@ -49,9 +49,17 @@
 
 int main(int argc, char **argv)
 {
+	int silent = 0;
 	int entries = read_cache();
 	int i;
 
+	while (argc-- > 1) {
+		if (!strcmp(argv[1], "-s"))
+			silent = 1;
+		else if (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help"))
+			usage("show-diff [-s]");
+	}
+
 	if (entries < 0) {
 		perror("read_cache");
 		exit(1);
@@ -77,6 +85,9 @@
 		for (n = 0; n < 20; n++)
 			printf("%02x", ce->sha1[n]);
 		printf("\n");
+		if (silent)
+			continue;
+
 		new = read_sha1_file(ce->sha1, type, &size);
 		show_differences(ce, &st, new, size);
 		free(new);
diff -ruN git-0.03/update-cache.c git-devel-clean/update-cache.c
--- git-0.03/update-cache.c	2005-04-08 17:53:44.000000000 +0200
+++ git-devel-clean/update-cache.c	2005-04-09 03:53:44.069637000 +0200
@@ -231,6 +231,9 @@
 		return -1;
 	}
 
+	if (argc < 2)
+		usage("update-cache <file>*");
+
 	newfd = open(".dircache/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
 	if (newfd < 0) {
 		perror("unable to create new cachefile");

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 23:38                     ` Daniel Phillips
@ 2005-04-09  2:54                       ` Andrea Arcangeli
  0 siblings, 0 replies; 201+ messages in thread
From: Andrea Arcangeli @ 2005-04-09  2:54 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang

On Fri, Apr 08, 2005 at 07:38:30PM -0400, Daniel Phillips wrote:
> For the immediate future, all we need is something than can _losslessly_ 
> capture the new metadata that's being generated.  That buys time to bring one 
> of the promising open source candidates up to full speed.

Agreed.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  2:27                       ` Andrea Arcangeli
  2005-04-09  2:32                         ` David Lang
@ 2005-04-09  3:08                         ` Brian Gerst
  2005-04-09  3:15                           ` Andrea Arcangeli
  2005-04-09  5:45                         ` Linus Torvalds
  2005-04-10 17:55                         ` Matthias Andree
  3 siblings, 1 reply; 201+ messages in thread
From: Brian Gerst @ 2005-04-09  3:08 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang

Andrea Arcangeli wrote:
> On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote:
> 
>>really designed for something like a offline http grabber, in that you can 
>>just grab files purely by filename (and verify that you got them right by 
>>running sha1sum on the resulting local copy). So think "wget".
> 
> 
> I'm not entirely convinced wget is going to be an efficient way to
> synchronize and fetch your tree, its simplicitly is great though. It's a
> tradeoff between optimzing and re-using existing tools (like webservers).
> Perhaps that's why you were compressing the stuff too? It sounds better
> not to compress the stuff on-disk, and to synchronize with a rsync-like
> protocol (rsync server would make it) that handles the compression in
> the network protocol itself, and in turn that can apply compression to a
> large blob (i.e. the diff between the trees), and not to the single tiny
> files.

It's my understanding that the files don't change.  Only new ones are 
created for each revision.

--
				Brian gErst	

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  3:08                         ` Brian Gerst
@ 2005-04-09  3:15                           ` Andrea Arcangeli
  0 siblings, 0 replies; 201+ messages in thread
From: Andrea Arcangeli @ 2005-04-09  3:15 UTC (permalink / raw)
  To: Brian Gerst; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang

On Fri, Apr 08, 2005 at 11:08:58PM -0400, Brian Gerst wrote:
> It's my understanding that the files don't change.  Only new ones are 
> created for each revision.

I said diff between the trees, not diff between files ;). When you fetch
the new changes with rsync, it'll compress better and in turn it'll be
faster (assuming we're network bound and I am with 1mbit and 2.5ghz
cpu), if it's rsync applying gzip to the big "combined diff between
trees" instead of us compressing every single small file on disk, that
won't compress anymore inside rsync.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  2:27                       ` Andrea Arcangeli
  2005-04-09  2:32                         ` David Lang
  2005-04-09  3:08                         ` Brian Gerst
@ 2005-04-09  5:45                         ` Linus Torvalds
  2005-04-09 22:55                           ` David S. Miller
  2005-04-10 17:55                         ` Matthias Andree
  3 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-09  5:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Martin Pool, linux-kernel, David Lang



On Sat, 9 Apr 2005, Andrea Arcangeli wrote:
> 
> I'm not entirely convinced wget is going to be an efficient way to
> synchronize and fetch your tree

I don't think it's efficient per se, but I think it's important that 
people can just "pass the files along". Ie it's a huge benefit if any 
everyday mirror script (whether rsync, wget, homebrew or whatever) will 
just automatically do the right thing. 

> Perhaps that's why you were compressing the stuff too? It sounds better
> not to compress the stuff on-disk

I much prefer to waste some CPU time to save disk cache. Especially since 
the compression is "free" if you do it early on (ie it's done only once, 
since the files are stable). Also, if the difference is a 1.5GB kernel 
repository or a 3GB kernel repository, I know which one I'll pick ;)

Also, I don't want people editing repostitory files by hand. Sure, the 
sha1 catches it, but still... I'd rather force the low-level ops to use 
the proper helper routines. Which is why it's a raw zlib compressed blob, 
not a gzipped file.

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  2:53         ` Petr Baudis
@ 2005-04-09  7:08           ` Randy.Dunlap
  2005-04-09 18:06             ` [PATCH] " Petr Baudis
  2005-04-10  1:01           ` Phillip Lougher
  1 sibling, 1 reply; 201+ messages in thread
From: Randy.Dunlap @ 2005-04-09  7:08 UTC (permalink / raw)
  To: Petr Baudis; +Cc: torvalds, ross, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2107 bytes --]

On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote:

|   Hello,
| 
| Dear diary, on Fri, Apr 08, 2005 at 05:50:21PM CEST, I got a letter
| where Linus Torvalds <torvalds@osdl.org> told me that...
| > 
| > 
| > On Fri, 8 Apr 2005 ross@jose.lug.udel.edu wrote:
| > > 
| > > Here's a partial solution.  It does depend on a modified version of
| > > cat-file that behaves like cat.  I found it easier to have cat-file
| > > just dump the object indicated on stdout.  Trivial patch for that is included.
| > 
| > Your trivial patch is trivially incorrect, though. First off, some files
| > may be binary (and definitely are - the "tree" type object contains
| > pathnames, and in order to avoid having to worry about special characters
| > they are NUL-terminated), and your modified "cat-file" breaks that.  
| > 
| > Secondly, it doesn't check or print the tag.
| 
|   FWIW, I made few small fixes (to prevent some trivial usage errors to
| cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
| gitlog.sh - heavily inspired by what already went through the mailing
| list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
| (including .dircache, even though it isn't shown in the index), the
| cumulative patch can be found below. The scripts aim to provide some
| (obviously very interim) more high-level interface for git.
| 
|   I'm now working on tree-diff.c which will (surprise!) produce a diff
| of two trees (I'll finish it after I get some sleep, though), and then I
| will probably do some dwimmy gitdiff.sh wrapper for tree-diff and
| show-diff. At that point I might get my hand on some pull more kind to
| local changes.

Hi,

I'll look at your scripts this weekend.  I've also been
working on some, but mine are a bit more experimental (cruder)
than yours are.  Anyway, here they are (attached) -- also
available at http://developer.osdl.org/rddunlap/git/

gitin : checkin/commit
gitwhat sha1 : what is that sha1 file (type and contents if blob or commit)
gitlist (blob, commit, tree, or all) :
	list all objects with type (commit, tree, blob, or all)

---
~Randy

[-- Attachment #2: gitin --]
[-- Type: application/octet-stream, Size: 742 bytes --]

#! /bin/sh
# gitin: checkin for git files

# grep show-diff for +++ => error, print 'run update-cache <filenames>', exit
#	(better would be an error exit code)
# write-tree > current_tree_object
# print 'enter commit message:'
# commit-tree `cat current_tree_object` -p `cat .dircache/HEAD` > current_commit_object
# update .dircache/HEAD with current_commit_object

diffs=`show-diff | grep "+++"`
#echo diffs=/$diffs/

if [ x"$diffs" != x ]; then
	echo "run update-cache <filenames>"
	exit
fi

tree_object=`write-tree`
#echo tree_obj=/$tree_object/

head=`cat .dircache/HEAD`
echo "enter commit message: (end with ^D)"
commit_object=`commit-tree $tree_object -p $head`
#echo commit_obj=/$commit_object/

echo $commit_object > .dircache/HEAD

[-- Attachment #3: gitlist --]
[-- Type: application/octet-stream, Size: 580 bytes --]

#! /bin/sh
# gitlist: list some git objects/types
# (by selected target type: blob, tree, commit, all)

target=$1
if [ -z "$target" ]; then
	echo "usage: gitlist type {blob, tree, commit, or all}"
	exit 1
fi


subdir=.dircache/objects/

for high in 0 1 2 3 4 5 6 7 8 9 a b c d e f ; do
    for low in 0 1 2 3 4 5 6 7 8 9 a b c d e f ; do
	top=$high$low

	for f in $subdir/$top/* ; do
		if [ ! -r $f ]; then
			continue
		fi
		base=`basename $f`
		type=`cat-file -t $top$base`
		if [ $target == "all" -o $target == $type ]; then
			echo $top$base : $type 
		fi
	done
    done
done

[-- Attachment #4: gitwhat --]
[-- Type: application/octet-stream, Size: 533 bytes --]

#! /bin/sh
# gitwhat: what is that file

sha1=$1
if [ -z $sha1 ]; then
	echo "usage: gitwhat sha1"
	exit 1
fi

what=`cat-file -t $sha1`
if [ -z "$what" ]; then
	exit 1
fi
echo "type is: $what"

topdir=${sha1:0:2}
last=${sha1:2}
file=.dircache/objects/$topdir/$last

if [ -z $PAGER ]; then
	pager=more
else
	pager=$PAGER
fi

case $what in
blob)
	#head -10 $file
	#$pager $file
	cat-file blob $sha1 | $pager
	;;
tree)
	echo "cannot print binary tree"
	#cat-file tree $sha1 | $pager
	;;
commit)
	cat-file commit $sha1 | $pager
	;;
esac

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:03                   ` Linus Torvalds
  2005-04-08 19:16                     ` Chris Wedgwood
@ 2005-04-09  7:20                     ` Willy Tarreau
  2005-04-09 15:15                     ` Paul Jackson
  2 siblings, 0 replies; 201+ messages in thread
From: Willy Tarreau @ 2005-04-09  7:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Wedgwood, Matthias-Christian Ott, Andrea Arcangeli,
	Kernel Mailing List

On Fri, Apr 08, 2005 at 12:03:49PM -0700, Linus Torvalds wrote:
 
> And if you do actively malicious things in your own directory, you get
> what you deserve. It's actually _hard_ to try to fool git into believing a
> file hasn't changed: you need to not only replace it with the exact same
> file length and ctime/mtime, you need to reuse the same inode/dev numbers
> (again - I didn't worry about portability, and filesystems where those
> aren't stable are a "don't do that then") and keep the mode the same. Oh,
> and uid/gid, but that was much me being silly.

It would be even easier to touch the tree with a known date before
patching (eg:1/1/70). It would protect against any accidental date
change if for any reason your system time went backwards while
working on the tree.

Another trick I use when I build the 2.4-hf patches is to build a
list of filenames from the patches. It works only because I want
to keep all original patches and no change should appear outside
those patches. Using this + cp -al + diff -pruN makes the process
very fast. It would not work if I had to rebuild those patches from
hand-edited files of course.

Last but not least, it only takes 0.26 seconds on my dual athlon
1800 to find date/size changes between 2.6.11{,.7} and 4.7s if the
tool includes the md5 sum in its checks :

$ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \
  --ignore-dot --only-new --ignore-sum linux-2.6.11/. linux-2.6.11.7/. |wc -l
     47

real    0m0.255s
user    0m0.094s
sys     0m0.162s

$ time flx check --ignore-owner --ignore-mode --ignore-ldate --ignore-dir \
  --ignore-dot --only-new linux-2.6.11/. linux-2.6.11.7/. |wc -l
     47

real    0m4.705s
user    0m3.398s
sys     0m1.310s

(This was with 'flx', a tool a friend developped for file-system integrity
checking which we also use to build our packages). Anyway, what I wanted
to show is that once the trees are cached, even somewhat heavy operations
such as checksumming can be done occasionnaly (such as md5 for double
checking) without you waiting too long. And I don't think that a database
would provide all the comfort of a standard file-system (cp -al, rsync,
choice of tools, etc...).

Willy


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:56                 ` Chris Wedgwood
@ 2005-04-09  7:37                   ` Willy Tarreau
  2005-04-09  7:47                     ` Neil Brown
  0 siblings, 1 reply; 201+ messages in thread
From: Willy Tarreau @ 2005-04-09  7:37 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Linus Torvalds, Jeff Garzik, Matthias-Christian Ott,
	Andrea Arcangeli, Kernel Mailing List

On Fri, Apr 08, 2005 at 11:56:09AM -0700, Chris Wedgwood wrote:
> On Fri, Apr 08, 2005 at 11:47:10AM -0700, Linus Torvalds wrote:
> 
> > Don't use NFS for development. It sucks for BK too.
> 
> Some times NFS is unavoidable.
> 
> In the best case (see previous email wrt to only stat'ing the parent
> directories when you can) for a current kernel though you can get away
> with 894 stats --- over NFS that would probably be tolerable.
> 
> After claiming such an optimization is probably not worth while I'm
> now thinking for network filesystems it might be.

I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
files each) and 1.3s once the trees are cached locally. This is without
comparing file contents, just meta-data. And it takes 19.33s to compare
the file's md5 sums once the trees are cached. I don't know if there are
ways to avoid some NFS operations when everything is cached.

Anyway, the system does not seem much efficient on hard links, it caches
the files twice :-(

Willy


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  7:37                   ` Willy Tarreau
@ 2005-04-09  7:47                     ` Neil Brown
  2005-04-09  8:00                       ` Willy Tarreau
  0 siblings, 1 reply; 201+ messages in thread
From: Neil Brown @ 2005-04-09  7:47 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Chris Wedgwood, Linus Torvalds, Jeff Garzik,
	Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List

On Saturday April 9, willy@w.ods.org wrote:
> 
> I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
> files each) and 1.3s once the trees are cached locally. This is without
> comparing file contents, just meta-data. And it takes 19.33s to compare
> the file's md5 sums once the trees are cached. I don't know if there are
> ways to avoid some NFS operations when everything is cached.
> 
> Anyway, the system does not seem much efficient on hard links, it caches
> the files twice :-(

I suspect you'll be wanting to add a "no_subtree_check" export option
on your NFS server...

NeilBrown

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  7:47                     ` Neil Brown
@ 2005-04-09  8:00                       ` Willy Tarreau
  2005-04-09  9:34                         ` Neil Brown
  0 siblings, 1 reply; 201+ messages in thread
From: Willy Tarreau @ 2005-04-09  8:00 UTC (permalink / raw)
  To: Neil Brown
  Cc: Chris Wedgwood, Linus Torvalds, Jeff Garzik,
	Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List

On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote:
> On Saturday April 9, willy@w.ods.org wrote:
> > 
> > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
> > files each) and 1.3s once the trees are cached locally. This is without
> > comparing file contents, just meta-data. And it takes 19.33s to compare
> > the file's md5 sums once the trees are cached. I don't know if there are
> > ways to avoid some NFS operations when everything is cached.
> > 
> > Anyway, the system does not seem much efficient on hard links, it caches
> > the files twice :-(
> 
> I suspect you'll be wanting to add a "no_subtree_check" export option
> on your NFS server...

Thanks a lot, Neil ! This is very valuable information. I didn't
understand such implications from the exports(5) man page, but it
makes a great difference. And the diff sped up from 5.7 to 3.9s
and from 19.3 to 15.3s.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:01   ` Marcin Dalecki
@ 2005-04-09  8:32     ` Jan Hudec
  2005-04-11  2:26     ` Miles Bader
  1 sibling, 0 replies; 201+ messages in thread
From: Jan Hudec @ 2005-04-09  8:32 UTC (permalink / raw)
  To: Marcin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3284 bytes --]

On Sat, Apr 09, 2005 at 03:01:29 +0200, Marcin Dalecki wrote:
> 
> On 2005-04-07, at 09:44, Jan Hudec wrote:
> >
> >I have looked at most systems currently available. I would suggest
> >following for closer look on:
> >
> >1) GNU Arch/Bazaar. They use the same archive format, simple, have the
> >   concepts right. It may need some scripts or add ons. When Bazaar-NG
> >   is ready, it will be able to read the GNU Arch/Bazaar archives so
> >   switching should be easy.
> 
> Arch isn't a sound example of software design. Quite contrary to the 

I actually _do_ agree with you. I like Arch, but it's user interface
certainly is broken and some parts of it would sure needs some redesign.

> random notes posted by it's author the following issues did strike me 
> the time I did evaluate it:
> 
> The application (tla) claims to have "intuitive" command names. However
> I didn't see that as given. Most of them where difficult to remember
> and appeared to be just infantile. I stopped looking further after I 
> saw:
> 
> tla my-id instead of: tla user-id or oeven tla set id ...
> 
> tla make-archive instead of tla init

In this case, tla init would be a lot *worse*, because there are two
different things to initialize -- the archive and the tree. But
init-archive would be a little better, for consistency.

> tla my-default-archive john@dole.com--2005-VersionPatrol

This one is kinda broken. Even in concept it is.

> No more "My Compuer" please...
> 
> Repository addressing requires you to use informally defined
> very elaborated and typing error prone conventions:
> 
> mkdir ~/{archives}

*NO*. Usng this is name is STRONGLY recommended *AGAINST*. Tom once used
it in the example or in some of his archive and people started doing it,
but it's a compelete bogosity and it is not required anywhere.

> tla make-archive john@dole.com--20005-VersionPatrol 
> ~/{archives}/2005-VersionPatrol
> 
> You notice the requirement for two commands to accomplish a single task 
> already well denoted by the second command? There is more of the same
> at quite a few places when you try to use it. You notice the triple
> zero it didn't catch?

I sure do. But the folks writing Bazaar are gradually fixing these.
There is a lot of them and it's not that long since they started, so
they did not fix all of them yey, but I think they eventually will.

> As an added bonus it relies on the applications named by accident
> patch and diff and installed on the host in question as well as few 
> other as well to
> operate.

No. The build process actually checks that the diff and patch
applications are actually the GNU Diff and GNU Patch in sufficiently
recent version. It's was not always the case, but now it does.

> Better don't waste your time with looking at Arch. Stick with patches
> you maintain by hand combined with some scripts containing a list of 
> apply commands
> and you should be still more productive then when using Arch.

I don't agree with you. Using Arch is more productive (eg. because it
does merges), but certainly one could do a lot better than Arch does.

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  8:00                       ` Willy Tarreau
@ 2005-04-09  9:34                         ` Neil Brown
  0 siblings, 0 replies; 201+ messages in thread
From: Neil Brown @ 2005-04-09  9:34 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Chris Wedgwood, Linus Torvalds, Jeff Garzik,
	Matthias-Christian Ott, Andrea Arcangeli, Kernel Mailing List

On Saturday April 9, willy@w.ods.org wrote:
> On Sat, Apr 09, 2005 at 05:47:08PM +1000, Neil Brown wrote:
> > On Saturday April 9, willy@w.ods.org wrote:
> > > 
> > > I've just checked, it takes 5.7s to compare 2.4.29{,-hf3} over NFS (13300
> > > files each) and 1.3s once the trees are cached locally. This is without
> > > comparing file contents, just meta-data. And it takes 19.33s to compare
> > > the file's md5 sums once the trees are cached. I don't know if there are
> > > ways to avoid some NFS operations when everything is cached.
> > > 
> > > Anyway, the system does not seem much efficient on hard links, it caches
> > > the files twice :-(
> > 
> > I suspect you'll be wanting to add a "no_subtree_check" export option
> > on your NFS server...
> 
> Thanks a lot, Neil ! This is very valuable information. I didn't
> understand such implications from the exports(5) man page, but it
> makes a great difference. And the diff sped up from 5.7 to 3.9s
> and from 19.3 to 15.3s.

No, that implication had never really occurred to me before either.
But when you said "caches the file twice" it suddenly made sense.
With subtree_check, the NFS file handle contains information about the
directory, and NFS uses the filehandle as the primary key to tell if
two things are the same or not.

Trond keeps prodding me to make no_subtree_check the default.  Maybe it
is time that I actually did....

NeilBrown

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 19:03                   ` Linus Torvalds
  2005-04-08 19:16                     ` Chris Wedgwood
  2005-04-09  7:20                     ` Willy Tarreau
@ 2005-04-09 15:15                     ` Paul Jackson
  2 siblings, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 15:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: cw, matthias.christian, andrea, linux-kernel

Linus wrote:
> you need to reuse the same inode/dev numbers
> (again - I didn't worry about portability, and filesystems where those
> aren't stable are a "don't do that then") 

On filesystems that don't have a stable inode number, I use the md5sum
of the full (relative to mount point) pathname as the inode number. 

Since these same file systems (not surprisingly) lack hard links as
well, the pathname _is_ essentially the stable inode number.


Off-topic details ...

This is on my backup program, which does a full snapshot of my 90 Gb
system, including some FAT file systems, in 6 or 7 minutes, plus time
proportional to actual changes.  I have given up finding a backup
program I can tolerate, and write my own.  It stores each md5sum unique
blob exactly once, but uses the same sort of tricks you describe to
detect changes from examining just the stat information so as to avoid
reading every damn byte on the disk.  It works with smb, fat, vfat,
ntfs, reiserfs, xfs, ext2/3, ...  A single manifest file, in plain
ascii, one file per line, captures a full snapshot, disk-to-disk, every
few hours.

This comment from my backup source explains more:

# Unfortunately, fat, vfat, smb, and ncpfs (Netware) file systems
# do not have unique disk-based persistent inode numbers.
# The kernel constructs transient inode numbers for inodes
# in its cache.  But after an umount and re-mount, the inode
# numbers are all different.  So we would end up recalculating
# the md5sums of all files in any such file systems.
#
# To avoid this, we keep track of which directories are on such
# file systems, and for files in any such directory, instead
# of using the inode value from stat'ing a file, we use the
# md5sum of its path as a pseudo-inode number.  This digest of
# a file's path has improved persistance over it's transiently
# assigned inode number.  Fields 5,6,7 (files total, free and
# avail) happen to be zero on file systems (fat, vfat, smb,
# ...) with no real inodes, so we we use this fallback means
# of getting a persistent pseudo-inode if a statvfs() call on
# its directory has fields 5,6,7 summing to zero:
#       sum(os.statvfs(dir)[5:8]) == 0
# We include that dir in the fat_directories set in this case.

fat_directories = sets.Set()    # set of directory paths on FAT file systems

# The Python statvfs() on Linux is a tad expensive - the
# glibc statvfs(2) code does several system calls, including
# scanning /proc/mounts and stat'ing its entries.  We need
# to know for each file whether it is on a "fat" file system
# (see above), but for efficiency we only statvfs at mount
# points, then propagate the file system type from there down.

mountpoints = [m.split()[1] for m in open("/proc/mounts")]



-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 18:47               ` Linus Torvalds
  2005-04-08 18:56                 ` Chris Wedgwood
@ 2005-04-09 15:40                 ` Paul Jackson
  2005-04-09 16:16                   ` Linus Torvalds
  1 sibling, 1 reply; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 15:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel

Linus wrote:
> then git will open have exactly _one_ 
> file (no searching, no messing around), which contains absolutely nothing 
> except for the compressed (and SHA1-signed) old contents of the file. It 
> obviously _has_ to do that, because in order to know whether you've 
> changed it, it needs to now compare it to the original.

I must be missing something here ...

If the stat shows a possible change, then you shouldn't have to open the
original version to determine if it really changed - just compute the
SHA1 of the new file, and see if that changed from the original SHA1.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:04             ` Marcin Dalecki
@ 2005-04-09 15:42               ` Paul Jackson
  2005-04-09 18:45                 ` Marcin Dalecki
  0 siblings, 1 reply; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 15:42 UTC (permalink / raw)
  To: Marcin Dalecki; +Cc: torvalds, matthias.christian, cw, andrea, linux-kernel

Marcin wrote:
> But what will impress you are either the price tag the 
> DB comes with or
> the hardware it runs on :-)

The payroll for the staffing to care and feed for these
babies is often impressive as well.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 15:50       ` Linus Torvalds
  2005-04-09  2:53         ` Petr Baudis
@ 2005-04-09 15:50         ` Paul Jackson
  2005-04-09 16:26           ` Linus Torvalds
  1 sibling, 1 reply; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 15:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ross, cw, linux-kernel

> in order to avoid having to worry about special characters
> they are NUL-terminated)

Would this be a possible alternative - newline terminated (convert any
newlines embedded in filenames to the 3 chars '%0A', and leave it as an
exercise to the reader to de-convert them.)

Line formatted ASCII files are really nice - worth pissing on embedded
newlines in paths to obtain.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 15:40                 ` Paul Jackson
@ 2005-04-09 16:16                   ` Linus Torvalds
  2005-04-09 17:15                     ` Paul Jackson
  2005-04-09 17:35                     ` Paul Jackson
  0 siblings, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-09 16:16 UTC (permalink / raw)
  To: Paul Jackson; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel



On Sat, 9 Apr 2005, Paul Jackson wrote:
> 
> I must be missing something here ...
> 
> If the stat shows a possible change, then you shouldn't have to open the
> original version to determine if it really changed - just compute the
> SHA1 of the new file, and see if that changed from the original SHA1.

Yes. However, I've got two reasons for this:

 (a) it may actually be cheaper to just unpack the compressed thing than
     it is to compute the sha, _especially_ since it's very likely that
     you have to do that anyway (ie if it turns out that they _are_
     different, you need the unpacked data to then look at the
     differences).

     So when you come from your backup angle, you only care about "has it 
     changed", and you'll do a backup. In "git", you usually care about 
     the old contents too.

 (b) while I depend on the fact that if the SHA of an object matches, the 
     objects are the same, I generally try to avoid the reverse 
     dependency. Why? Because if I end up changing the way I pack objects,
     and still want to work with old objects, I may end up in the 
     situation that two identical objects could get different object 
     names.

I don't actually know how valid a point "(b)" is, and I don't think it's 
likely, but imagine that SHA1 ends up being broken (*) and I decide that I 
want to pack new objects with a new-and-improved-SHA256 or something. Such 
a thing would obviously mean that you end up with lots of _duplicate_ data 
(any new data that is repackaged with the new name will now cause a new 
git object), but "duplicate" is better than "broken".

I don't actually guarantee that "git" could handle that right, but I've
been idly trying to avoid locking myself into the mindset that "file
equality has to mean name equality over the long run". So while the system 
right now works on the 1:1 "name" <-> "content" mapping, it's possible 
that it _could_ work with a more relaxed 1:n "content" -> "name" mapping.

But it's entirely possible that I'm being a git about this.

		Linus 

(*) yeah, yeah, I know about the current theoretical case, and I don't
care. Not only is it theoretical, the way my objects are packed you'd have
to not just generate the same SHA1 for it, it would have to _also_ still
be a valid zlib object _and_ get the header to match the "type + length"  
of object part. IOW, the object validity checks are actually even stricter
than just "sha1 matches".

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 10:30     ` Matthias Andree
  2005-04-07 10:54       ` Andrew Walrond
@ 2005-04-09 16:17       ` David Roundy
  2005-04-10  9:24         ` Giuseppe Bilotta
  1 sibling, 1 reply; 201+ messages in thread
From: David Roundy @ 2005-04-09 16:17 UTC (permalink / raw)
  To: Kernel Mailing List

On Thu, Apr 07, 2005 at 12:30:18PM +0200, Matthias Andree wrote:
> On Thu, 07 Apr 2005, Sergei Organov wrote:
> > darcs? <http://www.abridgegame.org/darcs/>
> 
> Close. Some things:
> 
> 1. It's rather slow and quite CPU consuming and certainly I/O consuming
>    at times - I keep, to try it out, leafnode-2 in a DARCS repo, which
>    has a mere 20,000 lines in 140 files, with 1,436 changes so far, on a
>    RAID-1 with two 7200/min disk drives, with an Athlon XP 2500+ with
>    512 MB RAM. The repo has 1,700 files in 11.5 MB, the source itself
>    189 files in 1.8 MB.
> 
>    Example: darcs annotate nntpd.c takes 23 s. (2,660 lines, 60 kByte)
> 
>    The maintainer himself states that there's still optimization required.

Indeed, there's still a lot of optimization to be done.  I've recently made
some improvements recently which will reduce the memory use (and speed
things up) for a few of the worst-performing commands.  No improvement to
the initial record, but on the plus side, that's only done once.  But I was
able to cut down the memory used checking out a kernel repository to 500m.
(Which, sadly enough, is a major improvement.)

You would do much better if you recorded the initial state one directory at
a time, since it's the size of the largest changeset that determines the
memory use on checkout, but that's ugly.

> Getting DARCS up to the task would probably require some polishing, and
> should probably be discussed with the DARCS maintainer before making
> this decision.
> 
> Don't get me wrong, DARCS looks promising, but I'm not convinced it's
> ready for the linux kernel yet.

Indeed, I do believe that darcs has a way to go before it'll perform
acceptably on the kernel.  On the other hand, tar seems to perform
unacceptably slow on the kernel, so I'm not sure how slow is too slow.
Definitely input from interested kernel developers on which commands are
too slow would be welcome.
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 15:50         ` Paul Jackson
@ 2005-04-09 16:26           ` Linus Torvalds
  2005-04-09 17:08             ` Paul Jackson
                               ` (2 more replies)
  0 siblings, 3 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-09 16:26 UTC (permalink / raw)
  To: Paul Jackson; +Cc: ross, cw, linux-kernel



On Sat, 9 Apr 2005, Paul Jackson wrote:
>
> > in order to avoid having to worry about special characters
> > they are NUL-terminated)
> 
> Would this be a possible alternative - newline terminated (convert any
> newlines embedded in filenames to the 3 chars '%0A', and leave it as an
> exercise to the reader to de-convert them.)

Sure, you could obviously do escaping (you need to remember to escape '%' 
too when you do that ;).

However, whenever you do escaping, that means that you're already going to 
have to use a tool to unpack the dang thing. So you didn't actually win 
anything. I pretty much guarantee that my existing format is easier to 
unpack than your escaped format.

ASCII isn't magical.

This is "fsck_tree()", which walks the unpacked tree representation and 
checks that it looks sane and marks the sha1's it finds as being 
needed (so that you can do reachability analysis in a second pass). It's 
not exactly complicated:

	static int fsck_tree(unsigned char *sha1, void *data, unsigned long size)
	{
	        while (size) {
	                int len = 1+strlen(data);
	                unsigned char *file_sha1 = data + len;
	                char *path = strchr(data, ' ');
	                if (size < len + 20 || !path)
	                        return -1;
	                data += len + 20;
	                size -= len + 20;
	                mark_needs_sha1(sha1, "blob", file_sha1);
	        }
	        return 0;
	}

and there's one HUGE advantage to _not_ having escaping: sorting and
comparing.

If you escape things, you now have to decide how you sort filenames. Do
you sort them by the escaped representation, or by the "raw"  
representation? Do you always have to escape or unescape the name in order 
to sort it.

So I like ASCII as much as the next guy, but it's not a religion. If there 
isn't any point to it, there isn't any point to it.

The biggest irritation I have with the "tree" format I chose is actually
not the name (which is trivial), it's the <sha1> part. Almost everything
else keeps the <sha1> in the ASCII hexadecimal representation, and I
should have done that here too. Why? Not because it's a <sha1> - hey, the 
binary representation is certainly denser and equivalent - but because an 
ASCII representation there would have allowed me to much more easily 
change the key format if I ever wanted to. Now it's very SHA1-specific.

Which I guess is fine - I don't really see any reason to change, and if I 
do change, I could always just re-generate the whole tree. But I think it 
would have been cleaner to have _that_ part in ASCII.

			Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  0:12                     ` Linus Torvalds
  2005-04-09  2:27                       ` Andrea Arcangeli
@ 2005-04-09 16:33                       ` Roman Zippel
  2005-04-09 23:31                         ` Tupshin Harper
  2005-04-10 17:24                         ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
  1 sibling, 2 replies; 201+ messages in thread
From: Roman Zippel @ 2005-04-09 16:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrea Arcangeli, Martin Pool, linux-kernel, David Lang

Hi,

On Fri, 8 Apr 2005, Linus Torvalds wrote:

> Also, I suspect that BKCVS actually bothers to get more details out of a
> BK tree than I cared about. People have pestered Larry about it, so BKCVS
> exports a lot of the nitty-gritty (per-file comments etc) that just
> doesn't actually _matter_, but people whine about. Me, I don't care. My
> sparse-conversion just took the important parts.

As soon as you want to synchronize and merge two trees, you will know why 
this information does matter.
(/me looks closer at the sparse-conversion...)
It seems you exported the complete parent information and this is exactly 
the "nitty-gritty" I was "whining" about and which is not available via 
bkcvs or bkweb and it's the most crucial information to make the bk data 
useful outside of bk. Larry was previously very clear about this that he 
considers this proprietary bk meta data and anyone attempting to export 
this information is in violation with the free bk licence, so you indeed 
just took the important parts and this is/was explicitly verboten for 
normal bk users.

bye, Roman

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 22:52     ` Roman Zippel
  2005-04-08 23:46       ` Tupshin Harper
@ 2005-04-09 16:52       ` Eric D. Mudama
  2005-04-09 17:40         ` Roman Zippel
  1 sibling, 1 reply; 201+ messages in thread
From: Eric D. Mudama @ 2005-04-09 16:52 UTC (permalink / raw)
  To: Roman Zippel; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

On Apr 8, 2005 4:52 PM, Roman Zippel <zippel@linux-m68k.org> wrote:
> The problem is you pay a price for this. There must be a reason developers
> were adding another GB of memory just to run BK.
> Preserving the complete merge history does indeed make repeated merges
> simpler, but it builds up complex meta data, which has to be managed
> forever. I doubt that this is really an advantage in the long term. I
> expect that we were better off serializing changesets in the main
> repository. For example bk does something like this:
> 
>         A1 -> A2 -> A3 -> BM
>           \-> B1 -> B2 --^
> 
> and instead of creating the merge changeset, one could merge them like
> this:
> 
>         A1 -> A2 -> A3 -> B1 -> B2
> 
> This results in a simpler repository, which is more scalable and which
> is easier for users to work with (e.g. binary bug search).
> The disadvantage would be it will cause more minor conflicts, when changes
> are pulled back into the original tree, but which should be easily
> resolvable most of the time.

The kicker comes that B1 was developed based on A1, so any test
results were based on B1 being a single changeset delta away from A1. 
If the resulting 'BM' fails testing, and you've converted into the
linear model above where B2 has failed, you lose the ability to
isolate B1's changes and where they came from, to revalidate the
developer's results.

With bugs and fixes that can be validated in a few hours, this may not
be a problem, but when chasing a bug that takes days or weeks to
manifest, that a developer swears they fixed, one has to be able to
reproduce their exact test environment.

I believe that flattening the change graph makes history reproduction
impossible, or alternately, you are imposing on each developer to test
the merge results at B1 + A1..3 before submission, but in doing so,
the test time may require additional test periods etc and with
sufficient velocity, might never close.  This is the problem CVS has
if you don't create micro branches for every single modification.

--eric

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:26           ` Linus Torvalds
@ 2005-04-09 17:08             ` Paul Jackson
  2005-04-10  3:41             ` Paul Jackson
  2005-04-10  8:39             ` David Lang
  2 siblings, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 17:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ross, cw, linux-kernel

Linus wrote:
> (you need to remember to escape '%' 
> too when you do that ;).

No - don't have to.  Not if I don't mind giving fools that embed
newlines in paths second class service.

In my case, if I create a file named "foo\nbar", then backup and restore
it, I end up with a restored file named "foo%0Abar".  If I had backed up
another file named "foo%0Abar", and now restore it, it collides, and
last one to be restored wins.  If I really need the "foo\nbar" file back
as originally named, I will have to dig it out by hand.

I dare say that Linux kernel source does not require first class support
for newlines embedded in pathnames.

> ASCII isn't magical.

No - but it's damn convenient.  Alot of tools work on line-oriented
ASCII that don't work elsewhere.

I guess Perl-hackers won't care much, but those working with either
classic shell script tools or Python will find line formatted ASCII more
convenient.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 21:27                         ` Linus Torvalds
@ 2005-04-09 17:14                           ` Roman Zippel
  0 siblings, 0 replies; 201+ messages in thread
From: Roman Zippel @ 2005-04-09 17:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Luck, Tony, Kernel Mailing List

Hi,

On Fri, 8 Apr 2005, Linus Torvalds wrote:

> Yes.  Per-file history is expensive in git, because if the way it is 
> indexed. Things are indexed by tree and by changeset, and there are no 
> per-file indexes.
> 
> You could create per-file _caches_ (*) on top of git if you wanted to make
> it behave more like a real SCM, but yes, it's all definitely optimized for
> the things that _I_ tend to care about, which is the whole-repository
> operations.

Per file history is also expensive for another reason. The basic reason is 
that I think that a hash based storage is not the best approach for SCM. 
It's lacking locality, so the more it grows the more it has to seek to 
collect all the data.
To reduce the space usage you could replace the parent file with a sha1 
reference + delta to the new file. This is basically what monotone does 
and might cause perfomance problems if you need to restore old versions 
(e.g. if you want to annotate a file).

bye, Roman

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:16                   ` Linus Torvalds
@ 2005-04-09 17:15                     ` Paul Jackson
  2005-04-09 17:35                     ` Paul Jackson
  1 sibling, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 17:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel

Linus wrote:
> In "git", you usually care about 
>     the old contents too.

True - in your case, you probably want the old contents
so might as well dig them out as soon as it becomes
convenient to have them.

I was objecting to your claim that you _had_ to dig out
the old contents to determine if a file changed.

You don't _have_ to ... but I agree that it's a good
time to do so.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:16                   ` Linus Torvalds
  2005-04-09 17:15                     ` Paul Jackson
@ 2005-04-09 17:35                     ` Paul Jackson
  1 sibling, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 17:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jgarzik, matthias.christian, andrea, cw, linux-kernel

>  (b) while I depend on the fact that if the SHA of an object matches, the 
>      objects are the same, I generally try to avoid the reverse 
>      dependency.

It might be a valid point that you want to leave the door open to using
a different (than SHA1) digest.  (So this means you going to store it
as an ASCII string, right?)

But I don't see how that applies here.  Any optimization that avoids
rereading old versions if the digests match will never trigger on the
day you change digests.  No problem here - you doomed to reread the old
version in any case.

Either you got your logic backwards, or I need another cup of coffee.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:52       ` Eric D. Mudama
@ 2005-04-09 17:40         ` Roman Zippel
  2005-04-09 18:56           ` Ray Lee
  0 siblings, 1 reply; 201+ messages in thread
From: Roman Zippel @ 2005-04-09 17:40 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: Linus Torvalds, David Woodhouse, Kernel Mailing List

Hi,

On Sat, 9 Apr 2005, Eric D. Mudama wrote:

> > For example bk does something like this:
> > 
> >         A1 -> A2 -> A3 -> BM
> >           \-> B1 -> B2 --^
> > 
> > and instead of creating the merge changeset, one could merge them like
> > this:
> > 
> >         A1 -> A2 -> A3 -> B1 -> B2
> > 
> > This results in a simpler repository, which is more scalable and which
> > is easier for users to work with (e.g. binary bug search).
> > The disadvantage would be it will cause more minor conflicts, when changes
> > are pulled back into the original tree, but which should be easily
> > resolvable most of the time.
> 
> The kicker comes that B1 was developed based on A1, so any test
> results were based on B1 being a single changeset delta away from A1. 
> If the resulting 'BM' fails testing, and you've converted into the
> linear model above where B2 has failed, you lose the ability to
> isolate B1's changes and where they came from, to revalidate the
> developer's results.

What good does it do if you can revalidate the original B1? The important 
point is that the end result works and if it only fails in the merged 
version you have a big problem. The serialized version gives you the 
chance to test whether it fails in B1 or B2.

> I believe that flattening the change graph makes history reproduction
> impossible, or alternately, you are imposing on each developer to test
> the merge results at B1 + A1..3 before submission, but in doing so,
> the test time may require additional test periods etc and with
> sufficient velocity, might never close.

The merge result has to be tested either way, so I'm not exactly sure, 
what you're trying to say.

bye, Roman

^ permalink raw reply	[flat|nested] 201+ messages in thread

* [PATCH] Re: Kernel SCM saga..
  2005-04-09  7:08           ` Randy.Dunlap
@ 2005-04-09 18:06             ` Petr Baudis
  0 siblings, 0 replies; 201+ messages in thread
From: Petr Baudis @ 2005-04-09 18:06 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: torvalds, ross, linux-kernel

Dear diary, on Sat, Apr 09, 2005 at 09:08:59AM CEST, I got a letter
where "Randy.Dunlap" <rddunlap@osdl.org> told me that...
> On Sat, 9 Apr 2005 04:53:57 +0200 Petr Baudis wrote:
..snip..
> |   FWIW, I made few small fixes (to prevent some trivial usage errors to
> | cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> | gitlog.sh - heavily inspired by what already went through the mailing
> | list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> | (including .dircache, even though it isn't shown in the index), the
> | cumulative patch can be found below. The scripts aim to provide some
> | (obviously very interim) more high-level interface for git.
> | 
> |   I'm now working on tree-diff.c which will (surprise!) produce a diff
> | of two trees (I'll finish it after I get some sleep, though), and then I
> | will probably do some dwimmy gitdiff.sh wrapper for tree-diff and
> | show-diff. At that point I might get my hand on some pull more kind to
> | local changes.
> 
> Hi,

  Hi,

> I'll look at your scripts this weekend.  I've also been
> working on some, but mine are a bit more experimental (cruder)
> than yours are.  Anyway, here they are (attached) -- also
> available at http://developer.osdl.org/rddunlap/git/
> 
> gitin : checkin/commit
> gitwhat sha1 : what is that sha1 file (type and contents if blob or commit)
> gitlist (blob, commit, tree, or all) :
> 	list all objects with type (commit, tree, blob, or all)

  thanks - I had a look, but so far I borrowed only the prompt message
from your gitin. ;-) I'm not sure if gitwhat would be useful for me in
any way and gitlist doesn't appear too practical to me either.

  In the meantime, I've made some progress too. I made ls-tree, which
will just convert the tree object to a human readable (and script
processable) form, and wrapper gitls.sh, which will also try to guess
the tree ID. parent-id will just return the commit ID(s) of the previous
commit(s), practical if you want to diff against the previous commit
easily etc.  And finally, there is gitdiff.sh, which will produce a diff
of any two trees.

  Everything is again available at http://pasky.or.cz/~pasky/dev/git/
and again including .dircache, even though it's invisible in the index.
The cumulative patch (against 0.03) is there as well as below, generated
by the

	./gitdiff.sh 0af20307bb4c634722af0f9203dac7b3222c4a4f

command. The empty entries are changed modes (664 vs 644), I will yet
have to think about how to denote them if the content didn't change;
or I might ignore them altogether...?

  You can obviously fetch any arbitrary change by doing the appropriate
gitdiff.sh call. You can find the ids in the ChangeLog, which was
generated by the plain

	./gitlog.sh

command. (That is for HEAD. 0af20307bb4c634722af0f9203dac7b3222c4a4f is
the last commit on the Linus' branch, pass that to gitlog.sh to get his
ChangeLog. ;-)

  Next, I will probably do some bk-style pull tool. Or perhaps first
a gitpatch.sh which will verify the sha1s and do the mode changes.

  Linus, could you please have a look and tell me what do you think
about it so far?

  Thanks,

				Petr Baudis

Index: Makefile
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/Makefile (mode:100664 sha1:270cd4f8a8bf10cd513b489c4aaf76c14d4504a7)
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/Makefile (mode:100644 sha1:185ff422e68984e68da011509dec116f05fc6f8d)
@@ -1,7 +1,7 @@
 CFLAGS=-g -O3 -Wall
 CC=gcc
 
-PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file fsck-cache
+PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file fsck-cache ls-tree
 
 all: $(PROG)
 
@@ -30,6 +30,9 @@
 cat-file: cat-file.o read-cache.o
 	$(CC) $(CFLAGS) -o cat-file cat-file.o read-cache.o $(LIBS)
 
+ls-tree: ls-tree.o read-cache.o
+	$(CC) $(CFLAGS) -o ls-tree ls-tree.o read-cache.o $(LIBS)
+
 fsck-cache: fsck-cache.o read-cache.o
 	$(CC) $(CFLAGS) -o fsck-cache fsck-cache.o read-cache.o $(LIBS)
 
Index: README
===================================================================
Index: cache.h
===================================================================
Index: cat-file.c
===================================================================
Index: commit-tree.c
===================================================================
Index: fsck-cache.c
===================================================================
Index: gitadd.sh
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitadd.sh
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitadd.sh (mode:100755 sha1:d23be758c0c9fc1cf9756bcd3ee4d7266c60a2c9)
@@ -0,0 +1,13 @@
+#!/bin/sh
+#
+# Add new file to a GIT repository.
+# Copyright (c) Petr Baudis, 2005
+#
+# Takes a list of file names at the command line, and schedules them
+# for addition to the GIT repository at the next commit.
+#
+# FIXME: Those files are omitted from show-diff output!
+
+for file in "$@"; do
+	echo $file >>.dircache/add-queue
+done
Index: gitcommit.sh
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitcommit.sh
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitcommit.sh (mode:100755 sha1:67a743c6cbc9dffaa6f571d3dc83ceec2bd0c039)
@@ -0,0 +1,38 @@
+#!/bin/sh
+#
+# Commit into a GIT repository.
+# Copyright (c) Petr Baudis, 2005
+# Based on an example script fragment sent to LKML by Linus Torvalds.
+#
+# Ignores any parameters for now, excepts changelog entry on stdin.
+#
+# FIXME: Gets it wrong for filenames containing spaces.
+
+
+if [ -r .dircache/add-queue ]; then
+	mv .dircache/add-queue .dircache/add-queue-progress
+	addedfiles=$(cat .dircache/add-queue-progress)
+else
+	addedfiles=
+fi
+changedfiles=$(show-diff -s | grep -v ': ok$' | cut -d : -f 1)
+commitfiles="$addedfiles $changedfiles"
+if [ ! "$commitfiles" ]; then
+	echo 'Nothing to commit.' >&2
+	exit
+fi
+update-cache $commitfiles
+rm -f .dircache/add-queue-progress
+
+
+oldhead=$(cat .dircache/HEAD)
+treeid=$(write-tree)
+
+echo "Enter commit message, terminated by ctrl-D on a separate line:" >&2
+newhead=$(commit-tree $treeid -p $oldhead)
+
+if [ "$newhead" ]; then
+	echo $newhead >.dircache/HEAD
+else
+	echo "Error during commit (oldhead $oldhead, treeid $treeid)" >&2
+fi
Index: gitdiff.sh
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitdiff.sh
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitdiff.sh (mode:100755 sha1:17aec840c7c0e0b4e4e78fd94b754fe6bc2f2ff2)
@@ -0,0 +1,104 @@
+#!/bin/sh
+#
+# Make a diff between two GIT trees.
+# Copyright (c) Petr Baudis, 2005
+#
+# Takes two parameters identifying the two trees/commits to compare.
+# Empty string will be substitued to HEAD revision.
+#
+# Outputs a diff converting the first tree to the second one.
+
+
+TREE="^tree [A-z0-9]{40}$"
+
+tree1ls=$(mktemp -t gitdiff.XXXXXX)
+tree2ls=$(mktemp -t gitdiff.XXXXXX)
+diffdir=$(mktemp -d -t gitdiff.XXXXXX)
+
+function die () {
+	echo gitdiff: $@ >&2
+	rm -f "$tree1ls" "$tree2ls"
+	rm -rf "$diffdir"
+	exit
+}
+
+function normalize_id () {
+	# XXX: This is basically a copy of gitls.sh
+	id=$1
+	if [ ! "$id" ]; then
+		id=$(cat .dircache/HEAD)
+	fi
+	if [ $(cat-file -t "$id") = "commit" ]; then
+		id=$(cat-file commit $id | egrep "$TREE" | cut -d ' ' -f 2)
+	fi
+	if [ ! $(cat-file -t "$id") = "tree" ]; then
+		die "Invalid ID supplied: $id"
+	fi
+	echo $id
+}
+
+function mkdiff () {
+	loc=$1; treeid=$2; fname=$3; mode=$4; sha1=$5;
+
+	if [ x"$sha1" != x"!" ]; then
+		cat-file blob $sha1 >$loc
+	else
+		>$loc
+	fi
+
+	label="$treeid/$fname";
+
+	labelapp=""
+	[ x"$mode" != x"!" ] && labelapp="$labelapp mode:$mode"
+	[ x"$sha1" != x"!" ] && labelapp="$labelapp sha1:$sha1"
+	labelapp=$(echo "$labelapp" | sed 's/^ *//')
+
+	[ "$labelapp" ] && label="$label  ($labelapp)"
+
+	echo $label
+}
+
+id1=$(normalize_id "$1")
+id2=$(normalize_id "$2")
+
+[ "$2" != "$1" ] || die "Cannot diff tree against itself."
+
+ls-tree "$id1" >$tree1ls
+[ -s "$tree1ls" ] || die "Error retrieving the first tree."
+ls-tree "$id2" >$tree2ls
+[ -s "$tree2ls" ] || die "Error retrieving the second tree."
+
+diffdir1="$diffdir/$id1"
+diffdir2="$diffdir/$id2"
+mkdir $diffdir1 $diffdir2
+
+join -e ! -a 1 -a 2 -j 4 -o 0,1.1,1.3,2.1,2.3 $tree1ls $tree2ls | {
+	while read line; do
+		name=$(echo $line | cut -d ' ' -f 1)
+		mode1=$(echo $line | cut -d ' ' -f 2)
+		sha1=$(echo $line | cut -d ' ' -f 3)
+		mode2=$(echo $line | cut -d ' ' -f 4)
+		sha2=$(echo $line | cut -d ' ' -f 5)
+
+		# XXX: The diff format is currently pretty ugly;
+		# ideally, we should print the sha1 and mode at the
+		# +++ and --- lines, but
+
+		if [ "$mode1" != "$mode2" ] || [ "$sha1" != "$sha2" ]; then
+			echo "Index: $name"
+			echo "==================================================================="
+
+			loc1="$diffdir1/$name"
+			loc2="$diffdir2/$name"
+			mkdir -p $(dirname $loc1) $(dirname $loc2)
+
+			label1=$(mkdiff "$loc1" $id1 "$name" $mode1 $sha1)
+			label2=$(mkdiff "$loc2" $id2 "$name" $mode2 $sha2)
+
+			diff -L "$label1" -L "$label2" -u "$loc1" "$loc2"
+		fi
+	done
+}
+
+rm -f "$tree1ls" "$tree2ls"
+rm -rf "$diffdir"
Index: gitlog.sh
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitlog.sh
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitlog.sh (mode:100755 sha1:e7a4eed8c0526821d00b08094c73fabb72eff4df)
@@ -0,0 +1,61 @@
+#!/bin/sh
+####
+#### Call this script with an object and it will produce the change
+#### information for all the parents of that object
+####
+#### This script was originally written by Ross Vandegrift.
+# multiple parents test 1d0f4aec21e5b66c441213643426c770dc6dedc0
+# parents: ffa098b2e187b71b86a76d3cd5eb77d074a2503c
+# 6860e0d9197c7f52155466c225baf39b42d62f63
+
+# regex for parent declarations
+PARENTS="^parent [A-z0-9]{40}$"
+
+TMPCL="/tmp/gitlog.$$"
+
+# takes an object and generates the object's parent(s)
+function unpack_parents () {
+	echo "me $1"
+	echo "me $1" >>$TMPCL
+	RENTS=""
+
+	TMPCM=$(mktemp)
+	cat-file commit $1 >$TMPCM
+	while read line; do
+		if echo "$line" | egrep -q "$PARENTS"; then
+			RENTS="$RENTS "$(echo $line | sed 's/parent //g')
+		fi
+		echo $line
+	done <$TMPCM
+	rm $TMPCM
+
+	echo -e "\n--------------------------\n"
+
+	# if the last object had no parents, return
+	if [ ! "$RENTS" ]; then
+		return;
+	fi
+
+	#useful for testing
+	#echo $RENTS
+	#read
+	for i in `echo $RENTS`; do
+		# break cycles
+		if grep -q "me $i" $TMPCL; then
+			echo "Already visited $i" >&2
+			continue
+		else
+			unpack_parents $i
+		fi
+	done
+}
+
+base=$1
+if [ ! "$base" ]; then
+	base=$(cat .dircache/HEAD)
+fi
+
+rm -f $TMPCL
+unpack_parents $base
+rm -f $TMPCL
+
Index: gitls.sh
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/gitls.sh
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/gitls.sh (mode:100755 sha1:4fe78b764ac0ab3cdb16631bbfdd65edb138e47b)
@@ -0,0 +1,22 @@
+#!/bin/sh
+#
+# List contents of a particular tree in a GIT repository.
+# Copyright (c) Petr Baudis, 2005
+#
+# Optionally takes commit or tree id as a parameter, defaulting to HEAD.
+
+TREE="^tree [A-z0-9]{40}$"
+
+id=$1
+if [ ! "$id" ]; then
+	id=$(cat .dircache/HEAD)
+fi
+if [ $(cat-file -t "$id") = "commit" ]; then
+	id=$(cat-file commit $id | egrep "$TREE" | cut -d ' ' -f 2)
+fi
+if [ ! $(cat-file -t "$id") = "tree" ]; then
+	echo "Invalid ID supplied: $id" >&2
+	exit
+fi
+
+ls-tree "$id"
Index: init-db.c
===================================================================
Index: ls-tree.c
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/ls-tree.c
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/ls-tree.c (mode:100644 sha1:ed5b82cd7f41c3ea4140fa1ee4b80b786f190151)
@@ -0,0 +1,51 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+#include "cache.h"
+
+static int list(unsigned char *sha1)
+{
+	void *buffer;
+	unsigned long size;
+	char type[20];
+
+	buffer = read_sha1_file(sha1, type, &size);
+	if (!buffer)
+		usage("unable to read sha1 file");
+	if (strcmp(type, "tree"))
+		usage("expected a 'tree' node");
+	while (size) {
+		int len = strlen(buffer)+1;
+		unsigned char *sha1 = buffer + len;
+		char *path = strchr(buffer, ' ')+1;
+		unsigned int mode;
+
+		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+			usage("corrupt 'tree' file");
+		buffer = sha1 + 20;
+		size -= len + 20;
+		/* XXX: We just assume the type is "blob" as it should be.
+		 * It seems worthless to read each file just to get this
+		 * and the file size. -- pasky@ucw.cz */
+		printf("%03o\t%s\t%s\t%s\n", mode, "blob", sha1_to_hex(sha1), path);
+	}
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	unsigned char sha1[20];
+
+	if (argc != 2)
+		usage("ls-tree <key>");
+	if (get_sha1_hex(argv[1], sha1) < 0)
+		usage("ls-tree <key>");
+	sha1_file_directory = getenv(DB_ENVIRONMENT);
+	if (!sha1_file_directory)
+		sha1_file_directory = DEFAULT_DB_ENVIRONMENT;
+	if (list(sha1) < 0)
+		usage("list failed");
+	return 0;
+}
Index: parent-id
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/parent-id
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/parent-id (mode:100755 sha1:198c551b7367988b48aa7a69876e098d73c19e88)
@@ -0,0 +1,15 @@
+#!/bin/sh
+#
+# Get ID of parent commit to a given revision or HEAD.
+# Copyright (c) Petr Baudis, 2005
+#
+# Takes ID of the current commit, defaults to HEAD.
+
+PARENT="^parent [A-z0-9]{40}$"
+
+id=$1
+if [ ! "$id" ]; then
+	id=$(cat .dircache/HEAD)
+fi
+
+cat-file commit $id | egrep "$PARENT" | cut -d ' ' -f 2
Index: read-cache.c
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/read-cache.c (mode:100664 sha1:e51c9ee84874b5ff0f22b11dcd4fe1f905e72a5e)
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/read-cache.c (mode:100644 sha1:3dbe6db46933683721ceafdcdd70da521a32269a)
@@ -264,11 +264,12 @@
 	size = 0; // avoid gcc warning
 	map = (void *)-1;
 	if (!fstat(fd, &st)) {
-		map = NULL;
 		size = st.st_size;
 		errno = EINVAL;
 		if (size > sizeof(struct cache_header))
 			map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+		else
+			return (!hdr->entries) ? 0 : error("inconsistent cache");
 	}
 	close(fd);
 	if (-1 == (int)(long)map)
Index: read-tree.c
===================================================================
Index: show-diff.c
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/show-diff.c (mode:100664 sha1:45f6e3140b3923497fdec808aec0e86ecf358b92)
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/show-diff.c (mode:100644 sha1:9beda1382103df29914d965fc135def0e6e7e839)
@@ -49,9 +49,17 @@
 
 int main(int argc, char **argv)
 {
+	int silent = 0;
 	int entries = read_cache();
 	int i;
 
+	while (argc-- > 1) {
+		if (!strcmp(argv[1], "-s"))
+			silent = 1;
+		else if (!strcmp(argv[1], "-h") || !strcmp(argv[1], "--help"))
+			usage("show-diff [-s]");
+	}
+
 	if (entries < 0) {
 		perror("read_cache");
 		exit(1);
@@ -77,6 +85,9 @@
 		for (n = 0; n < 20; n++)
 			printf("%02x", ce->sha1[n]);
 		printf("\n");
+		if (silent)
+			continue;
+
 		new = read_sha1_file(ce->sha1, type, &size);
 		show_differences(ce, &st, new, size);
 		free(new);
Index: update-cache.c
===================================================================
--- 6be98a9e92a3f131a3fdf0dc3a8576fba6421569/update-cache.c (mode:100664 sha1:9dcee6f628d5accaa5219f72a2e790c082d9dd9a)
+++ 3f6cc0ad3e076e05281438b0de69a7d6a5522d17/update-cache.c (mode:100644 sha1:916430a05a9da088dae1ea82eb8d5392033f548a)
@@ -231,6 +231,9 @@
 		return -1;
 	}
 
+	if (argc < 2)
+		usage("update-cache <file>*");
+
 	newfd = open(".dircache/index.lock", O_RDWR | O_CREAT | O_EXCL, 0600);
 	if (newfd < 0) {
 		perror("unable to create new cachefile");
Index: write-tree.c
===================================================================

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 15:42               ` Paul Jackson
@ 2005-04-09 18:45                 ` Marcin Dalecki
  0 siblings, 0 replies; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09 18:45 UTC (permalink / raw)
  To: Paul Jackson; +Cc: linux-kernel, matthias.christian, andrea, cw, torvalds


On 2005-04-09, at 17:42, Paul Jackson wrote:

> Marcin wrote:
>> But what will impress you are either the price tag the
>> DB comes with or
>> the hardware it runs on :-)
>
> The payroll for the staffing to care and feed for these
> babies is often impressive as well.

Please don't forget the bill from the electric plant behind it!


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 17:40         ` Roman Zippel
@ 2005-04-09 18:56           ` Ray Lee
  0 siblings, 0 replies; 201+ messages in thread
From: Ray Lee @ 2005-04-09 18:56 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Kernel Mailing List, David Woodhouse, Linus Torvalds, Eric D. Mudama

On Sat, 2005-04-09 at 19:40 +0200, Roman Zippel wrote:
> On Sat, 9 Apr 2005, Eric D. Mudama wrote:
> > > For example bk does something like this:
> > > 
> > >         A1 -> A2 -> A3 -> BM
> > >           \-> B1 -> B2 --^
> > > 
> > > and instead of creating the merge changeset, one could merge them like
> > > this:
> > > 
> > >         A1 -> A2 -> A3 -> B1 -> B2

> > I believe that flattening the change graph makes history reproduction
> > impossible, or alternately, you are imposing on each developer to test
> > the merge results at B1 + A1..3 before submission, but in doing so,
> > the test time may require additional test periods etc and with
> > sufficient velocity, might never close.
> 
> The merge result has to be tested either way, so I'm not exactly sure, 
> what you're trying to say.

The kernel changes. A lot. And often.

With that in mind, if (for example) A2 and A3 are simple changes that
are quick to test and B1 is large, or complex, or requires hours (days,
weeks) of testing to validate, then a maintainer's decision can
legitimately be to rebase a tree (say, -mm) upon the B1 line of
development, and toss the A2 branch back to those developers with a
"Sorry it didn't work out, something here causes Unhappiness with B1,
can you track down the problem and try again?"

Ray


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:50                     ` David Lang
@ 2005-04-09 22:12                       ` Florian Weimer
  0 siblings, 0 replies; 201+ messages in thread
From: Florian Weimer @ 2005-04-09 22:12 UTC (permalink / raw)
  To: David Lang; +Cc: Kernel Mailing List

* David Lang:

>> Databases supporting replication are called high end. You forgot
>> the cats dance around the network this issue involves.
>
> And Postgres (which is Free in all senses of the word) is high end by this 
> definition.

I'm not aware of *any* DBMS, commercial or not, which can perform
meaningful multi-master replication on tables which mainly consist of
text files as records.  All you can get is single-master replication
(which is well-understood), or some rather scary stuff which involves
throwing away updates, or taking extrema or averages (even automatic
3-way merges aren't available).

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  5:45                         ` Linus Torvalds
@ 2005-04-09 22:55                           ` David S. Miller
  2005-04-09 23:13                             ` Linus Torvalds
                                               ` (2 more replies)
  0 siblings, 3 replies; 201+ messages in thread
From: David S. Miller @ 2005-04-09 22:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: andrea, mbp, linux-kernel, dlang

On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> Also, I don't want people editing repostitory files by hand. Sure, the 
> sha1 catches it, but still... I'd rather force the low-level ops to use 
> the proper helper routines. Which is why it's a raw zlib compressed blob, 
> not a gzipped file.

I understand the arguments for compression, but I hate it for one
simple reason: recovery is more difficult when you corrupt some
file in your repository.

It's happened to me more than once and I did lose data.

Without compression, I might be able to recover if something
causes a block of zeros to be written to the middle of some
repository file.  With compression, you pretty much just lose.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 22:55                           ` David S. Miller
@ 2005-04-09 23:13                             ` Linus Torvalds
  2005-04-10  0:14                               ` Chris Wedgwood
  2005-04-10  0:22                             ` Paul Jackson
  2005-04-10 11:33                             ` Ingo Molnar
  2 siblings, 1 reply; 201+ messages in thread
From: Linus Torvalds @ 2005-04-09 23:13 UTC (permalink / raw)
  To: David S. Miller; +Cc: andrea, mbp, linux-kernel, dlang



On Sat, 9 Apr 2005, David S. Miller wrote:
> 
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.

Trust me, the way git does things, you'll have so much redundancy that 
you'll have to really _work_ at losing data.

That's the good news.

The bad news is that this is obviously why it does eat a lot of disk. 
Since it saves full-file commits, you're going to have a lot of 
(compressed) full files around.

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:33                       ` Roman Zippel
@ 2005-04-09 23:31                         ` Tupshin Harper
  2005-04-10 17:24                         ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
  1 sibling, 0 replies; 201+ messages in thread
From: Tupshin Harper @ 2005-04-09 23:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Roman Zippel, Linus Torvalds, Andrea Arcangeli, Martin Pool, David Lang

Roman Zippel wrote:

>It seems you exported the complete parent information and this is exactly 
>the "nitty-gritty" I was "whining" about and which is not available via 
>bkcvs or bkweb and it's the most crucial information to make the bk data 
>useful outside of bk. Larry was previously very clear about this that he 
>considers this proprietary bk meta data and anyone attempting to export 
>this information is in violation with the free bk licence, so you indeed 
>just took the important parts and this is/was explicitly verboten for 
>normal bk users.
>  
>
Yes, this is exactly the information that would be necessary to create a 
general interop tool between bk and darcs|arch|monotone, and is the 
fundamental objection I and others have had to open source projects 
using BK. Is Bitmover willing to grant a special dispensation to allow a 
lossless conversion of the linux history to another format?

-Tupshin

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 23:13                             ` Linus Torvalds
@ 2005-04-10  0:14                               ` Chris Wedgwood
  2005-04-10  1:56                                 ` Paul Jackson
  0 siblings, 1 reply; 201+ messages in thread
From: Chris Wedgwood @ 2005-04-10  0:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David S. Miller, andrea, mbp, linux-kernel, dlang

On Sat, Apr 09, 2005 at 04:13:51PM -0700, Linus Torvalds wrote:

> > I understand the arguments for compression, but I hate it for one
> > simple reason: recovery is more difficult when you corrupt some
> > file in your repository.

I've had this too.  Magic binary blobs are horrible here for data loss
which is why I'm not keen on subversion.

> Trust me, the way git does things, you'll have so much redundancy
> that you'll have to really _work_ at losing data.

It's not clear to me that compression should be *required* though.
Shouldn't we be able to turn this off in some cases?

> The bad news is that this is obviously why it does eat a lot of
> disk.

Disk is cheap, but sadly page-cache is not :-(

> Since it saves full-file commits, you're going to have a lot of
> (compressed) full files around.

How many is alot?  Are we talking 100k, 1m, 10m?

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 22:55                           ` David S. Miller
  2005-04-09 23:13                             ` Linus Torvalds
@ 2005-04-10  0:22                             ` Paul Jackson
  2005-04-10 11:33                             ` Ingo Molnar
  2 siblings, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-10  0:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: torvalds, andrea, mbp, linux-kernel, dlang

David wrote:
> recovery is more difficult when you corrupt some
> file in your repository.

Agreed.  I too have recovered RCS and SCCS files by hand editing.


Linus wrote:
> I don't want people editing repostitory files by hand.

Tyrant !;)

>From Wikipedia:

    A tyrant is a usurper of rightful power.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Re: Kernel SCM saga..
  2005-04-09  2:53         ` Petr Baudis
  2005-04-09  7:08           ` Randy.Dunlap
@ 2005-04-10  1:01           ` Phillip Lougher
  2005-04-10  1:42             ` Petr Baudis
  1 sibling, 1 reply; 201+ messages in thread
From: Phillip Lougher @ 2005-04-10  1:01 UTC (permalink / raw)
  To: Linus Torvalds, ross, Kernel Mailing List; +Cc: rddunlap, Phil Lougher

On Apr 9, 2005 3:53 AM, Petr Baudis <pasky@ucw.cz> wrote:

>   FWIW, I made few small fixes (to prevent some trivial usage errors to
> cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> gitlog.sh - heavily inspired by what already went through the mailing
> list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> (including .dircache, even though it isn't shown in the index), the
> cumulative patch can be found below. The scripts aim to provide some
> (obviously very interim) more high-level interface for git.

I did a bit of playing about with the changelog generate script,
trying to produce a faster version.  The attached version uses a
couple of improvements to be a lot faster (e.g. no recursion in the
common case of one parent).

FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and
28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my
hardware.  You mileage may of course vary.

Regards

Phillip

--------------------------------------
#!/bin/sh

changelog() {
        local parents new_parent
        declare -a new_parent

        new_parent[0]=$1
        parents=1

        while [ $parents -gt 0 ]; do
                parent=${new_parent[$((parents-1))]}
                echo $parent >> $TMP
                cat-file commit $parent > $TMP_FILE

                echo me $parent
                cat $TMP_FILE
                echo -e "\n--------------------------\n"

                parents=0
                while read type text; do
                        if [ $type = 'committer' ]; then
                                break;
                        elif [ $type = 'parent' ] &&
                                        ! grep -q $text $TMP ; then
                                new_parent[$parents]=$text
                                parents=$((parents+1))
                        fi
                done < $TMP_FILE

                i=0
                while [ $i -lt $((parents-1)) ]; do
                        changelog ${new_parent[$i]}
                        i=$((i+1))
                done
        done
}

TMP=`mktemp`
TMP_FILE=`mktemp`

base=$1
if [ ! "$base" ]; then
        base=$(cat .dircache/HEAD)
fi
changelog $base
rm -rf $TMP $TMP_FILE

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Re: Re: Kernel SCM saga..
  2005-04-10  1:01           ` Phillip Lougher
@ 2005-04-10  1:42             ` Petr Baudis
  2005-04-10  1:57               ` Phillip Lougher
  0 siblings, 1 reply; 201+ messages in thread
From: Petr Baudis @ 2005-04-10  1:42 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: Linus Torvalds, ross, Kernel Mailing List

Dear diary, on Sun, Apr 10, 2005 at 03:01:12AM CEST, I got a letter
where Phillip Lougher <phil.lougher@gmail.com> told me that...
> On Apr 9, 2005 3:53 AM, Petr Baudis <pasky@ucw.cz> wrote:
> 
> >   FWIW, I made few small fixes (to prevent some trivial usage errors to
> > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> > gitlog.sh - heavily inspired by what already went through the mailing
> > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> > (including .dircache, even though it isn't shown in the index), the
> > cumulative patch can be found below. The scripts aim to provide some
> > (obviously very interim) more high-level interface for git.
> 
> I did a bit of playing about with the changelog generate script,
> trying to produce a faster version.  The attached version uses a
> couple of improvements to be a lot faster (e.g. no recursion in the
> common case of one parent).
> 
> FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and
> 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my
> hardware.  You mileage may of course vary.

Wow, really impressive! Great work, I've merged it (if you don't object,
of course).

Wondering why I wasn't in the Cc list, BTW.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
98% of the time I am right. Why worry about the other 3%.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10  0:14                               ` Chris Wedgwood
@ 2005-04-10  1:56                                 ` Paul Jackson
  2005-04-10 12:03                                   ` Ingo Molnar
  0 siblings, 1 reply; 201+ messages in thread
From: Paul Jackson @ 2005-04-10  1:56 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: torvalds, davem, andrea, mbp, linux-kernel, dlang

Chris wrote:
> How many is alot?  Are we talking 100k, 1m, 10m?

I pulled some numbers out of my bk tree for Linux.

I have 16817 source files.

They average 12.2 bitkeeper changes per file (counting the number of
changes visible from doing 'bk sccslog' on each of the 16817 files). 

These 16817 files consume:

	224 MBytes uncompressed and
	 95 MBytes compressed

(using zlib's minigzip, on a 4 KB page reiserfs.)

Since each change will get its own copy of the file, multiplying these
two sizes (224 and 95) by 12.2 changes per file means the disk cost
would be:

	2.73 GByte uncompressed, or
	1.16 GBytes compressed.

I was pleasantly surprised at the degree of compression, shrinking files
to 42% of their original size.  I expected, since the classic rule of
thumb here to archive before compressing wasn't being followed (nor
should it be) and we were compressing lots a little files, we would save
fewer disk blocks than this.

Of course, since as Linus reminds us, it's disk buffers in memory,
not blocks on disk, that are precious, it's more like we will save
224 - 95 == 129 MBytes of RAM to hold one entire tree.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Re: Re: Kernel SCM saga..
  2005-04-10  1:42             ` Petr Baudis
@ 2005-04-10  1:57               ` Phillip Lougher
  0 siblings, 0 replies; 201+ messages in thread
From: Phillip Lougher @ 2005-04-10  1:57 UTC (permalink / raw)
  To: Phillip Lougher, Linus Torvalds, ross, Kernel Mailing List, pasky

On Apr 10, 2005 2:42 AM, Petr Baudis <pasky@ucw.cz> wrote:
> Dear diary, on Sun, Apr 10, 2005 at 03:01:12AM CEST, I got a letter
> where Phillip Lougher <phil.lougher@gmail.com> told me that...
> > On Apr 9, 2005 3:53 AM, Petr Baudis <pasky@ucw.cz> wrote:
> >
> > >   FWIW, I made few small fixes (to prevent some trivial usage errors to
> > > cause cache corruption) and added scripts gitcommit.sh, gitadd.sh and
> > > gitlog.sh - heavily inspired by what already went through the mailing
> > > list. Everything is available at http://pasky.or.cz/~pasky/dev/git/
> > > (including .dircache, even though it isn't shown in the index), the
> > > cumulative patch can be found below. The scripts aim to provide some
> > > (obviously very interim) more high-level interface for git.
> >
> > I did a bit of playing about with the changelog generate script,
> > trying to produce a faster version.  The attached version uses a
> > couple of improvements to be a lot faster (e.g. no recursion in the
> > common case of one parent).
> >
> > FWIW it is 7x faster than makechlog.sh (4.342 secs vs 34.129 secs) and
> > 28x faster than gitlog.sh (4.342 secs vs 2 mins 4 secs) on my
> > hardware.  You mileage may of course vary.
> 
> Wow, really impressive! Great work, I've merged it (if you don't object,
> of course).

Of course I don't object...

> 
> Wondering why I wasn't in the Cc list, BTW.

Weird, it wasn't intentional.  I read LKML in Gmail (which I don't use
for much else), and just clicked "reply", expecting to do the right
thing.  Replying to this email it's also left you off the CC list. 
Looking at the email source I believe it's probably to do with the
following:

Mail-Followup-To: Linus Torvalds <torvalds@osdl.org>,
	ross@jose.lug.udel.edu,
	Kernel Mailing List <linux-kernel@vger.kernel.org>> 

I've CC'd you explicitly on this.

Phillip

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:26           ` Linus Torvalds
  2005-04-09 17:08             ` Paul Jackson
@ 2005-04-10  3:41             ` Paul Jackson
  2005-04-10  8:39             ` David Lang
  2 siblings, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-10  3:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: ross, cw, linux-kernel

Linus wrote:
> Almost everything
> else keeps the <sha1> in the ASCII hexadecimal representation, and I
> should have done that here too. Why? Not because it's a <sha1> - hey, the 
> binary representation is certainly denser and equivalent

Since the size of <compressed> ASCII sha1's is only about 18% larger
than the size of the same number of binary sha1's <compressed or not>, I
don't see you gain much from the binary.

I cast my non-existent vote for making the sha1 ascii - while you still can ;).

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:26           ` Linus Torvalds
  2005-04-09 17:08             ` Paul Jackson
  2005-04-10  3:41             ` Paul Jackson
@ 2005-04-10  8:39             ` David Lang
  2005-04-10  9:40               ` Junio C Hamano
  2 siblings, 1 reply; 201+ messages in thread
From: David Lang @ 2005-04-10  8:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Jackson, ross, cw, linux-kernel

On Sat, 9 Apr 2005, Linus Torvalds wrote:

>
> The biggest irritation I have with the "tree" format I chose is actually
> not the name (which is trivial), it's the <sha1> part. Almost everything
> else keeps the <sha1> in the ASCII hexadecimal representation, and I
> should have done that here too. Why? Not because it's a <sha1> - hey, the
> binary representation is certainly denser and equivalent - but because an
> ASCII representation there would have allowed me to much more easily
> change the key format if I ever wanted to. Now it's very SHA1-specific.
>
> Which I guess is fine - I don't really see any reason to change, and if I
> do change, I could always just re-generate the whole tree. But I think it
> would have been cleaner to have _that_ part in ASCII.
>

just wanted to point out that recent news shows that sha1 isn't as good as 
it was thought to be (far easier to deliberatly create collisions then it 
should be)

this hasn't reached a point where you HAVE to quit useing it (especially 
since you have the other validity checks in place), but it's a good reason 
to expect that you may want to change to something else in a few years.

it's a lot easier to change things now to make that move easier then once 
this is being used extensively

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 16:17       ` David Roundy
@ 2005-04-10  9:24         ` Giuseppe Bilotta
  2005-04-10 13:51           ` David Roundy
  0 siblings, 1 reply; 201+ messages in thread
From: Giuseppe Bilotta @ 2005-04-10  9:24 UTC (permalink / raw)
  To: linux-kernel

On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote:

> I've recently made some improvements
> recently which will reduce the memory use

Does this include check for redundancy? ;)

-- 
Giuseppe "Oblomov" Bilotta

Hic manebimus optime


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10  8:39             ` David Lang
@ 2005-04-10  9:40               ` Junio C Hamano
  2005-04-10 16:46                 ` Bill Davidsen
  0 siblings, 1 reply; 201+ messages in thread
From: Junio C Hamano @ 2005-04-10  9:40 UTC (permalink / raw)
  To: David Lang; +Cc: linux-kernel

>>>>> "DL" == David Lang <dlang@digitalinsight.com> writes:

DL> just wanted to point out that recent news shows that sha1 isn't as
DL> good as it was thought to be (far easier to deliberatly create
DL> collisions then it should be)

I suspect there is no need to do so...

  Message-ID: <Pine.LNX.4.58.0504090902170.1267@ppc970.osdl.org>
  From: Linus Torvalds <torvalds@osdl.org>
  Subject: Re: Kernel SCM saga..
  Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT)

  ...

                  Linus 

  (*) yeah, yeah, I know about the current theoretical case, and I don't
  care. Not only is it theoretical, the way my objects are packed you'd have
  to not just generate the same SHA1 for it, it would have to _also_ still
  be a valid zlib object _and_ get the header to match the "type + length"  
  of object part. IOW, the object validity checks are actually even stricter
  than just "sha1 matches".


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09 22:55                           ` David S. Miller
  2005-04-09 23:13                             ` Linus Torvalds
  2005-04-10  0:22                             ` Paul Jackson
@ 2005-04-10 11:33                             ` Ingo Molnar
  2 siblings, 0 replies; 201+ messages in thread
From: Ingo Molnar @ 2005-04-10 11:33 UTC (permalink / raw)
  To: David S. Miller
  Cc: Linus Torvalds, andrea, mbp, linux-kernel, dlang, Paul Jackson


* David S. Miller <davem@davemloft.net> wrote:

> On Fri, 8 Apr 2005 22:45:18 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > Also, I don't want people editing repostitory files by hand. Sure, the 
> > sha1 catches it, but still... I'd rather force the low-level ops to use 
> > the proper helper routines. Which is why it's a raw zlib compressed blob, 
> > not a gzipped file.
> 
> I understand the arguments for compression, but I hate it for one
> simple reason: recovery is more difficult when you corrupt some
> file in your repository.
> 
> It's happened to me more than once and I did lose data.
> 
> Without compression, I might be able to recover if something
> causes a block of zeros to be written to the middle of some
> repository file.  With compression, you pretty much just lose.

that depends on how you compress. You are perfectly right that with 
default zlib compression, where you start the compression stream and 
stop it at the end of the file, recovery in case of damage is very hard 
for the portion that comes _after_ the damaged section. You'd have to 
reconstruct the compression state which is akin to breaking a key.

But with zlib you can 'flush' the compression state every couple of 
blocks and basically get the same recovery properties, at some very 
minimal extra space cost (because when you flush out compression state 
you get some extra padding bytes).

Flushing has another advantage as well: a small delta (even if it 
increases/decreases the file size!) in the middle of a larger file will 
still be compressed to the same output both before and after the change 
area (modulo flush block size), which rsync can pick up just fine. (IIRC 
that is one of the reasons why Debian, when compressing .deb's, does 
zlib-flushes every couple of blocks, so that rsync/apt-get can pick up 
partial .deb's as well.)

the zlib option is i think Z_PARTIAL_FLUSH, i'm using it in Tux to do 
chunks of compression. The flushing cost ismax 12 bytes or so, so if 
it's done every 4K we maximize the cost to 0.2%.

so flushing is both rsync-friendly and recovery-friendly.

(recovery isnt as simple as with plaintext, as you have to find the next 
'block' and the block length will be inevitably variable. But it should 
be pretty predictable, and tools might even exist.)

	Ingo

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10  1:56                                 ` Paul Jackson
@ 2005-04-10 12:03                                   ` Ingo Molnar
  2005-04-10 17:38                                     ` Paul Jackson
  0 siblings, 1 reply; 201+ messages in thread
From: Ingo Molnar @ 2005-04-10 12:03 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Chris Wedgwood, torvalds, davem, andrea, mbp, linux-kernel, dlang


* Paul Jackson <pj@engr.sgi.com> wrote:

> These 16817 files consume:
> 
> 	224 MBytes uncompressed and
> 	 95 MBytes compressed
> 
> (using zlib's minigzip, on a 4 KB page reiserfs.)

that's a 42.4% compressed size. Using a (much) more CPU-intense 
compression method (bzip -9), the compressed size is down to 45 MBytes.  
(a ratio of 20.2%)

using default 'gzip' i get 57 MB compressed.

> Since each change will get its own copy of the file, multiplying these
> two sizes (224 and 95) by 12.2 changes per file means the disk cost
> would be:
> 
> 	2.73 GByte uncompressed, or
> 	1.16 GBytes compressed.

with bzip2 -9 it would be 551 MBytes. It might as well be practical on 
faster CPUs, a full tree (224 MBytes, 45 MBytes compressed) decompresses 
in 24 seconds on a 3.4GHz P4 - single CPU. (and with dual core likely 
becoming the standard, we might as well divide that by two) With default 
gzip it's 3.3 seconds though, and that still compresses it down to 57 
MB.

	Ingo

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10  9:24         ` Giuseppe Bilotta
@ 2005-04-10 13:51           ` David Roundy
  0 siblings, 0 replies; 201+ messages in thread
From: David Roundy @ 2005-04-10 13:51 UTC (permalink / raw)
  To: linux-kernel

On Sun, Apr 10, 2005 at 11:24:07AM +0200, Giuseppe Bilotta wrote:
> On Sat, 9 Apr 2005 12:17:58 -0400, David Roundy wrote:
> 
> > I've recently made some improvements recently which will reduce the
> > memory use
> 
> Does this include check for redundancy? ;)

Yeah, the only catch is that if the redundancy checks fail, we now may
leave the repository in an inconsistent, but repairable, state.  (Only a
cache of the pristine tree is affected.)  The recent improvements mostly
came by increasing the laziness of a few operations, which meant we don't
need to store the entire parsed tree (or parsed patch) in memory for
certain operations.
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10  9:40               ` Junio C Hamano
@ 2005-04-10 16:46                 ` Bill Davidsen
  2005-04-10 17:50                   ` Paul Jackson
  0 siblings, 1 reply; 201+ messages in thread
From: Bill Davidsen @ 2005-04-10 16:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Lang, linux-kernel

On Sun, 10 Apr 2005, Junio C Hamano wrote:

> >>>>> "DL" == David Lang <dlang@digitalinsight.com> writes:
> 
> DL> just wanted to point out that recent news shows that sha1 isn't as
> DL> good as it was thought to be (far easier to deliberatly create
> DL> collisions then it should be)
> 
> I suspect there is no need to do so...

It's possible to generate another object with the same hash, but:
 - you can't just take your desired object and do magic to make it hash
   right
 - it may not have the same length (almost certainly)
 - it's still non-trivial in terms of computation needed

> 
>   Message-ID: <Pine.LNX.4.58.0504090902170.1267@ppc970.osdl.org>
>   From: Linus Torvalds <torvalds@osdl.org>
>   Subject: Re: Kernel SCM saga..
>   Date: Sat, 9 Apr 2005 09:16:22 -0700 (PDT)
> 
>   ...
> 
>                   Linus 
> 
>   (*) yeah, yeah, I know about the current theoretical case, and I don't
>   care. Not only is it theoretical, the way my objects are packed you'd have
>   to not just generate the same SHA1 for it, it would have to _also_ still
>   be a valid zlib object _and_ get the header to match the "type + length"  
>   of object part. IOW, the object validity checks are actually even stricter
>   than just "sha1 matches".
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Code snippet to reconstruct ancestry graph from bk repo
  2005-04-09 16:33                       ` Roman Zippel
  2005-04-09 23:31                         ` Tupshin Harper
@ 2005-04-10 17:24                         ` Paul P Komkoff Jr
  2005-04-10 18:19                           ` Roman Zippel
  1 sibling, 1 reply; 201+ messages in thread
From: Paul P Komkoff Jr @ 2005-04-10 17:24 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andrea Arcangeli, Martin Pool, linux-kernel, David Lang

Replying to Roman Zippel:
> the "nitty-gritty" I was "whining" about and which is not available via 
> bkcvs or bkweb and it's the most crucial information to make the bk data 
> useful outside of bk. Larry was previously very clear about this that he 
> considers this proprietary bk meta data and anyone attempting to export 
> this information is in violation with the free bk licence, so you indeed 
> just took the important parts and this is/was explicitly verboten for 
> normal bk users.

(borrowed from Tommi Virtanen)

Code snippet to reconstruct ancestry graph from bk repo:
bk changes -end':I: $if(:PARENT:){:PARENT:$if(:MPARENT:){ :MPARENT:}} $unless(:PARENT:){-}'         |tac

format is:
newrev parent1 [parent2]
parent2 present if merge occurs.

-- 
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key
 This message represents the official view of the voices in my head

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10 12:03                                   ` Ingo Molnar
@ 2005-04-10 17:38                                     ` Paul Jackson
  2005-04-10 17:46                                       ` Ingo Molnar
  0 siblings, 1 reply; 201+ messages in thread
From: Paul Jackson @ 2005-04-10 17:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: cw, torvalds, davem, andrea, mbp, linux-kernel, dlang

Ingo wrote:
> With default gzip it's 3.3 seconds though,
> and that still compresses it down to 57 MB.

Interesting.  I'm surprised how much a bunch of separate, modest sized
files can be compressed.

I'm unclear what matters most here.

Space on disk certainly isn't much of an issue.  Even with Andrew Morton
on our side, we still can't grow the kernel as fast as the disk drive
manufacturers can grow disk sizes.

Main memory size of the compressed history matters to Linus and his top
20 lieutenants doing full kernel source patching as a primary mission if
they can't fit the source _history_ in main memory.  But those people
are running 1 GByte or more of RAM - so whether it is 95, 57 or 45
MBytes, it fits fine.  The rest of us are mostly concerned with whether
a kernel build fits in memory.

Looking at an arch i386 kernel build tree I have at hand, I see the
following disk usage:

	102 MBytes - BitKeeper/*
	287 MBytes - */SCCS/* (outside of already counted BitKeeper/*)
	232 MBytes - checked out source files
	 94 MBytes - ELF and other build byproducts
	---
	715 MBytes - Total

Converting from bk to git, I guess this becomes:

	 97 MBytes - git (zlib)
	232 MBytes - checked out source files
	 94 MBytes - ELF and other build byproducts
	---
	423 MBytes - Total

Size matters when its a two to one difference, but when we are down to a
10% to 15% difference in the Total, its presentation that matters.  The
above numbers tell me that this is not a pure size issue for local disk
or memory usage.

What does matter that I can see:

 1) Linus explicitly stated he wanted "a raw zlib compressed blob,
    not a gzipped file", to encourage everyone to use the git tools to
    access this data.  He did not "want people editing repostitory files
    by hand."  I'm not sure what he gains here - it did annoy me for a
    couple hours before I decided fixing my supper was more important.

 2) The time to compress will be noticed by users as a delay when
    checking in changes (I'm guessing zlib compresses relatively faster).

 3) The time to copy compressed data over the internet will be
    noticed by users when upgrading kernel versions (gzip can
    compress smaller).

 4) Decompress times are smaller so don't matter as much.

 5) Zlib has a nice library, and is patent free.  I don't know about gzip.

 6) As you note, zlib has rsync-friendly, recovery-friendly Z_PARTIAL_FLUSH.
    I don't know about gzip.

My guess is that Linus finds (2) and (3) to balance each other, and that
(1) decides the point, in favor of zlib.  Well, that or a simpler
hypothesis, that he found the nice library (5) convenient, and (1)
sealed the deal, with the other tradeoffs passing through his
subconscious faster than he bothered to verbalize them.

You (Ingo) seem in your second message to be encouraging further
consideration of gzip, for its improved compression.

How will that matter to us, day to day?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10 17:38                                     ` Paul Jackson
@ 2005-04-10 17:46                                       ` Ingo Molnar
  2005-04-10 17:56                                         ` Paul Jackson
  0 siblings, 1 reply; 201+ messages in thread
From: Ingo Molnar @ 2005-04-10 17:46 UTC (permalink / raw)
  To: Paul Jackson; +Cc: cw, torvalds, davem, andrea, mbp, linux-kernel, dlang


* Paul Jackson <pj@engr.sgi.com> wrote:

> Ingo wrote:
> > With default gzip it's 3.3 seconds though,
> > and that still compresses it down to 57 MB.
> 
> Interesting.  I'm surprised how much a bunch of separate, modest sized
> files can be compressed.

sorry, what i measured was in essence the tarball. I.e. not the 
compression of every file separately. I should have been clear about 
that ...

	Ingo

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10 16:46                 ` Bill Davidsen
@ 2005-04-10 17:50                   ` Paul Jackson
  2005-04-12 23:20                     ` Pavel Machek
  0 siblings, 1 reply; 201+ messages in thread
From: Paul Jackson @ 2005-04-10 17:50 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: junkio, dlang, linux-kernel

> It's possible to generate another object with the same hash, but:

Yeah - the real check is that the modified object has to
compile and do something useful for someone (the cracker
if no one else).

Just getting a random bucket of bits substituted for a
real kernel source file isn't going to get me into the
cracker hall of fame, only into their odd-news of the
day.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  2:27                       ` Andrea Arcangeli
                                           ` (2 preceding siblings ...)
  2005-04-09  5:45                         ` Linus Torvalds
@ 2005-04-10 17:55                         ` Matthias Andree
  3 siblings, 0 replies; 201+ messages in thread
From: Matthias Andree @ 2005-04-10 17:55 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Linus Torvalds, Martin Pool, linux-kernel, David Lang

Andrea Arcangeli schrieb am 2005-04-09:

> On Fri, Apr 08, 2005 at 05:12:49PM -0700, Linus Torvalds wrote:
> > really designed for something like a offline http grabber, in that you can 
> > just grab files purely by filename (and verify that you got them right by 
> > running sha1sum on the resulting local copy). So think "wget".
> 
> I'm not entirely convinced wget is going to be an efficient way to
> synchronize and fetch your tree, its simplicitly is great though. It's a

wget is probably a VERY UNWISE choice:

<http://www.derkeiler.com/Mailing-Lists/securityfocus/bugtraq/2004-12/0106.html>

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10 17:46                                       ` Ingo Molnar
@ 2005-04-10 17:56                                         ` Paul Jackson
  0 siblings, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-10 17:56 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: cw, torvalds, davem, andrea, mbp, linux-kernel, dlang

Ingo wrote:
> not the compression of every file separately.

ok

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Code snippet to reconstruct ancestry graph from bk repo
  2005-04-10 17:24                         ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
@ 2005-04-10 18:19                           ` Roman Zippel
  0 siblings, 0 replies; 201+ messages in thread
From: Roman Zippel @ 2005-04-10 18:19 UTC (permalink / raw)
  To: Paul P Komkoff Jr
  Cc: Linus Torvalds, Andrea Arcangeli, Martin Pool, linux-kernel, David Lang

Hi,

On Sun, 10 Apr 2005, Paul P Komkoff Jr wrote:

> (borrowed from Tommi Virtanen)
> 
> Code snippet to reconstruct ancestry graph from bk repo:
> bk changes -end':I: $if(:PARENT:){:PARENT:$if(:MPARENT:){ :MPARENT:}} $unless(:PARENT:){-}'         |tac
> 
> format is:
> newrev parent1 [parent2]
> parent2 present if merge occurs.

I know that this is possible and Larry's response would have been 
something like this:
http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/0248.html

bye, Roman

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-07 18:29           ` Daniel Phillips
@ 2005-04-10 22:33             ` Troy Benjegerdes
  2005-04-11  0:00               ` Christian Parpart
  0 siblings, 1 reply; 201+ messages in thread
From: Troy Benjegerdes @ 2005-04-10 22:33 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Dmitry Yusupov, Al Viro, Linus Torvalds, David Woodhouse,
	Kernel Mailing List

On Thu, Apr 07, 2005 at 02:29:24PM -0400, Daniel Phillips wrote:
> On Thursday 07 April 2005 14:13, Dmitry Yusupov wrote:
> > On Thu, 2005-04-07 at 13:54 -0400, Daniel Phillips wrote:
> > > Three years ago, there was no fully working open source distributed scm
> > > code base to use as a starting point, so extending BK would have been the
> > > only easy alternative.  But since then the situation has changed.  There
> > > are now several working code bases to provide a good starting point:
> > > Monotone, Arch, SVK, Bazaar-ng and others.
> >
> > Right. For example, SVK is pretty mature project and very close to 1.0
> > release now. And it supports all kind of merges including Cherry-Picking
> > Mergeback:
> >
> > http://svk.elixus.org/?MergeFeatures
> 
> So for an interim way to get the patch flow back online, SVK is ready to try 
> _now_, and we only need a way to import the version graph?  (true/false)

Well, I followed some of the instructions to mirror the kernel tree on
svn.clkao.org/linux/cvs, and although it took around 12 hours to import
28232 versions, I seem to have a mirror of it on my own subversion
server now. I think the svn.clkao.org mirror was taken from bkcvs... the
last log message I see is "Rev 28232 - torvalds - 2005-04-04 09:08:33"

I have no idea what's missing. What is everyone's favorite web frontend
to subversion? I've got websvn (debian package) on there now, and it's a
bit sluggish, but it seems to work.

I hope to have time this week or next to actually make this machine
publicly accessible.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10 22:33             ` Troy Benjegerdes
@ 2005-04-11  0:00               ` Christian Parpart
  0 siblings, 0 replies; 201+ messages in thread
From: Christian Parpart @ 2005-04-11  0:00 UTC (permalink / raw)
  To: Troy Benjegerdes
  Cc: Daniel Phillips, Dmitry Yusupov, Al Viro, Linus Torvalds,
	David Woodhouse, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

On Monday 11 April 2005 12:33 am, you wrote:
[......]
> Well, I followed some of the instructions to mirror the kernel tree on
> svn.clkao.org/linux/cvs, and although it took around 12 hours to import
> 28232 versions, I seem to have a mirror of it on my own subversion
> server now. I think the svn.clkao.org mirror was taken from bkcvs... the
> last log message I see is "Rev 28232 - torvalds - 2005-04-04 09:08:33"

I'd love to see svk as a real choice for you guys, but I don't mind as along 
as I get a door open using svn/svk ;);)

> I have no idea what's missing. What is everyone's favorite web frontend
> to subversion? 

Check out ViewCVS at: http://viewcvs.sourceforge.net/
This seem widely used (not just by me ^o^).

Regards,
Christian Parpart.

-- 
Netiquette: http://www.ietf.org/rfc/rfc1855.txt
 01:55:08 up 18 days, 15:01,  2 users,  load average: 0.27, 0.39, 0.36

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-09  1:01   ` Marcin Dalecki
  2005-04-09  8:32     ` Jan Hudec
@ 2005-04-11  2:26     ` Miles Bader
  2005-04-11  2:56       ` Marcin Dalecki
  1 sibling, 1 reply; 201+ messages in thread
From: Miles Bader @ 2005-04-11  2:26 UTC (permalink / raw)
  To: Marcin Dalecki; +Cc: Jan Hudec, Linus Torvalds, Kernel Mailing List

Marcin Dalecki <martin@dalecki.de> writes:
> Better don't waste your time with looking at Arch. Stick with patches
> you maintain by hand combined with some scripts containing a list of
> apply commands and you should be still more productive then when using
> Arch.

Arch has its problems, but please lay off the uninformed flamebait (the
"issues" you complain about are so utterly minor as to be laughable).

-Miles
-- 
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-11  2:26     ` Miles Bader
@ 2005-04-11  2:56       ` Marcin Dalecki
  2005-04-11  6:36         ` Jan Hudec
  0 siblings, 1 reply; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-11  2:56 UTC (permalink / raw)
  To: Miles Bader; +Cc: Linus Torvalds, Jan Hudec, Kernel Mailing List


On 2005-04-11, at 04:26, Miles Bader wrote:

> Marcin Dalecki <martin@dalecki.de> writes:
>> Better don't waste your time with looking at Arch. Stick with patches
>> you maintain by hand combined with some scripts containing a list of
>> apply commands and you should be still more productive then when using
>> Arch.
>
> Arch has its problems, but please lay off the uninformed flamebait (the
> "issues" you complain about are so utterly minor as to be laughable).

I wish you a lot of laughter after replying to an already 3 days old 
message,
which was my final on Arch.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-11  2:56       ` Marcin Dalecki
@ 2005-04-11  6:36         ` Jan Hudec
  0 siblings, 0 replies; 201+ messages in thread
From: Jan Hudec @ 2005-04-11  6:36 UTC (permalink / raw)
  To: Marcin Dalecki; +Cc: Miles Bader, Linus Torvalds, Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1616 bytes --]

On Mon, Apr 11, 2005 at 04:56:06 +0200, Marcin Dalecki wrote:
> 
> On 2005-04-11, at 04:26, Miles Bader wrote:
> 
> >Marcin Dalecki <martin@dalecki.de> writes:
> >>Better don't waste your time with looking at Arch. Stick with patches
> >>you maintain by hand combined with some scripts containing a list of
> >>apply commands and you should be still more productive then when using
> >>Arch.
> >
> >Arch has its problems, but please lay off the uninformed flamebait (the
> >"issues" you complain about are so utterly minor as to be laughable).
> 
> I wish you a lot of laughter after replying to an already 3 days old 
> message,
> which was my final on Arch.

Marcin Dalecki <martin@dalecki.de> complained:
> Arch isn't a sound example of software design. Quite contrary to the 
> random notes posted by it's author the following issues did strike me 
> the time I did evaluate it:
> [...]

I didn't comment on this first time, but I see I should have. *NONE* of
the issues you complained about were issues of *DESIGN*. They were all
issues of *ENGINEERING*. *ENGINEERING* issues can be fixed. One of the
issues does not even exist any longer (the diff/patch one -- it now
checks they are the right ones -- and in all other respects it is
*exactly* the same as depending on a library)

But what really matters here is the concept. Arch has a simple concept,
that works well. Others have different concepts, that work well or
almost well too (Darcs, Monotone).

-------------------------------------------------------------------------------
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga.. (bk license?)
  2005-04-08  4:42   ` Linus Torvalds
                       ` (4 preceding siblings ...)
  2005-04-08  8:38     ` Matt Johnston
@ 2005-04-12  7:14     ` Kedar Sovani
  2005-04-12  9:34       ` Catalin Marinas
  2005-04-13  4:04       ` Ricky Beam
  5 siblings, 2 replies; 201+ messages in thread
From: Kedar Sovani @ 2005-04-12  7:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, Kernel Mailing List

I was wondering if working on git, is in anyway, in violation of the
Bitkeeper license, which states that you cannot work on any other SCM
(SCM-like?) tool for "x" amount of time after using Bitkeeper ?


Kedar. 

On Apr 8, 2005 10:12 AM, Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
> On Thu, 7 Apr 2005, Chris Wedgwood wrote:
> >
> > I'm playing with monotone right now.  Superficially it looks like it
> > has tons of gee-whiz neato stuff...  however, it's *agonizingly* slow.
> > I mean glacial.  A heavily sedated sloth with no legs is probably
> > faster.
> 
> Yes. The silly thing is, at least in my local tests it doesn't actually
> seem to be _doing_ anything while it's slow (there are no system calls
> except for a few memory allocations and de-allocations). It seems to have
> some exponential function on the number of pathnames involved etc.
> 
> I'm hoping they can fix it, though. The basic notions do not sound wrong.
> 
> In the meantime (and because monotone really _is_ that slow), here's a
> quick challenge for you, and any crazy hacker out there: if you want to
> play with something _really_ nasty (but also very _very_ fast), take a
> look at kernel.org:/pub/linux/kernel/people/torvalds/.
> 
> First one to send me the changelog tree of sparse-git (and a tool to
> commit and push/pull further changes) gets a gold star, and an honorable
> mention. I've put a hell of a lot of clues in there (*).
> 
> I've worked on it (and little else) for the last two days. Time for
> somebody else to tell me I'm crazy.
> 
>                 Linus
> 
> (*) It should be easier than it sounds. The database is designed so that
> you can do the equivalent of a nonmerging (ie pure superset) push/pull
> with just plain rsync, so replication really should be that easy (if
> somewhat bandwidth-intensive due to the whole-file format).
> 
> Never mind merging. It's not an SCM, it's a distribution and archival
> mechanism. I bet you could make a reasonable SCM on top of it, though.
> Another way of looking at it is to say that it's really a content-
> addressable filesystem, used to track directory trees.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga.. (bk license?)
  2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
@ 2005-04-12  9:34       ` Catalin Marinas
  2005-04-13  4:04       ` Ricky Beam
  1 sibling, 0 replies; 201+ messages in thread
From: Catalin Marinas @ 2005-04-12  9:34 UTC (permalink / raw)
  To: Kedar Sovani; +Cc: Linus Torvalds, Chris Wedgwood, Kernel Mailing List

Kedar Sovani <kedars@gmail.com> wrote:
> I was wondering if working on git, is in anyway, in violation of the
> Bitkeeper license, which states that you cannot work on any other SCM
> (SCM-like?) tool for "x" amount of time after using Bitkeeper ?

That's valid for the new BK license only which probably wasn't
accepted by Linus.

-- 
Catalin


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-10 17:50                   ` Paul Jackson
@ 2005-04-12 23:20                     ` Pavel Machek
  0 siblings, 0 replies; 201+ messages in thread
From: Pavel Machek @ 2005-04-12 23:20 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Bill Davidsen, junkio, dlang, linux-kernel

Hi!

> > It's possible to generate another object with the same hash, but:
> 
> Yeah - the real check is that the modified object has to
> compile and do something useful for someone (the cracker
> if no one else).
> 
> Just getting a random bucket of bits substituted for a
> real kernel source file isn't going to get me into the
> cracker hall of fame, only into their odd-news of the
> day.

I actually two different files with same md5 sum in my local CVS
repository. It would be very wrong if CVS did not do the right thing
with those files.

Yes, I was playing with md5, see "md5 to be considered harmfull
today". And I wanted old version of my "exploits" to be archived.
 
								Pavel
-- 
Boycott Kodak -- for their patent abuse against Java.

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga.. (bk license?)
  2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
  2005-04-12  9:34       ` Catalin Marinas
@ 2005-04-13  4:04       ` Ricky Beam
  1 sibling, 0 replies; 201+ messages in thread
From: Ricky Beam @ 2005-04-13  4:04 UTC (permalink / raw)
  To: Kedar Sovani; +Cc: Kernel Mailing List

On Tue, 12 Apr 2005, Kedar Sovani wrote:
>I was wondering if working on git, is in anyway, in violation of the
>Bitkeeper license, which states that you cannot work on any other SCM
>(SCM-like?) tool for "x" amount of time after using Bitkeeper ?

Technically, yes, it is.  However, as BitMover has given the community
little other choice, I don't see how they could hold anyone to it.  They'd
have a hard time making that 1year clause stick given their abandonment
of the free product and refusal to grant licenses to OSDL employees.

Plus, there's nothing in the bkl specifically granting BitMover the
right to revoke the license and thus use of BK/Free at their whim.
They have every right to stop developing, supporting, and distributing
BK/Free, but recending all BK/Free licenses just for spite does not
appear to be within their legal rights.

(Sorry Larry, but that's what you're doing.  Tridge was working on taking
 your toys apart -- he does that, what can I say.  He explicitly lied and
 said he would stop, but of course didn't.  And then you got all pissed
 at OSDL for not smiting him when, technically, they can't -- an employer
 is not responsible for the actions of their employees on their own time,
 on their own property, unrelated to their employ.  Sorry, but I know that
 one by heart :-))

--Ricky



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
@ 2005-04-10  4:20 Albert Cahalan
  0 siblings, 0 replies; 201+ messages in thread
From: Albert Cahalan @ 2005-04-10  4:20 UTC (permalink / raw)
  To: torvalds, linux-kernel mailing list

Linus Torvalds writes:

> NOTE! I detest the centralized SCM model, but if push comes to shove,
> and we just _can't_ get a reasonable parallell merge thing going in
> the short timeframe (ie month or two), I'll use something like SVN
> on a trusted site with just a few committers, and at least try to
> distribute the merging out over a few people rather than making _me_
> be the throttle.
>
> The reason I don't really want to do that is once we start doing
> it that way, I suspect we'll have a _really_ hard time stopping.
> I think it's a broken model. So I'd much rather try to have some
> pain in the short run and get a better model running, but I just
> wanted to let people know that I'm pragmatic enough that I realize
> that we may not have much choice.

I think you at least instinctively know this, but...

Centralized SCM means you have to grant and revoke commit access,
which means that Linux gets the disease of ugly BSD politics.

Under both the old pre-BitKeeper patch system and under BitKeeper,
developer rank is fuzzy. Everyone knows that some developers are
more central than others, but it isn't fully public and well-defined.
You can change things day by day without having to demote anyone.
While Linux development isn't completely without jealousy and pride,
few have stormed off (mostly IDE developers AFAIK) and none have
forked things as severely as OpenBSD and DragonflyBSD.

You may rank developer X higher than developer Y, but they have
only a guess as to how things are. Perhaps developer X would be
a prideful jerk if he knew. Perhaps developer Y would quit in
resentment if he knew.

Whatever you do, please avoid the BSD-style politics.

(the MAINTAINERS file is bad enough; it has caused problems)



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 23:29 ` Linus Torvalds
  2005-04-09  0:29   ` Linus Torvalds
@ 2005-04-09 16:20   ` Paul Jackson
  1 sibling, 0 replies; 201+ messages in thread
From: Paul Jackson @ 2005-04-09 16:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: vrajesh, linux-kernel

Linus wrote:
> If you want to have spaces
>  and newlines in your pathname, go wild.

So long as there is only one pathname in a record, you don't need
nul-terminators to be allow spaces in the name.  The rest of the record
is well known, so the pathname is just whatever is left after chomping
off the rest of the record.

It's only the support for embedded newlines that forces you to use
nul-terminators.

Not worth it - in my view.  Rather, do just enough hackery that
such a pathname doesn't break you, even if it means not giving
full service to such names.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
@ 2005-04-09 11:29 Samium Gromoff
  0 siblings, 0 replies; 201+ messages in thread
From: Samium Gromoff @ 2005-04-09 11:29 UTC (permalink / raw)
  To: linux-kernel

It seems that Tom Lord, the primary architect behind GNU Arch
has recently published an open letter to Linus Torvalds.

Because no open letter to Linus would be really open without an
accompanying reference post on lkml, here it is:

http://lists.seyza.com/pipermail/gnu-arch-dev/2005-April/001001.html

---
cheers,
   Samium Gromoff

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
@ 2005-04-09 11:02 Samium Gromoff
  0 siblings, 0 replies; 201+ messages in thread
From: Samium Gromoff @ 2005-04-09 11:02 UTC (permalink / raw)
  To: martin; +Cc: linux-kernel

Ok, this was literally screaming for a rebuttal! :-)                                                 
                                                                                                     
> Arch isn't a sound example of software design. Quite contrary to the                               
> random notes posted by it's author the following issues did strike me                              
> the time I did evaluate it:                                                                        
(Note that here you take a stab at the Arch design fundamentals, but                                 
actually fail to substantiate it later)                                                              
                                                                                                     
> The application (tla) claims to have "intuitive" command names. However                            
> I didn't see that as given. Most of them where difficult to remember                               
> and appeared to be just infantile. I stopped looking further after I                               
> saw:                                                                                               
[ UI issues snipped, not really core design ]                                                        
                                                                                                     
Yes, some people perceive that there _are_ UI issues in Arch.                                        
However, as strange as it may sound, some don`t feel so.                                             
                                                                                                     
> As an added bonus it relies on the applications named by accident                                  
> patch and diff and installed on the host in question as well as few                                
> other as well to                                                                                   
> operate.                                                                                           
                                                                                                     
This is called modularity and code reuse.                                                            
                                                                                                     
And given that patch and diff are installed by default on all of the                                 
relevant developer machines i fail to see as to why it is by any                                     
measure a derogatory.                                                                                
                                                                                                     
(and the rest you speak about is tar and gzip)                                                       
                                                                                                     
> Better don't waste your time with looking at Arch. Stick with patches                              
> you maintain by hand combined with some scripts containing a list of                               
> apply commands                                                                                     
> and you should be still more productive then when using Arch.                                      
                                                                                                     
Sure, you should`ve had come up with something more based than that! :-)                             
                                                                                                     
Now to the real design issues...                                                                     
                                                                                                     
Globally unique, meaningful, symbolic revision names -- the core of the                              
Arch namespace.                                                                                      
                                                                                                     
"Stone simple" on-disk format to store things -- a hierarchy                                         
of directories with textual files and tarballs.                                                      
                                                                                                     
No smart server -- any sftp, ftp, webdav (or just http for read-only access)                         
server is exactly up to the task.                                                                    
                                                                                                     
O(0) branching -- a branch is simply a tag, a continuation from some                                 
point of development. A network-capable-symlink if you would like.                                   
It is actually made possible due to the global Arch namespace.                                       
                                                                                                     
Revision ancestry graph, of course. Enables smart merging.                                           
                                                                                                     
Now, to the features:                                                                                
                                                                                                     
Archives/revisions are trivially crypto-signed -- thanks to the "stone-simple"                       
on-disk format.                                                                                      
                                                                                                     
Trivial push/pull mirroring -- a mirror is exactly a read-only archive,                              
and can be turned into a full-blown archive by removal of a single                                   
file.                                                                                                
                                                                                                     
Revision libraries as client-side operation speedup mechanism with partially                         
automated updates.                                                                                   
                                                                                                     
Cached revisions as server-side speedup.                                                             

Possibility for hardlinked checkouts for local archives. This requires that                          
your text editor is smart and deletes the original file when it writes                               
changes.                                                                                             

Various pre/post/whatever-commit hooks.

That much for starters... :-)

---
cheers,
   Samium Gromoff

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
@ 2005-04-09  4:06 Walter Landry
  0 siblings, 0 replies; 201+ messages in thread
From: Walter Landry @ 2005-04-09  4:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: arx-users

Linus Torvalds wrote:
> Which is why I'd love to hear from people who have actually used
> various SCM's with the kernel. There's bound to be people who have
> already tried.

At the end of my Codecon talk, there is a performance comparison of a
number of different distributed SCM's with the kernel.

  http://superbeast.ucsd.edu/~landry/ArX/codecon/codecon.html

I develop ArX (http://www.nongnu.org/arx).  You may find it of
interest ;)

Cheers,
Walter Landry
wlandry@ucsd.edu

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 21:13 ` kfogel
  2005-04-06 22:39   ` Jeff Garzik
@ 2005-04-09  1:00   ` Marcin Dalecki
  1 sibling, 0 replies; 201+ messages in thread
From: Marcin Dalecki @ 2005-04-09  1:00 UTC (permalink / raw)
  To: kfogel; +Cc: linux-kernel


On 2005-04-06, at 23:13, kfogel@collab.net wrote:

> Linus Torvalds wrote:
>> PS. Don't bother telling me about subversion. If you must, start 
>> reading
>> up on "monotone". That seems to be the most viable alternative, but 
>> don't
>> pester the developers so much that they don't get any work done. They 
>> are
>> already aware of my problems ;)
>
> By the way, the Subversion developers have no argument with the claim
> that Subversion would not be the right choice for Linux kernel
> development.  We've written an open letter entitled "Please Stop
> Bugging Linus Torvalds About Subversion" to explain why:
>
>    http://subversion.tigris.org/subversion-linus.html

Thumbs up "Subverters"! I just love you. I love your attitude toward 
high engineering
quality. And I  appreciate actually very much what you provide as 
software. Both:
from function and in terms of quality of implementation.


^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 23:29 ` Linus Torvalds
@ 2005-04-09  0:29   ` Linus Torvalds
  2005-04-09 16:20   ` Paul Jackson
  1 sibling, 0 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-09  0:29 UTC (permalink / raw)
  To: Rajesh Venkatasubramanian; +Cc: linux-kernel



On Fri, 8 Apr 2005, Linus Torvalds wrote:
> 
> Also note that the above algorithm really works for _any_ two commit 
> points (apart for the two first steps, which are obviously all about 
> finding the parent tree when you want to diff against a predecessor). 

Btw, if you want to try this, you should get an updated copy. I've pushed 
a "raw" git archive of both git and sparse (the latter is much more 
interesting from an archive standpoint, since it actually has 1400 
changesets in it) to kernel.org, but I'm not convinced it gets mirrored 
out. I think the mirror scripts may mirror only things they understand.

I've also added a partial "fsck" for the "git filesystem". It doesn't do
the connectivity analysis yet, but that should be pretty straightforward
to add - it already parses all the data, it just doesn't save it away (and
the connectivity analysis will automatically show how many "root"
changesets you have, and what the different HEADs are).

I'll make a tar-file (git-0.03), although at this point I've actually been 
maintaining it in itself, so to some degree it's almost getting easier if 
I'd just have a place to rsync it..

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-08 22:27 Rajesh Venkatasubramanian
@ 2005-04-08 23:29 ` Linus Torvalds
  2005-04-09  0:29   ` Linus Torvalds
  2005-04-09 16:20   ` Paul Jackson
  0 siblings, 2 replies; 201+ messages in thread
From: Linus Torvalds @ 2005-04-08 23:29 UTC (permalink / raw)
  To: Rajesh Venkatasubramanian; +Cc: linux-kernel



On Fri, 8 Apr 2005, Rajesh Venkatasubramanian wrote:
> 
> Although directory changes are tracked using change-sets, there 
> seems to be no easy way to answer "give me the diff corresponding to
> the commit (change-set) object <sha1>".  That will be really helpful to
> review the changes.

Actually, it is very easy indeed. Here's what you do:

 - look up the commit object ("cat-file commit <sha1>")

   This object starts out with "tree <sha1>", followed by a list of
   parent commit objects: "parent <sha1>"

   Remember the tree object (it defines what the tree looks like at
   the time of the commit). Pick the parent object you want to diff
   against (normally the first one).

   Also, print the checking messages at the end of the commit object.

 - look up the parent object ("cat-file commit <parentsha1>")

   Here you have the same kind of object, but this time you don't care
   about going deeper, you just pick up the tree <sha1> that describes
   the tree at the parent.

 - look up the two tree objects. Unlike a commit object, a tree object
   is a binary data blob, but the format is an _extremely_ simple table
   of thse guys:

	<ascii octal filemode> <space> <pathname> <NUL character> <20-byte sha1>

  and the reason it's binary is really that that way "git" doesn't end
  up having any issues with strange pathnames. If you want to have spaces
  and newlines in your pathname, go wild.

  In particular, the tree object is also _sorted_ by the pathname. This 
  makes things simple, because you now have to sorted trees, and the 
  first thing you do is just walk the two trees in lock-step, which is 
  trivial thanks to the sorted nature of the tree "array".

  So now you have three cases:
	- you have the same name, and the same sha1

	  ignore it - the file didn't change, you don't even have to look 
	  at the contents (although if the file mode changed you might
	  want to note that)

	- you have the same name in parent and child tree lists, but the
	  sha differs. Now you just need to do a "cat-file" on both of the 
	  SHA1 values, and do a "diff -u" between them.

	- you have the filename in only parent or only child. Do a 
	  "create" or "delete" diff with the content of the sha1 file.

See? Very efficient. For any files that didn't change, you didn't have to 
do anything at all - you didn't even have to look at their data.

Also note that the above algorithm really works for _any_ two commit 
points (apart for the two first steps, which are obviously all about 
finding the parent tree when you want to diff against a predecessor). 

It doesn't have to be parent and child. Pick any commit you have. And pick
them in the other order, and you'll automatically get the reverse diff.

You can even do diffs between unrelated projects this way if you use the
shared sha1 directory model, although that obviously doesn't tend to be
all that sensible ;)

		Linus

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
@ 2005-04-08 22:27 Rajesh Venkatasubramanian
  2005-04-08 23:29 ` Linus Torvalds
  0 siblings, 1 reply; 201+ messages in thread
From: Rajesh Venkatasubramanian @ 2005-04-08 22:27 UTC (permalink / raw)
  To: torvalds, linux-kernel

Linus wrote:
>> It looks like an operation like "show me the history of mm/memory.c" will
>> be pretty expensive using git.
>
> Yes.  Per-file history is expensive in git, because if the way it is 
> indexed. Things are indexed by tree and by changeset, and there are no 
> per-file indexes.

Although directory changes are tracked using change-sets, there 
seems to be no easy way to answer "give me the diff corresponding to
the commit (change-set) object <sha1>".  That will be really helpful to
review the changes.

Rajesh

^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
  2005-04-06 21:13 ` kfogel
@ 2005-04-06 22:39   ` Jeff Garzik
  2005-04-09  1:00   ` Marcin Dalecki
  1 sibling, 0 replies; 201+ messages in thread
From: Jeff Garzik @ 2005-04-06 22:39 UTC (permalink / raw)
  To: kfogel; +Cc: linux-kernel

kfogel@collab.net wrote:
> Linus Torvalds wrote:
> 
>>PS. Don't bother telling me about subversion. If you must, start reading
>>up on "monotone". That seems to be the most viable alternative, but don't
>>pester the developers so much that they don't get any work done. They are
>>already aware of my problems ;)
> 
> 
> By the way, the Subversion developers have no argument with the claim
> that Subversion would not be the right choice for Linux kernel
> development.  We've written an open letter entitled "Please Stop
> Bugging Linus Torvalds About Subversion" to explain why:
> 
>    http://subversion.tigris.org/subversion-linus.html

A thoughtful post.  Thanks for writing this.

	Jeff



^ permalink raw reply	[flat|nested] 201+ messages in thread

* Re: Kernel SCM saga..
       [not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org>
@ 2005-04-06 21:13 ` kfogel
  2005-04-06 22:39   ` Jeff Garzik
  2005-04-09  1:00   ` Marcin Dalecki
  0 siblings, 2 replies; 201+ messages in thread
From: kfogel @ 2005-04-06 21:13 UTC (permalink / raw)
  To: linux-kernel

Linus Torvalds wrote:
> PS. Don't bother telling me about subversion. If you must, start reading
> up on "monotone". That seems to be the most viable alternative, but don't
> pester the developers so much that they don't get any work done. They are
> already aware of my problems ;)

By the way, the Subversion developers have no argument with the claim
that Subversion would not be the right choice for Linux kernel
development.  We've written an open letter entitled "Please Stop
Bugging Linus Torvalds About Subversion" to explain why:

   http://subversion.tigris.org/subversion-linus.html

Best,
-Karl Fogel (on behalf of the Subversion team)

^ permalink raw reply	[flat|nested] 201+ messages in thread

end of thread, other threads:[~2005-04-13  4:14 UTC | newest]

Thread overview: 201+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-06 15:42 Kernel SCM saga Linus Torvalds
2005-04-06 16:00 ` Greg KH
2005-04-07 16:40   ` Rik van Riel
2005-04-08  0:53     ` Jesse Barnes
2005-04-06 16:09 ` Daniel Phillips
2005-04-06 19:07 ` Jon Smirl
2005-04-06 19:24   ` Matan Peled
2005-04-06 19:49     ` Jon Smirl
2005-04-06 20:34       ` Hua Zhong
2005-04-07  1:31       ` Christoph Lameter
2005-04-06 19:39 ` Paul P Komkoff Jr
2005-04-07  1:40   ` Martin Pool
2005-04-07  1:47     ` Jeff Garzik
2005-04-07  2:26       ` Martin Pool
2005-04-07  2:32         ` David Lang
2005-04-07  5:38           ` Martin Pool
2005-04-07 23:27             ` Linus Torvalds
2005-04-08  5:56               ` Martin Pool
2005-04-08  6:41                 ` Linus Torvalds
2005-04-08  8:38                   ` Andrea Arcangeli
2005-04-08 23:38                     ` Daniel Phillips
2005-04-09  2:54                       ` Andrea Arcangeli
2005-04-09  0:12                     ` Linus Torvalds
2005-04-09  2:27                       ` Andrea Arcangeli
2005-04-09  2:32                         ` David Lang
2005-04-09  3:08                         ` Brian Gerst
2005-04-09  3:15                           ` Andrea Arcangeli
2005-04-09  5:45                         ` Linus Torvalds
2005-04-09 22:55                           ` David S. Miller
2005-04-09 23:13                             ` Linus Torvalds
2005-04-10  0:14                               ` Chris Wedgwood
2005-04-10  1:56                                 ` Paul Jackson
2005-04-10 12:03                                   ` Ingo Molnar
2005-04-10 17:38                                     ` Paul Jackson
2005-04-10 17:46                                       ` Ingo Molnar
2005-04-10 17:56                                         ` Paul Jackson
2005-04-10  0:22                             ` Paul Jackson
2005-04-10 11:33                             ` Ingo Molnar
2005-04-10 17:55                         ` Matthias Andree
2005-04-09 16:33                       ` Roman Zippel
2005-04-09 23:31                         ` Tupshin Harper
2005-04-10 17:24                         ` Code snippet to reconstruct ancestry graph from bk repo Paul P Komkoff Jr
2005-04-10 18:19                           ` Roman Zippel
2005-04-08 16:46                   ` Kernel SCM saga Catalin Marinas
2005-04-07  8:14           ` Magnus Damm
2005-04-07  7:53       ` Zwane Mwaikambo
2005-04-07  3:35     ` Daniel Phillips
2005-04-07 15:08       ` Daniel Phillips
2005-04-07  6:36   ` bert hubert
2005-04-06 23:22 ` Jon Masters
2005-04-07  6:51 ` Paul Mackerras
2005-04-07  7:48   ` Arjan van de Ven
2005-04-07 15:10   ` Linus Torvalds
2005-04-07 17:00     ` Daniel Phillips
2005-04-07 17:38       ` Linus Torvalds
2005-04-07 17:47         ` Chris Wedgwood
2005-04-07 18:06         ` Magnus Damm
2005-04-07 18:36         ` Daniel Phillips
2005-04-08  3:35         ` Jeff Garzik
2005-04-07 19:56       ` Sam Ravnborg
2005-04-07 23:21     ` Dave Airlie
2005-04-07  7:18 ` David Woodhouse
2005-04-07  8:50   ` Andrew Morton
2005-04-07  9:20     ` Paul Mackerras
2005-04-07  9:46       ` Andrew Morton
2005-04-07 11:17         ` Paul Mackerras
2005-04-07 10:41       ` Geert Uytterhoeven
2005-04-07  9:25     ` David Woodhouse
2005-04-07  9:49       ` Andrew Morton
2005-04-07  9:55       ` Russell King
2005-04-07 10:11         ` David Woodhouse
2005-04-07  9:40     ` David Vrabel
2005-04-07  9:24   ` Sergei Organov
2005-04-07 10:30     ` Matthias Andree
2005-04-07 10:54       ` Andrew Walrond
2005-04-09 16:17       ` David Roundy
2005-04-10  9:24         ` Giuseppe Bilotta
2005-04-10 13:51           ` David Roundy
2005-04-07 15:32   ` Linus Torvalds
2005-04-07 17:09     ` Daniel Phillips
2005-04-07 17:10     ` Al Viro
2005-04-07 17:47       ` Linus Torvalds
2005-04-07 18:04         ` Jörn Engel
2005-04-07 18:27           ` Daniel Phillips
2005-04-07 20:54           ` Arjan van de Ven
2005-04-08  3:41         ` Jeff Garzik
2005-04-07 17:52       ` Bartlomiej Zolnierkiewicz
2005-04-07 17:54       ` Daniel Phillips
2005-04-07 18:13         ` Dmitry Yusupov
2005-04-07 18:29           ` Daniel Phillips
2005-04-10 22:33             ` Troy Benjegerdes
2005-04-11  0:00               ` Christian Parpart
2005-04-08 17:24         ` Jon Masters
2005-04-08 22:05           ` Daniel Phillips
2005-04-08 22:52     ` Roman Zippel
2005-04-08 23:46       ` Tupshin Harper
2005-04-09  1:00         ` Roman Zippel
2005-04-09  1:23           ` Tupshin Harper
2005-04-09 16:52       ` Eric D. Mudama
2005-04-09 17:40         ` Roman Zippel
2005-04-09 18:56           ` Ray Lee
2005-04-07  7:44 ` Jan Hudec
2005-04-08  6:14   ` Matthias Urlichs
2005-04-09  1:01   ` Marcin Dalecki
2005-04-09  8:32     ` Jan Hudec
2005-04-11  2:26     ` Miles Bader
2005-04-11  2:56       ` Marcin Dalecki
2005-04-11  6:36         ` Jan Hudec
2005-04-07 10:56 ` Andrew Walrond
2005-04-08  0:57 ` Ian Wienand
2005-04-08  4:13 ` Chris Wedgwood
2005-04-08  4:42   ` Linus Torvalds
2005-04-08  5:04     ` Chris Wedgwood
2005-04-08  5:14       ` H. Peter Anvin
2005-04-08  7:05         ` Rogan Dawes
2005-04-08  7:21           ` Daniel Phillips
2005-04-08  7:49             ` H. Peter Anvin
2005-04-08  7:14     ` Andrea Arcangeli
2005-04-08 12:02       ` Matthias Andree
2005-04-08 12:21         ` Florian Weimer
2005-04-08 14:26       ` Linus Torvalds
2005-04-08 16:15         ` Matthias-Christian Ott
2005-04-08 17:14           ` Linus Torvalds
2005-04-08 17:15             ` Chris Wedgwood
2005-04-08 17:46               ` Linus Torvalds
2005-04-08 18:05                 ` Chris Wedgwood
2005-04-08 19:03                   ` Linus Torvalds
2005-04-08 19:16                     ` Chris Wedgwood
2005-04-08 19:38                       ` Florian Weimer
2005-04-08 19:48                         ` Chris Wedgwood
2005-04-08 19:39                       ` Linus Torvalds
2005-04-08 20:11                         ` Uncached stat performace [ Was: Re: Kernel SCM saga.. ] Ragnar Kjørstad
2005-04-08 20:14                           ` Chris Wedgwood
2005-04-08 20:50                       ` Kernel SCM saga Luck, Tony
2005-04-08 21:27                         ` Linus Torvalds
2005-04-09 17:14                           ` Roman Zippel
2005-04-09  7:20                     ` Willy Tarreau
2005-04-09 15:15                     ` Paul Jackson
2005-04-08 17:25             ` Matthias-Christian Ott
2005-04-08 18:14               ` Linus Torvalds
2005-04-08 18:28                 ` Jon Smirl
2005-04-08 18:58                   ` Florian Weimer
2005-04-09  1:11                   ` Marcin Dalecki
2005-04-09  1:50                     ` David Lang
2005-04-09 22:12                       ` Florian Weimer
2005-04-08 19:16                 ` Matthias-Christian Ott
2005-04-08 19:32                   ` Linus Torvalds
2005-04-08 19:44                     ` Matthias-Christian Ott
2005-04-09  1:09                 ` Marcin Dalecki
2005-04-08 17:35             ` Jeff Garzik
2005-04-08 18:47               ` Linus Torvalds
2005-04-08 18:56                 ` Chris Wedgwood
2005-04-09  7:37                   ` Willy Tarreau
2005-04-09  7:47                     ` Neil Brown
2005-04-09  8:00                       ` Willy Tarreau
2005-04-09  9:34                         ` Neil Brown
2005-04-09 15:40                 ` Paul Jackson
2005-04-09 16:16                   ` Linus Torvalds
2005-04-09 17:15                     ` Paul Jackson
2005-04-09 17:35                     ` Paul Jackson
2005-04-09  1:04             ` Marcin Dalecki
2005-04-09 15:42               ` Paul Jackson
2005-04-09 18:45                 ` Marcin Dalecki
2005-04-09  1:00           ` Marcin Dalecki
2005-04-09  1:09             ` Chris Wedgwood
2005-04-09  1:21               ` Marcin Dalecki
2005-04-08  7:17     ` ross
2005-04-08 15:50       ` Linus Torvalds
2005-04-09  2:53         ` Petr Baudis
2005-04-09  7:08           ` Randy.Dunlap
2005-04-09 18:06             ` [PATCH] " Petr Baudis
2005-04-10  1:01           ` Phillip Lougher
2005-04-10  1:42             ` Petr Baudis
2005-04-10  1:57               ` Phillip Lougher
2005-04-09 15:50         ` Paul Jackson
2005-04-09 16:26           ` Linus Torvalds
2005-04-09 17:08             ` Paul Jackson
2005-04-10  3:41             ` Paul Jackson
2005-04-10  8:39             ` David Lang
2005-04-10  9:40               ` Junio C Hamano
2005-04-10 16:46                 ` Bill Davidsen
2005-04-10 17:50                   ` Paul Jackson
2005-04-12 23:20                     ` Pavel Machek
2005-04-08  7:34     ` Marcel Lanz
2005-04-08  9:23       ` Geert Uytterhoeven
2005-04-08  8:38     ` Matt Johnston
2005-04-12  7:14     ` Kernel SCM saga.. (bk license?) Kedar Sovani
2005-04-12  9:34       ` Catalin Marinas
2005-04-13  4:04       ` Ricky Beam
2005-04-08 11:42   ` Kernel SCM saga Catalin Marinas
     [not found] <Pine.LNX.4.58.0504060800280.2215 () ppc970 ! osdl ! org>
2005-04-06 21:13 ` kfogel
2005-04-06 22:39   ` Jeff Garzik
2005-04-09  1:00   ` Marcin Dalecki
2005-04-08 22:27 Rajesh Venkatasubramanian
2005-04-08 23:29 ` Linus Torvalds
2005-04-09  0:29   ` Linus Torvalds
2005-04-09 16:20   ` Paul Jackson
2005-04-09  4:06 Walter Landry
2005-04-09 11:02 Samium Gromoff
2005-04-09 11:29 Samium Gromoff
2005-04-10  4:20 Albert Cahalan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).