kernel.org lies about latest -mm kernel

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* kernel.org lies about latest -mm kernel
@ 2006-12-14 22:37 Pavel Machek
  2006-12-14 23:01 ` Randy Dunlap
                   ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Pavel Machek @ 2006-12-14 22:37 UTC (permalink / raw)
  To: kernel list; +Cc: hpa, Andrew Morton

Hi!

pavel@amd:/data/pavel$ finger @www.kernel.org
[zeus-pub.kernel.org]
...
The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 19
EXTRAVERSION = -mm1
...
pavel@amd:/data/pavel$

AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
not understand that.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: kernel.org lies about latest -mm kernel
  2006-12-14 22:37 kernel.org lies about latest -mm kernel Pavel Machek
@ 2006-12-14 23:01 ` Randy Dunlap
  2006-12-14 23:38 ` Sergio Monteiro Basto
  2006-12-16 17:44 ` [KORG] " Randy Dunlap
  2 siblings, 0 replies; 110+ messages in thread
From: Randy Dunlap @ 2006-12-14 23:01 UTC (permalink / raw)
  To: Pavel Machek; +Cc: kernel list, hpa, Andrew Morton

On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:

> Hi!
> 
> pavel@amd:/data/pavel$ finger @www.kernel.org
> [zeus-pub.kernel.org]
> ...
> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
> VERSION = 2
> PATCHLEVEL = 6
> SUBLEVEL = 19
> EXTRAVERSION = -mm1
> ...
> pavel@amd:/data/pavel$
> 
> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
> not understand that.

and 2.6.20-rc1 should also be listed there.

---
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: kernel.org lies about latest -mm kernel
  2006-12-14 22:37 kernel.org lies about latest -mm kernel Pavel Machek
  2006-12-14 23:01 ` Randy Dunlap
@ 2006-12-14 23:38 ` Sergio Monteiro Basto
  2006-12-16 17:44 ` [KORG] " Randy Dunlap
  2 siblings, 0 replies; 110+ messages in thread
From: Sergio Monteiro Basto @ 2006-12-14 23:38 UTC (permalink / raw)
  To: Pavel Machek, webmaster; +Cc: kernel list, hpa, Andrew Morton

kernel.org don't lie is not updated ,
this problem should be snd to webmaster isn't ? 

--
Sérgio M. B.

On Thu, 2006-12-14 at 23:37 +0100, Pavel Machek wrote:
> Hi!
> 
> pavel@amd:/data/pavel$ finger @www.kernel.org
> [zeus-pub.kernel.org]
> ...
> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
> VERSION = 2
> PATCHLEVEL = 6
> SUBLEVEL = 19
> EXTRAVERSION = -mm1
> ...
> pavel@amd:/data/pavel$
> 
> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
> not understand that.
> 								Pavel
> 


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-14 22:37 kernel.org lies about latest -mm kernel Pavel Machek
  2006-12-14 23:01 ` Randy Dunlap
  2006-12-14 23:38 ` Sergio Monteiro Basto
@ 2006-12-16 17:44 ` Randy Dunlap
  2006-12-16 17:57   ` Andrew Morton
  2 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2006-12-16 17:44 UTC (permalink / raw)
  To: Pavel Machek; +Cc: kernel list, hpa, Andrew Morton, webmaster

On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:

> Hi!
> 
> pavel@amd:/data/pavel$ finger @www.kernel.org
> [zeus-pub.kernel.org]
> ...
> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
> VERSION = 2
> PATCHLEVEL = 6
> SUBLEVEL = 19
> EXTRAVERSION = -mm1
> ...
> pavel@amd:/data/pavel$
> 
> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
> not understand that.

Still true (not listed) for 2.6.20-rc1-mm1  :(

Could someone explain what the problem is and what it would
take to correct it?

---
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 17:44 ` [KORG] " Randy Dunlap
@ 2006-12-16 17:57   ` Andrew Morton
  2006-12-16 18:02     ` Randy Dunlap
  0 siblings, 1 reply; 110+ messages in thread
From: Andrew Morton @ 2006-12-16 17:57 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Pavel Machek, kernel list, hpa, webmaster

On Sat, 16 Dec 2006 09:44:21 -0800
Randy Dunlap <randy.dunlap@oracle.com> wrote:

> On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:
> 
> > Hi!
> > 
> > pavel@amd:/data/pavel$ finger @www.kernel.org
> > [zeus-pub.kernel.org]
> > ...
> > The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
> > pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
> > VERSION = 2
> > PATCHLEVEL = 6
> > SUBLEVEL = 19
> > EXTRAVERSION = -mm1
> > ...
> > pavel@amd:/data/pavel$
> > 
> > AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
> > not understand that.
> 
> Still true (not listed) for 2.6.20-rc1-mm1  :(
> 
> Could someone explain what the problem is and what it would
> take to correct it?

2.6.20-rc1-mm1 still hasn't propagated out to the servers (it's been 36
hours).  Presumably the front page non-update is a consequence of that.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 17:57   ` Andrew Morton
@ 2006-12-16 18:02     ` Randy Dunlap
  2006-12-16 19:30       ` J.H.
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2006-12-16 18:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pavel Machek, kernel list, hpa, webmaster

Andrew Morton wrote:
> On Sat, 16 Dec 2006 09:44:21 -0800
> Randy Dunlap <randy.dunlap@oracle.com> wrote:
> 
>> On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:
>>
>>> Hi!
>>>
>>> pavel@amd:/data/pavel$ finger @www.kernel.org
>>> [zeus-pub.kernel.org]
>>> ...
>>> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
>>> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
>>> VERSION = 2
>>> PATCHLEVEL = 6
>>> SUBLEVEL = 19
>>> EXTRAVERSION = -mm1
>>> ...
>>> pavel@amd:/data/pavel$
>>>
>>> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
>>> not understand that.
>> Still true (not listed) for 2.6.20-rc1-mm1  :(
>>
>> Could someone explain what the problem is and what it would
>> take to correct it?
> 
> 2.6.20-rc1-mm1 still hasn't propagated out to the servers (it's been 36
> hours).  Presumably the front page non-update is a consequence of that.

Agreed on the latter part.  Can someone address the real problem???

-- 
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 18:02     ` Randy Dunlap
@ 2006-12-16 19:30       ` J.H.
  2006-12-16 20:30         ` Russell King
                           ` (6 more replies)
  0 siblings, 7 replies; 110+ messages in thread
From: J.H. @ 2006-12-16 19:30 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

The problem has been hashed over quite a bit recently, and I would be
curious what you would consider the real problem after you see the
situation.

The root cause boils down to with git, gitweb and the normal mirroring
on the frontend machines our basic working set no longer stays resident
in memory, which is forcing more and more to actively go to disk causing
a much higher I/O load.  You have the added problem that one of the
frontend machines is getting hit harder than the other due to several
factors: various DNS servers not round robining, people explicitly
hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
probably several other factors we aren't aware of.  This has caused the
average load on that machine to hover around 150-200 and if for whatever
reason we have to take one of the machines down the load on the
remaining machine will skyrocket to 2000+.  

Since it's apparent not everyone is aware of what we are doing, I'll
mention briefly some of the bigger points.

- We have contacted HP to see if we can get additional hardware, mind
you though this is a long term solution and will take time, but if our
request is approved it will double the number of machines kernel.org
runs.

- Gitweb is causing us no end of headache, there are (known to me
anyway) two different things happening on that.  I am looking at Jeff
Garzik's suggested caching mechanism as a temporary stop-gap, with an
eye more on doing a rather heavy re-write of gitweb itself to include
semi-intelligent caching.  I've already started in on the later - and I
just about have the caching layer put in.  But this is still at least a
week out before we could even remotely consider deploying it.

- We've cut back on the number of ftp and rsync users to the machines.
Basically we are cutting back where we can in an attempt to keep the
load from spiraling out of control, this helped a bit when we recently
had to take one of the machines down and instead of loads spiking into
the 2000+ range we peaked at about 500-600 I believe.

So we know the problem is there, and we are working on it - we are
getting e-mails about it if not daily than every other day or so.  If
there are suggestions we are willing to hear them - but the general
feeling with the admins is that we are probably hitting the biggest
problems already.

- John 'Warthog9' Hawley
Kernel.org Admin

On Sat, 2006-12-16 at 10:02 -0800, Randy Dunlap wrote:
> Andrew Morton wrote:
> > On Sat, 16 Dec 2006 09:44:21 -0800
> > Randy Dunlap <randy.dunlap@oracle.com> wrote:
> > 
> >> On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:
> >>
> >>> Hi!
> >>>
> >>> pavel@amd:/data/pavel$ finger @www.kernel.org
> >>> [zeus-pub.kernel.org]
> >>> ...
> >>> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
> >>> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
> >>> VERSION = 2
> >>> PATCHLEVEL = 6
> >>> SUBLEVEL = 19
> >>> EXTRAVERSION = -mm1
> >>> ...
> >>> pavel@amd:/data/pavel$
> >>>
> >>> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
> >>> not understand that.
> >> Still true (not listed) for 2.6.20-rc1-mm1  :(
> >>
> >> Could someone explain what the problem is and what it would
> >> take to correct it?
> > 
> > 2.6.20-rc1-mm1 still hasn't propagated out to the servers (it's been 36
> > hours).  Presumably the front page non-update is a consequence of that.
> 
> Agreed on the latter part.  Can someone address the real problem???
> 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
@ 2006-12-16 20:30         ` Russell King
  2006-12-26 16:47           ` H. Peter Anvin
  2006-12-16 21:21         ` Nigel Cunningham
                           ` (5 subsequent siblings)
  6 siblings, 1 reply; 110+ messages in thread
From: Russell King @ 2006-12-16 20:30 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
> The problem has been hashed over quite a bit recently, and I would be
> curious what you would consider the real problem after you see the
> situation.

One thing which isn't helping you is the way folk inevitably end up
using ftp.kernel.org rather than ftp.<cc>.kernel.org. [*]

Let me illustrate why.  Throw http://ftp.uk.kernel.org/pub/linux/kernel
into a web browser.  The address changes to:

  http://ftp.uk.kernel.org/sites/ftp.kernel.org/pub/linux/kernel/

Hit reload a few times, and eventually be greated by:

Forbidden
You don't have permission to access /sites/ftp.kernel.org/pub/linux/kernel/ on this server.

because one of the IPs which "ftp.uk.kernel.org" resolves to isn't a
part of the UK mirror service (who are providing most of ftp.uk.kernel.org),
and therefore has a different site policy.

Ergo, downloads via http from ftp.uk.kernel.org are at best unreliable
for multiple requests.

I agree that it's not directly your problem, and isn't something you
have direct control over.  However, if you want to round-robin the
<cc>.kernel.org IP addresses between different providers, I suggest
that either the name resolves to just one site, or that kernel.org
adopts a policy with their mirrors that they only become part of
the <cc>.kernel.org DNS entries as long as they do not rewrite their
site-specific URLs in terms of that address.

IOW, that URL above should've been:

  http://hawking-if-b.mirrorservice.org/sites/ftp.kernel.org/pub/linux/kernel/

to ensure that mirrorservice.org's policy isn't uselessly applied to
someone elses mirror site.

Maybe then ftp.<cc>.kernel.org would become slightly more attractive.

* - I gave up with ftp.uk.kernel.org many years ago when it became
unreliable and haven't looked back, despite recent news that it's
improved.  But as illustrated above it does still have issues.  I
certainly would _not_ want to use ftp.uk.linux.org to obtain GIT
updates from as long as the current DNS situation persists - that
would be suicide.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 20:30         ` Russell King
@ 2006-12-26 16:47           ` H. Peter Anvin
  0 siblings, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2006-12-26 16:47 UTC (permalink / raw)
  To: J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa,
	webmaster

Russell King wrote:
> 
> Ergo, downloads via http from ftp.uk.kernel.org are at best unreliable
> for multiple requests.
> 
> I agree that it's not directly your problem, and isn't something you
> have direct control over.  However, if you want to round-robin the
> <cc>.kernel.org IP addresses between different providers, I suggest
> that either the name resolves to just one site, or that kernel.org
> adopts a policy with their mirrors that they only become part of
> the <cc>.kernel.org DNS entries as long as they do not rewrite their
> site-specific URLs in terms of that address.
> 

Indeed.  I just sent a complaint about this, and if we can figure out a 
decent way to test for this automatically we'll add it to our automatic 
tests.

There is also always ftp.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
  2006-12-16 20:30         ` Russell King
@ 2006-12-16 21:21         ` Nigel Cunningham
  2006-12-26 16:49           ` H. Peter Anvin
  2006-12-17 12:32         ` Pavel Machek
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 110+ messages in thread
From: Nigel Cunningham @ 2006-12-16 21:21 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

Hi.

I've have git trees against a few versions besides Linus', and have just
moved all but Linus' to staging to help until you can get your new
hardware. If others were encouraged to do the same, it might help a lot?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 21:21         ` Nigel Cunningham
@ 2006-12-26 16:49           ` H. Peter Anvin
  2007-01-07  3:35             ` Nigel Cunningham
  0 siblings, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2006-12-26 16:49 UTC (permalink / raw)
  To: nigel
  Cc: J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster

Nigel Cunningham wrote:
> Hi.
> 
> I've have git trees against a few versions besides Linus', and have just
> moved all but Linus' to staging to help until you can get your new
> hardware. If others were encouraged to do the same, it might help a lot?
> 

Not really.  In fact, it would hardly help at all.

The two things git users can do to help is:

1. Make sure your alternatives file is set up correctly;
2. Keep your trees packed and pruned, to keep the file count down.

If you do this, the load imposed by a single git tree is fairly negible.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-26 16:49           ` H. Peter Anvin
@ 2007-01-07  3:35             ` Nigel Cunningham
  2007-01-07  4:10               ` Jeff Garzik
                                 ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Nigel Cunningham @ 2007-01-07  3:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster

Hi.

On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
> Nigel Cunningham wrote:
> > Hi.
> > 
> > I've have git trees against a few versions besides Linus', and have just
> > moved all but Linus' to staging to help until you can get your new
> > hardware. If others were encouraged to do the same, it might help a lot?
> > 
> 
> Not really.  In fact, it would hardly help at all.
> 
> The two things git users can do to help is:
> 
> 1. Make sure your alternatives file is set up correctly;
> 2. Keep your trees packed and pruned, to keep the file count down.
> 
> If you do this, the load imposed by a single git tree is fairly negible.

Sorry for the slow reply, and the ignorance... what's an alternatives
file? I've never heard of them before.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  3:35             ` Nigel Cunningham
@ 2007-01-07  4:10               ` Jeff Garzik
  2007-01-07  4:47                 ` Nigel Cunningham
  2007-01-07  4:22               ` Jeff Garzik
  2007-01-07  5:17               ` H. Peter Anvin
  2 siblings, 1 reply; 110+ messages in thread
From: Jeff Garzik @ 2007-01-07  4:10 UTC (permalink / raw)
  To: nigel
  Cc: H. Peter Anvin, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Nigel Cunningham wrote:
> Hi.
> 
> On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
>> Nigel Cunningham wrote:
>>> Hi.
>>>
>>> I've have git trees against a few versions besides Linus', and have just
>>> moved all but Linus' to staging to help until you can get your new
>>> hardware. If others were encouraged to do the same, it might help a lot?
>>>
>> Not really.  In fact, it would hardly help at all.
>>
>> The two things git users can do to help is:
>>
>> 1. Make sure your alternatives file is set up correctly;
>> 2. Keep your trees packed and pruned, to keep the file count down.
>>
>> If you do this, the load imposed by a single git tree is fairly negible.
> 
> Sorry for the slow reply, and the ignorance... what's an alternatives
> file? I've never heard of them before.

It's highly useful but poorly documented method of referencing 
repository B's objects from repository A.

When you clone locally

	git clone --reference linus-2.6 linus-2.6 nigel-2.6

it will create nigel-2.6 with zero objects, and an alternatives file 
pointing to 'linus-2.6' local repository.  When you commit, only the 
objects not already in linus-2.6 will be found in nigel-2.6.

It's far better "git clone -l ..." because you don't even have the 
additional hardlinked inodes, and don't have to run "git relink" locally.

	Jeff




^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  4:10               ` Jeff Garzik
@ 2007-01-07  4:47                 ` Nigel Cunningham
  0 siblings, 0 replies; 110+ messages in thread
From: Nigel Cunningham @ 2007-01-07  4:47 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: H. Peter Anvin, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Hi.

On Sat, 2007-01-06 at 23:10 -0500, Jeff Garzik wrote:
> Nigel Cunningham wrote:
> > Hi.
> > 
> > On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
> >> Nigel Cunningham wrote:
> >>> Hi.
> >>>
> >>> I've have git trees against a few versions besides Linus', and have just
> >>> moved all but Linus' to staging to help until you can get your new
> >>> hardware. If others were encouraged to do the same, it might help a lot?
> >>>
> >> Not really.  In fact, it would hardly help at all.
> >>
> >> The two things git users can do to help is:
> >>
> >> 1. Make sure your alternatives file is set up correctly;
> >> 2. Keep your trees packed and pruned, to keep the file count down.
> >>
> >> If you do this, the load imposed by a single git tree is fairly negible.
> > 
> > Sorry for the slow reply, and the ignorance... what's an alternatives
> > file? I've never heard of them before.
> 
> It's highly useful but poorly documented method of referencing 
> repository B's objects from repository A.
> 
> When you clone locally
> 
> 	git clone --reference linus-2.6 linus-2.6 nigel-2.6
> 
> it will create nigel-2.6 with zero objects, and an alternatives file 
> pointing to 'linus-2.6' local repository.  When you commit, only the 
> objects not already in linus-2.6 will be found in nigel-2.6.
> 
> It's far better "git clone -l ..." because you don't even have the 
> additional hardlinked inodes, and don't have to run "git relink" locally.

Cool. I'll have a play :)

Thanks!

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  3:35             ` Nigel Cunningham
  2007-01-07  4:10               ` Jeff Garzik
@ 2007-01-07  4:22               ` Jeff Garzik
  2007-01-07  4:29                 ` Linus Torvalds
  2007-01-07 20:11                 ` Greg KH
  2007-01-07  5:17               ` H. Peter Anvin
  2 siblings, 2 replies; 110+ messages in thread
From: Jeff Garzik @ 2007-01-07  4:22 UTC (permalink / raw)
  To: nigel, H. Peter Anvin, Andrew Morton, Greg KH, Linus Torvalds
  Cc: J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster,
	Git Mailing List

> On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
>> Not really.  In fact, it would hardly help at all.
>>
>> The two things git users can do to help is:
>>
>> 1. Make sure your alternatives file is set up correctly;
>> 2. Keep your trees packed and pruned, to keep the file count down.
>>
>> If you do this, the load imposed by a single git tree is fairly negible.


Would kernel hackers be amenable to having their trees auto-repacked, 
and linked via alternatives to Linus's linux-2.6.git?

Looking through kernel.org, we have a ton of repositories, however 
packed, that carrying their own copies of the linux-2.6.git repo.

Also, I wonder if "git push" will push only the non-linux-2.6.git 
objects, if both local and remote sides have the proper alternatives set up?

	Jeff



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  4:22               ` Jeff Garzik
@ 2007-01-07  4:29                 ` Linus Torvalds
  2007-01-07 20:11                 ` Greg KH
  1 sibling, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2007-01-07  4:29 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: nigel, H. Peter Anvin, Andrew Morton, Greg KH, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster,
	Git Mailing List



On Sat, 6 Jan 2007, Jeff Garzik wrote:
> 
> Also, I wonder if "git push" will push only the non-linux-2.6.git objects, if
> both local and remote sides have the proper alternatives set up?

Yes.

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  4:22               ` Jeff Garzik
  2007-01-07  4:29                 ` Linus Torvalds
@ 2007-01-07 20:11                 ` Greg KH
  2007-01-07 21:30                   ` H. Peter Anvin
  1 sibling, 1 reply; 110+ messages in thread
From: Greg KH @ 2007-01-07 20:11 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: nigel, H. Peter Anvin, Andrew Morton, Linus Torvalds, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster,
	Git Mailing List

On Sat, Jan 06, 2007 at 11:22:31PM -0500, Jeff Garzik wrote:
> >On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
> >>Not really.  In fact, it would hardly help at all.
> >>
> >>The two things git users can do to help is:
> >>
> >>1. Make sure your alternatives file is set up correctly;
> >>2. Keep your trees packed and pruned, to keep the file count down.
> >>
> >>If you do this, the load imposed by a single git tree is fairly negible.
> 
> 
> Would kernel hackers be amenable to having their trees auto-repacked, 
> and linked via alternatives to Linus's linux-2.6.git?
> 
> Looking through kernel.org, we have a ton of repositories, however 
> packed, that carrying their own copies of the linux-2.6.git repo.

Well, I create my repos by doing a:
	git clone -l --bare
which makes a hardlink from Linus's tree.

But then it gets copied over to the public server, which probably severs
that hardlink :(

Any shortcut to clone or set up a repo using "alternatives" so that we
don't have this issue at all?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07 20:11                 ` Greg KH
@ 2007-01-07 21:30                   ` H. Peter Anvin
  0 siblings, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-07 21:30 UTC (permalink / raw)
  To: Greg KH
  Cc: Jeff Garzik, nigel, Andrew Morton, Linus Torvalds, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster,
	Git Mailing List

Greg KH wrote:

> 
> Well, I create my repos by doing a:
> 	git clone -l --bare
> which makes a hardlink from Linus's tree.
> 
> But then it gets copied over to the public server, which probably severs
> that hardlink :(
> 
> Any shortcut to clone or set up a repo using "alternatives" so that we
> don't have this issue at all?
> 

Use the -s option to git clone.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  3:35             ` Nigel Cunningham
  2007-01-07  4:10               ` Jeff Garzik
  2007-01-07  4:22               ` Jeff Garzik
@ 2007-01-07  5:17               ` H. Peter Anvin
  2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
  2007-01-09  4:29                 ` [KORG] Re: kernel.org lies about latest -mm kernel Nigel Cunningham
  2 siblings, 2 replies; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-07  5:17 UTC (permalink / raw)
  To: nigel
  Cc: J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster

Nigel Cunningham wrote:
> Hi.
> 
> On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
>> Nigel Cunningham wrote:
>>> Hi.
>>>
>>> I've have git trees against a few versions besides Linus', and have just
>>> moved all but Linus' to staging to help until you can get your new
>>> hardware. If others were encouraged to do the same, it might help a lot?
>>>
>> Not really.  In fact, it would hardly help at all.
>>
>> The two things git users can do to help is:
>>
>> 1. Make sure your alternatives file is set up correctly;
>> 2. Keep your trees packed and pruned, to keep the file count down.
>>
>> If you do this, the load imposed by a single git tree is fairly negible.
> 
> Sorry for the slow reply, and the ignorance... what's an alternatives
> file? I've never heard of them before.
> 

Just a minor correction; it's the "alternates" file 
(objects/info/alternates).

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* How git affects kernel.org performance
  2007-01-07  5:17               ` H. Peter Anvin
@ 2007-01-07  5:24                 ` H. Peter Anvin
  2007-01-07  5:39                   ` Linus Torvalds
                                     ` (2 more replies)
  2007-01-09  4:29                 ` [KORG] Re: kernel.org lies about latest -mm kernel Nigel Cunningham
  1 sibling, 3 replies; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-07  5:24 UTC (permalink / raw)
  To: H. Peter Anvin, git
  Cc: nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Some more data on how git affects kernel.org...

During extremely high load, it appears that what slows kernel.org down 
more than anything else is the time that each individual getdents() call 
takes.  When I've looked this I've observed times from 200 ms to almost 
2 seconds!  Since an unpacked *OR* unpruned git tree adds 256 
directories to a cleanly packed tree, you can do the math yourself.

I have tried reducing vm.vfs_cache_pressure down to 1 on the kernel.org 
machines in order to improve the situation, but even at that point it 
appears the kernel doesn't readily hold the entire directory hierarchy 
in memory, even though there is space to do so.  I have suggested that 
we might want to add a sysctl to change the denominator from the default 
100.

The one thing that we need done locally is to have a smart uploader, 
instead of relying on rsync.  That, unfortunately, is a fairly sizable 
project.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
@ 2007-01-07  5:39                   ` Linus Torvalds
  2007-01-07  8:55                     ` Willy Tarreau
  2007-01-07 14:57                   ` Robert Fitzsimons
  2007-01-07 15:06                   ` Krzysztof Halasa
  2 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2007-01-07  5:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> 
> During extremely high load, it appears that what slows kernel.org down more
> than anything else is the time that each individual getdents() call takes.
> When I've looked this I've observed times from 200 ms to almost 2 seconds!
> Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> packed tree, you can do the math yourself.

"getdents()" is totally serialized by the inode semaphore. It's one of the 
most expensive system calls in Linux, partly because of that, and partly 
because it has to call all the way down into the filesystem in a way that 
almost no other common system call has to (99% of all filesystem calls can 
be handled basically at the VFS layer with generic caches - but not 
getdents()).

So if there are concurrent readdirs on the same directory, they get 
serialized. If there is any file creation/deletion activity in the 
directory, it serializes getdents(). 

To make matters worse, I don't think it has any read-ahead at all when you 
use hashed directory entries. So if you have cold-cache case, you'll read 
every single block totally individually, and serialized. One block at a 
time (I think the non-hashed case is likely also suspect, but that's a 
separate issue)

In other words, I'm not at all surprised it hits on filldir time. 
Especially on ext3.

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  5:39                   ` Linus Torvalds
@ 2007-01-07  8:55                     ` Willy Tarreau
  2007-01-07  8:58                       ` H. Peter Anvin
  2007-01-07  9:15                       ` Andrew Morton
  0 siblings, 2 replies; 110+ messages in thread
From: Willy Tarreau @ 2007-01-07  8:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> 
> 
> On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > 
> > During extremely high load, it appears that what slows kernel.org down more
> > than anything else is the time that each individual getdents() call takes.
> > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > packed tree, you can do the math yourself.
> 
> "getdents()" is totally serialized by the inode semaphore. It's one of the 
> most expensive system calls in Linux, partly because of that, and partly 
> because it has to call all the way down into the filesystem in a way that 
> almost no other common system call has to (99% of all filesystem calls can 
> be handled basically at the VFS layer with generic caches - but not 
> getdents()).
> 
> So if there are concurrent readdirs on the same directory, they get 
> serialized. If there is any file creation/deletion activity in the 
> directory, it serializes getdents(). 
> 
> To make matters worse, I don't think it has any read-ahead at all when you 
> use hashed directory entries. So if you have cold-cache case, you'll read 
> every single block totally individually, and serialized. One block at a 
> time (I think the non-hashed case is likely also suspect, but that's a 
> separate issue)
> 
> In other words, I'm not at all surprised it hits on filldir time. 
> Especially on ext3.

At work, we had the same problem on a file server with ext3. We use rsync
to make backups to a local IDE disk, and we noticed that getdents() took
about the same time as Peter reports (0.2 to 2 seconds), especially in
maildir directories. We tried many things to fix it with no result,
including enabling dirindexes. Finally, we made a full backup, and switched
over to XFS and the problem totally disappeared. So it seems that the
filesystem matters a lot here when there are lots of entries in a
directory, and that ext3 is not suitable for usages with thousands
of entries in directories with millions of files on disk. I'm not
certain it would be that easy to try other filesystems on kernel.org
though :-/

Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  8:55                     ` Willy Tarreau
@ 2007-01-07  8:58                       ` H. Peter Anvin
  2007-01-07  9:03                         ` Willy Tarreau
  2007-01-07  9:15                       ` Andrew Morton
  1 sibling, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-07  8:58 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Willy Tarreau wrote:
> 
> At work, we had the same problem on a file server with ext3. We use rsync
> to make backups to a local IDE disk, and we noticed that getdents() took
> about the same time as Peter reports (0.2 to 2 seconds), especially in
> maildir directories. We tried many things to fix it with no result,
> including enabling dirindexes. Finally, we made a full backup, and switched
> over to XFS and the problem totally disappeared. So it seems that the
> filesystem matters a lot here when there are lots of entries in a
> directory, and that ext3 is not suitable for usages with thousands
> of entries in directories with millions of files on disk. I'm not
> certain it would be that easy to try other filesystems on kernel.org
> though :-/
> 

Changing filesystems would mean about a week of downtime for a server. 
It's painful, but it's doable; however, if we get a traffic spike during 
that time it'll hurt like hell.

However, if there is credible reasons to believe XFS will help, I'd be 
inclined to try it out.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  8:58                       ` H. Peter Anvin
@ 2007-01-07  9:03                         ` Willy Tarreau
  2007-01-07 10:28                           ` Christoph Hellwig
  2007-01-07 10:50                           ` Jan Engelhardt
  0 siblings, 2 replies; 110+ messages in thread
From: Willy Tarreau @ 2007-01-07  9:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Linus Torvalds, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
> Willy Tarreau wrote:
> >
> >At work, we had the same problem on a file server with ext3. We use rsync
> >to make backups to a local IDE disk, and we noticed that getdents() took
> >about the same time as Peter reports (0.2 to 2 seconds), especially in
> >maildir directories. We tried many things to fix it with no result,
> >including enabling dirindexes. Finally, we made a full backup, and switched
> >over to XFS and the problem totally disappeared. So it seems that the
> >filesystem matters a lot here when there are lots of entries in a
> >directory, and that ext3 is not suitable for usages with thousands
> >of entries in directories with millions of files on disk. I'm not
> >certain it would be that easy to try other filesystems on kernel.org
> >though :-/
> >
> 
> Changing filesystems would mean about a week of downtime for a server. 
> It's painful, but it's doable; however, if we get a traffic spike during 
> that time it'll hurt like hell.
> 
> However, if there is credible reasons to believe XFS will help, I'd be 
> inclined to try it out.

The problem is that I have no sufficient FS knowledge to argument why
it helps here. It was a desperate attempt to fix the problem for us
and it definitely worked well.

Hmmm I'm thinking about something very dirty : would it be possible
to reduce the current FS size to get more space to create another
FS ? Supposing you create a XX GB/TB XFS after the current ext3,
you would be able to mount it in some directories with --bind and
slowly switch some parts to it. The problem with this approach is
that it will never be 100% converted, but as an experiment it might
be worth it, no ?

Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  9:03                         ` Willy Tarreau
@ 2007-01-07 10:28                           ` Christoph Hellwig
  2007-01-07 10:52                             ` Willy Tarreau
  2007-01-07 18:17                             ` Linus Torvalds
  2007-01-07 10:50                           ` Jan Engelhardt
  1 sibling, 2 replies; 110+ messages in thread
From: Christoph Hellwig @ 2007-01-07 10:28 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: H. Peter Anvin, Linus Torvalds, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> The problem is that I have no sufficient FS knowledge to argument why
> it helps here. It was a desperate attempt to fix the problem for us
> and it definitely worked well.

XFS does rather efficient btree directories, and it does sophisticated
readahead for directories.  I suspect that's what is helping you there.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 10:28                           ` Christoph Hellwig
@ 2007-01-07 10:52                             ` Willy Tarreau
  2007-01-07 18:17                             ` Linus Torvalds
  1 sibling, 0 replies; 110+ messages in thread
From: Willy Tarreau @ 2007-01-07 10:52 UTC (permalink / raw)
  To: Christoph Hellwig, H. Peter Anvin, Linus Torvalds, git, nigel,
	J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, Jan 07, 2007 at 10:28:53AM +0000, Christoph Hellwig wrote:
> On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> > The problem is that I have no sufficient FS knowledge to argument why
> > it helps here. It was a desperate attempt to fix the problem for us
> > and it definitely worked well.
> 
> XFS does rather efficient btree directories, and it does sophisticated
> readahead for directories.  I suspect that's what is helping you there.

Ok. Do you too think it might help (or even solve) the problem on
kernel.org ?

Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 10:28                           ` Christoph Hellwig
  2007-01-07 10:52                             ` Willy Tarreau
@ 2007-01-07 18:17                             ` Linus Torvalds
  2007-01-07 19:13                               ` Linus Torvalds
  1 sibling, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2007-01-07 18:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Willy Tarreau, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, 7 Jan 2007, Christoph Hellwig wrote:
>
> On Sun, Jan 07, 2007 at 10:03:36AM +0100, Willy Tarreau wrote:
> > The problem is that I have no sufficient FS knowledge to argument why
> > it helps here. It was a desperate attempt to fix the problem for us
> > and it definitely worked well.
> 
> XFS does rather efficient btree directories, and it does sophisticated
> readahead for directories.  I suspect that's what is helping you there.

The sad part is that this is a long-standing issue, and the directory 
reading code in ext3 really _should_ be able to do ok. 

A year or two ago I did a totally half-assed code for the non-hashed 
readdir that improved performance by an order of magnitude for ext3 for a 
test-case of mine, but it was subtly buggy and didn't do the hashed case 
AT ALL. Andrew fixed it up so that it at least wasn't subtly buggy any 
more, but in the process it also lost all capability of doing fragmented 
directories (so it doesn't help very much any more under exactly the 
situation that is the worst case), and it still doesn't do the hashed 
directory case.

It's my personal pet peeve with ext3 (as Andrew can attest). And it's 
really sad, because I don't think it is fundamental per se, but the way 
the directory handling and jdb are done, it's apparently very hard to fix.

(It's clearly not _impossible_ to do: I think that it should be possible 
to treat ext3 directories the same way we treat files, except they would 
always be in "data=journal" mode. But I understand ext2, not ext3 (and 
absolutely not jbd), so I'm not going to be able to do anything about it 
personally).

Anyway, I think that disabling hashing can actually help. And I suspect 
that even with hashing enabled, there should be some quick hack for making 
the directory reading at least be able to do multiple outstanding reads in 
parallel, instead of reading the blocks totally synchronously ("read five 
blocks, then wait for the one we care" rather than the current "read one 
block at a time, wait for it, read the next one, wait for it.." 
situation).

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 18:17                             ` Linus Torvalds
@ 2007-01-07 19:13                               ` Linus Torvalds
       [not found]                                 ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2007-01-07 19:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Willy Tarreau, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, 7 Jan 2007, Linus Torvalds wrote:
> 
> A year or two ago I did a totally half-assed code for the non-hashed 
> readdir that improved performance by an order of magnitude for ext3 for a 
> test-case of mine, but it was subtly buggy and didn't do the hashed case 
> AT ALL.

Btw, this isn't the test-case, but it's a half-way re-creation of 
something like it. It's _really_ stupid, but here's what you can do:

 - compile and run this idiotic program. It creates a directory called 
   "throwaway" that is ~44kB in size, and if I did things right, it should 
   not be totally contiguous on disk with the current ext3 allocation 
   logic.

 - as root, do "echo 3 > /proc/sys/vm/drop_caches" to get a cache-cold 
   schenario.

 - do "time ls throwaway > /dev/null".

I don't know what people consider to be reasonable performance, but for 
me, it takes about half a second to do a simple "ls". NOTE! This is _not_ 
reading inode stat information or anything like that. It literally takes 
0.3-0.4 seconds to read ~44kB off the disk. That's a whopping 125kB/s 
throughput on a reasonably fast modern disk.

That's what we in the industry call  "sad".

And that's on a totally unloaded machine. There was _nothing_ else going 
on. No IO congestion, no nothing. Just the cost of synchronously doing 
ten or eleven disk reads.

The fix?

 - proper read-ahead. Right now, even if the directory is totally 
   contiguous on disk (just remove the thing that writes data to the 
   files, so that you'll have empty files instead of 8kB files), I think 
   we do those reads totally synchronously if the filesystem was mounted 
   with directory hashing enabled.

   Without hashing, the directory will be much smaller too, so readdir() 
   will have less data to read. And it _should_ do some readahead, 
   although in my testing, the best I could do was still 0.185s for a (now 
   shrunken) 28kB directory. 

 - better directory block allocation patterns would likely help a lot, 
   rather than single blocks. That's true even without any read-ahead (at 
   least the disk wouldn't need to seek, and any on-disk track buffers etc 
   would work better), but with read-ahead and contiguous blocks it should 
   be just a couple of IO's (the indirect stuff means that it's more than 
   one), and so you should see much better IO patterns because the 
   elevator can try to help too.

Maybe I just have unrealistic expectations, but I really don't like how a 
fairly small 50kB directory takes an appreciable fraction of a second to 
read.

Once it's cached, it still takes too long, but at least at that point the 
individual getdents calls take just tens of microseconds.

Here's cold-cache numbers (notice: 34 msec for the first one, and 17 msec 
in the middle.. The 5-6ms range indicates a single IO for the intermediate 
ones, which basically says that each call does roughly one IO, except the 
first one that does ~5 (probably the indirect index blocks), and two in 
the middle who are able to fill up the buffer from the IO done by the 
previous one (4kB buffers, so if the previous getdents() happened to just 
read the beginning of a block, the next one might be able to fill 
everything from that block without having to do IO).

	getdents(3, /* 103 entries */, 4096)    = 4088 <0.034830>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.006703>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.006719>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000354>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000017>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.005302>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.016957>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000017>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.003530>
	getdents(3, /* 83 entries */, 4096)     = 3320 <0.000296>
	getdents(3, /* 0 entries */, 4096)      = 0 <0.000006>

Here's the pure CPU overhead: still pretty high (200 usec! For a single 
system call! That's disgusting! In contrast, a 4kB read() call takes 7 
usec on this machine, so the overhead of doing things one dentry at a 
time, and calling down to several layers of filesystem is quite high):

	getdents(3, /* 103 entries */, 4096)    = 4088 <0.000204>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000122>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000112>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000153>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000018>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000103>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000217>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000018>
	getdents(3, /* 102 entries */, 4096)    = 4080 <0.000095>
	getdents(3, /* 83 entries */, 4096)     = 3320 <0.000089>
	getdents(3, /* 0 entries */, 4096)      = 0 <0.000006>

but you can see the difference.. The real cost is obviously the IO.

		Linus

----
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>

static char buffer[8192];

static int create_file(const char *name)
{
	int fd = open(name, O_RDWR | O_CREAT | O_TRUNC, 0666);
	if (fd < 0)
		return fd;

	write(fd, buffer, sizeof(buffer));
	close(fd);
	return 0;
}

int main(int argc, char **argv)
{
	int i;
	char name[256];

	/* Fill up the buffer with some random garbage */
	for (i = 0; i < sizeof(buffer); i++)
		buffer[i] = "abcdefghijklmnopqrstuvwxyz\n"[i % 27];

	if (mkdir("throwaway", 0777) < 0 || chdir("throwaway") < 0) {
		perror("throwaway");
		exit(1);
	}

	/*
	 * Create a reasonably big directory by having a number
	 * of files with non-trivial filenames, and with some
	 * real content to fragment the directory blocks..
	 */
	for (i = 0; i < 1000; i++) {
		snprintf(name, sizeof(name),
			"file-name-%d-%d-%d-%d",
			i / 1000,
			(i / 100) % 10,
			(i / 10) % 10,
			(i / 1) % 10);
		create_file(name);
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 110+ messages in thread

[parent not found: <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>]

* Re: How git affects kernel.org performance
       [not found]                                 ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>
@ 2007-01-07 19:35                                   ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2007-01-07 19:35 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Christoph Hellwig, Willy Tarreau, H. Peter Anvin, git, nigel,
	J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, 7 Jan 2007, Jon Smirl wrote:
> > 
> >  - proper read-ahead. Right now, even if the directory is totally
> >    contiguous on disk (just remove the thing that writes data to the
> >    files, so that you'll have empty files instead of 8kB files), I think
> >    we do those reads totally synchronously if the filesystem was mounted
> >    with directory hashing enabled.
> 
> What's the status on the Adaptive Read-ahead patch from Wu Fengguang
> <wfg@mail.ustc.edu.cn> ? That patch really helped with read ahead
> problems I was having with mmap. It was in mm forever and I've lost
> track of it.

Won't help. ext3 does NO readahead at all. It doesn't use the general VFS 
helper routines to read data (because it doesn't use the page cache), it 
just does the raw buffer-head IO directly.

(In the non-indexed case, it does do some read-ahead, and it uses the 
generic routines for it, but because it does everything by physical 
address, even the generic routines will decide that it's just doing random 
reading if the directory isn't physically contiguous - and stop reading 
ahead).

(I may have missed some case where it does do read-ahead in the index 
routines, so don't take my word as being unquestionably true. I'm _fairly_ 
sure, but..)

			Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  9:03                         ` Willy Tarreau
  2007-01-07 10:28                           ` Christoph Hellwig
@ 2007-01-07 10:50                           ` Jan Engelhardt
  2007-01-07 18:49                             ` Randy Dunlap
  1 sibling, 1 reply; 110+ messages in thread
From: Jan Engelhardt @ 2007-01-07 10:50 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: H. Peter Anvin, Linus Torvalds, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster


On Jan 7 2007 10:03, Willy Tarreau wrote:
>On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
>> >[..]
>> >entries in directories with millions of files on disk. I'm not
>> >certain it would be that easy to try other filesystems on
>> >kernel.org though :-/
>> 
>> Changing filesystems would mean about a week of downtime for a server. 
>> It's painful, but it's doable; however, if we get a traffic spike during 
>> that time it'll hurt like hell.

Then make sure noone releases a kernel ;-)

>> However, if there is credible reasons to believe XFS will help, I'd be 
>> inclined to try it out.
>
>Hmmm I'm thinking about something very dirty : would it be possible
>to reduce the current FS size to get more space to create another
>FS ? Supposing you create a XX GB/TB XFS after the current ext3,
>you would be able to mount it in some directories with --bind and
>slowly switch some parts to it. The problem with this approach is
>that it will never be 100% converted, but as an experiment it might
>be worth it, no ?

Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync
again to catch any new files that have been added until the ftp
upload was closed, then do _one_ (technically two) mountpoint moves
(as opposed to Willy's idea of "some directories") in a mere second
along the lines of

  mount --move /oldfs /older; mount --move /newfs /oldfs.

let old transfers that still use files in /older complete (lsof or
fuser -m), then disconnect the old volume. In case /newfs (now
/oldfs) is a volume you borrowed from someone and need to return it,
well, I guess you need to rsync back somehow.


	-`J'
-- 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 10:50                           ` Jan Engelhardt
@ 2007-01-07 18:49                             ` Randy Dunlap
  2007-01-07 19:07                               ` Jan Engelhardt
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2007-01-07 18:49 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Willy Tarreau, H. Peter Anvin, Linus Torvalds, git, nigel, J.H.,
	Andrew Morton, Pavel Machek, kernel list, webmaster

On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:

> 
> On Jan 7 2007 10:03, Willy Tarreau wrote:
> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
> >> >[..]
> >> >entries in directories with millions of files on disk. I'm not
> >> >certain it would be that easy to try other filesystems on
> >> >kernel.org though :-/
> >> 
> >> Changing filesystems would mean about a week of downtime for a server. 
> >> It's painful, but it's doable; however, if we get a traffic spike during 
> >> that time it'll hurt like hell.
> 
> Then make sure noone releases a kernel ;-)

maybe the week of LCA ?

> >> However, if there is credible reasons to believe XFS will help, I'd be 
> >> inclined to try it out.
> >
> >Hmmm I'm thinking about something very dirty : would it be possible
> >to reduce the current FS size to get more space to create another
> >FS ? Supposing you create a XX GB/TB XFS after the current ext3,
> >you would be able to mount it in some directories with --bind and
> >slowly switch some parts to it. The problem with this approach is
> >that it will never be 100% converted, but as an experiment it might
> >be worth it, no ?
> 
> Much better: rsync from /oldfs to /newfs, stop all ftp uploads, rsync
> again to catch any new files that have been added until the ftp
> upload was closed, then do _one_ (technically two) mountpoint moves
> (as opposed to Willy's idea of "some directories") in a mere second
> along the lines of
> 
>   mount --move /oldfs /older; mount --move /newfs /oldfs.
> 
> let old transfers that still use files in /older complete (lsof or
> fuser -m), then disconnect the old volume. In case /newfs (now
> /oldfs) is a volume you borrowed from someone and need to return it,
> well, I guess you need to rsync back somehow.

---
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 18:49                             ` Randy Dunlap
@ 2007-01-07 19:07                               ` Jan Engelhardt
  2007-01-07 19:28                                 ` Randy Dunlap
  0 siblings, 1 reply; 110+ messages in thread
From: Jan Engelhardt @ 2007-01-07 19:07 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Willy Tarreau, H. Peter Anvin, Linus Torvalds, git, nigel, J.H.,
	Andrew Morton, Pavel Machek, kernel list, webmaster


On Jan 7 2007 10:49, Randy Dunlap wrote:
>On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:
>> On Jan 7 2007 10:03, Willy Tarreau wrote:
>> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
>> >> >[..]
>> >> >entries in directories with millions of files on disk. I'm not
>> >> >certain it would be that easy to try other filesystems on
>> >> >kernel.org though :-/
>> >> 
>> >> Changing filesystems would mean about a week of downtime for a server. 
>> >> It's painful, but it's doable; however, if we get a traffic spike during 
>> >> that time it'll hurt like hell.
>> 
>> Then make sure noone releases a kernel ;-)
>
>maybe the week of LCA ?

I don't know that acronym, but if you ask me when it should happen:
_Before_ the next big thing is released, e.g. before 2.6.20-final.
Reason: You never know how long they're chewing [downloading] on 2.6.20.
Excluding other projects on kernel.org from my hypothesis, I'd suppose the
lowest bandwidth usage the longer no new files have been released. (Because
everyone has them then more or less.)


	-`J'
-- 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 19:07                               ` Jan Engelhardt
@ 2007-01-07 19:28                                 ` Randy Dunlap
  2007-01-07 19:37                                   ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2007-01-07 19:28 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Willy Tarreau, H. Peter Anvin, Linus Torvalds, git, nigel, J.H.,
	Andrew Morton, Pavel Machek, kernel list, webmaster

On Sun, 7 Jan 2007 20:07:43 +0100 (MET) Jan Engelhardt wrote:

> 
> On Jan 7 2007 10:49, Randy Dunlap wrote:
> >On Sun, 7 Jan 2007 11:50:57 +0100 (MET) Jan Engelhardt wrote:
> >> On Jan 7 2007 10:03, Willy Tarreau wrote:
> >> >On Sun, Jan 07, 2007 at 12:58:38AM -0800, H. Peter Anvin wrote:
> >> >> >[..]
> >> >> >entries in directories with millions of files on disk. I'm not
> >> >> >certain it would be that easy to try other filesystems on
> >> >> >kernel.org though :-/
> >> >> 
> >> >> Changing filesystems would mean about a week of downtime for a server. 
> >> >> It's painful, but it's doable; however, if we get a traffic spike during 
> >> >> that time it'll hurt like hell.
> >> 
> >> Then make sure noone releases a kernel ;-)
> >
> >maybe the week of LCA ?

Sorry, it means Linux.conf.au (Australia):
  http://lca2007.linux.org.au/
Jan. 15-20, 2007

> I don't know that acronym, but if you ask me when it should happen:
> _Before_ the next big thing is released, e.g. before 2.6.20-final.
> Reason: You never know how long they're chewing [downloading] on 2.6.20.
> Excluding other projects on kernel.org from my hypothesis, I'd suppose the
> lowest bandwidth usage the longer no new files have been released. (Because
> everyone has them then more or less.)

ISTM that Linus is trying to make 2.6.20-final before LCA.  We'll see.

---
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 19:28                                 ` Randy Dunlap
@ 2007-01-07 19:37                                   ` Linus Torvalds
  0 siblings, 0 replies; 110+ messages in thread
From: Linus Torvalds @ 2007-01-07 19:37 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Jan Engelhardt, Willy Tarreau, H. Peter Anvin, git, nigel, J.H.,
	Andrew Morton, Pavel Machek, kernel list, webmaster



On Sun, 7 Jan 2007, Randy Dunlap wrote:
> 
> ISTM that Linus is trying to make 2.6.20-final before LCA.  We'll see.

No. Hopefully "final -rc" before LCA, but I'll do the actual 2.6.20 
release afterwards. I don't want to have a merge window during LCA, as I 
and many others will all be out anyway. So it's much better to have LCA 
happen during the end of the stabilization phase when there's hopefully 
not a lot going on.

(Of course, often at the end of the stabilization phase there is all the 
"ok, what about regression XyZ?" panic)

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  8:55                     ` Willy Tarreau
  2007-01-07  8:58                       ` H. Peter Anvin
@ 2007-01-07  9:15                       ` Andrew Morton
  2007-01-07  9:38                         ` Rene Herman
  2007-01-08  3:05                         ` Suparna Bhattacharya
  1 sibling, 2 replies; 110+ messages in thread
From: Andrew Morton @ 2007-01-07  9:15 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Sun, 7 Jan 2007 09:55:26 +0100
Willy Tarreau <w@1wt.eu> wrote:

> On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> > 
> > 
> > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > > 
> > > During extremely high load, it appears that what slows kernel.org down more
> > > than anything else is the time that each individual getdents() call takes.
> > > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > > packed tree, you can do the math yourself.
> > 
> > "getdents()" is totally serialized by the inode semaphore. It's one of the 
> > most expensive system calls in Linux, partly because of that, and partly 
> > because it has to call all the way down into the filesystem in a way that 
> > almost no other common system call has to (99% of all filesystem calls can 
> > be handled basically at the VFS layer with generic caches - but not 
> > getdents()).
> > 
> > So if there are concurrent readdirs on the same directory, they get 
> > serialized. If there is any file creation/deletion activity in the 
> > directory, it serializes getdents(). 
> > 
> > To make matters worse, I don't think it has any read-ahead at all when you 
> > use hashed directory entries. So if you have cold-cache case, you'll read 
> > every single block totally individually, and serialized. One block at a 
> > time (I think the non-hashed case is likely also suspect, but that's a 
> > separate issue)
> > 
> > In other words, I'm not at all surprised it hits on filldir time. 
> > Especially on ext3.
> 
> At work, we had the same problem on a file server with ext3. We use rsync
> to make backups to a local IDE disk, and we noticed that getdents() took
> about the same time as Peter reports (0.2 to 2 seconds), especially in
> maildir directories. We tried many things to fix it with no result,
> including enabling dirindexes. Finally, we made a full backup, and switched
> over to XFS and the problem totally disappeared. So it seems that the
> filesystem matters a lot here when there are lots of entries in a
> directory, and that ext3 is not suitable for usages with thousands
> of entries in directories with millions of files on disk. I'm not
> certain it would be that easy to try other filesystems on kernel.org
> though :-/
> 

Yeah, slowly-growing directories will get splattered all over the disk.

Possible short-term fixes would be to just allocate up to (say) eight
blocks when we grow a directory by one block.  Or teach the
directory-growth code to use ext3 reservations.

Longer-term people are talking about things like on-disk rerservations. 
But I expect directories are being forgotten about in all of that.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  9:15                       ` Andrew Morton
@ 2007-01-07  9:38                         ` Rene Herman
  2007-01-08  3:05                         ` Suparna Bhattacharya
  1 sibling, 0 replies; 110+ messages in thread
From: Rene Herman @ 2007-01-07  9:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On 01/07/2007 10:15 AM, Andrew Morton wrote:

> Yeah, slowly-growing directories will get splattered all over the
> disk.
> 
> Possible short-term fixes would be to just allocate up to (say) eight
>  blocks when we grow a directory by one block.  Or teach the 
> directory-growth code to use ext3 reservations.
> 
> Longer-term people are talking about things like on-disk
> rerservations. But I expect directories are being forgotten about in
> all of that.

I wish people would just talk about de2fsrag... ;-\

Rene


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  9:15                       ` Andrew Morton
  2007-01-07  9:38                         ` Rene Herman
@ 2007-01-08  3:05                         ` Suparna Bhattacharya
  2007-01-08 12:58                           ` Theodore Tso
  1 sibling, 1 reply; 110+ messages in thread
From: Suparna Bhattacharya @ 2007-01-08  3:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote:
> On Sun, 7 Jan 2007 09:55:26 +0100
> Willy Tarreau <w@1wt.eu> wrote:
> 
> > On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Sat, 6 Jan 2007, H. Peter Anvin wrote:
> > > >
> > > > During extremely high load, it appears that what slows kernel.org down more
> > > > than anything else is the time that each individual getdents() call takes.
> > > > When I've looked this I've observed times from 200 ms to almost 2 seconds!
> > > > Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
> > > > packed tree, you can do the math yourself.
> > >
> > > "getdents()" is totally serialized by the inode semaphore. It's one of the
> > > most expensive system calls in Linux, partly because of that, and partly
> > > because it has to call all the way down into the filesystem in a way that
> > > almost no other common system call has to (99% of all filesystem calls can
> > > be handled basically at the VFS layer with generic caches - but not
> > > getdents()).
> > >
> > > So if there are concurrent readdirs on the same directory, they get
> > > serialized. If there is any file creation/deletion activity in the
> > > directory, it serializes getdents().
> > >
> > > To make matters worse, I don't think it has any read-ahead at all when you
> > > use hashed directory entries. So if you have cold-cache case, you'll read
> > > every single block totally individually, and serialized. One block at a
> > > time (I think the non-hashed case is likely also suspect, but that's a
> > > separate issue)
> > >
> > > In other words, I'm not at all surprised it hits on filldir time.
> > > Especially on ext3.
> >
> > At work, we had the same problem on a file server with ext3. We use rsync
> > to make backups to a local IDE disk, and we noticed that getdents() took
> > about the same time as Peter reports (0.2 to 2 seconds), especially in
> > maildir directories. We tried many things to fix it with no result,
> > including enabling dirindexes. Finally, we made a full backup, and switched
> > over to XFS and the problem totally disappeared. So it seems that the
> > filesystem matters a lot here when there are lots of entries in a
> > directory, and that ext3 is not suitable for usages with thousands
> > of entries in directories with millions of files on disk. I'm not
> > certain it would be that easy to try other filesystems on kernel.org
> > though :-/
> >
> 
> Yeah, slowly-growing directories will get splattered all over the disk.
> 
> Possible short-term fixes would be to just allocate up to (say) eight
> blocks when we grow a directory by one block.  Or teach the
> directory-growth code to use ext3 reservations.
> 
> Longer-term people are talking about things like on-disk rerservations.
> But I expect directories are being forgotten about in all of that.

By on-disk reservations, do you mean persistent file preallocation ? (that
is explicit preallocation of blocks to a given file) If so, you are
right, we haven't really given any thought to the possibility of directories
needing that feature.

Regards
Suparna

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08  3:05                         ` Suparna Bhattacharya
@ 2007-01-08 12:58                           ` Theodore Tso
  2007-01-08 13:41                             ` Johannes Stezenbach
                                               ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Theodore Tso @ 2007-01-08 12:58 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin,
	git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > Yeah, slowly-growing directories will get splattered all over the disk.
> > 
> > Possible short-term fixes would be to just allocate up to (say) eight
> > blocks when we grow a directory by one block.  Or teach the
> > directory-growth code to use ext3 reservations.
> > 
> > Longer-term people are talking about things like on-disk rerservations.
> > But I expect directories are being forgotten about in all of that.
> 
> By on-disk reservations, do you mean persistent file preallocation ? (that
> is explicit preallocation of blocks to a given file) If so, you are
> right, we haven't really given any thought to the possibility of directories
> needing that feature.

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.

Allocating some number of empty blocks when we grow the directory
would be a quick hack that I'd probably do as a 2nd priority.  It
won't help pre-existing directories, but combined with readahead
logic, should help us out greatly in the non-btree case.  

						- Ted

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08 12:58                           ` Theodore Tso
@ 2007-01-08 13:41                             ` Johannes Stezenbach
  2007-01-08 13:56                               ` Theodore Tso
  2007-01-08 13:43                             ` Jeff Garzik
       [not found]                             ` <20070109075945.GA8799@mail.ustc.edu.cn>
  2 siblings, 1 reply; 110+ messages in thread
From: Johannes Stezenbach @ 2007-01-08 13:41 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Suparna Bhattacharya, Andrew Morton, Willy Tarreau,
	Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> 
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases.  If
> you're using some kind of b-tree structure, such as XFS does for
> directories, preallocation doesn't help you much.  Delayed allocation
> can save you if your delayed allocator knows how to structure disk
> blocks so that a btree-traversal is efficient, but I'm guessing the
> biggest reason why we are losing is because we don't have sufficient
> readahead.  This also has the advantage that it will help without
> needing to doing a backup/restore to improve layout.

Would e2fsck -D help? What kind of optimization
does it perform?


Thanks,
Johannes

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08 13:41                             ` Johannes Stezenbach
@ 2007-01-08 13:56                               ` Theodore Tso
  2007-01-08 13:59                                 ` Pavel Machek
  0 siblings, 1 reply; 110+ messages in thread
From: Theodore Tso @ 2007-01-08 13:56 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Suparna Bhattacharya, Andrew Morton, Willy Tarreau,
	Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote:
> 
> Would e2fsck -D help? What kind of optimization
> does it perform?

It will help a little; e2fsck -D compresses the logical view of the
directory, but it doesn't optimize the physical layout on disk at all,
and of course, it won't help with the lack of readahead logic.  It's
possible to improve how e2fsck -D works, at the moment, it's not
trying to make the directory be contiguous on disk.  What it should
probably do is to pull a list of all of the blocks used by the
directory, sort them, and then try to see if it can improve on the
list by allocating some new blocks that would make the directory more
contiguous on disk.  I suspect any improvements that would be seen by
doing this would be second order effects at most, though.

						- Ted

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08 13:56                               ` Theodore Tso
@ 2007-01-08 13:59                                 ` Pavel Machek
  2007-01-08 14:17                                   ` Theodore Tso
  0 siblings, 1 reply; 110+ messages in thread
From: Pavel Machek @ 2007-01-08 13:59 UTC (permalink / raw)
  To: Theodore Tso, Johannes Stezenbach, Suparna Bhattacharya,
	Andrew Morton, Willy Tarreau, Linus Torvalds, H. Peter Anvin,
	git, nigel, J.H.,
	Randy Dunlap, kernel list, webmaster, linux-ext4

Hi!

> > Would e2fsck -D help? What kind of optimization
> > does it perform?
> 
> It will help a little; e2fsck -D compresses the logical view of the
> directory, but it doesn't optimize the physical layout on disk at all,
> and of course, it won't help with the lack of readahead logic.  It's
> possible to improve how e2fsck -D works, at the moment, it's not
> trying to make the directory be contiguous on disk.  What it should
> probably do is to pull a list of all of the blocks used by the
> directory, sort them, and then try to see if it can improve on the
> list by allocating some new blocks that would make the directory more
> contiguous on disk.  I suspect any improvements that would be seen by
> doing this would be second order effects at most, though.

...sounds like a job for e2defrag, not e2fsck...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08 13:59                                 ` Pavel Machek
@ 2007-01-08 14:17                                   ` Theodore Tso
  0 siblings, 0 replies; 110+ messages in thread
From: Theodore Tso @ 2007-01-08 14:17 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Johannes Stezenbach, Suparna Bhattacharya, Andrew Morton,
	Willy Tarreau, Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, kernel list, webmaster, linux-ext4

On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > Would e2fsck -D help? What kind of optimization
> > > does it perform?
> > 
> > It will help a little; e2fsck -D compresses the logical view of the
> > directory, but it doesn't optimize the physical layout on disk at all,
> > and of course, it won't help with the lack of readahead logic.  It's
> > possible to improve how e2fsck -D works, at the moment, it's not
> > trying to make the directory be contiguous on disk.  What it should
> > probably do is to pull a list of all of the blocks used by the
> > directory, sort them, and then try to see if it can improve on the
> > list by allocating some new blocks that would make the directory more
> > contiguous on disk.  I suspect any improvements that would be seen by
> > doing this would be second order effects at most, though.
> 
> ...sounds like a job for e2defrag, not e2fsck...

I wasn't proposing to move other data blocks around in order make the
directory be contiguous, but just a "quick and dirty" try to make
things better.  But yes, in order to really fix layout issues you
would have to do a full defrag, and it's probably more important that
we try to fix things so that defragmentation runs aren't necessary in
the first place....

						- Ted


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08 12:58                           ` Theodore Tso
  2007-01-08 13:41                             ` Johannes Stezenbach
@ 2007-01-08 13:43                             ` Jeff Garzik
  2007-01-09  1:09                               ` Paul Jackson
       [not found]                             ` <20070109075945.GA8799@mail.ustc.edu.cn>
  2 siblings, 1 reply; 110+ messages in thread
From: Jeff Garzik @ 2007-01-08 13:43 UTC (permalink / raw)
  To: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau,
	Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

Theodore Tso wrote:
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases.  If
> you're using some kind of b-tree structure, such as XFS does for
> directories, preallocation doesn't help you much.  Delayed allocation
> can save you if your delayed allocator knows how to structure disk
> blocks so that a btree-traversal is efficient, but I'm guessing the
> biggest reason why we are losing is because we don't have sufficient
> readahead.  This also has the advantage that it will help without
> needing to doing a backup/restore to improve layout.


Something I just thought of:  ATA and SCSI hard disks do their own 
read-ahead.  Seeking all over the place to pick up bits of directory 
will hurt even more with the disk reading and throwing away data (albeit 
in its internal elevator and cache).

	Jeff



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-08 13:43                             ` Jeff Garzik
@ 2007-01-09  1:09                               ` Paul Jackson
  2007-01-09  2:18                                 ` Jeremy Higdon
  0 siblings, 1 reply; 110+ messages in thread
From: Paul Jackson @ 2007-01-09  1:09 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: tytso, suparna, akpm, w, torvalds, hpa, git, nigel, warthog9,
	randy.dunlap, pavel, linux-kernel, webmaster, linux-ext4

Jeff wrote:
> Something I just thought of:  ATA and SCSI hard disks do their own
> read-ahead.

Probably this is wishful thinking on my part, but I would have hoped
that most of the read-ahead they did was for stuff that happened to be
on the cylinder they were reading anyway.  So long as their read-ahead
doesn't cause much extra or delayed disk head motion, what does it
matter?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-09  1:09                               ` Paul Jackson
@ 2007-01-09  2:18                                 ` Jeremy Higdon
  0 siblings, 0 replies; 110+ messages in thread
From: Jeremy Higdon @ 2007-01-09  2:18 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Jeff Garzik, tytso, suparna, akpm, w, torvalds, hpa, git, nigel,
	warthog9, randy.dunlap, pavel, linux-kernel, webmaster,
	linux-ext4

On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote:
> Jeff wrote:
> > Something I just thought of:  ATA and SCSI hard disks do their own
> > read-ahead.
> 
> Probably this is wishful thinking on my part, but I would have hoped
> that most of the read-ahead they did was for stuff that happened to be
> on the cylinder they were reading anyway.  So long as their read-ahead
> doesn't cause much extra or delayed disk head motion, what does it
> matter?


And they usually won't readahead if there is another command to
process, though they can be set up to read unrequested data in
spite of outstanding commands.

When they are reading ahead, they'll only fetch LBAs beyond the last
request until a buffer fills or the readahead gets interrupted.

jeremy

^ permalink raw reply	[flat|nested] 110+ messages in thread

[parent not found: <20070109075945.GA8799@mail.ustc.edu.cn>]

* Re: How git affects kernel.org performance
       [not found]                             ` <20070109075945.GA8799@mail.ustc.edu.cn>
@ 2007-01-09  7:59                               ` Fengguang Wu
  2007-01-09 16:23                                 ` Linus Torvalds
  0 siblings, 1 reply; 110+ messages in thread
From: Fengguang Wu @ 2007-01-09  7:59 UTC (permalink / raw)
  To: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau,
	Linus Torvalds, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
> On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
> > > Yeah, slowly-growing directories will get splattered all over the disk.
> > > 
> > > Possible short-term fixes would be to just allocate up to (say) eight
> > > blocks when we grow a directory by one block.  Or teach the
> > > directory-growth code to use ext3 reservations.
> > > 
> > > Longer-term people are talking about things like on-disk rerservations.
> > > But I expect directories are being forgotten about in all of that.
> > 
> > By on-disk reservations, do you mean persistent file preallocation ? (that
> > is explicit preallocation of blocks to a given file) If so, you are
> > right, we haven't really given any thought to the possibility of directories
> > needing that feature.
> 
> The fastest and probably most important thing to add is some readahead
> smarts to directories --- both to the htree and non-htree cases.  If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
 fs/ext3/dir.c   |   22 ++++++++++++++++++++++
 fs/ext3/inode.c |    2 +-
 2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
 	return error_msg == NULL ? 1 : 0;
 }
 
+int ext3_get_block(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+	struct inode *inode = filp->f_path.dentry->d_inode;
+	struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping;
+	unsigned long sector;
+	unsigned long blk;
+	pgoff_t offset;
+
+	for (blk = 0; blk < inode->i_blocks; blk++) {
+		sector = blk << (inode->i_blkbits - 9);
+		sector = generic_block_bmap(inode->i_mapping, sector, ext3_get_block);
+		offset = sector >> (PAGE_CACHE_SHIFT - 9);
+		do_page_cache_readahead(mapping, filp, offset, 1);
+	}
+}
+
 static int ext3_readdir(struct file * filp,
 			 void * dirent, filldir_t filldir)
 {
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi
 
 	sb = inode->i_sb;
 
+	if (!filp->f_pos)
+		ext3_dir_readahead(filp);
+
 #ifdef CONFIG_EXT3_INDEX
 	if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
 				    EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:
 
 #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
 
-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create)
 {
 	handle_t *handle = journal_current_handle();

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-09  7:59                               ` Fengguang Wu
@ 2007-01-09 16:23                                 ` Linus Torvalds
       [not found]                                   ` <20070110015739.GA26978@mail.ustc.edu.cn>
  0 siblings, 1 reply; 110+ messages in thread
From: Linus Torvalds @ 2007-01-09 16:23 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau,
	H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4



On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > 
> > The fastest and probably most important thing to add is some readahead
> > smarts to directories --- both to the htree and non-htree cases.  If
> 
> Here's is a quick hack to practice the directory readahead idea.
> Comments are welcome, it's a freshman's work :)

Well, I'd probably have done it differently, but more important is whether 
this actually makes a difference performance-wise. Have you benchmarked it 
at all?

Doing an

	echo 3 > /proc/sys/vm/drop_caches

is your friend for testing things like this, to force cold-cache 
behaviour..

		Linus

^ permalink raw reply	[flat|nested] 110+ messages in thread

[parent not found: <20070110015739.GA26978@mail.ustc.edu.cn>]

* Re: How git affects kernel.org performance
       [not found]                                   ` <20070110015739.GA26978@mail.ustc.edu.cn>
@ 2007-01-10  1:57                                     ` Fengguang Wu
  2007-01-10  3:20                                     ` Nigel Cunningham
  1 sibling, 0 replies; 110+ messages in thread
From: Fengguang Wu @ 2007-01-10  1:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Suparna Bhattacharya, Andrew Morton, Willy Tarreau,
	H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
>
>
> On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > >
> > > The fastest and probably most important thing to add is some readahead
> > > smarts to directories --- both to the htree and non-htree cases.  If
> >
> > Here's is a quick hack to practice the directory readahead idea.
> > Comments are welcome, it's a freshman's work :)
>
> Well, I'd probably have done it differently, but more important is whether
> this actually makes a difference performance-wise. Have you benchmarked it
> at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find / > /dev/null

real    0m10.000s
user    0m0.210s
sys     0m4.370s

# time find / > /dev/null

real    0m9.890s
user    0m0.160s
sys     0m3.270s

> Doing an
>
> 	echo 3 > /proc/sys/vm/drop_caches
>
> is your friend for testing things like this, to force cold-cache
> behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
       [not found]                                   ` <20070110015739.GA26978@mail.ustc.edu.cn>
  2007-01-10  1:57                                     ` Fengguang Wu
@ 2007-01-10  3:20                                     ` Nigel Cunningham
       [not found]                                       ` <20070110140730.GA986@mail.ustc.edu.cn>
  1 sibling, 1 reply; 110+ messages in thread
From: Nigel Cunningham @ 2007-01-10  3:20 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya,
	Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

Hi.

On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
> On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
> >
> >
> > On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > > >
> > > > The fastest and probably most important thing to add is some readahead
> > > > smarts to directories --- both to the htree and non-htree cases.  If
> > >
> > > Here's is a quick hack to practice the directory readahead idea.
> > > Comments are welcome, it's a freshman's work :)
> >
> > Well, I'd probably have done it differently, but more important is whether
> > this actually makes a difference performance-wise. Have you benchmarked it
> > at all?
> 
> Yes, a trivial test shows a marginal improvement, on a minimal debian system:
> 
> # find / | wc -l
> 13641
> 
> # time find / > /dev/null
> 
> real    0m10.000s
> user    0m0.210s
> sys     0m4.370s
> 
> # time find / > /dev/null
> 
> real    0m9.890s
> user    0m0.160s
> sys     0m3.270s
> 
> > Doing an
> >
> > 	echo 3 > /proc/sys/vm/drop_caches
> >
> > is your friend for testing things like this, to force cold-cache
> > behaviour..
> 
> Thanks, I'll work out numbers on large/concurrent dir accesses soon.

I gave it a try, and I'm afraid the results weren't pretty.

I did:

time find /usr/src | wc -l

on current git with (3 times) and without (5 times) the patch, and got

with:
real   54.306, 54.327, 53.742s
usr    0.324, 0.284, 0.234s
sys    2.432, 2.484, 2.592s

without:
real   24.413, 24.616, 24.080s
usr    0.208, 0.316, 0.312s
sys:   2.496, 2.440, 2.540s

Subsequent runs without dropping caches did give a significant
improvement in both cases (1.821/.188/1.632 is one result I wrote with
the patch applied).

Regards,

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

[parent not found: <20070110140730.GA986@mail.ustc.edu.cn>]

* Re: How git affects kernel.org performance
       [not found]                                       ` <20070110140730.GA986@mail.ustc.edu.cn>
@ 2007-01-10 14:07                                         ` Fengguang Wu
  2007-01-12 10:54                                         ` Nigel Cunningham
  1 sibling, 0 replies; 110+ messages in thread
From: Fengguang Wu @ 2007-01-10 14:07 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya,
	Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote:
> Hi.
> 
> On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
> > On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
> > >
> > >
> > > On Tue, 9 Jan 2007, Fengguang Wu wrote:
> > > > >
> > > > > The fastest and probably most important thing to add is some readahead
> > > > > smarts to directories --- both to the htree and non-htree cases.  If
> > > >
> > > > Here's is a quick hack to practice the directory readahead idea.
> > > > Comments are welcome, it's a freshman's work :)
> > >
> > > Well, I'd probably have done it differently, but more important is whether
> > > this actually makes a difference performance-wise. Have you benchmarked it
> > > at all?
> > 
> > Yes, a trivial test shows a marginal improvement, on a minimal debian system:
> > 
> > # find / | wc -l
> > 13641
> > 
> > # time find / > /dev/null
> > 
> > real    0m10.000s
> > user    0m0.210s
> > sys     0m4.370s
> > 
> > # time find / > /dev/null
> > 
> > real    0m9.890s
> > user    0m0.160s
> > sys     0m3.270s
> > 
> > > Doing an
> > >
> > > 	echo 3 > /proc/sys/vm/drop_caches
> > >
> > > is your friend for testing things like this, to force cold-cache
> > > behaviour..
> > 
> > Thanks, I'll work out numbers on large/concurrent dir accesses soon.
> 
> I gave it a try, and I'm afraid the results weren't pretty.
> 
> I did:
> 
> time find /usr/src | wc -l
> 
> on current git with (3 times) and without (5 times) the patch, and got
> 
> with:
> real   54.306, 54.327, 53.742s
> usr    0.324, 0.284, 0.234s
> sys    2.432, 2.484, 2.592s
> 
> without:
> real   24.413, 24.616, 24.080s
> usr    0.208, 0.316, 0.312s
> sys:   2.496, 2.440, 2.540s
> 
> Subsequent runs without dropping caches did give a significant
> improvement in both cases (1.821/.188/1.632 is one result I wrote with
> the patch applied).

Thanks, Nigel.
But I'm very sorry that the calculation in the patch was wrong.

Would you give this new patch a run?

It produced pretty numbers here:

#!/bin/zsh

ROOT=/mnt/mnt
TIMEFMT="%E clock  %S kernel  %U user  %w+%c cs  %J"

echo 3 > /proc/sys/vm/drop_caches

# 49: enable dir readahead
# 50: disable
echo ${1:-50} > /proc/sys/vm/readahead_ratio

# time find $ROOT/a > /dev/null

time find /etch > /dev/null

# time find $ROOT/a > /dev/null&
# time grep -r asdf $ROOT/b > /dev/null&
# time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null&

exit 0

# collected results on a SATA disk:
# ./test-parallel-dir-reada.sh 49
4.18s clock  0.08s kernel  0.04s user  418+0 cs  find $ROOT/a > /dev/null
4.09s clock  0.10s kernel  0.02s user  410+1 cs  find $ROOT/a > /dev/null

# ./test-parallel-dir-reada.sh 50
12.18s clock  0.15s kernel  0.07s user  1520+4 cs  find $ROOT/a > /dev/null
11.99s clock  0.13s kernel  0.04s user  1558+6 cs  find $ROOT/a > /dev/null


# ./test-parallel-dir-reada.sh 49
4.01s clock  0.06s kernel  0.01s user  1567+2 cs  find /etch > /dev/null
4.08s clock  0.07s kernel  0.00s user  1568+0 cs  find /etch > /dev/null

# ./test-parallel-dir-reada.sh 50
4.10s clock  0.09s kernel  0.01s user  1578+1 cs  find /etch > /dev/null
4.19s clock  0.08s kernel  0.03s user  1578+0 cs  find /etch > /dev/null


# ./test-parallel-dir-reada.sh 49
7.73s clock  0.11s kernel  0.06s user  438+2 cs  find $ROOT/a > /dev/null
18.92s clock  0.43s kernel  0.02s user  1246+13 cs  cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
32.91s clock  4.20s kernel  1.55s user  103564+51 cs  grep -r asdf $ROOT/b > /dev/null

8.47s clock  0.10s kernel  0.02s user  442+4 cs  find $ROOT/a > /dev/null
19.24s clock  0.53s kernel  0.03s user  1250+23 cs  cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
29.93s clock  4.18s kernel  1.61s user  100425+47 cs  grep -r asdf $ROOT/b > /dev/null

# ./test-parallel-dir-reada.sh 50
17.87s clock  0.57s kernel  0.02s user  1244+21 cs  cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
21.30s clock  0.08s kernel  0.05s user  1517+5 cs  find $ROOT/a > /dev/null
49.68s clock  3.94s kernel  1.67s user  101520+57 cs  grep -r asdf $ROOT/b > /dev/null

15.66s clock  0.51s kernel  0.00s user  1248+25 cs  cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
22.15s clock  0.15s kernel  0.04s user  1520+5 cs  find $ROOT/a > /dev/null
46.14s clock  4.08s kernel  1.68s user  101517+63 cs  grep -r asdf $ROOT/b > /dev/null

Thanks,
Wu
---

Subject: ext3 readdir readahead

Do readahead for ext3_readdir().

Reasons to be aggressive:
- readdir() users are likely to traverse the whole directory,
  so readahead miss is not a concern.
- most dirs are small, so slow start is not good
- the htree indexing introduces some randomness,
  which can be helped by the aggressiveness.

So we do 128K sized readaheads, at twice the speed of reads.

The following actual readahead pages are collected for a dir with
110000 entries:
	32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19
That means a readahead hit ratio of
	454/541 = 84%

The performance is marginally better for a minimal debian system:
	command:	find /
	baseline:	4.10s	4.19s
	patched:	4.01s	4.08s

And considerably better for 100 directories, each with 1000 8K files:
	command:	find /throwaways
	baseline:	12.18s	11.99s
	patched:	 4.18s	 4.09s

And also noticable better for parallel operations:
					baseline	patched
	find /throwaways &		21.30s 22.15s    7.73s  8.47s 
	grep -r asdf /throwaways2 &     49.68s 46.14s   32.91s 29.93s 
	cp /KNOPPIX_CD.iso /dev/null &  17.87s 15.66s   18.92s 19.24s 

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/ext3/dir.c           |   33 +++++++++++++++++++++++++++++++++
 fs/ext3/inode.c         |    2 +-
 include/linux/ext3_fs.h |    2 ++
 3 files changed, 36 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,28 @@ int ext3_check_dir_entry (const char * f
 	return error_msg == NULL ? 1 : 0;
 }
 
+#define DIR_READAHEAD_BYTES  (128*1024)
+#define DIR_READAHEAD_PGMASK ((DIR_READAHEAD_BYTES >> PAGE_CACHE_SHIFT) - 1)
+
+static void ext3_dir_readahead(struct file * filp)
+{
+	struct inode *inode = filp->f_path.dentry->d_inode;
+	struct address_space *mapping = inode->i_sb->s_bdev->bd_inode->i_mapping;
+	int bbits = inode->i_blkbits;
+	unsigned long blk, end;
+
+	blk = filp->f_ra.prev_page << (PAGE_CACHE_SHIFT - bbits);
+	end = min(inode->i_blocks >> (bbits - 9),
+		  blk + (DIR_READAHEAD_BYTES >> bbits));
+
+	for (; blk < end; blk++) {
+		pgoff_t phy;
+		phy = generic_block_bmap(inode->i_mapping, blk, ext3_get_block)
+				>> (PAGE_CACHE_SHIFT - bbits);
+		do_page_cache_readahead(mapping, filp, phy, 1);
+	}
+}
+
 static int ext3_readdir(struct file * filp,
 			 void * dirent, filldir_t filldir)
 {
@@ -108,6 +130,17 @@ static int ext3_readdir(struct file * fi
 
 	sb = inode->i_sb;
 
+	/*
+	 * Reading-ahead at 2x the page fault rate, in hope of reducing
+	 * readahead misses caused by the partially random htree order.
+	 */
+	filp->f_ra.prev_page += 2;
+	filp->f_ra.prev_page &= ~1;
+
+	if (!(filp->f_ra.prev_page & DIR_READAHEAD_PGMASK) &&
+		filp->f_ra.prev_page < (inode->i_blocks >> (PAGE_CACHE_SHIFT-9)))
+		ext3_dir_readahead(filp);
+
 #ifdef CONFIG_EXT3_INDEX
 	if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
 				    EXT3_FEATURE_COMPAT_DIR_INDEX) &&
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:
 
 #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
 
-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
 			struct buffer_head *bh_result, int create)
 {
 	handle_t *handle = journal_current_handle();
--- linux.orig/include/linux/ext3_fs.h
+++ linux/include/linux/ext3_fs.h
@@ -814,6 +814,8 @@ struct buffer_head * ext3_bread (handle_
 int ext3_get_blocks_handle(handle_t *handle, struct inode *inode,
 	sector_t iblock, unsigned long maxblocks, struct buffer_head *bh_result,
 	int create, int extend_disksize);
+extern int ext3_get_block(struct inode *inode, sector_t iblock,
+			struct buffer_head *bh_result, int create);
 
 extern void ext3_read_inode (struct inode *);
 extern int  ext3_write_inode (struct inode *, int);

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
       [not found]                                       ` <20070110140730.GA986@mail.ustc.edu.cn>
  2007-01-10 14:07                                         ` Fengguang Wu
@ 2007-01-12 10:54                                         ` Nigel Cunningham
  1 sibling, 0 replies; 110+ messages in thread
From: Nigel Cunningham @ 2007-01-12 10:54 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Linus Torvalds, Theodore Tso, Suparna Bhattacharya,
	Andrew Morton, Willy Tarreau, H. Peter Anvin, git, J.H.,
	Randy Dunlap, Pavel Machek, kernel list, webmaster, linux-ext4

Hi.

On Wed, 2007-01-10 at 22:07 +0800, Fengguang Wu wrote:
> Thanks, Nigel.
> But I'm very sorry that the calculation in the patch was wrong.
> 
> Would you give this new patch a run?

Sorry for my slowness. I just did

time find /usr/src | wc -l

again:

Without patch: 35.137, 35.104, 35.351 seconds
With patch: 34.518, 34.376, 34.489 seconds

So there's about .8 seconds saved.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
  2007-01-07  5:39                   ` Linus Torvalds
@ 2007-01-07 14:57                   ` Robert Fitzsimons
  2007-01-07 19:12                     ` J.H.
  2007-01-08  1:51                     ` Jakub Narebski
  2007-01-07 15:06                   ` Krzysztof Halasa
  2 siblings, 2 replies; 110+ messages in thread
From: Robert Fitzsimons @ 2007-01-07 14:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

> Some more data on how git affects kernel.org...

I have a quick question about the gitweb configuration, does the
$projects_list config entry point to a directory or a file?

When it is a directory gitweb ends up doing the equivalent of a 'find
$project_list' to find all the available projects, so it really should
be changed to a projects list file.

Robert


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 14:57                   ` Robert Fitzsimons
@ 2007-01-07 19:12                     ` J.H.
  2007-01-08  1:51                     ` Jakub Narebski
  1 sibling, 0 replies; 110+ messages in thread
From: J.H. @ 2007-01-07 19:12 UTC (permalink / raw)
  To: Robert Fitzsimons
  Cc: H. Peter Anvin, git, nigel, Randy Dunlap, Andrew Morton,
	Pavel Machek, kernel list, webmaster

With my gitweb caching changes this isn't as big of a deal as the front
page is only generated once every 10 minutes or so (and with the changes
I'm working on today that timeout will be variable)

- John

On Sun, 2007-01-07 at 14:57 +0000, Robert Fitzsimons wrote:
> > Some more data on how git affects kernel.org...
> 
> I have a quick question about the gitweb configuration, does the
> $projects_list config entry point to a directory or a file?
> 
> When it is a directory gitweb ends up doing the equivalent of a 'find
> $project_list' to find all the available projects, so it really should
> be changed to a projects list file.
> 
> Robert


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 14:57                   ` Robert Fitzsimons
  2007-01-07 19:12                     ` J.H.
@ 2007-01-08  1:51                     ` Jakub Narebski
  1 sibling, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2007-01-08  1:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: git

Robert Fitzsimons wrote:

>> Some more data on how git affects kernel.org...
> 
> I have a quick question about the gitweb configuration, does the
> $projects_list config entry point to a directory or a file?

It can point to both. Usually it is either unset, and then we
do find over $projectroot, or it is a file (URI escaped path
relative to $projectroot, SPACE, and URI escaped owner of a project;
you can get the file clicking on TXT on projects_list page).

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
  2007-01-07  5:39                   ` Linus Torvalds
  2007-01-07 14:57                   ` Robert Fitzsimons
@ 2007-01-07 15:06                   ` Krzysztof Halasa
  2007-01-07 20:31                     ` Shawn O. Pearce
  2 siblings, 1 reply; 110+ messages in thread
From: Krzysztof Halasa @ 2007-01-07 15:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

"H. Peter Anvin" <hpa@zytor.com> writes:

> During extremely high load, it appears that what slows kernel.org down
> more than anything else is the time that each individual getdents()
> call takes.  When I've looked this I've observed times from 200 ms to
> almost 2 seconds!  Since an unpacked *OR* unpruned git tree adds 256
> directories to a cleanly packed tree, you can do the math yourself.

Hmm... Perhaps it should be possible to push git updates as a pack
file only? I mean, the pack file would stay packed = never individual
files and never 256 directories?

People aren't doing commit/etc. activity there, right?
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 15:06                   ` Krzysztof Halasa
@ 2007-01-07 20:31                     ` Shawn O. Pearce
  2007-01-08 14:46                       ` Nicolas Pitre
  0 siblings, 1 reply; 110+ messages in thread
From: Shawn O. Pearce @ 2007-01-07 20:31 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Krzysztof Halasa <khc@pm.waw.pl> wrote:
> Hmm... Perhaps it should be possible to push git updates as a pack
> file only? I mean, the pack file would stay packed = never individual
> files and never 256 directories?

Latest Git does this.  If the server is later than 1.4.3.3 then
the receive-pack process can actually store the pack file rather
than unpacking it into loose objects.  The downside is that it will
copy any missing base objects onto the end of a thin pack to make
it not-thin.

There's actually a limit that controls when to keep the pack and when
not to (receive.unpackLimit).  In 1.4.3.3 this defaulted to 5000
objects, which meant all but the largest pushes will be exploded
into loose objects.  In 1.5.0-rc0 that limit changed from 5000 to
100, though Nico did a lot of study and discovered that the optimum
is likely 3.  But that tends to create too many pack files so 100
was arbitrarily chosen.

So if the user pushes <100 objects to a 1.5.0-rc0 server we unpack
to loose; >= 100 we keep the pack file.  Perhaps this would help
kernel.org.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: How git affects kernel.org performance
  2007-01-07 20:31                     ` Shawn O. Pearce
@ 2007-01-08 14:46                       ` Nicolas Pitre
  0 siblings, 0 replies; 110+ messages in thread
From: Nicolas Pitre @ 2007-01-08 14:46 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Krzysztof Halasa, H. Peter Anvin, git, nigel, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sun, 7 Jan 2007, Shawn O. Pearce wrote:

> Krzysztof Halasa <khc@pm.waw.pl> wrote:
> > Hmm... Perhaps it should be possible to push git updates as a pack
> > file only? I mean, the pack file would stay packed = never individual
> > files and never 256 directories?
> 
> Latest Git does this.  If the server is later than 1.4.3.3 then
> the receive-pack process can actually store the pack file rather
> than unpacking it into loose objects.  The downside is that it will
> copy any missing base objects onto the end of a thin pack to make
> it not-thin.

No.  There are no thin packs for pushes.  And IMHO it should stay that 
way exactly to avoid this little inconvenience on servers.

The fetch case is a different story of course.


Nicolas

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07  5:17               ` H. Peter Anvin
  2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
@ 2007-01-09  4:29                 ` Nigel Cunningham
  2007-01-09  5:09                   ` Adrian Bunk
  1 sibling, 1 reply; 110+ messages in thread
From: Nigel Cunningham @ 2007-01-09  4:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster

Hi again.

On Sat, 2007-01-06 at 21:17 -0800, H. Peter Anvin wrote:
> Nigel Cunningham wrote:
> > On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
> >> The two things git users can do to help is:
> >>
> >> 1. Make sure your alternatives file is set up correctly;
> >> 2. Keep your trees packed and pruned, to keep the file count down.
> >>
> >> If you do this, the load imposed by a single git tree is fairly negible.
> > 
> > Sorry for the slow reply, and the ignorance... what's an alternatives
> > file? I've never heard of them before.
> > 
> 
> Just a minor correction; it's the "alternates" file 
> (objects/info/alternates).

I went looking for documentation on how to use the alternates feature,
and found an email from September 2005
(http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2860.html) that
says:

<quote>
/pub/scm/linux/kernel/git/$u/$tree

Of course, you may have more than one such $tree. The
suggestion by Linus was to do (please do not do this yet -- that
is what this message is about):

$ cd /pub/scm/linux/kernel/git/$u/$tree
$ cat /pub/scm/linux/kernel/git/torvalds/linux-2.6/objects \
>objects/info/alternates
$ GIT_DIR=. git prune
</quote>

Are these instructions still correct in the case of master.kernel.org?

Regards,

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-09  4:29                 ` [KORG] Re: kernel.org lies about latest -mm kernel Nigel Cunningham
@ 2007-01-09  5:09                   ` Adrian Bunk
  2007-01-09  5:51                     ` Nigel Cunningham
  0 siblings, 1 reply; 110+ messages in thread
From: Adrian Bunk @ 2007-01-09  5:09 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: H. Peter Anvin, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Tue, Jan 09, 2007 at 03:29:35PM +1100, Nigel Cunningham wrote:
> Hi again.
> 
> On Sat, 2007-01-06 at 21:17 -0800, H. Peter Anvin wrote:
> > Nigel Cunningham wrote:
> > > On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
> > >> The two things git users can do to help is:
> > >>
> > >> 1. Make sure your alternatives file is set up correctly;
> > >> 2. Keep your trees packed and pruned, to keep the file count down.
> > >>
> > >> If you do this, the load imposed by a single git tree is fairly negible.
> > > 
> > > Sorry for the slow reply, and the ignorance... what's an alternatives
> > > file? I've never heard of them before.
> > > 
> > 
> > Just a minor correction; it's the "alternates" file 
> > (objects/info/alternates).
> 
> I went looking for documentation on how to use the alternates feature,
> and found an email from September 2005
> (http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2860.html) that
> says:
> 
> <quote>
> /pub/scm/linux/kernel/git/$u/$tree
> 
> Of course, you may have more than one such $tree. The
> suggestion by Linus was to do (please do not do this yet -- that
> is what this message is about):
> 
> $ cd /pub/scm/linux/kernel/git/$u/$tree
> $ cat /pub/scm/linux/kernel/git/torvalds/linux-2.6/objects \
> >objects/info/alternates
> $ GIT_DIR=. git prune
> </quote>
> 
> Are these instructions still correct in the case of master.kernel.org?

It works for me (instead of "git prune" I was using 
"git-repack -a -d -l -f").

> Regards,
> 
> Nigel

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-09  5:09                   ` Adrian Bunk
@ 2007-01-09  5:51                     ` Nigel Cunningham
  0 siblings, 0 replies; 110+ messages in thread
From: Nigel Cunningham @ 2007-01-09  5:51 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: H. Peter Anvin, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Hi again.

On Tue, 2007-01-09 at 06:09 +0100, Adrian Bunk wrote:
> On Tue, Jan 09, 2007 at 03:29:35PM +1100, Nigel Cunningham wrote:
> > Hi again.
> > 
> > On Sat, 2007-01-06 at 21:17 -0800, H. Peter Anvin wrote:
> > > Nigel Cunningham wrote:
> > > > On Tue, 2006-12-26 at 08:49 -0800, H. Peter Anvin wrote:
> > > >> The two things git users can do to help is:
> > > >>
> > > >> 1. Make sure your alternatives file is set up correctly;
> > > >> 2. Keep your trees packed and pruned, to keep the file count down.
> > > >>
> > > >> If you do this, the load imposed by a single git tree is fairly negible.
> > > > 
> > > > Sorry for the slow reply, and the ignorance... what's an alternatives
> > > > file? I've never heard of them before.
> > > > 
> > > 
> > > Just a minor correction; it's the "alternates" file 
> > > (objects/info/alternates).
> > 
> > I went looking for documentation on how to use the alternates feature,
> > and found an email from September 2005
> > (http://www.ussg.iu.edu/hypermail/linux/kernel/0509.1/2860.html) that
> > says:
> > 
> > <quote>
> > /pub/scm/linux/kernel/git/$u/$tree
> > 
> > Of course, you may have more than one such $tree. The
> > suggestion by Linus was to do (please do not do this yet -- that
> > is what this message is about):
> > 
> > $ cd /pub/scm/linux/kernel/git/$u/$tree
> > $ cat /pub/scm/linux/kernel/git/torvalds/linux-2.6/objects \
> > >objects/info/alternates
> > $ GIT_DIR=. git prune
> > </quote>
> > 
> > Are these instructions still correct in the case of master.kernel.org?
> 
> It works for me (instead of "git prune" I was using 
> "git-repack -a -d -l -f").

There's a typo in the above commands. It should be echo instead of cat.
In addition, just typing "git prune" didn't save much. When I did
git-prune -a -l -d, however, all of my trees shrunk to just a couple of
megs (just my deltas, I assume).

Maybe this will be helpful to someone else.

Regards,

Nigel


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
  2006-12-16 20:30         ` Russell King
  2006-12-16 21:21         ` Nigel Cunningham
@ 2006-12-17 12:32         ` Pavel Machek
  2006-12-17 13:13           ` Jeff Garzik
  2006-12-17 18:23         ` Randy Dunlap
                           ` (3 subsequent siblings)
  6 siblings, 1 reply; 110+ messages in thread
From: Pavel Machek @ 2006-12-17 12:32 UTC (permalink / raw)
  To: J.H., vojtech; +Cc: Randy Dunlap, Andrew Morton, kernel list, hpa, webmaster

Hi!

> The problem has been hashed over quite a bit recently, and I would be
> curious what you would consider the real problem after you see the
> situation.
> 
> The root cause boils down to with git, gitweb and the normal mirroring
> on the frontend machines our basic working set no longer stays resident
> in memory, which is forcing more and more to actively go to disk causing
> a much higher I/O load.  You have the added problem that one of the
> frontend machines is getting hit harder than the other due to several
> factors: various DNS servers not round robining, people explicitly
> hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> probably several other factors we aren't aware of.  This has caused the
> average load on that machine to hover around 150-200 and if for whatever
> reason we have to take one of the machines down the load on the
> remaining machine will skyrocket to 2000+.  
> 
> Since it's apparent not everyone is aware of what we are doing, I'll
> mention briefly some of the bigger points.
> 
> - We have contacted HP to see if we can get additional hardware, mind
> you though this is a long term solution and will take time, but if our
> request is approved it will double the number of machines kernel.org
> runs.

Would you accept help from someone else than HP? kernel.org is very
important, and hardware is cheap these days... What are the
requirements for machine to be interesting to kernel.org? I guess
AMD/1GHz, 1GB ram, 100GB disk is not interesting to you....


								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-17 12:32         ` Pavel Machek
@ 2006-12-17 13:13           ` Jeff Garzik
  0 siblings, 0 replies; 110+ messages in thread
From: Jeff Garzik @ 2006-12-17 13:13 UTC (permalink / raw)
  To: Pavel Machek
  Cc: J.H., vojtech, Randy Dunlap, Andrew Morton, kernel list, hpa, webmaster

Pavel Machek wrote:
> Would you accept help from someone else than HP? kernel.org is very
> important, and hardware is cheap these days... What are the
> requirements for machine to be interesting to kernel.org? I guess
> AMD/1GHz, 1GB ram, 100GB disk is not interesting to you....

quoting http://www.kernel.org/ ...

"We have put two new external servers, graciously donated by 
Hewlett-Packard into full production use. These servers are both 
ProLiant DL585  quad Opteron servers, each with 24 GB of RAM and 10 TB 
of disk. Huge thanks to HP!"


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
                           ` (2 preceding siblings ...)
  2006-12-17 12:32         ` Pavel Machek
@ 2006-12-17 18:23         ` Randy Dunlap
  2006-12-17 22:37           ` Matti Aarnio
  2007-01-08 20:10           ` Jean Delvare
  2006-12-19  6:34         ` Willy Tarreau
                           ` (2 subsequent siblings)
  6 siblings, 2 replies; 110+ messages in thread
From: Randy Dunlap @ 2006-12-17 18:23 UTC (permalink / raw)
  To: J.H.; +Cc: Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

J.H. wrote:
> The problem has been hashed over quite a bit recently, and I would be
> curious what you would consider the real problem after you see the
> situation.

OK, thanks for the summary.

> The root cause boils down to with git, gitweb and the normal mirroring
> on the frontend machines our basic working set no longer stays resident
> in memory, which is forcing more and more to actively go to disk causing
> a much higher I/O load.  You have the added problem that one of the
> frontend machines is getting hit harder than the other due to several
> factors: various DNS servers not round robining, people explicitly
> hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> probably several other factors we aren't aware of.  This has caused the
> average load on that machine to hover around 150-200 and if for whatever
> reason we have to take one of the machines down the load on the
> remaining machine will skyrocket to 2000+.  
> 
> Since it's apparent not everyone is aware of what we are doing, I'll
> mention briefly some of the bigger points.
> 
> - We have contacted HP to see if we can get additional hardware, mind
> you though this is a long term solution and will take time, but if our
> request is approved it will double the number of machines kernel.org
> runs.
> 
> - Gitweb is causing us no end of headache, there are (known to me
> anyway) two different things happening on that.  I am looking at Jeff
> Garzik's suggested caching mechanism as a temporary stop-gap, with an
> eye more on doing a rather heavy re-write of gitweb itself to include
> semi-intelligent caching.  I've already started in on the later - and I
> just about have the caching layer put in.  But this is still at least a
> week out before we could even remotely consider deploying it.
> 
> - We've cut back on the number of ftp and rsync users to the machines.
> Basically we are cutting back where we can in an attempt to keep the
> load from spiraling out of control, this helped a bit when we recently
> had to take one of the machines down and instead of loads spiking into
> the 2000+ range we peaked at about 500-600 I believe.
> 
> So we know the problem is there, and we are working on it - we are
> getting e-mails about it if not daily than every other day or so.  If
> there are suggestions we are willing to hear them - but the general
> feeling with the admins is that we are probably hitting the biggest
> problems already.

I have (or had) no insight into the problem analysis, just that there
is a big problem.  Fortunately you and others know that too and
are working on it.

You asked what I (or anyone) would consider the real problem.
I can't really say since I have no performance/profile data to base
it on.  There has been some noise about (not) providing mirror services
for distros.  Is that a big cpu/memory consumer?  If so, then is that
something that kernel.org could shed over some N (6 ?) months?
I understand not dropping it immediately, but it seems to be more of
a convenience rather than something related to kernel development.


> - John 'Warthog9' Hawley
> Kernel.org Admin
> 
> On Sat, 2006-12-16 at 10:02 -0800, Randy Dunlap wrote:
>> Andrew Morton wrote:
>>> On Sat, 16 Dec 2006 09:44:21 -0800
>>> Randy Dunlap <randy.dunlap@oracle.com> wrote:
>>>
>>>> On Thu, 14 Dec 2006 23:37:18 +0100 Pavel Machek wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> pavel@amd:/data/pavel$ finger @www.kernel.org
>>>>> [zeus-pub.kernel.org]
>>>>> ...
>>>>> The latest -mm patch to the stable Linux kernels is: 2.6.19-rc6-mm2
>>>>> pavel@amd:/data/pavel$ head /data/l/linux-mm/Makefile
>>>>> VERSION = 2
>>>>> PATCHLEVEL = 6
>>>>> SUBLEVEL = 19
>>>>> EXTRAVERSION = -mm1
>>>>> ...
>>>>> pavel@amd:/data/pavel$
>>>>>
>>>>> AFAICT 2.6.19-mm1 is newer than 2.6.19-rc6-mm2, but kernel.org does
>>>>> not understand that.
>>>> Still true (not listed) for 2.6.20-rc1-mm1  :(
>>>>
>>>> Could someone explain what the problem is and what it would
>>>> take to correct it?
>>> 2.6.20-rc1-mm1 still hasn't propagated out to the servers (it's been 36
>>> hours).  Presumably the front page non-update is a consequence of that.
>> Agreed on the latter part.  Can someone address the real problem???

-- 
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-17 18:23         ` Randy Dunlap
@ 2006-12-17 22:37           ` Matti Aarnio
  2006-12-18  0:42             ` J.H.
  2007-01-08 20:10           ` Jean Delvare
  1 sibling, 1 reply; 110+ messages in thread
From: Matti Aarnio @ 2006-12-17 22:37 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: J.H., Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> J.H. wrote:
...
> >The root cause boils down to with git, gitweb and the normal mirroring
> >on the frontend machines our basic working set no longer stays resident
> >in memory, which is forcing more and more to actively go to disk causing
> >a much higher I/O load.  You have the added problem that one of the
> >frontend machines is getting hit harder than the other due to several
> >factors: various DNS servers not round robining, people explicitly
> >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> >probably several other factors we aren't aware of.  This has caused the
> >average load on that machine to hover around 150-200 and if for whatever
> >reason we have to take one of the machines down the load on the
> >remaining machine will skyrocket to 2000+.  

Relaying on DNS and clients doing round-robin load-balancing is doomed.

You really, REALLY, need external L4 load-balancer switches.
(And installation help from somebody who really knows how to do this
kind of services on a cluster.)

Basic config features include, of course:
 - number of parallel active connections with each protocol
 - availability of each served protocol  (e.g. one can shutdown rsync
   at one server, and new rsync connections get pushed elsewere)
 - running load-balance of each served protocol separately
 - server load monitoring and letting it bias new connections to nodes
   not so utterly loaded
 - allowing direct access to each server in addition to the access
   via cluster service
 - some sort of connection persistence, only for HTTP access ?
   (ftp and rsync can do nicely without)

> >Since it's apparent not everyone is aware of what we are doing, I'll
> >mention briefly some of the bigger points.
...
> >- We've cut back on the number of ftp and rsync users to the machines.
> >Basically we are cutting back where we can in an attempt to keep the
> >load from spiraling out of control, this helped a bit when we recently
> >had to take one of the machines down and instead of loads spiking into
> >the 2000+ range we peaked at about 500-600 I believe.

How about having filesystems mounted with "noatime" ?
Or do you already do that ?

> >So we know the problem is there, and we are working on it - we are
> >getting e-mails about it if not daily than every other day or so.  If
> >there are suggestions we are willing to hear them - but the general
> >feeling with the admins is that we are probably hitting the biggest
> >problems already.

/Matti Aarnio

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-17 22:37           ` Matti Aarnio
@ 2006-12-18  0:42             ` J.H.
  2006-12-19  6:46               ` Willy Tarreau
  0 siblings, 1 reply; 110+ messages in thread
From: J.H. @ 2006-12-18  0:42 UTC (permalink / raw)
  To: Matti Aarnio
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Mon, 2006-12-18 at 00:37 +0200, Matti Aarnio wrote:
> On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> > J.H. wrote:
> ...
> > >The root cause boils down to with git, gitweb and the normal mirroring
> > >on the frontend machines our basic working set no longer stays resident
> > >in memory, which is forcing more and more to actively go to disk causing
> > >a much higher I/O load.  You have the added problem that one of the
> > >frontend machines is getting hit harder than the other due to several
> > >factors: various DNS servers not round robining, people explicitly
> > >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> > >probably several other factors we aren't aware of.  This has caused the
> > >average load on that machine to hover around 150-200 and if for whatever
> > >reason we have to take one of the machines down the load on the
> > >remaining machine will skyrocket to 2000+.  
> 
> Relaying on DNS and clients doing round-robin load-balancing is doomed.
> 
> You really, REALLY, need external L4 load-balancer switches.
> (And installation help from somebody who really knows how to do this
> kind of services on a cluster.)

While this is a really good idea when you have systems that are all in a
single location, with a single uplink and what not - this isn't the case
with kernel.org.  Our machines are currently in three separate
facilities in the US (spanning two different states), with us working on
a fourth in Europe.

> > >Since it's apparent not everyone is aware of what we are doing, I'll
> > >mention briefly some of the bigger points.
> ...
> > >- We've cut back on the number of ftp and rsync users to the machines.
> > >Basically we are cutting back where we can in an attempt to keep the
> > >load from spiraling out of control, this helped a bit when we recently
> > >had to take one of the machines down and instead of loads spiking into
> > >the 2000+ range we peaked at about 500-600 I believe.
> 
> How about having filesystems mounted with "noatime" ?
> Or do you already do that ?

We've been doing that for over a year.

- John


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-18  0:42             ` J.H.
@ 2006-12-19  6:46               ` Willy Tarreau
  2006-12-19  7:39                 ` J.H.
  0 siblings, 1 reply; 110+ messages in thread
From: Willy Tarreau @ 2006-12-19  6:46 UTC (permalink / raw)
  To: J.H.
  Cc: Matti Aarnio, Randy Dunlap, Andrew Morton, Pavel Machek,
	kernel list, hpa, webmaster

On Sun, Dec 17, 2006 at 04:42:56PM -0800, J.H. wrote:
> On Mon, 2006-12-18 at 00:37 +0200, Matti Aarnio wrote:
> > On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> > > J.H. wrote:
> > ...
> > > >The root cause boils down to with git, gitweb and the normal mirroring
> > > >on the frontend machines our basic working set no longer stays resident
> > > >in memory, which is forcing more and more to actively go to disk causing
> > > >a much higher I/O load.  You have the added problem that one of the
> > > >frontend machines is getting hit harder than the other due to several
> > > >factors: various DNS servers not round robining, people explicitly
> > > >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> > > >probably several other factors we aren't aware of.  This has caused the
> > > >average load on that machine to hover around 150-200 and if for whatever
> > > >reason we have to take one of the machines down the load on the
> > > >remaining machine will skyrocket to 2000+.  
> > 
> > Relaying on DNS and clients doing round-robin load-balancing is doomed.
> > 
> > You really, REALLY, need external L4 load-balancer switches.
> > (And installation help from somebody who really knows how to do this
> > kind of services on a cluster.)
> 
> While this is a really good idea when you have systems that are all in a
> single location, with a single uplink and what not - this isn't the case
> with kernel.org.  Our machines are currently in three separate
> facilities in the US (spanning two different states), with us working on
> a fourth in Europe.

On multi-site setups, you have to rely on DNS, but the DNS should not
announce the servers themselves, but the local load balancers, each of
which knows other sites.

While people often find it dirty, there's no problem forwarding a
request from one site to another via the internet as long as there
are big pipes. Generally, I play with weights to slightly smooth
the load and reduce the bandwidth usage on the pipe (eg: 2/3 local,
1/3 remote).

With LVS, you can even use the tunneling mode, with which the request
comes to LB on site A, is forwarded to site B via the net, but the data
returns from site B to the client.

If the frontend machines are not taken off-line too often, it should
be no big deal for them to handle something such as LVS, and would
help spreding the load.

> > > >Since it's apparent not everyone is aware of what we are doing, I'll
> > > >mention briefly some of the bigger points.
> > ...
> > > >- We've cut back on the number of ftp and rsync users to the machines.
> > > >Basically we are cutting back where we can in an attempt to keep the
> > > >load from spiraling out of control, this helped a bit when we recently
> > > >had to take one of the machines down and instead of loads spiking into
> > > >the 2000+ range we peaked at about 500-600 I believe.
> > 
> > How about having filesystems mounted with "noatime" ?
> > Or do you already do that ?
> 
> We've been doing that for over a year.

Couldn't we temporarily *cut* the services one after the other on www1
to find which ones are the most I/O consumming, and see which ones can
coexist without bad interaction ?

Also, I see that keepalive is still enabled on apache, I guess there
are thousands of processes and that apache is eating gigs of RAM by
itself. I strongly suggest disabling keepalive there.

> - John

Just my 2 cents,
Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19  6:46               ` Willy Tarreau
@ 2006-12-19  7:39                 ` J.H.
  2006-12-19 13:32                   ` Willy Tarreau
  2006-12-19 14:36                   ` Dave Jones
  0 siblings, 2 replies; 110+ messages in thread
From: J.H. @ 2006-12-19  7:39 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Matti Aarnio, Randy Dunlap, Andrew Morton, Pavel Machek,
	kernel list, hpa, webmaster

On Tue, 2006-12-19 at 07:46 +0100, Willy Tarreau wrote:
> On Sun, Dec 17, 2006 at 04:42:56PM -0800, J.H. wrote:
> > On Mon, 2006-12-18 at 00:37 +0200, Matti Aarnio wrote:
> > > On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> > > > J.H. wrote:
> > > ...
> > > > >The root cause boils down to with git, gitweb and the normal mirroring
> > > > >on the frontend machines our basic working set no longer stays resident
> > > > >in memory, which is forcing more and more to actively go to disk causing
> > > > >a much higher I/O load.  You have the added problem that one of the
> > > > >frontend machines is getting hit harder than the other due to several
> > > > >factors: various DNS servers not round robining, people explicitly
> > > > >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> > > > >probably several other factors we aren't aware of.  This has caused the
> > > > >average load on that machine to hover around 150-200 and if for whatever
> > > > >reason we have to take one of the machines down the load on the
> > > > >remaining machine will skyrocket to 2000+.  
> > > 
> > > Relaying on DNS and clients doing round-robin load-balancing is doomed.
> > > 
> > > You really, REALLY, need external L4 load-balancer switches.
> > > (And installation help from somebody who really knows how to do this
> > > kind of services on a cluster.)
> > 
> > While this is a really good idea when you have systems that are all in a
> > single location, with a single uplink and what not - this isn't the case
> > with kernel.org.  Our machines are currently in three separate
> > facilities in the US (spanning two different states), with us working on
> > a fourth in Europe.
> 
> On multi-site setups, you have to rely on DNS, but the DNS should not
> announce the servers themselves, but the local load balancers, each of
> which knows other sites.
> 
> While people often find it dirty, there's no problem forwarding a
> request from one site to another via the internet as long as there
> are big pipes. Generally, I play with weights to slightly smooth
> the load and reduce the bandwidth usage on the pipe (eg: 2/3 local,
> 1/3 remote).
> 
> With LVS, you can even use the tunneling mode, with which the request
> comes to LB on site A, is forwarded to site B via the net, but the data
> returns from site B to the client.
> 
> If the frontend machines are not taken off-line too often, it should
> be no big deal for them to handle something such as LVS, and would
> help spreding the load.

I'll have to look into it - but by and large the round robining tends to
work.  Specifically as I am writing this the machines are both pushing
right around 150mbps, however the load on zeus1 is 170 vs. zeus2's 4.
Also when we peak the bandwidth we do use every last kb we can get our
hands on, so doing any tunneling takes just that much bandwidth away
from the total.

	Number of Processes running
process		#1	#2
------------------------------------
rsync		162	69
http		734	642
ftp		353	190

as a quick snapshot.  I would agree with HPA's recent statement - that
people who are mirroring against kernel.org have probably hard coded the
first machine into their scripts, combine that with a few dns servers
that don't honor or deal with round robining and you have the extra load
on the first machine vs. the second.

> 
> > > > >Since it's apparent not everyone is aware of what we are doing, I'll
> > > > >mention briefly some of the bigger points.
> > > ...
> > > > >- We've cut back on the number of ftp and rsync users to the machines.
> > > > >Basically we are cutting back where we can in an attempt to keep the
> > > > >load from spiraling out of control, this helped a bit when we recently
> > > > >had to take one of the machines down and instead of loads spiking into
> > > > >the 2000+ range we peaked at about 500-600 I believe.
> > > 
> > > How about having filesystems mounted with "noatime" ?
> > > Or do you already do that ?
> > 
> > We've been doing that for over a year.
> 
> Couldn't we temporarily *cut* the services one after the other on www1
> to find which ones are the most I/O consumming, and see which ones can
> coexist without bad interaction ?
> 
> Also, I see that keepalive is still enabled on apache, I guess there
> are thousands of processes and that apache is eating gigs of RAM by
> itself. I strongly suggest disabling keepalive there.
> 
> > - John
> 
> Just my 2 cents,
> Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19  7:39                 ` J.H.
@ 2006-12-19 13:32                   ` Willy Tarreau
  2006-12-19 14:36                   ` Dave Jones
  1 sibling, 0 replies; 110+ messages in thread
From: Willy Tarreau @ 2006-12-19 13:32 UTC (permalink / raw)
  To: J.H.
  Cc: Matti Aarnio, Randy Dunlap, Andrew Morton, Pavel Machek,
	kernel list, hpa, webmaster

On Mon, Dec 18, 2006 at 11:39:51PM -0800, J.H. wrote:
> > If the frontend machines are not taken off-line too often, it should
> > be no big deal for them to handle something such as LVS, and would
> > help spreding the load.
> 
> I'll have to look into it - but by and large the round robining tends to
> work.  Specifically as I am writing this the machines are both pushing
> right around 150mbps, however the load on zeus1 is 170 vs. zeus2's 4.
> Also when we peak the bandwidth we do use every last kb we can get our
> hands on, so doing any tunneling takes just that much bandwidth away
> from the total.

Indeed.

> 	Number of Processes running
> process		#1	#2
> ------------------------------------
> rsync		162	69
> http		734	642
> ftp		353	190
> 
> as a quick snapshot.  I would agree with HPA's recent statement - that
> people who are mirroring against kernel.org have probably hard coded the
> first machine into their scripts, combine that with a few dns servers
> that don't honor or deal with round robining and you have the extra load
> on the first machine vs. the second.

I've also already experienced I/O loads due to rsync. The most annoying
part certainly being that most of the connections see nothing new, but
the disks are seeked anyway, and the cache always gets trashed. A dirty
but probably efficient emergency workaround would be to randomly refuse
a few rsync connections on www1. It would make the mirroring tools fail
once in a while, and the data would be mirrored in larger batches, so
all in all, it would reduce the rate of useless disk seeks.

Since I suspect that the volume of data transferred by rsync is fairly
moderate, it might be interesting to load balance the rsync between the
two machines, even if that involves making the data transit via the net
twice. I can help setting up a reverse proxy setup if you want to give
a try to such a setup.

Best regards,
Willy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19  7:39                 ` J.H.
  2006-12-19 13:32                   ` Willy Tarreau
@ 2006-12-19 14:36                   ` Dave Jones
  2006-12-19 14:38                     ` Willy Tarreau
  2006-12-26 16:14                     ` H. Peter Anvin
  1 sibling, 2 replies; 110+ messages in thread
From: Dave Jones @ 2006-12-19 14:36 UTC (permalink / raw)
  To: J.H.
  Cc: Willy Tarreau, Matti Aarnio, Randy Dunlap, Andrew Morton,
	Pavel Machek, kernel list, hpa, webmaster

On Mon, Dec 18, 2006 at 11:39:51PM -0800, J.H. wrote:

 > I'll have to look into it - but by and large the round robining tends to
 > work.  Specifically as I am writing this the machines are both pushing
 > right around 150mbps, however the load on zeus1 is 170 vs. zeus2's 4.
 > Also when we peak the bandwidth we do use every last kb we can get our
 > hands on, so doing any tunneling takes just that much bandwidth away
 > from the total.
 > 
 > 	Number of Processes running
 > process		#1	#2
 > ------------------------------------
 > rsync		162	69
 > http		734	642
 > ftp		353	190

A wild idea just occured to me.  You guys are running Fedora/RHEL kernels
on the kernel.org boxes iirc, which have Ingo's 'tux' httpd accelerator.
It might not make the problem go away, but it could make it more
bearable under high load.   Or it might do absolutely squat depending
on the ratio of static/dynamic content.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19 14:36                   ` Dave Jones
@ 2006-12-19 14:38                     ` Willy Tarreau
  2006-12-26 16:14                     ` H. Peter Anvin
  1 sibling, 0 replies; 110+ messages in thread
From: Willy Tarreau @ 2006-12-19 14:38 UTC (permalink / raw)
  To: Dave Jones, J.H.,
	Matti Aarnio, Randy Dunlap, Andrew Morton, Pavel Machek,
	kernel list, hpa, webmaster

On Tue, Dec 19, 2006 at 09:36:06AM -0500, Dave Jones wrote:
> On Mon, Dec 18, 2006 at 11:39:51PM -0800, J.H. wrote:
> 
>  > I'll have to look into it - but by and large the round robining tends to
>  > work.  Specifically as I am writing this the machines are both pushing
>  > right around 150mbps, however the load on zeus1 is 170 vs. zeus2's 4.
>  > Also when we peak the bandwidth we do use every last kb we can get our
>  > hands on, so doing any tunneling takes just that much bandwidth away
>  > from the total.
>  > 
>  > 	Number of Processes running
>  > process		#1	#2
>  > ------------------------------------
>  > rsync		162	69
>  > http		734	642
>  > ftp		353	190
> 
> A wild idea just occured to me.  You guys are running Fedora/RHEL kernels
> on the kernel.org boxes iirc, which have Ingo's 'tux' httpd accelerator.
> It might not make the problem go away, but it could make it more
> bearable under high load.   Or it might do absolutely squat depending
> on the ratio of static/dynamic content.

I've already thought about this and never knew why it's not used. It
supports both HTTP and FTP and does a wonderful job under high loads.
In fact, it's what I use as an HTTP termination during benchmarks, because
it's the absolute best performer I've ever found.

> 		Dave

Regards,
Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19 14:36                   ` Dave Jones
  2006-12-19 14:38                     ` Willy Tarreau
@ 2006-12-26 16:14                     ` H. Peter Anvin
  1 sibling, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2006-12-26 16:14 UTC (permalink / raw)
  To: Dave Jones, J.H.,
	Willy Tarreau, Matti Aarnio, Randy Dunlap, Andrew Morton,
	Pavel Machek, kernel list, hpa, webmaster

Dave Jones wrote:
> 
> A wild idea just occured to me.  You guys are running Fedora/RHEL kernels
> on the kernel.org boxes iirc, which have Ingo's 'tux' httpd accelerator.
> It might not make the problem go away, but it could make it more
> bearable under high load.   Or it might do absolutely squat depending
> on the ratio of static/dynamic content.
> 

Almost the only dynamic content we have (that actually matters) is 
gitweb.  Everything else is static, with Apache parked in sendfile(). 
Far too often in D state.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-17 18:23         ` Randy Dunlap
  2006-12-17 22:37           ` Matti Aarnio
@ 2007-01-08 20:10           ` Jean Delvare
  1 sibling, 0 replies; 110+ messages in thread
From: Jean Delvare @ 2007-01-08 20:10 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: J.H., Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Sun, 17 Dec 2006 10:23:54 -0800, Randy Dunlap wrote:
> I can't really say since I have no performance/profile data to base
> it on.  There has been some noise about (not) providing mirror services
> for distros.  Is that a big cpu/memory consumer?  If so, then is that
> something that kernel.org could shed over some N (6 ?) months?
> I understand not dropping it immediately, but it seems to be more of
> a convenience rather than something related to kernel development.

I'd second that request. Mirroring out distros isn't the primary
objective of kernel.org, and we should focus on giving the kernel
developers what they need to do their job. Distro mirroring is best
handled by other sites around the world, and by bittorrent. I doubt
anyone out there will notice if kernel.org stops mirroring these
distros, while kernel developpers hopefully will.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
                           ` (3 preceding siblings ...)
  2006-12-17 18:23         ` Randy Dunlap
@ 2006-12-19  6:34         ` Willy Tarreau
  2006-12-19  6:52           ` J.H.
  2006-12-26 17:02           ` H. Peter Anvin
  2006-12-19 15:37         ` Tim Schmielau
  2007-01-08 21:20         ` Jean Delvare
  6 siblings, 2 replies; 110+ messages in thread
From: Willy Tarreau @ 2006-12-19  6:34 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
(...)
> Since it's apparent not everyone is aware of what we are doing, I'll
> mention briefly some of the bigger points.
> 
> - We have contacted HP to see if we can get additional hardware, mind
> you though this is a long term solution and will take time, but if our
> request is approved it will double the number of machines kernel.org
> runs.

Just evil suggestion, but if you contact someone else than HP, they
might be _very_ interested in taking HP's place and providing whatever
you need to get their name on www.kernel.org. Sun and IBM do such
monter machines too. That would not be very kind to HP, but it might
help getting hardware faster.

> - Gitweb is causing us no end of headache, there are (known to me
> anyway) two different things happening on that.  I am looking at Jeff
> Garzik's suggested caching mechanism as a temporary stop-gap, with an
> eye more on doing a rather heavy re-write of gitweb itself to include
> semi-intelligent caching.  I've already started in on the later - and I
> just about have the caching layer put in.  But this is still at least a
> week out before we could even remotely consider deploying it.

Couldn't we disable gitweb for as long as we don't get newer machines ?
I've been using it in the past, but it was just a convenience. If needed,
we can explode all the recent patches with a "git-format-patch -k -m" in a
directory.

> - We've cut back on the number of ftp and rsync users to the machines.
> Basically we are cutting back where we can in an attempt to keep the
> load from spiraling out of control, this helped a bit when we recently
> had to take one of the machines down and instead of loads spiking into
> the 2000+ range we peaked at about 500-600 I believe.

I did not imagine FTP and rsync being so much used !

> So we know the problem is there, and we are working on it - we are
> getting e-mails about it if not daily than every other day or so.  If
> there are suggestions we are willing to hear them - but the general
> feeling with the admins is that we are probably hitting the biggest
> problems already.

BTW, yesterday my 2.4 patches were not published, but I noticed that
they were not even signed not bziped on hera. At first I simply thought
it was related, but right now I have a doubt. Maybe the automatic script
has been temporarily been disabled on hera too ?

> - John 'Warthog9' Hawley
> Kernel.org Admin

Thanks for keeping us informed !
Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19  6:34         ` Willy Tarreau
@ 2006-12-19  6:52           ` J.H.
  2007-01-06 18:33             ` Randy Dunlap
  2006-12-26 17:02           ` H. Peter Anvin
  1 sibling, 1 reply; 110+ messages in thread
From: J.H. @ 2006-12-19  6:52 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Tue, 2006-12-19 at 07:34 +0100, Willy Tarreau wrote:
> On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
> (...)
> > Since it's apparent not everyone is aware of what we are doing, I'll
> > mention briefly some of the bigger points.
> > 
> > - We have contacted HP to see if we can get additional hardware, mind
> > you though this is a long term solution and will take time, but if our
> > request is approved it will double the number of machines kernel.org
> > runs.
> 
> Just evil suggestion, but if you contact someone else than HP, they
> might be _very_ interested in taking HP's place and providing whatever
> you need to get their name on www.kernel.org. Sun and IBM do such
> monter machines too. That would not be very kind to HP, but it might
> help getting hardware faster.

I leave the actual hardware acquisitions up to HPA, I just try to keep
the machines up and running without too many problems.  HP has been
incredibly supportive of kernel.org in the past and I for one have been
very appreciative of their hardware and would love to continue working
with them.

> 
> > - Gitweb is causing us no end of headache, there are (known to me
> > anyway) two different things happening on that.  I am looking at Jeff
> > Garzik's suggested caching mechanism as a temporary stop-gap, with an
> > eye more on doing a rather heavy re-write of gitweb itself to include
> > semi-intelligent caching.  I've already started in on the later - and I
> > just about have the caching layer put in.  But this is still at least a
> > week out before we could even remotely consider deploying it.
> 
> Couldn't we disable gitweb for as long as we don't get newer machines ?
> I've been using it in the past, but it was just a convenience. If needed,
> we can explode all the recent patches with a "git-format-patch -k -m" in a
> directory.

I've mentioned this to the other admins and the consensus was that there
would be quite the outcry to suggest this - if the consensus is to
disable gitweb until we can get it under control we would take doing
that into consideration.

> 
> > - We've cut back on the number of ftp and rsync users to the machines.
> > Basically we are cutting back where we can in an attempt to keep the
> > load from spiraling out of control, this helped a bit when we recently
> > had to take one of the machines down and instead of loads spiking into
> > the 2000+ range we peaked at about 500-600 I believe.
> 
> I did not imagine FTP and rsync being so much used !

On average we are moving anywhere from 400-600mbps between the two
machines, on release days we max both of the connections at 1gpbs each
and have seen that draw last for 48hours.  For instance when FC6 was
released in the first 12 hours or so we moved 13 TBytes of data.

> 
> > So we know the problem is there, and we are working on it - we are
> > getting e-mails about it if not daily than every other day or so.  If
> > there are suggestions we are willing to hear them - but the general
> > feeling with the admins is that we are probably hitting the biggest
> > problems already.
> 
> BTW, yesterday my 2.4 patches were not published, but I noticed that
> they were not even signed not bziped on hera. At first I simply thought
> it was related, but right now I have a doubt. Maybe the automatic script
> has been temporarily been disabled on hera too ?

The script that deals with the uploads also deals with the packaging -
so yes the problem is related.

> 
> > - John 'Warthog9' Hawley
> > Kernel.org Admin
> 
> Thanks for keeping us informed !
> Willy

Doing what I can :-)

- John


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19  6:52           ` J.H.
@ 2007-01-06 18:33             ` Randy Dunlap
  2007-01-06 19:18               ` H. Peter Anvin
  2007-01-06 19:21               ` J.H.
  0 siblings, 2 replies; 110+ messages in thread
From: Randy Dunlap @ 2007-01-06 18:33 UTC (permalink / raw)
  To: J.H.
  Cc: Willy Tarreau, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Mon, 18 Dec 2006 22:52:51 -0800 J.H. wrote:

> On Tue, 2006-12-19 at 07:34 +0100, Willy Tarreau wrote:
> > On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
> > (...)
> > 
> > > So we know the problem is there, and we are working on it - we are
> > > getting e-mails about it if not daily than every other day or so.  If
> > > there are suggestions we are willing to hear them - but the general
> > > feeling with the admins is that we are probably hitting the biggest
> > > problems already.
> > 
> > BTW, yesterday my 2.4 patches were not published, but I noticed that
> > they were not even signed not bziped on hera. At first I simply thought
> > it was related, but right now I have a doubt. Maybe the automatic script
> > has been temporarily been disabled on hera too ?
> 
> The script that deals with the uploads also deals with the packaging -
> so yes the problem is related.

and with the finger_banner and version info on www.kernel.org page?

They currently say:

The latest stable version of the Linux kernel is:           2.6.19.1
The latest prepatch for the stable Linux kernel tree is:    2.6.20-rc3
The latest snapshot for the stable Linux kernel tree is:    2.6.20-rc3-git4
The latest 2.4 version of the Linux kernel is:              2.4.34
The latest 2.2 version of the Linux kernel is:              2.2.26
The latest prepatch for the 2.2 Linux kernel tree is:       2.2.27-rc2
The latest -mm patch to the stable Linux kernels is:        2.6.20-rc2-mm1


but there are 2.6.20-rc3-git[567] and 2.6.20-rc3-mm1 out there,
so when is the finger version info updated?

Thanks,
---
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 18:33             ` Randy Dunlap
@ 2007-01-06 19:18               ` H. Peter Anvin
  2007-01-06 19:35                 ` Willy Tarreau
                                   ` (2 more replies)
  2007-01-06 19:21               ` J.H.
  1 sibling, 3 replies; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-06 19:18 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: J.H., Willy Tarreau, Andrew Morton, Pavel Machek, kernel list, webmaster

Randy Dunlap wrote:
>
>>> BTW, yesterday my 2.4 patches were not published, but I noticed that
>>> they were not even signed not bziped on hera. At first I simply thought
>>> it was related, but right now I have a doubt. Maybe the automatic script
>>> has been temporarily been disabled on hera too ?
>> The script that deals with the uploads also deals with the packaging -
>> so yes the problem is related.
> 
> and with the finger_banner and version info on www.kernel.org page?

Yes, they're all connected.

The load on *both* machines were up above the 300s yesterday, probably 
due to the release of a new Knoppix DVD.

The most fundamental problem seems to be that I can't tell currnt Linux 
kernels that the dcache/icache is precious, and that it's way too eager 
to dump dcache and icache in favour of data blocks.  If I could do that, 
this problem would be much, much smaller.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 19:18               ` H. Peter Anvin
@ 2007-01-06 19:35                 ` Willy Tarreau
  2007-01-06 19:37                 ` Nicholas Miell
  2007-01-06 20:13                 ` Jeff Garzik
  2 siblings, 0 replies; 110+ messages in thread
From: Willy Tarreau @ 2007-01-06 19:35 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Randy Dunlap, J.H., Andrew Morton, Pavel Machek, kernel list, webmaster

On Sat, Jan 06, 2007 at 11:18:37AM -0800, H. Peter Anvin wrote:
> Randy Dunlap wrote:
> >
> >>>BTW, yesterday my 2.4 patches were not published, but I noticed that
> >>>they were not even signed not bziped on hera. At first I simply thought
> >>>it was related, but right now I have a doubt. Maybe the automatic script
> >>>has been temporarily been disabled on hera too ?
> >>The script that deals with the uploads also deals with the packaging -
> >>so yes the problem is related.
> >
> >and with the finger_banner and version info on www.kernel.org page?
> 
> Yes, they're all connected.
> 
> The load on *both* machines were up above the 300s yesterday, probably 
> due to the release of a new Knoppix DVD.

I have one trivial idea : would it help to use 2 addresses to server data,
one for pure kernel usage (eg: git, rsync) and one with other stuff such
as DVDs, but with a low limit on the number of concurrent connections ?

> The most fundamental problem seems to be that I can't tell currnt Linux 
> kernels that the dcache/icache is precious, and that it's way too eager 
> to dump dcache and icache in favour of data blocks.  If I could do that, 
> this problem would be much, much smaller.

I often have this problem on some of my machines after slocate runs.
Everything is consumed in dcache/icache and no data blocks are cacheable
anymore. I never found a way to tell the kernel to assign a higher prio
to data than to [di]cache. To remedy this, I wrote this stupid program that
I run when I need to free memory. It does simply allocate the memory size
I ask, which causes a flush of the [di]caches, and when it exits, this
memory is usable again for data blocks. 

I'm not sure it would be easy to automatically run such a thing, but
maybe it could sometimes help when the [id]caches are too fat.

Willy

#include <stdio.h>
main(int argc, char **argv) {
  unsigned long int i,k=0, max;
  char *p;

  max = (argc>1) ? atol(argv[1]) : 102400; // default to 100 MB
  printf("Allocating %lu kB...\n",max);
  while (((p=(char *)malloc(1048576))!=NULL) && (k+1024<=max)) {
    for (i=0;i<256;p[4096*i++]=0); /* mark block dirty */
    k+=1024;
    fprintf(stderr,"\r%d kB allocated",k);
  }
  fprintf(stderr,"\nMemory freed.\n");
  exit(0);
}


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 19:18               ` H. Peter Anvin
  2007-01-06 19:35                 ` Willy Tarreau
@ 2007-01-06 19:37                 ` Nicholas Miell
  2007-01-06 20:13                   ` Andrew Morton
  2007-01-06 20:13                 ` Jeff Garzik
  2 siblings, 1 reply; 110+ messages in thread
From: Nicholas Miell @ 2007-01-06 19:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Randy Dunlap, J.H.,
	Willy Tarreau, Andrew Morton, Pavel Machek, kernel list,
	webmaster

On Sat, 2007-01-06 at 11:18 -0800, H. Peter Anvin wrote:
> Randy Dunlap wrote:
> >
> >>> BTW, yesterday my 2.4 patches were not published, but I noticed that
> >>> they were not even signed not bziped on hera. At first I simply thought
> >>> it was related, but right now I have a doubt. Maybe the automatic script
> >>> has been temporarily been disabled on hera too ?
> >> The script that deals with the uploads also deals with the packaging -
> >> so yes the problem is related.
> > 
> > and with the finger_banner and version info on www.kernel.org page?
> 
> Yes, they're all connected.
> 
> The load on *both* machines were up above the 300s yesterday, probably 
> due to the release of a new Knoppix DVD.
> 
> The most fundamental problem seems to be that I can't tell currnt Linux 
> kernels that the dcache/icache is precious, and that it's way too eager 
> to dump dcache and icache in favour of data blocks.  If I could do that, 
> this problem would be much, much smaller.
> 
> 	-hpa

Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
this?

-- 
Nicholas Miell <nmiell@comcast.net>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 19:37                 ` Nicholas Miell
@ 2007-01-06 20:13                   ` Andrew Morton
  2007-01-06 20:18                     ` H. Peter Anvin
  2007-01-06 23:50                     ` [KORG] Re: kernel.org lies about latest -mm kernel H. Peter Anvin
  0 siblings, 2 replies; 110+ messages in thread
From: Andrew Morton @ 2007-01-06 20:13 UTC (permalink / raw)
  To: Nicholas Miell
  Cc: H. Peter Anvin, Randy Dunlap, J.H.,
	Willy Tarreau, Pavel Machek, kernel list, webmaster

On Sat, 06 Jan 2007 11:37:46 -0800
Nicholas Miell <nmiell@comcast.net> wrote:

> On Sat, 2007-01-06 at 11:18 -0800, H. Peter Anvin wrote:
> > Randy Dunlap wrote:
> > >
> > >>> BTW, yesterday my 2.4 patches were not published, but I noticed that
> > >>> they were not even signed not bziped on hera. At first I simply thought
> > >>> it was related, but right now I have a doubt. Maybe the automatic script
> > >>> has been temporarily been disabled on hera too ?
> > >> The script that deals with the uploads also deals with the packaging -
> > >> so yes the problem is related.
> > > 
> > > and with the finger_banner and version info on www.kernel.org page?
> > 
> > Yes, they're all connected.
> > 
> > The load on *both* machines were up above the 300s yesterday, probably 
> > due to the release of a new Knoppix DVD.
> > 
> > The most fundamental problem seems to be that I can't tell currnt Linux 
> > kernels that the dcache/icache is precious, and that it's way too eager 
> > to dump dcache and icache in favour of data blocks.  If I could do that, 
> > this problem would be much, much smaller.

Usually people complain about the exact opposite of this.

> Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
> this?

yup.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 20:13                   ` Andrew Morton
@ 2007-01-06 20:18                     ` H. Peter Anvin
  2007-03-19 19:27                       ` [PATCH] sysctl: vfs_cache_divisor Randy Dunlap
  2007-01-06 23:50                     ` [KORG] Re: kernel.org lies about latest -mm kernel H. Peter Anvin
  1 sibling, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-06 20:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nicholas Miell, Randy Dunlap, J.H.,
	Willy Tarreau, Pavel Machek, kernel list, webmaster

Andrew Morton wrote:
>>>
>>> The most fundamental problem seems to be that I can't tell currnt Linux 
>>> kernels that the dcache/icache is precious, and that it's way too eager 
>>> to dump dcache and icache in favour of data blocks.  If I could do that, 
>>> this problem would be much, much smaller.
> 
> Usually people complain about the exact opposite of this.

Yeah, but we constantly have all-filesystem sweeps, and being able to 
retain those in memory would be a key to performance, *especially* from 
the upload latency standpoint.

>> Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
>> this?

Just tweaked it (setting it to 1).  There really should be another 
sysctl to set the denominator instead of hardcoding it at 100, since the 
granularity of this sysctl at the very low end is really much too coarse.

I missed this sysctl since the name isn't really all that obvious.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH] sysctl: vfs_cache_divisor
  2007-01-06 20:18                     ` H. Peter Anvin
@ 2007-03-19 19:27                       ` Randy Dunlap
  2007-03-19 20:36                         ` Andrew Morton
  2007-03-20 19:53                         ` Ingo Oeser
  0 siblings, 2 replies; 110+ messages in thread
From: Randy Dunlap @ 2007-03-19 19:27 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Morton, J.H., kernel list

On Sat, 06 Jan 2007 12:18:39 -0800 H. Peter Anvin wrote:

> Andrew Morton wrote:
> >>>
> >>> The most fundamental problem seems to be that I can't tell currnt Linux 
> >>> kernels that the dcache/icache is precious, and that it's way too eager 
> >>> to dump dcache and icache in favour of data blocks.  If I could do that, 
> >>> this problem would be much, much smaller.
> > 
> > Usually people complain about the exact opposite of this.
> 
> Yeah, but we constantly have all-filesystem sweeps, and being able to 
> retain those in memory would be a key to performance, *especially* from 
> the upload latency standpoint.
> 
> >> Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
> >> this?
> 
> Just tweaked it (setting it to 1).  There really should be another 
> sysctl to set the denominator instead of hardcoding it at 100, since the 
> granularity of this sysctl at the very low end is really much too coarse.
> 
> I missed this sysctl since the name isn't really all that obvious.

Peter,

Were there any patches written after this?  If so, I missed them.
If not, does this patch help any?
---

From: Randy Dunlap <randy.dunlap@oracle.com>

Add sysctl_vfs_cache_divisor (default value 100), which is used as the
divisor for sysctl_vfs_cache_pressure.  This allows a system admin to
make finer-grained pressure settings.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
---
 Documentation/filesystems/proc.txt |    7 +++++++
 Documentation/sysctl/vm.txt        |    4 ++--
 fs/dcache.c                        |    6 +++++-
 fs/dquot.c                         |    4 +++-
 fs/inode.c                         |    3 ++-
 fs/mbcache.c                       |    3 ++-
 fs/nfs/dir.c                       |    4 +++-
 include/linux/dcache.h             |    1 +
 include/linux/sysctl.h             |    1 +
 kernel/sysctl.c                    |   10 ++++++++++
 10 files changed, 36 insertions(+), 7 deletions(-)

--- linux-2621-rc4.orig/fs/dcache.c
+++ linux-2621-rc4/fs/dcache.c
@@ -17,6 +17,7 @@
 #include <linux/syscalls.h>
 #include <linux/string.h>
 #include <linux/mm.h>
+#include <linux/dcache.h>
 #include <linux/fs.h>
 #include <linux/fsnotify.h>
 #include <linux/slab.h>
@@ -37,6 +38,8 @@
 
 int sysctl_vfs_cache_pressure __read_mostly = 100;
 EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
+int sysctl_vfs_cache_divisor __read_mostly = 100;
+EXPORT_SYMBOL_GPL(sysctl_vfs_cache_divisor);
 
  __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
 static __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
@@ -851,7 +854,8 @@ static int shrink_dcache_memory(int nr, 
 			return -1;
 		prune_dcache(nr, NULL);
 	}
-	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
+	return (dentry_stat.nr_unused / sysctl_vfs_cache_divisor)
+		* sysctl_vfs_cache_pressure;
 }
 
 /**
--- linux-2621-rc4.orig/fs/dquot.c
+++ linux-2621-rc4/fs/dquot.c
@@ -57,6 +57,7 @@
 
 #include <linux/errno.h>
 #include <linux/kernel.h>
+#include <linux/dcache.h>
 #include <linux/fs.h>
 #include <linux/mount.h>
 #include <linux/mm.h>
@@ -536,7 +537,8 @@ static int shrink_dqcache_memory(int nr,
 		prune_dqcache(nr);
 		spin_unlock(&dq_list_lock);
 	}
-	return (dqstats.free_dquots / 100) * sysctl_vfs_cache_pressure;
+	return (dqstats.free_dquots / sysctl_vfs_cache_divisor)
+		* sysctl_vfs_cache_pressure;
 }
 
 /*
--- linux-2621-rc4.orig/fs/inode.c
+++ linux-2621-rc4/fs/inode.c
@@ -461,7 +461,8 @@ static int shrink_icache_memory(int nr, 
 			return -1;
 		prune_icache(nr);
 	}
-	return (inodes_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
+	return (inodes_stat.nr_unused / sysctl_vfs_cache_divisor)
+		* sysctl_vfs_cache_pressure;
 }
 
 static void __wait_on_freeing_inode(struct inode *inode);
--- linux-2621-rc4.orig/fs/mbcache.c
+++ linux-2621-rc4/fs/mbcache.c
@@ -30,6 +30,7 @@
 #include <linux/module.h>
 
 #include <linux/hash.h>
+#include <linux/dcache.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
@@ -226,7 +227,7 @@ mb_cache_shrink_fn(int nr_to_scan, gfp_t
 						   e_lru_list), gfp_mask);
 	}
 out:
-	return (count / 100) * sysctl_vfs_cache_pressure;
+	return (count / sysctl_vfs_cache_divisor) * sysctl_vfs_cache_pressure;
 }
 
 
--- linux-2621-rc4.orig/include/linux/dcache.h
+++ linux-2621-rc4/include/linux/dcache.h
@@ -355,6 +355,7 @@ extern struct vfsmount *__lookup_mnt(str
 extern struct dentry *lookup_create(struct nameidata *nd, int is_dir);
 
 extern int sysctl_vfs_cache_pressure;
+extern int sysctl_vfs_cache_divisor;
 
 #endif /* __KERNEL__ */
 
--- linux-2621-rc4.orig/include/linux/sysctl.h
+++ linux-2621-rc4/include/linux/sysctl.h
@@ -207,6 +207,7 @@ enum
 	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
 	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
 	VM_MIN_SLAB=35,		 /* Percent pages ignored by zone reclaim */
+	VM_VFS_CACHE_DIVISOR=36, /* dcache/icache reclaim pressure divisor, def. 100 */
 
 	/* s390 vm cmm sysctls */
 	VM_CMM_PAGES=1111,
--- linux-2621-rc4.orig/fs/nfs/dir.c
+++ linux-2621-rc4/fs/nfs/dir.c
@@ -18,6 +18,7 @@
  */
 
 #include <linux/time.h>
+#include <linux/dcache.h>
 #include <linux/errno.h>
 #include <linux/stat.h>
 #include <linux/fcntl.h>
@@ -1773,7 +1774,8 @@ remove_lru_entry:
 		list_del(&cache->lru);
 		nfs_access_free_entry(cache);
 	}
-	return (atomic_long_read(&nfs_access_nr_entries) / 100) * sysctl_vfs_cache_pressure;
+	return (atomic_long_read(&nfs_access_nr_entries) /
+		sysctl_vfs_cache_divisor) * sysctl_vfs_cache_pressure;
 }
 
 static void __nfs_access_zap_cache(struct inode *inode)
--- linux-2621-rc4.orig/kernel/sysctl.c
+++ linux-2621-rc4/kernel/sysctl.c
@@ -800,6 +800,16 @@ static ctl_table vm_table[] = {
 		.strategy	= &sysctl_intvec,
 		.extra1		= &zero,
 	},
+	{
+		.ctl_name	= VM_VFS_CACHE_DIVISOR,
+		.procname	= "vfs_cache_divisor",
+		.data		= &sysctl_vfs_cache_divisor,
+		.maxlen		= sizeof(sysctl_vfs_cache_divisor),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+	},
 #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
 	{
 		.ctl_name	= VM_LEGACY_VA_LAYOUT,
--- linux-2621-rc4.orig/Documentation/filesystems/proc.txt
+++ linux-2621-rc4/Documentation/filesystems/proc.txt
@@ -1156,6 +1156,13 @@ swapcache reclaim.  Decreasing vfs_cache
 to retain dentry and inode caches.  Increasing vfs_cache_pressure beyond 100
 causes the kernel to prefer to reclaim dentries and inodes.
 
+vfs_cache_divisor
+-----------------
+The default vfs_cache_divisor value is 100 (like percent).  However, for
+extremely large systems where a value of vfs_cache_pressure of less than
+1 percent is desirable, using a larger vfs_cache_divisor enables this wanted
+characteristic.
+
 dirty_background_ratio
 ----------------------
 
--- linux-2621-rc4.orig/Documentation/sysctl/vm.txt
+++ linux-2621-rc4/Documentation/sysctl/vm.txt
@@ -35,8 +35,8 @@ Currently, these files are in /proc/sys/
 ==============================================================
 
 dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
-dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
-block_dump, swap_token_timeout, drop-caches:
+dirty_writeback_centisecs, vfs_cache_pressure, vfs_cache_divisor,
+laptop_mode, block_dump, swap_token_timeout, drop-caches:
 
 See Documentation/filesystems/proc.txt
 

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-19 19:27                       ` [PATCH] sysctl: vfs_cache_divisor Randy Dunlap
@ 2007-03-19 20:36                         ` Andrew Morton
  2007-03-19 20:42                           ` Randy Dunlap
  2007-03-20 19:53                         ` Ingo Oeser
  1 sibling, 1 reply; 110+ messages in thread
From: Andrew Morton @ 2007-03-19 20:36 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: H. Peter Anvin, J.H., kernel list

On Mon, 19 Mar 2007 12:27:40 -0700
Randy Dunlap <randy.dunlap@oracle.com> wrote:

> +The default vfs_cache_divisor value is 100 (like percent).  However, for
> +extremely large systems where a value of vfs_cache_pressure of less than
> +1 percent is desirable, using a larger vfs_cache_divisor enables this wanted
> +characteristic.

The one-percent-granularity problem also applies to /proc/sys/vm/*dirty*
and possibly other things.  So any fix we do should be applicable to those
as well.

And I'm not really sure how we should do this.  I do think that we should
change the kernel so these knobs are internally higher-resolution.  So, for
example, we switch all the logic so that instead of these variables
representing 1/100th, they instead represent 1/1000000th, for example.

Then, we change the top-level /proc handler to do the 1/100th <-> 1/1000000th
conversion.  So the rest of the kernel doesn't have to know about it.

The we duplicate all the relevant /proc knobs:

cat /proc/sys/vm/dirty_ratio
30
cat /proc/sys/vm/hires-dirty_ratio/
300000

Or we do something else ;)

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-19 20:36                         ` Andrew Morton
@ 2007-03-19 20:42                           ` Randy Dunlap
  2007-03-20  4:22                             ` H. Peter Anvin
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2007-03-19 20:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: H. Peter Anvin, J.H., kernel list

Andrew Morton wrote:
> On Mon, 19 Mar 2007 12:27:40 -0700
> Randy Dunlap <randy.dunlap@oracle.com> wrote:
> 
>> +The default vfs_cache_divisor value is 100 (like percent).  However, for
>> +extremely large systems where a value of vfs_cache_pressure of less than
>> +1 percent is desirable, using a larger vfs_cache_divisor enables this wanted
>> +characteristic.
> 
> The one-percent-granularity problem also applies to /proc/sys/vm/*dirty*
> and possibly other things.  So any fix we do should be applicable to those
> as well.
> 
> And I'm not really sure how we should do this.  I do think that we should
> change the kernel so these knobs are internally higher-resolution.  So, for
> example, we switch all the logic so that instead of these variables
> representing 1/100th, they instead represent 1/1000000th, for example.
> 
> Then, we change the top-level /proc handler to do the 1/100th <-> 1/1000000th
> conversion.  So the rest of the kernel doesn't have to know about it.
> 
> The we duplicate all the relevant /proc knobs:
> 
> cat /proc/sys/vm/dirty_ratio
> 30
> cat /proc/sys/vm/hires-dirty_ratio/
> 300000
> 
> Or we do something else ;)

Sounds better.  I wasn't very keen on the userspace interface that this
exposed.  Will look at those.

-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-19 20:42                           ` Randy Dunlap
@ 2007-03-20  4:22                             ` H. Peter Anvin
  2007-03-21 23:01                               ` Randy Dunlap
  0 siblings, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2007-03-20  4:22 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Andrew Morton, J.H., kernel list

Randy Dunlap wrote:
>>
>> The we duplicate all the relevant /proc knobs:
>>
>> cat /proc/sys/vm/dirty_ratio
>> 30
>> cat /proc/sys/vm/hires-dirty_ratio/
>> 300000
>>
>> Or we do something else ;)
> 
> Sounds better.  I wasn't very keen on the userspace interface that this
> exposed.  Will look at those.
> 

Okay... may be I could throw a spanner in the machinery, and suggest 
another option: perhaps we should add a way to do sysctl which can 
handle fractional (fixed-point) values... more coherent/detailed message 
tomorrow.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-20  4:22                             ` H. Peter Anvin
@ 2007-03-21 23:01                               ` Randy Dunlap
  2007-03-21 23:11                                 ` Andrew Morton
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2007-03-21 23:01 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Andrew Morton, J.H., kernel list

On Mon, 19 Mar 2007 21:22:33 -0700 H. Peter Anvin wrote:

> Randy Dunlap wrote:
> >>
> >> The we duplicate all the relevant /proc knobs:
> >>
> >> cat /proc/sys/vm/dirty_ratio
> >> 30
> >> cat /proc/sys/vm/hires-dirty_ratio/
> >> 300000
> >>
> >> Or we do something else ;)
> > 
> > Sounds better.  I wasn't very keen on the userspace interface that this
> > exposed.  Will look at those.
> > 
> 
> Okay... may be I could throw a spanner in the machinery, and suggest 
> another option: perhaps we should add a way to do sysctl which can 
> handle fractional (fixed-point) values... more coherent/detailed message 
> tomorrow.

I prefer the fixed-point values for pressure and dirty* to having
duplicated entries for each of them.  I'll proceed with that idea.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-21 23:01                               ` Randy Dunlap
@ 2007-03-21 23:11                                 ` Andrew Morton
  2007-03-23  0:07                                   ` Kyle Moffett
  0 siblings, 1 reply; 110+ messages in thread
From: Andrew Morton @ 2007-03-21 23:11 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: H. Peter Anvin, J.H., kernel list

On Wed, 21 Mar 2007 16:01:32 -0700
Randy Dunlap <randy.dunlap@oracle.com> wrote:

> On Mon, 19 Mar 2007 21:22:33 -0700 H. Peter Anvin wrote:
> 
> > Randy Dunlap wrote:
> > >>
> > >> The we duplicate all the relevant /proc knobs:
> > >>
> > >> cat /proc/sys/vm/dirty_ratio
> > >> 30
> > >> cat /proc/sys/vm/hires-dirty_ratio/
> > >> 300000
> > >>
> > >> Or we do something else ;)
> > > 
> > > Sounds better.  I wasn't very keen on the userspace interface that this
> > > exposed.  Will look at those.
> > > 
> > 
> > Okay... may be I could throw a spanner in the machinery, and suggest 
> > another option: perhaps we should add a way to do sysctl which can 
> > handle fractional (fixed-point) values... more coherent/detailed message 
> > tomorrow.
> 
> I prefer the fixed-point values for pressure and dirty* to having
> duplicated entries for each of them.  I'll proceed with that idea.
> 

Problem is, if a read of /proc/sys/vm/dirty_ratio is changed to return
7.457 then existing userspace might get confused.

This might be acceptable if we are careful to ensure that reads of
/proc/sys/vm/dirty_ratio will always return an integer if it was previously
initialised with an integer.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-21 23:11                                 ` Andrew Morton
@ 2007-03-23  0:07                                   ` Kyle Moffett
  2007-03-23 20:36                                     ` Randy Dunlap
  0 siblings, 1 reply; 110+ messages in thread
From: Kyle Moffett @ 2007-03-23  0:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Randy Dunlap, H. Peter Anvin, J.H., kernel list

On Mar 21, 2007, at 19:11:40, Andrew Morton wrote:
> On Wed, 21 Mar 2007 16:01:32 -0700 Randy Dunlap  
> <randy.dunlap@oracle.com> wrote:
>> I prefer the fixed-point values for pressure and dirty* to having  
>> duplicated entries for each of them.  I'll proceed with that idea.
>
> Problem is, if a read of /proc/sys/vm/dirty_ratio is changed to  
> return 7.457 then existing userspace might get confused.
>
> This might be acceptable if we are careful to ensure that reads of / 
> proc/sys/vm/dirty_ratio will always return an integer if it was  
> previously initialised with an integer.

What about instead adding support for fractions (IE: "1/1000") in / 
proc/sys/vm/dirty_ratio?  If the denominator is 100, the default,  
then it prints in the form "$NUMERATOR", otherwise it prints answers  
of the form "$NUMERATOR/$DENOMINATOR".  Input could be of either  
form, with the kernel auto-setting the denominator to 100 if none is  
specified.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-23  0:07                                   ` Kyle Moffett
@ 2007-03-23 20:36                                     ` Randy Dunlap
  2007-03-23 20:59                                       ` H. Peter Anvin
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2007-03-23 20:36 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Andrew Morton, H. Peter Anvin, J.H., kernel list

Kyle Moffett wrote:
> On Mar 21, 2007, at 19:11:40, Andrew Morton wrote:
>> On Wed, 21 Mar 2007 16:01:32 -0700 Randy Dunlap 
>> <randy.dunlap@oracle.com> wrote:
>>> I prefer the fixed-point values for pressure and dirty* to having 
>>> duplicated entries for each of them.  I'll proceed with that idea.
>>
>> Problem is, if a read of /proc/sys/vm/dirty_ratio is changed to return 
>> 7.457 then existing userspace might get confused.
>>
>> This might be acceptable if we are careful to ensure that reads of 
>> /proc/sys/vm/dirty_ratio will always return an integer if it was 
>> previously initialised with an integer.
> 
> What about instead adding support for fractions (IE: "1/1000") in 
> /proc/sys/vm/dirty_ratio?  If the denominator is 100, the default, then 
> it prints in the form "$NUMERATOR", otherwise it prints answers of the 
> form "$NUMERATOR/$DENOMINATOR".  Input could be of either form, with the 
> kernel auto-setting the denominator to 100 if none is specified.

Hi,
That makes a lot of sense to me.  It gives us finer-grained control
without having to support fixed-point data.

I've been working on the fixed-point data patch, but I'm going to give
this method some time also, to see how it looks in code (instead of just
thinking about it).

Thanks.
-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-23 20:36                                     ` Randy Dunlap
@ 2007-03-23 20:59                                       ` H. Peter Anvin
  2007-03-24  0:45                                         ` Kyle Moffett
  0 siblings, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2007-03-23 20:59 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Kyle Moffett, Andrew Morton, J.H., kernel list

Randy Dunlap wrote:
> 
> Hi,
> That makes a lot of sense to me.  It gives us finer-grained control
> without having to support fixed-point data.
> 
> I've been working on the fixed-point data patch, but I'm going to give
> this method some time also, to see how it looks in code (instead of just
> thinking about it).
> 

Well, to be specific, it actually is the same thing with different 
syntax (38.55 or 3855/100).

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-23 20:59                                       ` H. Peter Anvin
@ 2007-03-24  0:45                                         ` Kyle Moffett
  2007-03-24  1:17                                           ` Kyle Moffett
  0 siblings, 1 reply; 110+ messages in thread
From: Kyle Moffett @ 2007-03-24  0:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Randy Dunlap, Andrew Morton, J.H., kernel list

On Mar 23, 2007, at 16:59:02, H. Peter Anvin wrote:
> Randy Dunlap wrote:
>> Hi,
>> That makes a lot of sense to me.  It gives us finer-grained  
>> control without having to support fixed-point data.  I've been  
>> working on the fixed-point data patch, but I'm going to give this  
>> method some time also, to see how it looks in code (instead of  
>> just thinking about it).
>
> Well, to be specific, it actually is the same thing with different  
> syntax (38.55 or 3855/100).

Not quite; if you're dealing with numbers near the limit of the range  
of "int" then IIRC fixed point is harder to get both good accuracy  
and good overflow behavior compared to a user-set fraction:

Assuming "fraction" is a reduce (num, den) pair, then to compute  
"result = value * fraction" with optimal accuracy:

/* On input: reduce num/den but retain old values for display  
purposes */
gcd = euclid(&inputnum, &inputden);
num = inputnum/gcd;
den = inputden/gcd;
valmax = UINT_MAX / num;

/* For computation: */
result = (val > valmax) ? (val/den)*num : (val*num)/den;

Doing that with fixed point is possible, but a bit more tricky, and  
you lose accuracy for something like "1/3", which isn't quite equal  
to "0.3333".  Note that if you aren't really careful with your fixed- 
point then "0.3333 * 3" ends up being 0:  (3333 * 3) / 10000 ==  
9999 / 10000 == 0.  Rounding is possible but it's better to just let  
them specify the exact fraction directly and avoid the confusion.

Cheers,
Kyle Moffett


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-24  0:45                                         ` Kyle Moffett
@ 2007-03-24  1:17                                           ` Kyle Moffett
  0 siblings, 0 replies; 110+ messages in thread
From: Kyle Moffett @ 2007-03-24  1:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Randy Dunlap, Andrew Morton, J.H., kernel list

On Mar 23, 2007, at 20:45:21, Kyle Moffett wrote:
> On Mar 23, 2007, at 16:59:02, H. Peter Anvin wrote:
>> Randy Dunlap wrote:
>>> That makes a lot of sense to me.  It gives us finer-grained  
>>> control without having to support fixed-point data.  I've been  
>>> working on the fixed-point data patch, but I'm going to give this  
>>> method some time also, to see how it looks in code (instead of  
>>> just thinking about it).
>>
>> Well, to be specific, it actually is the same thing with different  
>> syntax (38.55 or 3855/100).
>
> /* On input: reduce num/den but retain old values for display  
> purposes */
> gcd = euclid(&inputnum, &inputden);
> num = inputnum/gcd;
> den = inputden/gcd;

Whoops, if I divide inputnum and inputden by gcd I don't need to pass  
by reference.

The trivial implementation of euclid (modulo the bogus address-of  
operations above) is of course something like this:

unsigned long euclid(unsigned long a, unsigned long b)
{
	unsigned long q, r;

	if (!a || !b)
		return 1;

	do {
		q = a / b;
		r = a % b;
		a = b;
		b = r;
	} while (r);

	return a;
}


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH] sysctl: vfs_cache_divisor
  2007-03-19 19:27                       ` [PATCH] sysctl: vfs_cache_divisor Randy Dunlap
  2007-03-19 20:36                         ` Andrew Morton
@ 2007-03-20 19:53                         ` Ingo Oeser
  1 sibling, 0 replies; 110+ messages in thread
From: Ingo Oeser @ 2007-03-20 19:53 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: H. Peter Anvin, Andrew Morton, J.H., kernel list

Hi Randy,

On Monday 19 March 2007, Randy Dunlap wrote:
> Were there any patches written after this?  If so, I missed them.
> If not, does this patch help any?

How is division by zero avoided? Maybe one can avoid setting it to zero.

Regards

Ingo Oeser

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 20:13                   ` Andrew Morton
  2007-01-06 20:18                     ` H. Peter Anvin
@ 2007-01-06 23:50                     ` H. Peter Anvin
  1 sibling, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-06 23:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nicholas Miell, Randy Dunlap, J.H.,
	Willy Tarreau, Pavel Machek, kernel list, webmaster

Andrew Morton wrote:
>>>
>>> The most fundamental problem seems to be that I can't tell currnt Linux 
>>> kernels that the dcache/icache is precious, and that it's way too eager 
>>> to dump dcache and icache in favour of data blocks.  If I could do that, 
>>> this problem would be much, much smaller.
> 
> Usually people complain about the exact opposite of this.
> 
>> Isn't setting the vm.vfs_cache_pressure sysctl below 100 supposed to do
>> this?
> 
> yup.

Well, it appears that even a setting of 1 is too aggressive for 
kernel.org.  We're still ending up with each and every getdents() call 
taking anywhere from 200 ms to over a second.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 19:18               ` H. Peter Anvin
  2007-01-06 19:35                 ` Willy Tarreau
  2007-01-06 19:37                 ` Nicholas Miell
@ 2007-01-06 20:13                 ` Jeff Garzik
  2007-01-06 20:17                   ` Andrew Morton
  2 siblings, 1 reply; 110+ messages in thread
From: Jeff Garzik @ 2007-01-06 20:13 UTC (permalink / raw)
  To: H. Peter Anvin, Andrew Morton
  Cc: Randy Dunlap, J.H., Willy Tarreau, Pavel Machek, kernel list, webmaster

H. Peter Anvin wrote:
> Randy Dunlap wrote:
>>
>>>> BTW, yesterday my 2.4 patches were not published, but I noticed that
>>>> they were not even signed not bziped on hera. At first I simply thought
>>>> it was related, but right now I have a doubt. Maybe the automatic 
>>>> script
>>>> has been temporarily been disabled on hera too ?
>>> The script that deals with the uploads also deals with the packaging -
>>> so yes the problem is related.
>>
>> and with the finger_banner and version info on www.kernel.org page?
> 
> Yes, they're all connected.
> 
> The load on *both* machines were up above the 300s yesterday, probably 
> due to the release of a new Knoppix DVD.
> 
> The most fundamental problem seems to be that I can't tell currnt Linux 
> kernels that the dcache/icache is precious, and that it's way too eager 
> to dump dcache and icache in favour of data blocks.  If I could do that, 
> this problem would be much, much smaller.

Have you messed around with /proc/sys/vm/vfs_cache_pressure?

Unfortunately that affects all three of: dcache, icache, and mbcache. 
Maybe we could split that sysctl in two (Andrew?), so that one sysctl 
affects dcache/icache and another affects mbcache.

	Jeff



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 20:13                 ` Jeff Garzik
@ 2007-01-06 20:17                   ` Andrew Morton
  2007-01-06 20:20                     ` H. Peter Anvin
  0 siblings, 1 reply; 110+ messages in thread
From: Andrew Morton @ 2007-01-06 20:17 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: H. Peter Anvin, Randy Dunlap, J.H.,
	Willy Tarreau, Pavel Machek, kernel list, webmaster

On Sat, 06 Jan 2007 15:13:50 -0500
Jeff Garzik <jeff@garzik.org> wrote:

> H. Peter Anvin wrote:
> > Randy Dunlap wrote:
> >>
> >>>> BTW, yesterday my 2.4 patches were not published, but I noticed that
> >>>> they were not even signed not bziped on hera. At first I simply thought
> >>>> it was related, but right now I have a doubt. Maybe the automatic 
> >>>> script
> >>>> has been temporarily been disabled on hera too ?
> >>> The script that deals with the uploads also deals with the packaging -
> >>> so yes the problem is related.
> >>
> >> and with the finger_banner and version info on www.kernel.org page?
> > 
> > Yes, they're all connected.
> > 
> > The load on *both* machines were up above the 300s yesterday, probably 
> > due to the release of a new Knoppix DVD.
> > 
> > The most fundamental problem seems to be that I can't tell currnt Linux 
> > kernels that the dcache/icache is precious, and that it's way too eager 
> > to dump dcache and icache in favour of data blocks.  If I could do that, 
> > this problem would be much, much smaller.
> 
> Have you messed around with /proc/sys/vm/vfs_cache_pressure?
> 
> Unfortunately that affects all three of: dcache, icache, and mbcache. 
> Maybe we could split that sysctl in two (Andrew?), so that one sysctl 
> affects dcache/icache and another affects mbcache.
> 

That would be simple enough to do, if someone can demonstrate a
need.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 20:17                   ` Andrew Morton
@ 2007-01-06 20:20                     ` H. Peter Anvin
  2007-01-06 20:36                       ` Andrew Morton
  0 siblings, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-06 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, Randy Dunlap, J.H.,
	Willy Tarreau, Pavel Machek, kernel list, webmaster

Andrew Morton wrote:
>>
>> Unfortunately that affects all three of: dcache, icache, and mbcache. 
>> Maybe we could split that sysctl in two (Andrew?), so that one sysctl 
>> affects dcache/icache and another affects mbcache.
>>
> 
> That would be simple enough to do, if someone can demonstrate a
> need.
> 

Is there an easy way to find out how much memory is occupied by the 
various caches?  If so I should be able to tell you in a couple of days.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 20:20                     ` H. Peter Anvin
@ 2007-01-06 20:36                       ` Andrew Morton
  0 siblings, 0 replies; 110+ messages in thread
From: Andrew Morton @ 2007-01-06 20:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jeff Garzik, Randy Dunlap, J.H.,
	Willy Tarreau, Pavel Machek, kernel list, webmaster

On Sat, 06 Jan 2007 12:20:27 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> Andrew Morton wrote:
> >>
> >> Unfortunately that affects all three of: dcache, icache, and mbcache. 
> >> Maybe we could split that sysctl in two (Andrew?), so that one sysctl 
> >> affects dcache/icache and another affects mbcache.
> >>
> > 
> > That would be simple enough to do, if someone can demonstrate a
> > need.
> > 
> 
> Is there an easy way to find out how much memory is occupied by the 
> various caches?  If so I should be able to tell you in a couple of days.

/proc/meminfo:SReclaimable is a lumped sum.  /proc/slabinfo has the most
detail.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 18:33             ` Randy Dunlap
  2007-01-06 19:18               ` H. Peter Anvin
@ 2007-01-06 19:21               ` J.H.
  2007-01-07 19:52                 ` Randy Dunlap
  1 sibling, 1 reply; 110+ messages in thread
From: J.H. @ 2007-01-06 19:21 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Willy Tarreau, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

It's an issue of load, and both machines are running 'hot' so to speak.
When the loads on the machines climbs our update rsyncs take longer to
complete (considering that our loads are completely based on I/O this
isn't surprising).  More or less nothing has changed since:
http://lkml.org/lkml/2006/12/14/347 with the exception that git & gitweb
are no longer the concern we have (the caching layer I put into
kernel.org seems to be taking care of the worst problems we were seeing
and I have a couple more to put up this weekend), right now it's getting
loads between the two machines load evened out and lowering the number
of allowed rsyncs on each machine to better bound the load problem.

- John

On Sat, 2007-01-06 at 10:33 -0800, Randy Dunlap wrote:
> On Mon, 18 Dec 2006 22:52:51 -0800 J.H. wrote:
> 
> > On Tue, 2006-12-19 at 07:34 +0100, Willy Tarreau wrote:
> > > On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
> > > (...)
> > > 
> > > > So we know the problem is there, and we are working on it - we are
> > > > getting e-mails about it if not daily than every other day or so.  If
> > > > there are suggestions we are willing to hear them - but the general
> > > > feeling with the admins is that we are probably hitting the biggest
> > > > problems already.
> > > 
> > > BTW, yesterday my 2.4 patches were not published, but I noticed that
> > > they were not even signed not bziped on hera. At first I simply thought
> > > it was related, but right now I have a doubt. Maybe the automatic script
> > > has been temporarily been disabled on hera too ?
> > 
> > The script that deals with the uploads also deals with the packaging -
> > so yes the problem is related.
> 
> and with the finger_banner and version info on www.kernel.org page?
> 
> They currently say:
> 
> The latest stable version of the Linux kernel is:           2.6.19.1
> The latest prepatch for the stable Linux kernel tree is:    2.6.20-rc3
> The latest snapshot for the stable Linux kernel tree is:    2.6.20-rc3-git4
> The latest 2.4 version of the Linux kernel is:              2.4.34
> The latest 2.2 version of the Linux kernel is:              2.2.26
> The latest prepatch for the 2.2 Linux kernel tree is:       2.2.27-rc2
> The latest -mm patch to the stable Linux kernels is:        2.6.20-rc2-mm1
> 
> 
> but there are 2.6.20-rc3-git[567] and 2.6.20-rc3-mm1 out there,
> so when is the finger version info updated?
> 
> Thanks,
> ---
> ~Randy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-06 19:21               ` J.H.
@ 2007-01-07 19:52                 ` Randy Dunlap
  2007-01-07 23:56                   ` H. Peter Anvin
  0 siblings, 1 reply; 110+ messages in thread
From: Randy Dunlap @ 2007-01-07 19:52 UTC (permalink / raw)
  To: J.H.
  Cc: Willy Tarreau, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Sat, 06 Jan 2007 11:21:15 -0800 J.H. wrote:

> It's an issue of load, and both machines are running 'hot' so to speak.
> When the loads on the machines climbs our update rsyncs take longer to
> complete (considering that our loads are completely based on I/O this
> isn't surprising).  More or less nothing has changed since:
> http://lkml.org/lkml/2006/12/14/347 with the exception that git & gitweb
> are no longer the concern we have (the caching layer I put into
> kernel.org seems to be taking care of the worst problems we were seeing
> and I have a couple more to put up this weekend), right now it's getting
> loads between the two machines load evened out and lowering the number
> of allowed rsyncs on each machine to better bound the load problem.

Hi,

I'm sure that all of this ext3fs etc. discussion is good,
but let me clarify:  I would be much happier if the kernel.org
main page and the finger_banner info were updated at the same
time that new tarballs were put onto kernel.org.

Now someone may say that this is still the rsync/load problem,
but from a customer perspective, it's bad.


> On Sat, 2007-01-06 at 10:33 -0800, Randy Dunlap wrote:
> > On Mon, 18 Dec 2006 22:52:51 -0800 J.H. wrote:
> > 
> > > On Tue, 2006-12-19 at 07:34 +0100, Willy Tarreau wrote:
> > > > On Sat, Dec 16, 2006 at 11:30:34AM -0800, J.H. wrote:
> > > > (...)
> > > > 
> > > > > So we know the problem is there, and we are working on it - we are
> > > > > getting e-mails about it if not daily than every other day or so.  If
> > > > > there are suggestions we are willing to hear them - but the general
> > > > > feeling with the admins is that we are probably hitting the biggest
> > > > > problems already.
> > > > 
> > > > BTW, yesterday my 2.4 patches were not published, but I noticed that
> > > > they were not even signed not bziped on hera. At first I simply thought
> > > > it was related, but right now I have a doubt. Maybe the automatic script
> > > > has been temporarily been disabled on hera too ?
> > > 
> > > The script that deals with the uploads also deals with the packaging -
> > > so yes the problem is related.
> > 
> > and with the finger_banner and version info on www.kernel.org page?
> > 
> > They currently say:
> > 
> > The latest stable version of the Linux kernel is:           2.6.19.1
> > The latest prepatch for the stable Linux kernel tree is:    2.6.20-rc3
> > The latest snapshot for the stable Linux kernel tree is:    2.6.20-rc3-git4
> > The latest 2.4 version of the Linux kernel is:              2.4.34
> > The latest 2.2 version of the Linux kernel is:              2.2.26
> > The latest prepatch for the 2.2 Linux kernel tree is:       2.2.27-rc2
> > The latest -mm patch to the stable Linux kernels is:        2.6.20-rc2-mm1
> > 
> > 
> > but there are 2.6.20-rc3-git[567] and 2.6.20-rc3-mm1 out there,
> > so when is the finger version info updated?

---
~Randy

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-07 19:52                 ` Randy Dunlap
@ 2007-01-07 23:56                   ` H. Peter Anvin
  0 siblings, 0 replies; 110+ messages in thread
From: H. Peter Anvin @ 2007-01-07 23:56 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: J.H., Willy Tarreau, Andrew Morton, Pavel Machek, kernel list, webmaster

Randy Dunlap wrote:
> 
> Hi,
> 
> I'm sure that all of this ext3fs etc. discussion is good,
> but let me clarify:  I would be much happier if the kernel.org
> main page and the finger_banner info were updated at the same
> time that new tarballs were put onto kernel.org.
> 

Tough sh*t.

> Now someone may say that this is still the rsync/load problem,
> but from a customer perspective, it's bad.

If we did that, we'd get thousands of "this link doesn't work".



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-19  6:34         ` Willy Tarreau
  2006-12-19  6:52           ` J.H.
@ 2006-12-26 17:02           ` H. Peter Anvin
  2007-01-08 19:31             ` Jean Delvare
  1 sibling, 1 reply; 110+ messages in thread
From: H. Peter Anvin @ 2006-12-26 17:02 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster

Willy Tarreau wrote:
> 
> Just evil suggestion, but if you contact someone else than HP, they
> might be _very_ interested in taking HP's place and providing whatever
> you need to get their name on www.kernel.org. Sun and IBM do such
> monter machines too. That would not be very kind to HP, but it might
> help getting hardware faster.
> 

No, it would not be kind to HP.  They have supported us pretty well so 
far, and as a consequence I really want to give them the right of first 
refusal.  We're asking for a lot of equipment, and these things take 
time.  This is all further complicated right now by the fact that we 
probably won't get our 501(c)3 decision from IRS until mid-2007.

	-hpa

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-26 17:02           ` H. Peter Anvin
@ 2007-01-08 19:31             ` Jean Delvare
  2007-01-08 19:37               ` Willy Tarreau
  0 siblings, 1 reply; 110+ messages in thread
From: Jean Delvare @ 2007-01-08 19:31 UTC (permalink / raw)
  To: H. Peter Anvin, Willy Tarreau
  Cc: J.H., Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, webmaster

Hi Peter,

On Tue, 26 Dec 2006 09:02:59 -0800, H. Peter Anvin wrote:
> Willy Tarreau wrote:
>
> > Just evil suggestion, but if you contact someone else than HP, they
> > might be _very_ interested in taking HP's place and providing whatever
> > you need to get their name on www.kernel.org. Sun and IBM do such
> > monter machines too. That would not be very kind to HP, but it might
> > help getting hardware faster.
> 
> No, it would not be kind to HP.  They have supported us pretty well so 
> far, and as a consequence I really want to give them the right of first 
> refusal.  We're asking for a lot of equipment, and these things take 
> time.  This is all further complicated right now by the fact that we 
> probably won't get our 501(c)3 decision from IRS until mid-2007.

What's wrong with having _several_ vendors contributing hardware to
kernel.org? Hopefully HP is providing the hardware not just for the
fame but also because they know how much it is needed and they have an
interest in the Linux kernel development going well and smoothly.

HP should actually enjoy the cost being shared with another company. Or
was the HP hardware given in exhange of the promess that HP would be
the only hardware provider of kernel.org and that this would be loudly
advertised? I sure hope not, and I can't believe we're willing to wait
for half a year just to let HP provide the hardware if another vendor
is also willing to provide it and would be faster.

It's free software we're doing here, it should be natural to accept the
help of anyone who wants to help, and to share the costs where possible.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-08 19:31             ` Jean Delvare
@ 2007-01-08 19:37               ` Willy Tarreau
  2007-01-08 22:05                 ` Jean Delvare
  0 siblings, 1 reply; 110+ messages in thread
From: Willy Tarreau @ 2007-01-08 19:37 UTC (permalink / raw)
  To: Jean Delvare
  Cc: H. Peter Anvin, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Hi Jean,

On Mon, Jan 08, 2007 at 08:31:50PM +0100, Jean Delvare wrote:
> Hi Peter,
> 
> On Tue, 26 Dec 2006 09:02:59 -0800, H. Peter Anvin wrote:
> > Willy Tarreau wrote:
> >
> > > Just evil suggestion, but if you contact someone else than HP, they
> > > might be _very_ interested in taking HP's place and providing whatever
> > > you need to get their name on www.kernel.org. Sun and IBM do such
> > > monter machines too. That would not be very kind to HP, but it might
> > > help getting hardware faster.
> > 
> > No, it would not be kind to HP.  They have supported us pretty well so 
> > far, and as a consequence I really want to give them the right of first 
> > refusal.  We're asking for a lot of equipment, and these things take 
> > time.  This is all further complicated right now by the fact that we 
> > probably won't get our 501(c)3 decision from IRS until mid-2007.
> 
> What's wrong with having _several_ vendors contributing hardware to
> kernel.org? Hopefully HP is providing the hardware not just for the
> fame but also because they know how much it is needed and they have an
> interest in the Linux kernel development going well and smoothly.
> 
> HP should actually enjoy the cost being shared with another company. Or
> was the HP hardware given in exhange of the promess that HP would be
> the only hardware provider of kernel.org and that this would be loudly
> advertised? I sure hope not, and I can't believe we're willing to wait
> for half a year just to let HP provide the hardware if another vendor
> is also willing to provide it and would be faster.
> 
> It's free software we're doing here, it should be natural to accept the
> help of anyone who wants to help, and to share the costs where possible.

But it's also necessary to be fair with the donators and encourage them
to give again. I agree that if we do not have any consideration for their
implication, it might make them lose their motivations. If, as Peter said,
they have been very supportive in the past, let's support them in turn.

Regards,
Willy


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-08 19:37               ` Willy Tarreau
@ 2007-01-08 22:05                 ` Jean Delvare
  0 siblings, 0 replies; 110+ messages in thread
From: Jean Delvare @ 2007-01-08 22:05 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: H. Peter Anvin, J.H.,
	Randy Dunlap, Andrew Morton, Pavel Machek, kernel list,
	webmaster

Salut Willy,

On Mon, 8 Jan 2007 20:37:58 +0100, Willy Tarreau wrote:
> Hi Jean,
> 
> On Mon, Jan 08, 2007 at 08:31:50PM +0100, Jean Delvare wrote:
> > Hi Peter,
> > 
> > On Tue, 26 Dec 2006 09:02:59 -0800, H. Peter Anvin wrote:
> > > Willy Tarreau wrote:
> > >
> > > > Just evil suggestion, but if you contact someone else than HP, they
> > > > might be _very_ interested in taking HP's place and providing whatever
> > > > you need to get their name on www.kernel.org. Sun and IBM do such
> > > > monter machines too. That would not be very kind to HP, but it might
> > > > help getting hardware faster.
> > > 
> > > No, it would not be kind to HP.  They have supported us pretty well so 
> > > far, and as a consequence I really want to give them the right of first 
> > > refusal.  We're asking for a lot of equipment, and these things take 
> > > time.  This is all further complicated right now by the fact that we 
> > > probably won't get our 501(c)3 decision from IRS until mid-2007.
> > 
> > What's wrong with having _several_ vendors contributing hardware to
> > kernel.org? Hopefully HP is providing the hardware not just for the
> > fame but also because they know how much it is needed and they have an
> > interest in the Linux kernel development going well and smoothly.
> > 
> > HP should actually enjoy the cost being shared with another company. Or
> > was the HP hardware given in exhange of the promess that HP would be
> > the only hardware provider of kernel.org and that this would be loudly
> > advertised? I sure hope not, and I can't believe we're willing to wait
> > for half a year just to let HP provide the hardware if another vendor
> > is also willing to provide it and would be faster.
> > 
> > It's free software we're doing here, it should be natural to accept the
> > help of anyone who wants to help, and to share the costs where possible.
> 
> But it's also necessary to be fair with the donators and encourage them
> to give again. I agree that if we do not have any consideration for their
> implication, it might make them lose their motivations. If, as Peter said,
> they have been very supportive in the past, let's support them in turn.

I did _not_ suggest that we discourage HP from giving more hardware if
they wish to. And I didn't mean to show any lack of consideration for
HP's repeated donations to kernel.org in the past; sorry if it sounded
that way. I am really happy, as I think everyone else here is, that HP
supported kernel.org the way they did so far. That's not the point!

I am simply wondering what's the benefit of kernel.org sticking to a
single donator for their hardware, and quite frankly, I couldn't find a
reasonable answer. We should be happy if kernel.org can get more
hardware to solve the current bottlenecks. HP should be happy if they
don't have to pay for all the kernel.org hardware. Other companies may
be happy to help if their developers are affected by the performance
issues that we are experiencing at the moment.

Now, maybe no other company is actually interested in giving hardware
to run kernel.org, or what they are ready to offer doesn't fit the
needs, or maybe there are technical or administrative constraints that
only HP is currently able to fulfill. Again that's not the point. All I
say is that we shouldn't, IMVHO, set as a principle that only one
vendor should be allowed to provide the hardware that runs kernel.org.
It simply doesn't fit in the spirit of the software we are developing.

Just to make it very clear: I am not suggesting that we should start a
competition between various hardware vendors. It's cooperation I'm
thinking about. Again, because that's exactly what free software is all
about.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
                           ` (4 preceding siblings ...)
  2006-12-19  6:34         ` Willy Tarreau
@ 2006-12-19 15:37         ` Tim Schmielau
  2007-01-08 21:20         ` Jean Delvare
  6 siblings, 0 replies; 110+ messages in thread
From: Tim Schmielau @ 2006-12-19 15:37 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Sat, 16 Dec 2006, J.H. wrote:

> - Gitweb is causing us no end of headache,

Is there any mirror for http://git.kernel.org/git/ other than 
git2.kernel.org? If there is, it would probably help to make it better 
known.

thanks,
Tim

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2006-12-16 19:30       ` J.H.
                           ` (5 preceding siblings ...)
  2006-12-19 15:37         ` Tim Schmielau
@ 2007-01-08 21:20         ` Jean Delvare
  2007-01-08 21:33           ` J.H.
  6 siblings, 1 reply; 110+ messages in thread
From: Jean Delvare @ 2007-01-08 21:20 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

Hi JH,

On Sat, 16 Dec 2006 11:30:34 -0800, J.H. wrote:
> The root cause boils down to with git, gitweb and the normal mirroring
> on the frontend machines our basic working set no longer stays resident
> in memory, which is forcing more and more to actively go to disk causing
> a much higher I/O load.  You have the added problem that one of the
> frontend machines is getting hit harder than the other due to several
> factors: various DNS servers not round robining, people explicitly
> hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and

I am trying to be a good citizen by explicitely asking for
www2.kernel.org, unfortunately I notice that many links on the main
page point to www.kernel.org rather than www2.kernel.org. Check the
location, patchtype, full source, patch, view patch, and changeset
links for example. Fixing these links would let people really use www2
if they want to, that might help.

BTW, I'm no DNS expert, but isn't it possible to favor one host in the
round robin mechanism? E.g. by listing the server 2 twice, so that it
gets 2/3 of the load? This could also help if server 1 otherwise gets
more load.

> So we know the problem is there, and we are working on it - we are
> getting e-mails about it if not daily than every other day or so.  If
> there are suggestions we are willing to hear them - but the general
> feeling with the admins is that we are probably hitting the biggest
> problems already.

I have a few suggestions although I realize that the other things
you're working on are likely to be much more helpful:

* Shorten the www.kernel.org main page. I guess that 99% of the hits on
this page are by people who just want to know the latest versions, and
possibly download a patch or access Linus' git tree through gitweb. All
the rest could be moved to a separate page, or if you think it's
better to keep all the general info on the main page, move the array
with the versions to a separate page, which developers can bookmark.
Splitting the dynamic content (top) from the essentially static content
(bottom) of this page should help with caching, BTW.

* Drop the bandwidth graphs. Most visitors certainly do not care, and
their presence generates traffic on all web servers regardless of the
one the visitor is using, as each graph is generated by the respective
server. If you really like these graphs, just move them to a separate
page for people who want to watch them. As far as I am concerned, I
find them rather confusing and uninformative - from a quick look you
just can't tell if the servers are loaded or not, you have to look at
the numbers, so what's the point of drawing a graph...

Of course the interest of these proposals directly depends on how much
the www.kernel.org/index page accounts in the total load of the servers.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-08 21:20         ` Jean Delvare
@ 2007-01-08 21:33           ` J.H.
  2007-01-09  7:01             ` Jean Delvare
  0 siblings, 1 reply; 110+ messages in thread
From: J.H. @ 2007-01-08 21:33 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Mon, 2007-01-08 at 22:20 +0100, Jean Delvare wrote:
> Hi JH,
> 
> On Sat, 16 Dec 2006 11:30:34 -0800, J.H. wrote:
> > The root cause boils down to with git, gitweb and the normal mirroring
> > on the frontend machines our basic working set no longer stays resident
> > in memory, which is forcing more and more to actively go to disk causing
> > a much higher I/O load.  You have the added problem that one of the
> > frontend machines is getting hit harder than the other due to several
> > factors: various DNS servers not round robining, people explicitly
> > hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> 
> I am trying to be a good citizen by explicitely asking for
> www2.kernel.org, unfortunately I notice that many links on the main
> page point to www.kernel.org rather than www2.kernel.org. Check the
> location, patchtype, full source, patch, view patch, and changeset
> links for example. Fixing these links would let people really use www2
> if they want to, that might help.

True - however if you look at the underlying link for those you'll
notice that most of the links will continue to use www2 instead of www.
The ones that explicitly point to www probably have a good reason for
doing so, but I'll have to check on that.  Regardless the kernel.org
webpages need some work and it's on my todo list (maybe I should post
that somewhere...)

> 
> BTW, I'm no DNS expert, but isn't it possible to favor one host in the
> round robin mechanism? E.g. by listing the server 2 twice, so that it
> gets 2/3 of the load? This could also help if server 1 otherwise gets
> more load.

Could, but the bigger problem seems to be people explicitly pointing
rsync at 1 instead of the generic name or 2.  Beyond that traffic seems
to distribute as we are expecting.

> 
> > So we know the problem is there, and we are working on it - we are
> > getting e-mails about it if not daily than every other day or so.  If
> > there are suggestions we are willing to hear them - but the general
> > feeling with the admins is that we are probably hitting the biggest
> > problems already.
> 
> I have a few suggestions although I realize that the other things
> you're working on are likely to be much more helpful:
> 
> * Shorten the www.kernel.org main page. I guess that 99% of the hits on
> this page are by people who just want to know the latest versions, and
> possibly download a patch or access Linus' git tree through gitweb. All
> the rest could be moved to a separate page, or if you think it's
> better to keep all the general info on the main page, move the array
> with the versions to a separate page, which developers can bookmark.
> Splitting the dynamic content (top) from the essentially static content
> (bottom) of this page should help with caching, BTW.

The frontpage itself cache's pretty nicely and the upper 'dynamic'
content isn't constantly being generated on every page request so by and
large this caches and we don't have any real issue with it.

> 
> * Drop the bandwidth graphs. Most visitors certainly do not care, and
> their presence generates traffic on all web servers regardless of the
> one the visitor is using, as each graph is generated by the respective
> server. If you really like these graphs, just move them to a separate
> page for people who want to watch them. As far as I am concerned, I
> find them rather confusing and uninformative - from a quick look you
> just can't tell if the servers are loaded or not, you have to look at
> the numbers, so what's the point of drawing a graph...

While I agree that most users don't care, they are useful.  If someone
notices that 1 has an incredibly high load and moving lots of traffic in
comparison to 2, than they can manually redirect to 2 for better &
faster service on their own.  Since these images aren't particularly big
they cache just fine and it's not that big of a deal, and there are much
longer poles in the tent right now.

> 
> Of course the interest of these proposals directly depends on how much
> the www.kernel.org/index page accounts in the total load of the servers.
> 

Honestly - negligible at best.  We have bigger issues from trying to
service 200 seperate rsync processes on top of http, ftp, git, gitweb,
etc than worying about a couple of small, 90% static pages.

- John 'Warthog9' Hawley
Kernel.org Admin


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-08 21:33           ` J.H.
@ 2007-01-09  7:01             ` Jean Delvare
  2007-01-09  7:25               ` J.H.
  0 siblings, 1 reply; 110+ messages in thread
From: Jean Delvare @ 2007-01-09  7:01 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

Hi JH,

On Mon, 08 Jan 2007 13:33:04 -0800, J.H. wrote:
> On Mon, 2007-01-08 at 22:20 +0100, Jean Delvare wrote:
> > * Drop the bandwidth graphs. Most visitors certainly do not care, and
> > their presence generates traffic on all web servers regardless of the
> > one the visitor is using, as each graph is generated by the respective
> > server. If you really like these graphs, just move them to a separate
> > page for people who want to watch them. As far as I am concerned, I
> > find them rather confusing and uninformative - from a quick look you
> > just can't tell if the servers are loaded or not, you have to look at
> > the numbers, so what's the point of drawing a graph...
> 
> While I agree that most users don't care, they are useful.  If someone

So moving them to a separate page would make sense.

> notices that 1 has an incredibly high load and moving lots of traffic in
> comparison to 2, than they can manually redirect to 2 for better &
> faster service on their own.  Since these images aren't particularly big

Unfortunately the images actually fail to present this information to
the visitor clearly. One problem is the time range displayed. 17
minutes is either too much (hardly better than an instant value, but
harder to read) or not enough (you can't really see the trend.) With
stats on the last 24 hours, people could see the daily usage curve and
schedule their rsyncs at times of lesser load, for example, if they see
a daily pattern in the load.

Another problem is the fact that the vertical scales are dynamically
chosen, and thus different between both servers, making it impossible to
quickly compare the bandwidth usage. If the bandwidth usage on both
servers is stable, both images will look the same, even though one
server might be overloaded and the other one underused. The user also
can't compare from one visit to the next, the graphs look essentially
the same each time, regardless of the actual bandwidth use. So, if you
really want people to use these graphs to take decsions and help
balancing the load better, you have to use fixed scales.

I also notice that the graphs show primarily the bandwidth, while what
seems to matter is the server load.

> they cache just fine and it's not that big of a deal, and there are much
> longer poles in the tent right now.

The images are being regenerated every other minute or so, so I doubt
they can actually be cached.

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-09  7:01             ` Jean Delvare
@ 2007-01-09  7:25               ` J.H.
  2007-01-09 13:36                 ` Jean Delvare
  0 siblings, 1 reply; 110+ messages in thread
From: J.H. @ 2007-01-09  7:25 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

On Tue, 2007-01-09 at 08:01 +0100, Jean Delvare wrote:
> Hi JH,
> 
> On Mon, 08 Jan 2007 13:33:04 -0800, J.H. wrote:
> > On Mon, 2007-01-08 at 22:20 +0100, Jean Delvare wrote:
> > > * Drop the bandwidth graphs. Most visitors certainly do not care, and
> > > their presence generates traffic on all web servers regardless of the
> > > one the visitor is using, as each graph is generated by the respective
> > > server. If you really like these graphs, just move them to a separate
> > > page for people who want to watch them. As far as I am concerned, I
> > > find them rather confusing and uninformative - from a quick look you
> > > just can't tell if the servers are loaded or not, you have to look at
> > > the numbers, so what's the point of drawing a graph...
> > 
> > While I agree that most users don't care, they are useful.  If someone
> 
> So moving them to a separate page would make sense.

Not really.

> 
> > notices that 1 has an incredibly high load and moving lots of traffic in
> > comparison to 2, than they can manually redirect to 2 for better &
> > faster service on their own.  Since these images aren't particularly big
> 
> Unfortunately the images actually fail to present this information to
> the visitor clearly. One problem is the time range displayed. 17
> minutes is either too much (hardly better than an instant value, but
> harder to read) or not enough (you can't really see the trend.) With
> stats on the last 24 hours, people could see the daily usage curve and
> schedule their rsyncs at times of lesser load, for example, if they see
> a daily pattern in the load.

They are useful, even if they are more or less a snapshot in time, as it
at least gives people a vague idea of whats going on.  So instead of
having to e-mail the admins and ask 'my download is slow' they can at
least glance at the graphs and possibly realize OHHHH it's release day
and kernel.org is moving close to 2gbps between the two machines.

> 
> Another problem is the fact that the vertical scales are dynamically
> chosen, and thus different between both servers, making it impossible to
> quickly compare the bandwidth usage. If the bandwidth usage on both
> servers is stable, both images will look the same, even though one
> server might be overloaded and the other one underused. The user also
> can't compare from one visit to the next, the graphs look essentially
> the same each time, regardless of the actual bandwidth use. So, if you
> really want people to use these graphs to take decsions and help
> balancing the load better, you have to use fixed scales.
> 
> I also notice that the graphs show primarily the bandwidth, while what
> seems to matter is the server load.
> 
> > they cache just fine and it's not that big of a deal, and there are much
> > longer poles in the tent right now.
> 
> The images are being regenerated every other minute or so, so I doubt
> they can actually be cached.
> 

Considering how many times the front page of kernel.org is viewed, yes
they are cached and sitting in ram on the kernel.org boxes.
Realistically - we are arguing over something that barely even registers
as a blip within the entirety of the load on kernel.org, and we have
bigger things to worry about than a restructuring of our front page when
it won't greatly affect our loads.

- John 'Warthog9'


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [KORG] Re: kernel.org lies about latest -mm kernel
  2007-01-09  7:25               ` J.H.
@ 2007-01-09 13:36                 ` Jean Delvare
  0 siblings, 0 replies; 110+ messages in thread
From: Jean Delvare @ 2007-01-09 13:36 UTC (permalink / raw)
  To: J.H.
  Cc: Randy Dunlap, Andrew Morton, Pavel Machek, kernel list, hpa, webmaster

Hi J.H.,

On Mon, 08 Jan 2007 23:25:18 -0800, J.H. wrote:
> On Tue, 2007-01-09 at 08:01 +0100, Jean Delvare wrote:
> > > they cache just fine and it's not that big of a deal, and there are much
> > > longer poles in the tent right now.
> > 
> > The images are being regenerated every other minute or so, so I doubt
> > they can actually be cached.
> 
> Considering how many times the front page of kernel.org is viewed, yes
> they are cached and sitting in ram on the kernel.org boxes.

This is client-side caching (or lack thereof) I was thinking about, to
limit the number of requests received by the web server.

> Realistically - we are arguing over something that barely even registers
> as a blip within the entirety of the load on kernel.org, and we have
> bigger things to worry about than a restructuring of our front page when
> it won't greatly affect our loads.

You will know better, I was only suggesting things based on my limited
user experience. You have the actual numbers, you know where your
time and energy will be best used. And thank you for taking care of
it! :)

-- 
Jean Delvare

^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2007-03-24  1:18 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-14 22:37 kernel.org lies about latest -mm kernel Pavel Machek
2006-12-14 23:01 ` Randy Dunlap
2006-12-14 23:38 ` Sergio Monteiro Basto
2006-12-16 17:44 ` [KORG] " Randy Dunlap
2006-12-16 17:57   ` Andrew Morton
2006-12-16 18:02     ` Randy Dunlap
2006-12-16 19:30       ` J.H.
2006-12-16 20:30         ` Russell King
2006-12-26 16:47           ` H. Peter Anvin
2006-12-16 21:21         ` Nigel Cunningham
2006-12-26 16:49           ` H. Peter Anvin
2007-01-07  3:35             ` Nigel Cunningham
2007-01-07  4:10               ` Jeff Garzik
2007-01-07  4:47                 ` Nigel Cunningham
2007-01-07  4:22               ` Jeff Garzik
2007-01-07  4:29                 ` Linus Torvalds
2007-01-07 20:11                 ` Greg KH
2007-01-07 21:30                   ` H. Peter Anvin
2007-01-07  5:17               ` H. Peter Anvin
2007-01-07  5:24                 ` How git affects kernel.org performance H. Peter Anvin
2007-01-07  5:39                   ` Linus Torvalds
2007-01-07  8:55                     ` Willy Tarreau
2007-01-07  8:58                       ` H. Peter Anvin
2007-01-07  9:03                         ` Willy Tarreau
2007-01-07 10:28                           ` Christoph Hellwig
2007-01-07 10:52                             ` Willy Tarreau
2007-01-07 18:17                             ` Linus Torvalds
2007-01-07 19:13                               ` Linus Torvalds
     [not found]                                 ` <9e4733910701071126r7931042eldfb73060792f4f41@mail.gmail.com>
2007-01-07 19:35                                   ` Linus Torvalds
2007-01-07 10:50                           ` Jan Engelhardt
2007-01-07 18:49                             ` Randy Dunlap
2007-01-07 19:07                               ` Jan Engelhardt
2007-01-07 19:28                                 ` Randy Dunlap
2007-01-07 19:37                                   ` Linus Torvalds
2007-01-07  9:15                       ` Andrew Morton
2007-01-07  9:38                         ` Rene Herman
2007-01-08  3:05                         ` Suparna Bhattacharya
2007-01-08 12:58                           ` Theodore Tso
2007-01-08 13:41                             ` Johannes Stezenbach
2007-01-08 13:56                               ` Theodore Tso
2007-01-08 13:59                                 ` Pavel Machek
2007-01-08 14:17                                   ` Theodore Tso
2007-01-08 13:43                             ` Jeff Garzik
2007-01-09  1:09                               ` Paul Jackson
2007-01-09  2:18                                 ` Jeremy Higdon
     [not found]                             ` <20070109075945.GA8799@mail.ustc.edu.cn>
2007-01-09  7:59                               ` Fengguang Wu
2007-01-09 16:23                                 ` Linus Torvalds
     [not found]                                   ` <20070110015739.GA26978@mail.ustc.edu.cn>
2007-01-10  1:57                                     ` Fengguang Wu
2007-01-10  3:20                                     ` Nigel Cunningham
     [not found]                                       ` <20070110140730.GA986@mail.ustc.edu.cn>
2007-01-10 14:07                                         ` Fengguang Wu
2007-01-12 10:54                                         ` Nigel Cunningham
2007-01-07 14:57                   ` Robert Fitzsimons
2007-01-07 19:12                     ` J.H.
2007-01-08  1:51                     ` Jakub Narebski
2007-01-07 15:06                   ` Krzysztof Halasa
2007-01-07 20:31                     ` Shawn O. Pearce
2007-01-08 14:46                       ` Nicolas Pitre
2007-01-09  4:29                 ` [KORG] Re: kernel.org lies about latest -mm kernel Nigel Cunningham
2007-01-09  5:09                   ` Adrian Bunk
2007-01-09  5:51                     ` Nigel Cunningham
2006-12-17 12:32         ` Pavel Machek
2006-12-17 13:13           ` Jeff Garzik
2006-12-17 18:23         ` Randy Dunlap
2006-12-17 22:37           ` Matti Aarnio
2006-12-18  0:42             ` J.H.
2006-12-19  6:46               ` Willy Tarreau
2006-12-19  7:39                 ` J.H.
2006-12-19 13:32                   ` Willy Tarreau
2006-12-19 14:36                   ` Dave Jones
2006-12-19 14:38                     ` Willy Tarreau
2006-12-26 16:14                     ` H. Peter Anvin
2007-01-08 20:10           ` Jean Delvare
2006-12-19  6:34         ` Willy Tarreau
2006-12-19  6:52           ` J.H.
2007-01-06 18:33             ` Randy Dunlap
2007-01-06 19:18               ` H. Peter Anvin
2007-01-06 19:35                 ` Willy Tarreau
2007-01-06 19:37                 ` Nicholas Miell
2007-01-06 20:13                   ` Andrew Morton
2007-01-06 20:18                     ` H. Peter Anvin
2007-03-19 19:27                       ` [PATCH] sysctl: vfs_cache_divisor Randy Dunlap
2007-03-19 20:36                         ` Andrew Morton
2007-03-19 20:42                           ` Randy Dunlap
2007-03-20  4:22                             ` H. Peter Anvin
2007-03-21 23:01                               ` Randy Dunlap
2007-03-21 23:11                                 ` Andrew Morton
2007-03-23  0:07                                   ` Kyle Moffett
2007-03-23 20:36                                     ` Randy Dunlap
2007-03-23 20:59                                       ` H. Peter Anvin
2007-03-24  0:45                                         ` Kyle Moffett
2007-03-24  1:17                                           ` Kyle Moffett
2007-03-20 19:53                         ` Ingo Oeser
2007-01-06 23:50                     ` [KORG] Re: kernel.org lies about latest -mm kernel H. Peter Anvin
2007-01-06 20:13                 ` Jeff Garzik
2007-01-06 20:17                   ` Andrew Morton
2007-01-06 20:20                     ` H. Peter Anvin
2007-01-06 20:36                       ` Andrew Morton
2007-01-06 19:21               ` J.H.
2007-01-07 19:52                 ` Randy Dunlap
2007-01-07 23:56                   ` H. Peter Anvin
2006-12-26 17:02           ` H. Peter Anvin
2007-01-08 19:31             ` Jean Delvare
2007-01-08 19:37               ` Willy Tarreau
2007-01-08 22:05                 ` Jean Delvare
2006-12-19 15:37         ` Tim Schmielau
2007-01-08 21:20         ` Jean Delvare
2007-01-08 21:33           ` J.H.
2007-01-09  7:01             ` Jean Delvare
2007-01-09  7:25               ` J.H.
2007-01-09 13:36                 ` Jean Delvare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).