linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.4.14-pre6
@ 2001-10-31 16:15 Linus Torvalds
  2001-10-31 18:36 ` 2.4.14-pre6 Andrew Morton
  2001-11-01 10:20 ` 2.4.14-pre6 Neil Brown
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2001-10-31 16:15 UTC (permalink / raw)
  To: Kernel Mailing List; +Cc: Andrew Morton


In article <3BDFBFF5.9F54B938@zip.com.au>,
Andrew Morton  <akpm@zip.com.au> wrote:
>
>Appended here is a program which creates 100,000 small files.
>Using ext2 on -pre5.  We see how long it takes to run
>
>	(make-many-files ; sync)
>
>For several values of queue_nr_requests:
>
>queue_nr_requests:	128	8192	32768
>execution time:		4:43	3:25	3:20
>
>Almost all of the execution time is in the `sync'.

Hmm..  I don't consider "sync" to be a benchmark, and one of the things
that made me limit the queue size was in fact that Linux in the
timeframe before roughly 2.4.7 or so was _completely_ unresponsive when
you did a big "untar" followed by a "sync".

I'd rather have a machine where I don't even much notice the sync than
one where a made-up-load and a "sync" that servers no purpose shows
lower throughput.

Do you actually have any real load that cares?

>By restricting the number of requests in flight to 128 we're
>giving new requests only a very small chance of getting merged with
>an existing request.  More seeking.

If you can come up with alternatives that do not suck from a latency
standpoint, I'm open to ideas.

However, having tested the -ac approach, I know from personal experience
that it's just way too easy to find behaviour with so horrible latency
on a 2GB machine that it's not in the _least_ funny.

Making the elevator heavily favour reads over writes might be ok enough
to make the long queues even an option but:

>OK, not an interesting workload.  But I suspect that there are real
>workloads which will be bitten by this.
>
>Why is the queue length so tiny now?  Latency?  If so, couldn't this
>be addressed by giving reads higher priority versus writes?

It's a write-write latency thing too, but that's probably not as strong an
argument.

Trivial example: do the above thing at the same time you have a mail agent
open that does a "fsync()" on its mail store (and depending on your mail
agent and your mail folder layout, you may have quite a lot of small
fsyncs going on).

I don't know about you, but I start up mail agents a _lot_ more often
than I do "sync". And I'd rather do "sync &" than have bad interactive
performance from the mail agent.

I'm not against making the queues larger, but on the other hand I see so
many _better_ approaches that I would rather people spent some effort on,
for example, making the dirty list itself be more ordered.

We have actually talked about some higher-level ordering of the dirty list
for at least five years, but nobody has ever done it. And I bet you $5
that you'll get (a) better throughput than by making the queues longer and
(b) you'll have fine latency while you write and (c) that you want to
order the write-queue anyway for filesystems that care about ordering.

So yes, making the queue longer is an "easy" solution, but if it then
leads to complex problems like how to make an elevator that is guaranteed
to not have bad latency behaviour, I actually think that doing some (even
just fairly rudimentary) ordering of the write queue ends up being easier
_and_ more effective.

		Linus


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 16:15 2.4.14-pre6 Linus Torvalds
@ 2001-10-31 18:36 ` Andrew Morton
  2001-10-31 19:06   ` 2.4.14-pre6 Linus Torvalds
  2001-11-01 10:20 ` 2.4.14-pre6 Neil Brown
  1 sibling, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2001-10-31 18:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Linus Torvalds wrote:
> 
> In article <3BDFBFF5.9F54B938@zip.com.au>,
> Andrew Morton  <akpm@zip.com.au> wrote:
> >
> >Appended here is a program which creates 100,000 small files.
> >Using ext2 on -pre5.  We see how long it takes to run
> >
> >       (make-many-files ; sync)
> >
> >For several values of queue_nr_requests:
> >
> >queue_nr_requests:     128     8192    32768
> >execution time:                4:43    3:25    3:20
> >
> >Almost all of the execution time is in the `sync'.
> 
> Hmm..  I don't consider "sync" to be a benchmark, and one of the things
> that made me limit the queue size was in fact that Linux in the
> timeframe before roughly 2.4.7 or so was _completely_ unresponsive when
> you did a big "untar" followed by a "sync".

Sure.  I chose `sync' because it's measurable.  That sync took
four minutes, so the machine will be locked up seeking for four
minutes whether the writeback was initiated by /bin/sync or by
kupdate/bdflush.

> I'd rather have a machine where I don't even much notice the sync than
> one where a made-up-load and a "sync" that servers no purpose shows
> lower throughput.
> 
> Do you actually have any real load that cares?

All I do is compile kernels :)

Actually, ext3 journal replay can sometimes take thirty seconds
or longer - it reads maybe ten megs from the journal and then
it has to splatter it all over the platter and wait on it.

> ...
> We have actually talked about some higher-level ordering of the dirty list
> for at least five years, but nobody has ever done it. And I bet you $5
> that you'll get (a) better throughput than by making the queues longer and
> (b) you'll have fine latency while you write and (c) that you want to
> order the write-queue anyway for filesystems that care about ordering.

I'll buy that.  It's not just the dirty list, either.  I've seen 
various incarnations of page_launder() and its successor which
were pretty suboptimal from a write clustering pov.

But it's actually quite seductive to take a huge amount of data and
just chuck it at the request layer and let Jens sort it out. This
usually works well and keeps the complexity in one place.

One does wonder whether everything is working as it should, though.
Creating those 100,000 4k files is going to require writeout of
how many blocks?  120,000?  And four minutes is enough time for
34,000 seven-millisecond seeks.  And ext2 is pretty good at laying
things out contiguously.  These numbers don't gel.

Ah-ha.  Look at the sync_inodes stuff:

	for (zillions of files) {
		filemap_fdatasync(file)
		filemap_fdatawait(file)
	}

If we turn this into

	for (zillions of files)
		filemap_fdatasync(file)
	for (zillions of files)
		filemap_fdatawait(file)

I suspect that interesting effects will be observed, yes?  Especially
if we have a nice long request queue, and the results of the
preceding sync_buffers() are still available for being merged with.

kupdate runs this code path as well. Why is there any need for
kupdate to wait on the writes?

Anyway.  I'll take a look....


-

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 18:36 ` 2.4.14-pre6 Andrew Morton
@ 2001-10-31 19:06   ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2001-10-31 19:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel Mailing List


On Wed, 31 Oct 2001, Andrew Morton wrote:
>
> But it's actually quite seductive to take a huge amount of data and
> just chuck it at the request layer and let Jens sort it out. This
> usually works well and keeps the complexity in one place.

Fair enough, I see your point. How would you suggest we handle the latency
thing, though?

I'm not against making the elevator more intelligent, and you have a
good argument. But I've very much against "allow the queues to grow with
no sense of latency".

> One does wonder whether everything is working as it should, though.
> Creating those 100,000 4k files is going to require writeout of
> how many blocks?  120,000?  And four minutes is enough time for
> 34,000 seven-millisecond seeks.  And ext2 is pretty good at laying
> things out contiguously.  These numbers don't gel.
>
> Ah-ha.  Look at the sync_inodes stuff:
>
> 	for (zillions of files) {
> 		filemap_fdatasync(file)
> 		filemap_fdatawait(file)
> 	}
>
> If we turn this into
>
> 	for (zillions of files)
> 		filemap_fdatasync(file)
> 	for (zillions of files)
> 		filemap_fdatawait(file)

Good catch, I bet you're right.

> kupdate runs this code path as well. Why is there any need for
> kupdate to wait on the writes?

At least historically (and I think it's still true in some cases),
kupdated was also in charge of trying to write out buffers under
low-memory circumstances. And without any throttling, blind writing can
make things worse.

However, the request throttle should be _plenty_ good enough, so I think
you're right.

Oh, one issue in case you're going to work on this: kupdated does need to
do the "wait_for_locked_buffers()" at some point, as that is also what
moves buffers from the locked list to the clean list. But that has nothing
to do with the fdatawait thing.

		Linus


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 16:15 2.4.14-pre6 Linus Torvalds
  2001-10-31 18:36 ` 2.4.14-pre6 Andrew Morton
@ 2001-11-01 10:20 ` Neil Brown
  2001-11-01 20:55   ` 2.4.14-pre6 Andrew Morton
  2001-11-01 21:28   ` 2.4.14-pre6 Chris Mason
  1 sibling, 2 replies; 34+ messages in thread
From: Neil Brown @ 2001-11-01 10:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Andrew Morton

On Wednesday October 31, torvalds@transmeta.com wrote:
> 
> We have actually talked about some higher-level ordering of the dirty list
> for at least five years, but nobody has ever done it. And I bet you $5
> that you'll get (a) better throughput than by making the queues longer and
> (b) you'll have fine latency while you write and (c) that you want to
> order the write-queue anyway for filesystems that care about ordering.
> 

But what is the "right" order. A raid5 array might well respond to a
different ordering that an JBOD.

I've thought a bit about how to best give blocks to RAID5 so that they
can be written efficiently.  I suspect the issues are similar for
normal disk io:

Currently the device (or block-device-layer) doesn't see a block until
the upper levels really want the IO to happen.  There is a little bit
of a grace period betwen the submit_bh and the run_task_queue(&tq_disk) 
when re-ordering can happen, but it isn't very long.  There is a bit
more grace time while waiting to get a turn on the device.  But it is
still a lot less time than the amount of time that most buffers are
sitting around in cache.

What I would like is that as soon as a buffer was marked "dirty", it 
would get passed down to the driver (or at least to the
block-device-layer) with something like 
    submit_bh(WRITEA, bh);
i.e. a write ahead. (or is it write-behind...)
The device handler (the elevator algorithm for normal disks, other
code for other devices) could keep them ordered in whatever way it
chooses, and feed them into the queues at some appropriate time.

The submit_bh(WRITE, bh) would then push the buffer out if it hadn't
gone already.

The elevator code could possibly keep two sorted lists: one of WRITEA
(or READA) requests and one of WRITE (or READ) requests.
It processes the second merging in some of the first as it goes.
Maybe capping it to 2 -ahead blocks for every immediate block.
Probably also allowing for larger numbers of -ahead blocks if they are
contiguous with an immediate block.

RAID5 would do something a bit different.  Possibly whenever it wanted
to write a stripe, it would hunt though the -ahead list (sort of like
the 2.2 code did) for other blocks that could be proactive added to
the stripe.


This would allow a nice ordering of write-behind (and read-ahead)
requests but give the driver control of latency by allowing it to
limit the extent to which write-behind/read-ahead blocks can usurp the
position of other blocks.

Does that make any sense?  Is it conceptually simple enough?

NeilBrown

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-01 10:20 ` 2.4.14-pre6 Neil Brown
@ 2001-11-01 20:55   ` Andrew Morton
  2001-11-02  8:00     ` 2.4.14-pre6 Helge Hafting
  2001-11-04 22:34     ` 2.4.14-pre6 Pavel Machek
  2001-11-01 21:28   ` 2.4.14-pre6 Chris Mason
  1 sibling, 2 replies; 34+ messages in thread
From: Andrew Morton @ 2001-11-01 20:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linus Torvalds, Kernel Mailing List, Andrea Arcangeli

Neil Brown wrote:
> 
> ...
> What I would like is that as soon as a buffer was marked "dirty", it
> would get passed down to the driver (or at least to the
> block-device-layer) with something like
>     submit_bh(WRITEA, bh);
> i.e. a write ahead. (or is it write-behind...)
> The device handler (the elevator algorithm for normal disks, other
> code for other devices) could keep them ordered in whatever way it
> chooses, and feed them into the queues at some appropriate time.
> 

Sounds sensible to me.

In many ways, it's similar to the current scheme when it's used
with an enormous request queue - all writeable blocks in the
system are candidates for request merging.  But your proposal
is a whole lot smarter.

In particular, the current kupdate scheme of writing the
dirty block list out in six chunks, five seconds apart
does potentially miss out on a very large number of merging
opportunities.  Your proposal would fix that.

Another potential microoptimisation would be to write out
clean blocks if that helps merging.  So if we see a write
for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
then write it out too.  I suspect this would be a win for
ATA but a loss for SCSI.  Not sure.

But I have a gut feel that all this is in the noisefloor,
compared to The Big Problem.  It's just a matter of identifying
and fixing TBP.  Fixing the fdatasync() thing didn't help,
because ext2_write_inode() for a new file has to read the
inode block from disk.  Fixing that, by doing an async preread
of the inode's block in ext2_new_inode() didn't help either,
I suspect because my working set was so large that the VM
tossed out my preread before I got to use it.  A few more days
poking is needed.



Oh.  I have a gripe concerning prune_icache().  The design
idea behind keventd is that it's a "process context bottom
half handler".  It's used for things like cardbus hotplug
interrupt handlers, handling tty hangups, etc.  It should
probably run SCHED_FIFO.

Using keventd to synchronously flush large amounts of 
data out to disk constitutes gross abuse - it's being blocked
from performing its designed duties for many seconds.  Can we
please not do that?  We already have kswapd, kupdate, bdflush,
which should be sufficient.

-

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-01 10:20 ` 2.4.14-pre6 Neil Brown
  2001-11-01 20:55   ` 2.4.14-pre6 Andrew Morton
@ 2001-11-01 21:28   ` Chris Mason
  1 sibling, 0 replies; 34+ messages in thread
From: Chris Mason @ 2001-11-01 21:28 UTC (permalink / raw)
  To: Andrew Morton, Neil Brown
  Cc: Linus Torvalds, Kernel Mailing List, Andrea Arcangeli



On Thursday, November 01, 2001 12:55:41 PM -0800 Andrew Morton
<akpm@zip.com.au> wrote:

> Oh.  I have a gripe concerning prune_icache().  The design
> idea behind keventd is that it's a "process context bottom
> half handler".  It's used for things like cardbus hotplug
> interrupt handlers, handling tty hangups, etc.  It should
> probably run SCHED_FIFO.
> 
> Using keventd to synchronously flush large amounts of 
> data out to disk constitutes gross abuse - it's being blocked
> from performing its designed duties for many seconds.  Can we
> please not do that?  We already have kswapd, kupdate, bdflush,
> which should be sufficient.

One of the worst parts of prune_icache was that if a journaled
FS needed to log dirty inodes, kswapd would wait on the log, who was
probably waiting on kswapd.  Thus the dirty_inode call, which I'd like to
get rid of.

I don't think kupdate or bdflush are suitable to flush the dirty inodes,
kupdate shouldn't do memory pressure and bdflush shouldn't wait on the log.
So how about a new kinoded?

-chris


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-01 20:55   ` 2.4.14-pre6 Andrew Morton
@ 2001-11-02  8:00     ` Helge Hafting
  2001-11-04 22:34     ` 2.4.14-pre6 Pavel Machek
  1 sibling, 0 replies; 34+ messages in thread
From: Helge Hafting @ 2001-11-02  8:00 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

Andrew Morton wrote:

> Another potential microoptimisation would be to write out
> clean blocks if that helps merging.  So if we see a write
> for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
> then write it out too.  I suspect this would be a win for
> ATA but a loss for SCSI.  Not sure.
> 

A not to stupid disk would implement the seek to block 5
as waiting for block 4 to move past.  So
rewriting block 4 probably wouldn't help.  could be 
interesting to see a benchmark for that though, perhaps
some drives are really dumb.

The average "half rotation delay" when seeking does not apply
when the seek _isn't_ random.

Helge Hafting

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-01 20:55   ` 2.4.14-pre6 Andrew Morton
  2001-11-02  8:00     ` 2.4.14-pre6 Helge Hafting
@ 2001-11-04 22:34     ` Pavel Machek
  2001-11-04 23:16       ` 2.4.14-pre6 Daniel Phillips
  1 sibling, 1 reply; 34+ messages in thread
From: Pavel Machek @ 2001-11-04 22:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Neil Brown, Linus Torvalds, Kernel Mailing List, Andrea Arcangeli

Hi!

> > What I would like is that as soon as a buffer was marked "dirty", it
> > would get passed down to the driver (or at least to the
> > block-device-layer) with something like
> >     submit_bh(WRITEA, bh);
> > i.e. a write ahead. (or is it write-behind...)
> > The device handler (the elevator algorithm for normal disks, other
> > code for other devices) could keep them ordered in whatever way it
> > chooses, and feed them into the queues at some appropriate time.
> 
> Sounds sensible to me.
> 
> In many ways, it's similar to the current scheme when it's used
> with an enormous request queue - all writeable blocks in the
> system are candidates for request merging.  But your proposal
> is a whole lot smarter.
> 
> In particular, the current kupdate scheme of writing the
> dirty block list out in six chunks, five seconds apart
> does potentially miss out on a very large number of merging
> opportunities.  Your proposal would fix that.
> 
> Another potential microoptimisation would be to write out
> clean blocks if that helps merging.  So if we see a write
> for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
> then write it out too.  I suspect this would be a win for
> ATA but a loss for SCSI.  Not sure.

Please don't do this, it is bug.

If user did not ask writing somewhere, DO NOT WRITE THERE! If power
fails in the middle of the sector... Or if that is flashcard.... Just
don't do this.
								Pavel
-- 
STOP THE WAR! Someone killed innocent Americans. That does not give
U.S. right to kill people in Afganistan.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-04 22:34     ` 2.4.14-pre6 Pavel Machek
@ 2001-11-04 23:16       ` Daniel Phillips
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Phillips @ 2001-11-04 23:16 UTC (permalink / raw)
  To: Pavel Machek, Andrew Morton
  Cc: Neil Brown, Linus Torvalds, Kernel Mailing List, Andrea Arcangeli

On November 4, 2001 11:34 pm, Pavel Machek wrote:
> > Another potential microoptimisation would be to write out
> > clean blocks if that helps merging.  So if we see a write
> > for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
> > then write it out too.  I suspect this would be a win for
> > ATA but a loss for SCSI.  Not sure.
> 
> Please don't do this, it is bug.
> 
> If user did not ask writing somewhere, DO NOT WRITE THERE! If power
> fails in the middle of the sector... Or if that is flashcard....

or raid or nbd...

> Just don't do this.

--
Daniel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
                     ` (3 preceding siblings ...)
  2001-11-05 21:08   ` 2.4.14-pre6 Wilson
@ 2001-11-05 21:27   ` Josh Fryman
  2001-11-05 19:04     ` 2.4.14-pre6 Gérard Roudier
  4 siblings, 1 reply; 34+ messages in thread
From: Josh Fryman @ 2001-11-05 21:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: pavel, linux-kernel

> Basically, you get two virtual CPU's per die, and each CPU can run two
> threads at the same time. It slows some stuff down, because it makes for
> much more cache pressure, but Intel claims up to 30% improvement on some
> loads that scale well.
> 
> The 30% is probably a marketing number (ie it might be more like 10% on
> more normal loads), but you have to give them points for interesting
> technology <)

Specifically, the 30% comes in two places.  Using Intel proprietary
benchmarks (unreleased, according to the footnotes) they find that a
typical IA32 instruction mix uses some 35% of system resources in an
advanced device like the P4 with NetBurst.  the rest is idle.

by using the SMT model with two virtual systems - each with complete
register sets and independent APICs, sharing only the backend exec 
units - they claim you get a 30% improvement in wall-clock time.  This 
is supposed to be on their benchmarks *without* recompiling anything.  To 
get "additional" improvement, using code to take advantage of the dual 
virtual CPUs nature of the chip and recompiling should give some 
unquantified gain.

-josh

to help your searching if you want more details, Intel has called this:
  Jackson Technology aka Project Foster aka Hyper-Threading Technology 
  and is known in the rest of the world as SMT.

Intel has a whitepaper or two available for download.  If you can't find
them at developer.intel.com or via Google, let me know and I've got some
copies laying around.  Amusingly, they seem to be ultra scared of
releasing any real information about it. Alpha was working on a 4-way 
design that seemed a bit more clever for the 21464, which appears to be
destined for the bit bucket now :(

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
                     ` (2 preceding siblings ...)
  2001-11-05 21:04   ` 2.4.14-pre6 Johannes Erdfelt
@ 2001-11-05 21:08   ` Wilson
  2001-11-05 21:27   ` 2.4.14-pre6 Josh Fryman
  4 siblings, 0 replies; 34+ messages in thread
From: Wilson @ 2001-11-05 21:08 UTC (permalink / raw)
  To: linux-kernel

----- Original Message -----
From: "Pavel Machek" <pavel@suse.cz>
To: "Linus Torvalds" <torvalds@transmeta.com>
Cc: "Kernel Mailing List" <linux-kernel@vger.kernel.org>
Sent: Friday, November 02, 2001 7:01 AM
Subject: Re: 2.4.14-pre6


> Hi!
>
> > Oh, and the first funny patches for the upcoming SMT P4 cores are
starting
> > to show up. More to come.
>
> What is SMT P4?
>
> > Anybody out there with cerberus?
> >
> > Linus "128MB of RAM and 1GB into swap, and happy" Torvalds
>
> Someone go and steal 64MB from Linus....
>

SMT == Simultaneous Multi-Threading:
http://www.anandtech.com/showdoc.html?i=1525&p=4

They accused us of suppressing freedom of expression.
This was a lie and we could not let them publish it.
-- Nelba Blandon, Nicaraguan Interior Ministry Director of Censorship




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
  2001-11-05 20:43   ` 2.4.14-pre6 Charles Cazabon
  2001-11-05 20:49   ` 2.4.14-pre6 Linus Torvalds
@ 2001-11-05 21:04   ` Johannes Erdfelt
  2001-11-05 21:08   ` 2.4.14-pre6 Wilson
  2001-11-05 21:27   ` 2.4.14-pre6 Josh Fryman
  4 siblings, 0 replies; 34+ messages in thread
From: Johannes Erdfelt @ 2001-11-05 21:04 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Kernel Mailing List

On Fri, Nov 02, 2001, Pavel Machek <pavel@suse.cz> wrote:
> > Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> > to show up. More to come.
> 
> What is SMT P4?

Symmetric Multi Threading IIRC.

Essentially having a virtual dual CPU system on one die where you can
dispatch multiple programs to the differen execution units. For example
you can run a FP intensive program at the same time as an Integer
intensive program.

Nowhere close to true dual CPU performance because of resource
contention on the execution units, but better than single CPU
performance.

JE


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
  2001-11-05 20:43   ` 2.4.14-pre6 Charles Cazabon
@ 2001-11-05 20:49   ` Linus Torvalds
  2001-11-05 21:04   ` 2.4.14-pre6 Johannes Erdfelt
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2001-11-05 20:49 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Kernel Mailing List


On Fri, 2 Nov 2001, Pavel Machek wrote:
>
> > Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> > to show up. More to come.
>
> What is SMT P4?

It's the upcoming symmetric multi-threading on the P4 chips (disabled in
hardware in currently selling stuff, but apparently various Intel contacts
already have chips to test with).

Basically, you get two virtual CPU's per die, and each CPU can run two
threads at the same time. It slows some stuff down, because it makes for
much more cache pressure, but Intel claims up to 30% improvement on some
loads that scale well.

The 30% is probably a marketing number (ie it might be more like 10% on
more normal loads), but you have to give them points for interesting
technology <)

> > Anybody out there with cerberus?
> >
> > 		Linus "128MB of RAM and 1GB into swap, and happy" Torvalds
>
> Someone go and steal 64MB from Linus....

Hey, hey. I actually have spent a _lot_ of time with 40MB of RAM and KDE
over the last few weeks. And this is with DRI on a graphics card that also
seems to eat up 8MB just for the direct rendering stuff, _and_ with kernel
profiling enabled, so it actually had more like 30MB of "real" memory
available. In 1600x1200, 16-bit color.

Konqueror is a pig, but it's _usable_. I did real work, including kernel
compiles, with it.

Admittedly I do like the behaviour with 2GB a lot better. That way I can
cache every kernel tree I work on, and not ever think about "diff" times.

		Linus


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
@ 2001-11-05 20:43   ` Charles Cazabon
  2001-11-05 20:49   ` 2.4.14-pre6 Linus Torvalds
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 34+ messages in thread
From: Charles Cazabon @ 2001-11-05 20:43 UTC (permalink / raw)
  To: Kernel Mailing List

Pavel Machek <pavel@suse.cz> wrote:
> 
> > Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> > to show up. More to come.
> 
> What is SMT P4?

"Jackson" technology-enabled P4 core.  Also known as "hyperthreading".

Charles 
-- 
-----------------------------------------------------------------------
Charles Cazabon                            <linux@discworld.dyndns.org>
GPL'ed software available at:  http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-05 21:27   ` 2.4.14-pre6 Josh Fryman
@ 2001-11-05 19:04     ` Gérard Roudier
  0 siblings, 0 replies; 34+ messages in thread
From: Gérard Roudier @ 2001-11-05 19:04 UTC (permalink / raw)
  To: Josh Fryman; +Cc: Linus Torvalds, pavel, linux-kernel



On Mon, 5 Nov 2001, Josh Fryman wrote:

> > Basically, you get two virtual CPU's per die, and each CPU can run two
> > threads at the same time. It slows some stuff down, because it makes for
> > much more cache pressure, but Intel claims up to 30% improvement on some
> > loads that scale well.
> >
> > The 30% is probably a marketing number (ie it might be more like 10% on
> > more normal loads), but you have to give them points for interesting
> > technology <)
>
> Specifically, the 30% comes in two places.  Using Intel proprietary
> benchmarks (unreleased, according to the footnotes) they find that a
> typical IA32 instruction mix uses some 35% of system resources in an
> advanced device like the P4 with NetBurst.  the rest is idle.
>
> by using the SMT model with two virtual systems - each with complete
> register sets and independent APICs, sharing only the backend exec
> units - they claim you get a 30% improvement in wall-clock time.  This
> is supposed to be on their benchmarks *without* recompiling anything.  To
> get "additional" improvement, using code to take advantage of the dual
> virtual CPUs nature of the chip and recompiling should give some
> unquantified gain.

All things being equal, this probably will make a NxMHz P4 be as fast as a
NxMHz PIII. But the new complexity it may require in real life may just
turn the gain into just nil.

What a great improvement, indeed! :-)

  Gérard.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-03 18:01     ` 2.4.14-pre6 Linus Torvalds
@ 2001-11-03 19:07       ` Mike Galbraith
  0 siblings, 0 replies; 34+ messages in thread
From: Mike Galbraith @ 2001-11-03 19:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jogi, Kernel Mailing List

On Sat, 3 Nov 2001, Linus Torvalds wrote:

> On Sat, 3 Nov 2001, Mike Galbraith wrote:
> >
> > > Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.
> >
> > My box agrees.  Notice pre5aa1/ac IO numbers below.  I'm getting
> > ~good %user/wallclock with pre6/pre7 despite (thrash?) IO numbers.
>
> Well, pre7 gets the second-best numbers, and the reason I really don't
> like pre5aa1 is that since pre4, the virgin kernels have had all mapped
> pages in the LRU queue, and can use that knowledge to decide when to
> start swapping.
>
> So in those kernels, the balance between scanning the VM tables and
> scanning the regular unmapped caches is something that is strictly
> deterministic, which is something I _really_ want to have.
>
> We've had too much trouble with the "let's hope this works" approach.
> Which is why I want the anonymous pages to clearly show up in the
> scanning, and not have them be these virtual ghosts that only show up when
> you start swapping stuff out.
>
> Your array cut down to just the ones that made the benchmark in under 8
> minutes makes it easier to read, and clearly pre6+ seems to be a bit _too_
> swap-happy. I'm trying the "dynamic max_mapped" approach now.

Swap-happy doesn't bother this load too much.  What it's really sensitive
to is pagein.  Turning the cache knobs (vigorously:) in aa-latest...

2.4.14-pre6aa1
real    8m29.484s
user    6m38.650s
sys     0m27.940s

user  :       0:06:45.45  70.6%  page in :   641298
nice  :       0:00:00.00   0.0%  page out:   634494
system:       0:00:41.73   7.3%  swap in :   118869
idle  :       0:02:06.90  22.1%  swap out:   154141

echo 2 > /proc/sys/vm/vm_mapped_ratio
echo 128 > /proc/sys/vm/vm_balance_ratio

real    7m25.069s
user    6m37.390s
sys     0m27.540s

user  :       0:06:43.60  78.7%  page in :   588488
nice  :       0:00:00.00   0.0%  page out:   514865
system:       0:00:40.47   7.9%  swap in :   118738
idle  :       0:01:08.92  13.4%  swap out:   122340

..lowers the sleep time noticibly despite swapin remaining constant.

	-Mike


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-03 12:47   ` 2.4.14-pre6 Mike Galbraith
@ 2001-11-03 18:01     ` Linus Torvalds
  2001-11-03 19:07       ` 2.4.14-pre6 Mike Galbraith
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2001-11-03 18:01 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: jogi, Kernel Mailing List


On Sat, 3 Nov 2001, Mike Galbraith wrote:
>
> > Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.
>
> My box agrees.  Notice pre5aa1/ac IO numbers below.  I'm getting
> ~good %user/wallclock with pre6/pre7 despite (thrash?) IO numbers.

Well, pre7 gets the second-best numbers, and the reason I really don't
like pre5aa1 is that since pre4, the virgin kernels have had all mapped
pages in the LRU queue, and can use that knowledge to decide when to
start swapping.

So in those kernels, the balance between scanning the VM tables and
scanning the regular unmapped caches is something that is strictly
deterministic, which is something I _really_ want to have.

We've had too much trouble with the "let's hope this works" approach.
Which is why I want the anonymous pages to clearly show up in the
scanning, and not have them be these virtual ghosts that only show up when
you start swapping stuff out.

Your array cut down to just the ones that made the benchmark in under 8
minutes makes it easier to read, and clearly pre6+ seems to be a bit _too_
swap-happy. I'm trying the "dynamic max_mapped" approach now.

		Linus


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-11-02 16:48 ` 2.4.14-pre6 jogi
@ 2001-11-03 12:47   ` Mike Galbraith
  2001-11-03 18:01     ` 2.4.14-pre6 Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Mike Galbraith @ 2001-11-03 12:47 UTC (permalink / raw)
  To: jogi; +Cc: Linus Torvalds, Kernel Mailing List

On 2 Nov 2001 jogi@planetzork.ping.de wrote:

> I did my usual kernel compile testings and here are the resuls:
>
>                     j25       j50       j75      j100
>
> 2.4.13-pre5aa1:   5:02.63   5:09.18   5:26.27   5:34.36
> 2.4.13-pre5aa1:   4:58.80   5:12.30   5:26.23   5:32.14
> 2.4.13-pre5aa1:   4:57.66   5:11.29   5:45.90   6:03.53
> 2.4.13-pre5aa1:   4:58.39   5:13.10   5:29.32   5:44.49
> 2.4.13-pre5aa1:   4:57.93   5:09.76   5:24.76   5:26.79
>
> 2.4.14-pre6:      4:58.88   5:16.68   5:45.93   7:16.56
> 2.4.14-pre6:      4:55.72   5:34.65   5:57.94   6:50.58
> 2.4.14-pre6:      4:59.46   5:16.88   6:25.83   6:51.43
> 2.4.14-pre6:      4:56.38   5:18.88   6:15.97   6:31.72
> 2.4.14-pre6:      4:55.79   5:17.47   6:00.23   6:44.85
>
> 2.4.14-pre7:      4:56.39   5:22.84   6:09.05   9:56.59
> 2.4.14-pre7:      4:56.55   5:25.15   7:01.37   7:03.74
> 2.4.14-pre7:      4:59.44   5:15.10   6:06.78   12:51.39*
> 2.4.14-pre7:      4:58.07   5:30.55   6:15.37      *
> 2.4.14-pre7:      4:58.17   5:26.80   6:41.44      *

<snip>

> Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.

My box agrees.  Notice pre5aa1/ac IO numbers below.  I'm getting
~good %user/wallclock with pre6/pre7 despite (thrash?) IO numbers.

	-Mike

fresh boot -> time make -j30 bzImage && procinfo >> /stats

2.4.13-pre2.virgin
real    8m44.484s
user    6m37.800s
sys     0m27.040s

user  :       0:06:44.26  68.4%  page in :   653397
nice  :       0:00:00.00   0.0%  page out:   617078
system:       0:01:22.68  14.0%  swap in :   112202
idle  :       0:01:43.93  17.6%  swap out:   149382

2.4.13-pre2.aa1
real    8m5.204s
user    6m38.590s
sys     0m27.220s

user  :       0:06:44.90  74.8%  page in :   560202
nice  :       0:00:00.00   0.0%  page out:   568467
system:       0:01:09.70  12.9%  swap in :    97083
idle  :       0:01:06.55  12.3%  swap out:   137374

2.4.13-pre5.virgin
real    9m1.709s
user    6m37.310s
sys     0m53.880s

user  :       0:06:44.49  66.1%  page in :   519473
nice  :       0:00:00.00   0.0%  page out:   521926
system:       0:01:51.32  18.2%  swap in :    93794
idle  :       0:01:35.91  15.7%  swap out:   125145

2.4.13-pre5.aa1
real    7m30.261s
user    6m35.930s
sys     0m28.500s

user  :       0:06:42.74  76.8%  page in :   402421
nice  :       0:00:00.00   0.0%  page out:   390429
system:       0:01:21.20  15.5%  swap in :    70652
idle  :       0:00:40.51   7.7%  swap out:    90871

2.4.13.virgin
real    9m13.976s
user    6m36.910s
sys     0m27.510s

user  :       0:06:43.67  64.3%  page in :   523516
nice  :       0:00:00.00   0.0%  page out:   547148
system:       0:00:41.29   6.6%  swap in :    85945
idle  :       0:03:02.39  29.1%  swap out:   131574

2.4.14-pre2.virgin
real    8m0.051s
user    6m34.060s
sys     0m31.020s

user  :       0:06:40.77  72.9%  page in :   425768
nice  :       0:00:00.00   0.0%  page out:   494520
system:       0:00:44.65   8.1%  swap in :    82020
idle  :       0:01:44.23  19.0%  swap out:   117066

2.4.14-pre2.virgin+p2p3
real    8m0.094s
user    6m35.450s
sys     0m29.810s

user  :       0:06:41.38  73.2%  page in :   432894
nice  :       0:00:00.00   0.0%  page out:   483079
system:       0:00:43.71   8.0%  swap in :    82909
idle  :       0:01:42.92  18.8%  swap out:   113578

2.4.14-pre3.virgin
real    8m30.454s
user    6m35.760s
sys     0m29.770s

user  :       0:06:42.40  69.6%  page in :   430062
nice  :       0:00:00.00   0.0%  page out:   610021
system:       0:00:42.29   7.3%  swap in :    84529
idle  :       0:02:13.18  23.0%  swap out:   147283

2.4.14-pre6.virgin
real    7m58.841s
user    6m37.220s
sys     0m30.370s

user  :       0:06:43.37  73.6%  page in :   576081
nice  :       0:00:00.00   0.0%  page out:   704720
system:       0:00:42.87   7.8%  swap in :   120317
idle  :       0:01:41.45  18.5%  swap out:   170619

2.4.14-pre7.virgin
real    7m56.357s
user    6m36.580s
sys     0m30.600s

user  :       0:06:42.88  74.5%  page in :   646265
nice  :       0:00:00.00   0.0%  page out:   704490
system:       0:00:43.11   8.0%  swap in :   136957
idle  :       0:01:34.61  17.5%  swap out:   171134

2.4.14-pre6aa1
real    8m29.484s
user    6m38.650s
sys     0m27.940s

user  :       0:06:45.45  70.6%  page in :   641298
nice  :       0:00:00.00   0.0%  page out:   634494
system:       0:00:41.73   7.3%  swap in :   118869
idle  :       0:02:06.90  22.1%  swap out:   154141

2.4.12-ac1
real    8m12.184s
user    6m35.170s
sys     0m33.630s

user  :       0:06:41.35  71.8%  page in :   402144
nice  :       0:00:00.00   0.0%  page out:   382625
system:       0:01:44.76  18.7%  swap in :    65589
idle  :       0:00:53.25   9.5%  swap out:    89164

2.4.12-ac3
real    8m8.200s
user    6m36.230s
sys     0m32.340s

user  :       0:06:43.05  71.7%  page in :   419527
nice  :       0:00:00.00   0.0%  page out:   385711
system:       0:00:49.29   8.8%  swap in :    70491
idle  :       0:01:49.46  19.5%  swap out:    89771

2.4.13-ac6
real    8m15.366s
user    6m35.710s
sys     0m33.570s

user  :       0:06:42.25  71.6%  page in :   461270
nice  :       0:00:00.00   0.0%  page out:   494015
system:       0:00:49.03   8.7%  swap in :    82114
idle  :       0:01:50.74  19.7%  swap out:   117766


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
                   ` (5 preceding siblings ...)
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
@ 2001-11-02 16:48 ` jogi
  2001-11-03 12:47   ` 2.4.14-pre6 Mike Galbraith
  6 siblings, 1 reply; 34+ messages in thread
From: jogi @ 2001-11-02 16:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, Oct 31, 2001 at 12:00:00AM -0800, Linus Torvalds wrote:
> 
> Incredibly, I didn't get a _single_ bugreport about the fact that I had
> forgotten to change the version number in pre5. Usually that's everybody's
> favourite bug.. Is everybody asleep on the lists?

I noticed but I thought everybody else would complain :-)

[...]

> The MM has calmed down, but the OOM killer didn't use to work. Now it
> does, with heurstics that are so incredibly simple that it's almost
> embarrassing.
> 
> And I dare anybody to break those OOM heuristics - either by not
> triggering when they should, or by triggering too early. You'll get an
> honourable mention if you can break them and tell me how ("Honourable
> mention"? Yeah, I'm cheap. What else is new?)
> 
> In fact, I'd _really_ like to know of any VM loads that show bad
> behaviour. If you have a pet peeve about the VM, now is the time to speak
> up. Because otherwise I think I'm done.

I did my usual kernel compile testings and here are the resuls:

                    j25       j50       j75      j100

2.4.13-pre5aa1:   5:02.63   5:09.18   5:26.27   5:34.36
2.4.13-pre5aa1:   4:58.80   5:12.30   5:26.23   5:32.14
2.4.13-pre5aa1:   4:57.66   5:11.29   5:45.90   6:03.53
2.4.13-pre5aa1:   4:58.39   5:13.10   5:29.32   5:44.49
2.4.13-pre5aa1:   4:57.93   5:09.76   5:24.76   5:26.79

2.4.14-pre6:      4:58.88   5:16.68   5:45.93   7:16.56
2.4.14-pre6:      4:55.72   5:34.65   5:57.94   6:50.58
2.4.14-pre6:      4:59.46   5:16.88   6:25.83   6:51.43
2.4.14-pre6:      4:56.38   5:18.88   6:15.97   6:31.72
2.4.14-pre6:      4:55.79   5:17.47   6:00.23   6:44.85

2.4.14-pre7:      4:56.39   5:22.84   6:09.05   9:56.59
2.4.14-pre7:      4:56.55   5:25.15   7:01.37   7:03.74
2.4.14-pre7:      4:59.44   5:15.10   6:06.78   12:51.39*
2.4.14-pre7:      4:58.07   5:30.55   6:15.37      *
2.4.14-pre7:      4:58.17   5:26.80   6:41.44      *


The last three of the runs of make -j100 with -pre7 failed since some
processes (portmap and cc1) were killed. So the oom killer seems to
kill (in case of portmap) the wrong processes and might trigger a little
too early. I have no data about the swap / mem usage at that time since
the script runs unattended.

Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.

I have not checked about the interactivity issues so this might be a
*feature*.


Regards,

   Jogi

-- 

Well, yeah ... I suppose there's no point in getting greedy, is there?

    << Calvin & Hobbes >>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
                   ` (4 preceding siblings ...)
  2001-11-01 19:14 ` 2.4.14-pre6 Pozsar Balazs
@ 2001-11-02 12:01 ` Pavel Machek
  2001-11-05 20:43   ` 2.4.14-pre6 Charles Cazabon
                     ` (4 more replies)
  2001-11-02 16:48 ` 2.4.14-pre6 jogi
  6 siblings, 5 replies; 34+ messages in thread
From: Pavel Machek @ 2001-11-02 12:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Hi!

> Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> to show up. More to come.

What is SMT P4?

> Anybody out there with cerberus?
> 
> 		Linus "128MB of RAM and 1GB into swap, and happy" Torvalds

Someone go and steal 64MB from Linus....

		Pavel "12MB of RAM and no space left for swap" Machek
			(on my Velo thingie....)
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
                   ` (3 preceding siblings ...)
  2001-10-31 19:52 ` 2.4.14-pre6 Philipp Matthias Hahn
@ 2001-11-01 19:14 ` Pozsar Balazs
  2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
  2001-11-02 16:48 ` 2.4.14-pre6 jogi
  6 siblings, 0 replies; 34+ messages in thread
From: Pozsar Balazs @ 2001-11-01 19:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List


On Wed, 31 Oct 2001, Linus Torvalds wrote:

> Incredibly, I didn't get a _single_ bugreport about the fact that I had
> forgotten to change the version number in pre5. Usually that's everybody's
> favourite bug.. Is everybody asleep on the lists?

You should read lkml :))
Dave Airlie posted a mail with the subject "FYI: 2.4.14-pre5 issues.."
that noticed this bug on okt 30.

-- 
Balazs Pozsar.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 23:40       ` 2.4.14-pre6 Dax Kelson
@ 2001-10-31 23:57         ` Michael Peddemors
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Peddemors @ 2001-10-31 23:57 UTC (permalink / raw)
  To: Dax Kelson; +Cc: Kernel Mailing List

On Wed, 2001-10-31 at 15:40, Dax Kelson wrote:
> On Wed, 31 Oct 2001, Erik Andersen wrote:
> 
> > How about ext3 for 2.4.14?
> 
> Seconded.
> 

As much as I would like to ext3 get in, NOT IN THIS RELEASE please...
Don't put anything else in, until what we got works.. Hit him up on
2.4.15 :)

-- 
"Catch the Magic of Linux..."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
LinuxAdministration - Internet Services
NetworkServices - Programming - Security
Wizard IT Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)589-0037 Beautiful British Columbia, Canada


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 23:18     ` 2.4.14-pre6 Erik Andersen
@ 2001-10-31 23:40       ` Dax Kelson
  2001-10-31 23:57         ` 2.4.14-pre6 Michael Peddemors
  0 siblings, 1 reply; 34+ messages in thread
From: Dax Kelson @ 2001-10-31 23:40 UTC (permalink / raw)
  To: Erik Andersen; +Cc: Linus Torvalds, Kernel Mailing List

On Wed, 31 Oct 2001, Erik Andersen wrote:

> How about ext3 for 2.4.14?

Seconded.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 19:38   ` 2.4.14-pre6 Linus Torvalds
  2001-10-31 19:55     ` 2.4.14-pre6 Mike Castle
  2001-10-31 20:02     ` 2.4.14-pre6 Rik van Riel
@ 2001-10-31 23:18     ` Erik Andersen
  2001-10-31 23:40       ` 2.4.14-pre6 Dax Kelson
  2 siblings, 1 reply; 34+ messages in thread
From: Erik Andersen @ 2001-10-31 23:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed Oct 31, 2001 at 11:38:44AM -0800, Linus Torvalds wrote:
> 
> On 31 Oct 2001, Michael Peddemors wrote:
> >
> > Lets' let this testing cycle go a little longer before making any
> > changes.. Let developers catch up..
> 
> My not-so-cunning plan is actually to try to figure out the big problems
> now, then release a reasonable 2.4.14, and then just stop for a while,
> refusing to take new features.

How about ext3 for 2.4.14?

 -Erik

--
Erik B. Andersen             http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 19:52 ` 2.4.14-pre6 Philipp Matthias Hahn
@ 2001-10-31 21:05   ` H. Peter Anvin
  0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2001-10-31 21:05 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <Pine.LNX.4.33.0110312032110.18881-100000@titan.lahn.de>
By author:    Philipp Matthias Hahn <pmhahn@titan.lahn.de>
In newsgroup: linux.dev.kernel
> 
> > Other changes:
> linux/zlib_fs.h is still missing in you tree and breaks compilation of
> fs/cramfs and other.
> 

I have submitted patches to Linus to make cramfs and zisofs work.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 19:38   ` 2.4.14-pre6 Linus Torvalds
  2001-10-31 19:55     ` 2.4.14-pre6 Mike Castle
@ 2001-10-31 20:02     ` Rik van Riel
  2001-10-31 23:18     ` 2.4.14-pre6 Erik Andersen
  2 siblings, 0 replies; 34+ messages in thread
From: Rik van Riel @ 2001-10-31 20:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Michael Peddemors, Kernel Mailing List

On Wed, 31 Oct 2001, Linus Torvalds wrote:

> (2.5.x will obviously use the new VM regardless, and I actually
> believe that the new VM simply is better. I think that Alan will see
> the light eventually, but at the same time I clearly admit that Alan
> was right on a stability front for the last month or two ;)

Will you document the new VM ?

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 19:38   ` 2.4.14-pre6 Linus Torvalds
@ 2001-10-31 19:55     ` Mike Castle
  2001-10-31 20:02     ` 2.4.14-pre6 Rik van Riel
  2001-10-31 23:18     ` 2.4.14-pre6 Erik Andersen
  2 siblings, 0 replies; 34+ messages in thread
From: Mike Castle @ 2001-10-31 19:55 UTC (permalink / raw)
  To: Kernel Mailing List

On Wed, Oct 31, 2001 at 11:38:44AM -0800, Linus Torvalds wrote:
> Then, 2.4.15 would be the point where I start 2.5.x, and where Alan gets
> to do whatever he wants to do with 2.4.x. Including, of course, just
> reverting all my and Andrea's VM changes ;)

There are a lot of patches applied to -ac that are not in the main line.
If many of those are applied to 2.4.16+, would they also be put into the
2.5.x line early in the process so that they will be fairly synced, plus
give you ample time to feel comfortable with their stability?

Especially patches that did not come directly from the maintainers.

mrc
-- 
     Mike Castle      dalgoda@ix.netcom.com      www.netcom.com/~dalgoda/
    We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
                   ` (2 preceding siblings ...)
  2001-10-31 19:27 ` 2.4.14-pre6 Michael Peddemors
@ 2001-10-31 19:52 ` Philipp Matthias Hahn
  2001-10-31 21:05   ` 2.4.14-pre6 H. Peter Anvin
  2001-11-01 19:14 ` 2.4.14-pre6 Pozsar Balazs
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Philipp Matthias Hahn @ 2001-10-31 19:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 31 Oct 2001, Linus Torvalds wrote:

> Incredibly, I didn't get a _single_ bugreport about the fact that I had
> forgotten to change the version number in pre5. Usually that's everybody's
> favourite bug.. Is everybody asleep on the lists?
Message-ID: <Pine.LNX.4.32.0110302228010.17012-100000@skynet>

> Other changes:
linux/zlib_fs.h is still missing in you tree and breaks compilation of
fs/cramfs and other.

http://marc.theaimsgroup.com/?l=linux-kernel&m=100407670605760&q=raw

BYtE
Philipp
-- 
  / /  (_)__  __ ____  __ Philipp Hahn
 / /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ pmhahn@titan.lahn.de




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31 19:27 ` 2.4.14-pre6 Michael Peddemors
@ 2001-10-31 19:38   ` Linus Torvalds
  2001-10-31 19:55     ` 2.4.14-pre6 Mike Castle
                       ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2001-10-31 19:38 UTC (permalink / raw)
  To: Michael Peddemors; +Cc: Kernel Mailing List


On 31 Oct 2001, Michael Peddemors wrote:
>
> Lets' let this testing cycle go a little longer before making any
> changes.. Let developers catch up..

My not-so-cunning plan is actually to try to figure out the big problems
now, then release a reasonable 2.4.14, and then just stop for a while,
refusing to take new features.

Then, 2.4.15 would be the point where I start 2.5.x, and where Alan gets
to do whatever he wants to do with 2.4.x. Including, of course, just
reverting all my and Andrea's VM changes ;)

I'm personally convinced that my tree does the right thing VM-wise, but
Alan _will_ be the maintainer, and I'm not going to butt in on his
decisions. The last thing I want to be is a micromanaging pointy-haired
boss.

(2.5.x will obviously use the new VM regardless, and I actually believe
that the new VM simply is better. I think that Alan will see the light
eventually, but at the same time I clearly admit that Alan was right on a
stability front for the last month or two ;)

> My own kernel patches I had to stop because I couldn't keep up ....  Can
> we go a full month with you just hitting us over the head with a bat
> yelling 'test, dammit, test', until this is tested fully before
> releasing another production release?

I think we're really close.

[ I'd actually like to thank Gary Sandine from laclinux.com who made the
  "Ultimate Linux Box" for an article by Eric Raymond for Linux Journal.
  They sent me one too, and the 2GB box made it easier to test some real
  highmem loads. This has given me additional load environments to test,
  and made me able to see some of the problems people reported.. ]

But I do want to make a real 2.4.14, not just another "final" pre-kernel,
and let that be the base for a reasonably orderly switch-over at 2.4.15
(ie I'd still release 2.4.15, everything from then on is Alan).

		Linus


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
  2001-10-31  9:10 ` 2.4.14-pre6 Andrew Morton
  2001-10-31  9:30 ` 2.4.14-pre6 bert hubert
@ 2001-10-31 19:27 ` Michael Peddemors
  2001-10-31 19:38   ` 2.4.14-pre6 Linus Torvalds
  2001-10-31 19:52 ` 2.4.14-pre6 Philipp Matthias Hahn
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Michael Peddemors @ 2001-10-31 19:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Lets' let this testing cycle go a little longer before making any
changes.. Let developers catch up..
My own kernel patches I had to stop because I couldn't keep up ....  Can
we go a full month with you just hitting us over the head with a bat
yelling 'test, dammit, test', until this is tested fully before
releasing another production release?

I would like to get a chance to test this one on more than one hardware
platform :) And I want to test it under production loads as well..


On Wed, 2001-10-31 at 00:00, Linus Torvalds wrote:
> 

> In fact, I'd _really_ like to know of any VM loads that show bad
> behaviour. If you have a pet peeve about the VM, now is the time to speak
> up. Because otherwise I think I'm done.
> 
> Anybody out there with cerberus?
> 
> 		Linus "128MB of RAM and 1GB into swap, and happy" Torvalds
> 
-- 
"Catch the Magic of Linux..."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
LinuxAdministration - Internet Services
NetworkServices - Programming - Security
Wizard IT Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)589-0037 Beautiful British Columbia, Canada


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
  2001-10-31  9:10 ` 2.4.14-pre6 Andrew Morton
@ 2001-10-31  9:30 ` bert hubert
  2001-10-31 19:27 ` 2.4.14-pre6 Michael Peddemors
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: bert hubert @ 2001-10-31  9:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, Oct 31, 2001 at 12:00:00AM -0800, Linus Torvalds wrote:


> In fact, I'd _really_ like to know of any VM loads that show bad
> behaviour. If you have a pet peeve about the VM, now is the time to speak
> up. Because otherwise I think I'm done.

The Google case comes to mind. And we should be good for google!

Regards,

bert

-- 
http://www.PowerDNS.com          Versatile DNS Software & Services
Trilab                                 The Technology People
Netherlabs BV / Rent-a-Nerd.nl           - Nerd Available -
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  9:10 ` 2.4.14-pre6 Andrew Morton
@ 2001-10-31  9:29   ` Jens Axboe
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Axboe @ 2001-10-31  9:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, Kernel Mailing List

On Wed, Oct 31 2001, Andrew Morton wrote:
> Linus Torvalds wrote:
> > 
> > If you have a pet peeve about the VM, now is the time to speak
> > up.
> >
> 
> I'm peeved by the request queue changes.

I was too. However it didn't seem to make too much of a difference in
real life, I guess your test cases shows a bit differently.

> Appended here is a program which creates 100,000 small files.
> Using ext2 on -pre5.  We see how long it takes to run
> 
> 	(make-many-files ; sync)
> 
> For several values of queue_nr_requests:
> 
> queue_nr_requests:	128	8192	32768
> execution time:		4:43	3:25	3:20
> 
> Almost all of the execution time is in the `sync'.
> 
> This is on a disk with a 2 meg cache which does pretty aggressive
> write-behind.  I expect the difference would be worse with a disk
> which doesn't help so much.
> 
> By restricting the number of requests in flight to 128 we're
> giving new requests only a very small chance of getting merged with
> an existing request.  More seeking.
> 
> OK, not an interesting workload.  But I suspect that there are real
> workloads which will be bitten by this.
> 
> Why is the queue length so tiny now?  Latency?  If so, couldn't this
> be addressed by giving reads higher priority versus writes?

Should be possible. Try for yourself. When you do your 100,000 small
file tes with 8k or more of requests, how is interactive feel of other
programs accessing the same spindle? Play around with the READ and WRITE
intial elevator sequence numbers, repeat :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.4.14-pre6
  2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
@ 2001-10-31  9:10 ` Andrew Morton
  2001-10-31  9:29   ` 2.4.14-pre6 Jens Axboe
  2001-10-31  9:30 ` 2.4.14-pre6 bert hubert
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Andrew Morton @ 2001-10-31  9:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Linus Torvalds wrote:
> 
> If you have a pet peeve about the VM, now is the time to speak
> up.
>

I'm peeved by the request queue changes.

Appended here is a program which creates 100,000 small files.
Using ext2 on -pre5.  We see how long it takes to run

	(make-many-files ; sync)

For several values of queue_nr_requests:

queue_nr_requests:	128	8192	32768
execution time:		4:43	3:25	3:20

Almost all of the execution time is in the `sync'.

This is on a disk with a 2 meg cache which does pretty aggressive
write-behind.  I expect the difference would be worse with a disk
which doesn't help so much.

By restricting the number of requests in flight to 128 we're
giving new requests only a very small chance of getting merged with
an existing request.  More seeking.

OK, not an interesting workload.  But I suspect that there are real
workloads which will be bitten by this.

Why is the queue length so tiny now?  Latency?  If so, couldn't this
be addressed by giving reads higher priority versus writes?



#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>

static void doit(char *name)
{
	static char stuff[4096];
	int fd;

	fd = creat(name, 0666);
	if (fd < 0) {
		perror(name);
		exit(1);
	}
	write(fd, stuff, sizeof(stuff));
	close(fd);
}

int main(void)
{
	int i, j, k, l, m;
	char buf[100];

	for (i = 0; i < 10; i++) {
		sprintf(buf, "%d", i);
		mkdir(buf, 0777);
		for (j = 0; j < 10; j++) {
			sprintf(buf, "%d/%d", i, j);
			mkdir(buf, 0777);
			printf("%s\n", buf);
			for (k = 0; k < 10; k++) {
				sprintf(buf, "%d/%d/%d", i, j, k);
				mkdir(buf, 0777);
				for (l = 0; l < 10; l++) {
					sprintf(buf, "%d/%d/%d/%d", i, j, k, l);
					mkdir(buf, 0777);
					for (m = 0; m < 10; m++) {
						sprintf(buf, "%d/%d/%d/%d/%d", i, j, k, l, m);
						doit(buf);
					}
				}
			}
		}
	}
	exit(0);
}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* 2.4.14-pre6
@ 2001-10-31  8:00 Linus Torvalds
  2001-10-31  9:10 ` 2.4.14-pre6 Andrew Morton
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Linus Torvalds @ 2001-10-31  8:00 UTC (permalink / raw)
  To: Kernel Mailing List


Incredibly, I didn't get a _single_ bugreport about the fact that I had
forgotten to change the version number in pre5. Usually that's everybody's
favourite bug.. Is everybody asleep on the lists?

Anyway, pre6 is out, and now it's too late. I updated the version number.

Other changes:

Bulk of pre5->pre6 differences: the sparc and net updates from David.

Oh, and the first funny patches for the upcoming SMT P4 cores are starting
to show up. More to come.

The MM has calmed down, but the OOM killer didn't use to work. Now it
does, with heurstics that are so incredibly simple that it's almost
embarrassing.

And I dare anybody to break those OOM heuristics - either by not
triggering when they should, or by triggering too early. You'll get an
honourable mention if you can break them and tell me how ("Honourable
mention"? Yeah, I'm cheap. What else is new?)

In fact, I'd _really_ like to know of any VM loads that show bad
behaviour. If you have a pet peeve about the VM, now is the time to speak
up. Because otherwise I think I'm done.

Anybody out there with cerberus?

		Linus "128MB of RAM and 1GB into swap, and happy" Torvalds

----

pre6:
 - me: remember to bump the version number ;)
 - Hugh Dickins: export "free_lru_page()" for modules
 - Jeff Garzik: don't change nopage arguments, just make the last a dummy one
 - David Miller: sparc and net updates (netfilter, VLAN etc)
 - Nikita Danilov: reiserfs cleanups
 - Jan Kara: quota initialization race
 - Tigran Aivazian: make the x86 microcode update driver happy about
   hyperthreaded P4's
 - me: shrink dcache/icache more aggressively
 - me: fix up oom-killer so that it actually works

pre5:
 - Andrew Morton: remove stale UnlockPage
 - me: swap cache page locking update

pre4:
 - Mikael Pettersson: fix P4 boot with APIC enabled
 - me: fix device queuing thinko, clean up VM locking

pre3:
 - René Scharfe: random bugfix
 - me: block device queuing low-water-marks, VM mapped tweaking.

pre2:
 - Alan Cox: more merging
 - Alexander Viro: block device module race fixes
 - Richard Henderson: mmap for 32-bit alpha personality
 - Jeff Garzik: 8139 and natsemi update

pre1:
 - Michael Warfield: computone serial driver update
 - Alexander Viro: cdrom module race fixes
 - David Miller: Acenic driver fix
 - Andrew Grover: ACPI update
 - Kai Germaschewski: ISDN update
 - Tim Waugh: parport update
 - David Woodhouse: JFFS garbage collect sleep


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2001-11-05 21:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-31 16:15 2.4.14-pre6 Linus Torvalds
2001-10-31 18:36 ` 2.4.14-pre6 Andrew Morton
2001-10-31 19:06   ` 2.4.14-pre6 Linus Torvalds
2001-11-01 10:20 ` 2.4.14-pre6 Neil Brown
2001-11-01 20:55   ` 2.4.14-pre6 Andrew Morton
2001-11-02  8:00     ` 2.4.14-pre6 Helge Hafting
2001-11-04 22:34     ` 2.4.14-pre6 Pavel Machek
2001-11-04 23:16       ` 2.4.14-pre6 Daniel Phillips
2001-11-01 21:28   ` 2.4.14-pre6 Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2001-10-31  8:00 2.4.14-pre6 Linus Torvalds
2001-10-31  9:10 ` 2.4.14-pre6 Andrew Morton
2001-10-31  9:29   ` 2.4.14-pre6 Jens Axboe
2001-10-31  9:30 ` 2.4.14-pre6 bert hubert
2001-10-31 19:27 ` 2.4.14-pre6 Michael Peddemors
2001-10-31 19:38   ` 2.4.14-pre6 Linus Torvalds
2001-10-31 19:55     ` 2.4.14-pre6 Mike Castle
2001-10-31 20:02     ` 2.4.14-pre6 Rik van Riel
2001-10-31 23:18     ` 2.4.14-pre6 Erik Andersen
2001-10-31 23:40       ` 2.4.14-pre6 Dax Kelson
2001-10-31 23:57         ` 2.4.14-pre6 Michael Peddemors
2001-10-31 19:52 ` 2.4.14-pre6 Philipp Matthias Hahn
2001-10-31 21:05   ` 2.4.14-pre6 H. Peter Anvin
2001-11-01 19:14 ` 2.4.14-pre6 Pozsar Balazs
2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
2001-11-05 20:43   ` 2.4.14-pre6 Charles Cazabon
2001-11-05 20:49   ` 2.4.14-pre6 Linus Torvalds
2001-11-05 21:04   ` 2.4.14-pre6 Johannes Erdfelt
2001-11-05 21:08   ` 2.4.14-pre6 Wilson
2001-11-05 21:27   ` 2.4.14-pre6 Josh Fryman
2001-11-05 19:04     ` 2.4.14-pre6 Gérard Roudier
2001-11-02 16:48 ` 2.4.14-pre6 jogi
2001-11-03 12:47   ` 2.4.14-pre6 Mike Galbraith
2001-11-03 18:01     ` 2.4.14-pre6 Linus Torvalds
2001-11-03 19:07       ` 2.4.14-pre6 Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).