linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] VM: I have a dream...
@ 2006-01-21 18:08 Al Boldi
  2006-01-21 18:42 ` Jamie Lokier
                   ` (4 more replies)
  0 siblings, 5 replies; 75+ messages in thread
From: Al Boldi @ 2006-01-21 18:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

A long time ago, when i was a kid, I had dream. It went like this:

I am waking up in the twenty-first century and start my computer.
After completing the boot sequence, I start top to find that my memory is 
equal to total disk-capacity.  What's more, there is no more swap.
Apps are executed inplace, as if already loaded.
Physical RAM is used to cache slower storage RAM, much the same as the CPU 
cache RAM caches slower physical RAM.

When I woke up, I was really looking forward for the new century.

Sadly, the current way of dealing with memory can at best only be described 
as schizophrenic.  Again the reason being, that we are still running in the 
last-century mode.

Wouldn't it be nice to take advantage of todays 64bit archs and TB drives, 
and run a more modern way of life w/o this memory/storage split personality?

All comments, other than "dream on", are most welcome!

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-21 18:08 [RFC] VM: I have a dream Al Boldi
@ 2006-01-21 18:42 ` Jamie Lokier
  2006-01-21 18:46 ` Avi Kivity
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 75+ messages in thread
From: Jamie Lokier @ 2006-01-21 18:42 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel

Al Boldi wrote:
> Apps are executed inplace, as if already loaded.
> Physical RAM is used to cache slower storage RAM, much the same as the CPU 
> cache RAM caches slower physical RAM.

Linux and most other OSes have done that since.. oh, 20 years at least?

It's called "demand paging".  The RAM is simply a cache of the
executable file on disk.  The complicated-looking page fault mechanism
that you see is simply the cache management logic.  In what way is
your vision different from demand paging?

> my memory is equal to total disk-capacity.  What's more, there is no
> more swap.  [...]  Physical RAM is used to cache slower storage RAM,
> much the same as the CPU cache RAM caches slower physical RAM.

Windows has had that since, oh, Windows 95?

It's called "on-demand swap space", or making all the disk's free
space be usable for paging.  The physical RAM is simply a cache of the
virtual "storage RAM".  In what way is your vision different from
on-demand swap?

> Sadly, the current way of dealing with memory can at best only be described 
> as schizophrenic.  Again the reason being, that we are still running in the 
> last-century mode.
>
> Wouldn't it be nice to take advantage of todays 64bit archs and TB drives, 
> and run a more modern way of life w/o this memory/storage split personality?

In what way does your vision _behave_ any differently than what we have?

In my mind, "physical RAM is used to cache slower storage RAM" behaves
the same as demand paging, even if the terminology is different.  The
code I guess you're referring to in the kernel, to handle paging to
storage, is simply one kernel's method of implementing that kind of cache.

It's not clear from anything you said how the computer in your dream
would behave any differently to the ones we've got now.

Can you describe that difference, if there is one?

Is it just an implementation idea, where the kernel does less of the
page caching logic and some bit of hardware does more of it
automatically?  Given how little time is taken in kernel to do that,
and how complex the logic has to be for efficient caching decisions
between RAM and storage, it seems likely that any simple hardware
solution would behave the same, but slower.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-21 18:08 [RFC] VM: I have a dream Al Boldi
  2006-01-21 18:42 ` Jamie Lokier
@ 2006-01-21 18:46 ` Avi Kivity
  2006-01-23 19:52   ` Bryan Henderson
  2006-01-22  8:16 ` Pavel Machek
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 75+ messages in thread
From: Avi Kivity @ 2006-01-21 18:46 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel

Al Boldi wrote:

>A long time ago, when i was a kid, I had dream. It went like this:
>
>I am waking up in the twenty-first century and start my computer.
>After completing the boot sequence, I start top to find that my memory is 
>equal to total disk-capacity.  What's more, there is no more swap.
>Apps are executed inplace, as if already loaded.
>Physical RAM is used to cache slower storage RAM, much the same as the CPU 
>cache RAM caches slower physical RAM.
>
>  
>
I'm sure you can find a 4GB disk on ebay.

>When I woke up, I was really looking forward for the new century.
>
>Sadly, the current way of dealing with memory can at best only be described 
>as schizophrenic.  Again the reason being, that we are still running in the 
>last-century mode.
>
>Wouldn't it be nice to take advantage of todays 64bit archs and TB drives, 
>and run a more modern way of life w/o this memory/storage split personality?
>  
>
Perhaps you'd be interested in single-level store architectures, where 
no distinction is made between memory and storage. IBM uses it in one 
(or maybe more) of their systems. A particularly interesting example is 
http://www.eros-os.org.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-21 18:08 [RFC] VM: I have a dream Al Boldi
  2006-01-21 18:42 ` Jamie Lokier
  2006-01-21 18:46 ` Avi Kivity
@ 2006-01-22  8:16 ` Pavel Machek
  2006-01-22 12:33 ` Robin Holt
  2006-01-22 19:55 ` Barry K. Nathan
  4 siblings, 0 replies; 75+ messages in thread
From: Pavel Machek @ 2006-01-22  8:16 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel

On So 21-01-06 21:08:41, Al Boldi wrote:
> A long time ago, when i was a kid, I had dream. It went like this:
> 
> I am waking up in the twenty-first century and start my computer.
> After completing the boot sequence, I start top to find that my memory is 
> equal to total disk-capacity.  What's more, there is no more swap.
> Apps are executed inplace, as if already loaded.
> Physical RAM is used to cache slower storage RAM, much the same as the CPU 
> cache RAM caches slower physical RAM.

...and then you try to execute mozilla in place, and your dream slowly
turns into nightmare, as letters start to appear, pixel by pixel...

[swap is backing store for anonymous memory. Think about it. You need
swap as long as you support malloc. You could always provide filename
with malloc, but hey, that starts to look like IBM mainframe. Plus
ability to powercycle the machine and *have* it boot (not continue
where it left) is lifesaver.]
								Pavel

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-21 18:08 [RFC] VM: I have a dream Al Boldi
                   ` (2 preceding siblings ...)
  2006-01-22  8:16 ` Pavel Machek
@ 2006-01-22 12:33 ` Robin Holt
  2006-01-23 18:03   ` Al Boldi
  2006-01-22 19:55 ` Barry K. Nathan
  4 siblings, 1 reply; 75+ messages in thread
From: Robin Holt @ 2006-01-22 12:33 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel

On Sat, Jan 21, 2006 at 09:08:41PM +0300, Al Boldi wrote:
> A long time ago, when i was a kid, I had dream. It went like this:
> 
> I am waking up in the twenty-first century and start my computer.
> After completing the boot sequence, I start top to find that my memory is 
> equal to total disk-capacity.  What's more, there is no more swap.
> Apps are executed inplace, as if already loaded.
> Physical RAM is used to cache slower storage RAM, much the same as the CPU 
> cache RAM caches slower physical RAM.
> 
> When I woke up, I was really looking forward for the new century.
> 
> Sadly, the current way of dealing with memory can at best only be described 
> as schizophrenic.  Again the reason being, that we are still running in the 
> last-century mode.
> 
> Wouldn't it be nice to take advantage of todays 64bit archs and TB drives, 
> and run a more modern way of life w/o this memory/storage split personality?
> 
> All comments, other than "dream on", are most welcome!

Unfortunately, with Linux/Unix you are only going to get a "dream on".
Look at IBM's AS/400 with OS/400.  It does that sort of thing.  I am
sure there are others.

How do you handle a hot-plug device that is being brought online for
something like a backup?  Assume it would the be removed when the backup
is complete.  In your dream world, anon-pages could be written to the
device.  As the backup proceeds and the disk fills, would those pages
be migrated to a different backing store, any processes which wrote
to the device killed, or would the backup be forced to abort/move to a
different device?  When the administrator then goes to remove the volume,
do they need to wait for all those pages to be migrated to a different
backing store?

Now assume a flash device gets added to the system.  Would we allow
paging to happen to that device?  If so, how does the administrator or
user control the added wear-and-tear on the device?

Now consider a system that has 8K drives (don't laugh, I know of one
system with more).  How do we keep track of the pages on those devices?
Right now, there is enough information in the space of a pte (64 bits
on ia64, not sure about other archs) to locate that page on the device.
For your proposal to work, we need a better way to track pages when they
are on backing store.  It would need to be nearly unlimited in number
of devices and device offset.

On one large machine I know of, the MTBF for at least one of the drives
in the system is around 3 hours.  With your proposal, would we reboot
every time some part of disk fails to be available or would we need
to keep track of where those pages are and kill all user processes on
that device?  Imagine the amount of kernel memory required to track all
those pages of anonymous memory.  You would end up with a situation where
adding a disk to the system would force you to consume some substatial
portion of kernel memory.  Would we allow kernel pages to be migrated to
backing store as well?  If so, how would we handle a failure of backing
devices with kernel pages.  Would users accept that the longest one of
their jobs can run without being terminated is around 3 hours?

Your simple world introduces a level of complexity to the kernel which
is nearly unmanageable.  Basically, you are asking the system to intuit
your desires.  The swap device/file scheme allows an administrator to
control some aspects of their system while giving the kernel developer
a reasonable number of variables to work with.  That, at least to me,
does not sound schizophrenic, but rather very reasonable.

Sorry for raining on your parade,
Robin Holt

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-21 18:08 [RFC] VM: I have a dream Al Boldi
                   ` (3 preceding siblings ...)
  2006-01-22 12:33 ` Robin Holt
@ 2006-01-22 19:55 ` Barry K. Nathan
  2006-01-23  5:23   ` Michael Loftis
  4 siblings, 1 reply; 75+ messages in thread
From: Barry K. Nathan @ 2006-01-22 19:55 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel

On 1/21/06, Al Boldi <a1426z@gawab.com> wrote:
> A long time ago, when i was a kid, I had dream. It went like this:
[snip]

FWIW, Mac OS X is one step closer to your vision than the typical
Linux distribution: It has a directory for swapfiles -- /var/vm -- and
it creates new swapfiles there as needed. (It used to be that each
swapfile would be 80MB, but the iMac next to me just has a single 64MB
swapfile, so maybe Mac OS 10.4 does something different now.)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-22 19:55 ` Barry K. Nathan
@ 2006-01-23  5:23   ` Michael Loftis
  2006-01-23  5:46     ` Chase Venters
  2006-01-23 15:05     ` Ram Gupta
  0 siblings, 2 replies; 75+ messages in thread
From: Michael Loftis @ 2006-01-23  5:23 UTC (permalink / raw)
  To: Barry K. Nathan, Al Boldi; +Cc: linux-kernel, linux-fsdevel



--On January 22, 2006 11:55:37 AM -0800 "Barry K. Nathan" 
<barryn@pobox.com> wrote:

> On 1/21/06, Al Boldi <a1426z@gawab.com> wrote:
>> A long time ago, when i was a kid, I had dream. It went like this:
> [snip]
>
> FWIW, Mac OS X is one step closer to your vision than the typical
> Linux distribution: It has a directory for swapfiles -- /var/vm -- and
> it creates new swapfiles there as needed. (It used to be that each
> swapfile would be 80MB, but the iMac next to me just has a single 64MB
> swapfile, so maybe Mac OS 10.4 does something different now.)
/var/vm/swap*
 64M    swapfile0
 64M    swapfile1
128M    swapfile2
256M    swapfile3
512M    swapfile4
512M    swapfile5
1.5G    total

However only the first 5 are in use.  the 6th just represents the peak swap 
usage on this machine.  This is on 10.4.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23  5:23   ` Michael Loftis
@ 2006-01-23  5:46     ` Chase Venters
  2006-01-23  8:20       ` Barry K. Nathan
  2006-01-23 13:17       ` Jamie Lokier
  2006-01-23 15:05     ` Ram Gupta
  1 sibling, 2 replies; 75+ messages in thread
From: Chase Venters @ 2006-01-23  5:46 UTC (permalink / raw)
  To: Michael Loftis; +Cc: Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

On Sunday 22 January 2006 23:23, Michael Loftis wrote:
> --On January 22, 2006 11:55:37 AM -0800 "Barry K. Nathan"
>
> <barryn@pobox.com> wrote:
> > On 1/21/06, Al Boldi <a1426z@gawab.com> wrote:
> >> A long time ago, when i was a kid, I had dream. It went like this:
> >
> > [snip]
> >
> > FWIW, Mac OS X is one step closer to your vision than the typical
> > Linux distribution: It has a directory for swapfiles -- /var/vm -- and
> > it creates new swapfiles there as needed. (It used to be that each
> > swapfile would be 80MB, but the iMac next to me just has a single 64MB
> > swapfile, so maybe Mac OS 10.4 does something different now.)

Just as a curiosity... does anyone have any guesses as to the runtime 
performance cost of hosting one or more swap files (which thanks to on demand 
creation and growth are presumably built of blocks scattered around the disk) 
versus having one or more simple contiguous swap partitions?

I think it's probably a given that swap partitions are better; I'm just 
curious how much better they might actually be.

Cheers,
Chase

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23  5:46     ` Chase Venters
@ 2006-01-23  8:20       ` Barry K. Nathan
  2006-01-23 13:17       ` Jamie Lokier
  1 sibling, 0 replies; 75+ messages in thread
From: Barry K. Nathan @ 2006-01-23  8:20 UTC (permalink / raw)
  To: Chase Venters; +Cc: Michael Loftis, Al Boldi, linux-kernel, linux-fsdevel

On 1/22/06, Chase Venters <chase.venters@clientec.com> wrote:
> Just as a curiosity... does anyone have any guesses as to the runtime
> performance cost of hosting one or more swap files (which thanks to on demand
> creation and growth are presumably built of blocks scattered around the disk)
> versus having one or more simple contiguous swap partitions?
>
> I think it's probably a given that swap partitions are better; I'm just
> curious how much better they might actually be.

If you google "mac os x swap partition", you'll find benchmarks from
several years ago. (Although, those benchmarks are with a partition
dedicated to the dynamically created swap files. It does more or less
ensure that the files are contiguous though.) Mac OS X was *much* more
of a dog back then, in terms of performance, so I don't know how
relevant those benchmarks are nowadays, but it might be a starting
point for answering your question.

--
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23  5:46     ` Chase Venters
  2006-01-23  8:20       ` Barry K. Nathan
@ 2006-01-23 13:17       ` Jamie Lokier
  2006-01-23 20:21         ` Peter Chubb
  1 sibling, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2006-01-23 13:17 UTC (permalink / raw)
  To: Chase Venters
  Cc: Michael Loftis, Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

Chase Venters wrote:
> Just as a curiosity... does anyone have any guesses as to the
> runtime performance cost of hosting one or more swap files (which
> thanks to on demand creation and growth are presumably built of
> blocks scattered around the disk) versus having one or more simple
> contiguous swap partitions?

> I think it's probably a given that swap partitions are better; I'm just 
> curious how much better they might actually be.

When programs must access files in addition to swapping, and that
includes demand-paged executable files, swap files have the
_potential_ to be faster because they provide opportunities to use the
disk nearer the files which are being accessed.  This is more so is
all the filesystem's free space is available for swapping.  A swap
partition in this scenario forces the disk head to move back and forth
between the swap partition and the filesystem.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23  5:23   ` Michael Loftis
  2006-01-23  5:46     ` Chase Venters
@ 2006-01-23 15:05     ` Ram Gupta
  2006-01-23 15:26       ` Diego Calleja
  2006-01-23 20:43       ` Michael Loftis
  1 sibling, 2 replies; 75+ messages in thread
From: Ram Gupta @ 2006-01-23 15:05 UTC (permalink / raw)
  To: Michael Loftis; +Cc: Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

On 1/22/06, Michael Loftis <mloftis@wgops.com> wrote:
>
> > FWIW, Mac OS X is one step closer to your vision than the typical
> > Linux distribution: It has a directory for swapfiles -- /var/vm -- and
> > it creates new swapfiles there as needed. (It used to be that each
> > swapfile would be 80MB, but the iMac next to me just has a single 64MB
> > swapfile, so maybe Mac OS 10.4 does something different now.)
> /var/vm/swap*
>  64M    swapfile0
>  64M    swapfile1
> 128M    swapfile2
> 256M    swapfile3
> 512M    swapfile4
> 512M    swapfile5
> 1.5G    total
>

Linux also supports multiple swap files . But these are more
beneficial if there are more than one disk in the system so that i/o
can be done in parallel. These swap files may be activated at run time
based on some criteria.

Regards
Ram Gupta

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 15:05     ` Ram Gupta
@ 2006-01-23 15:26       ` Diego Calleja
  2006-01-23 16:11         ` linux-os (Dick Johnson)
  2006-01-25 22:27         ` Nix
  2006-01-23 20:43       ` Michael Loftis
  1 sibling, 2 replies; 75+ messages in thread
From: Diego Calleja @ 2006-01-23 15:26 UTC (permalink / raw)
  To: Ram Gupta; +Cc: mloftis, barryn, a1426z, linux-kernel, linux-fsdevel

El Mon, 23 Jan 2006 09:05:41 -0600,
Ram Gupta <ram.gupta5@gmail.com> escribió:

> Linux also supports multiple swap files . But these are more

There're in fact a "dynamic swap" tool which apparently
does what mac os x do: http://dynswapd.sourceforge.net/

However, I doubt the approach is really useful. If you need that much
swap space, you're going well beyond the capabilities of the machine.
In fact, I bet that most of the cases of machines needing too much
memory will be because of bugs in the programs and OOM'ing would be
a better solution.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 15:26       ` Diego Calleja
@ 2006-01-23 16:11         ` linux-os (Dick Johnson)
  2006-01-23 16:50           ` Jamie Lokier
                             ` (2 more replies)
  2006-01-25 22:27         ` Nix
  1 sibling, 3 replies; 75+ messages in thread
From: linux-os (Dick Johnson) @ 2006-01-23 16:11 UTC (permalink / raw)
  To: Diego Calleja
  Cc: Ram Gupta, mloftis, barryn, a1426z, linux-kernel, linux-fsdevel


On Mon, 23 Jan 2006, Diego Calleja wrote:

> El Mon, 23 Jan 2006 09:05:41 -0600,
> Ram Gupta <ram.gupta5@gmail.com> escribió:
>
>> Linux also supports multiple swap files . But these are more
>
> There're in fact a "dynamic swap" tool which apparently
> does what mac os x do: http://dynswapd.sourceforge.net/
>
> However, I doubt the approach is really useful. If you need that much
> swap space, you're going well beyond the capabilities of the machine.
> In fact, I bet that most of the cases of machines needing too much
> memory will be because of bugs in the programs and OOM'ing would be
> a better solution.

You have roughly 2 GB of dynamic address-space avaliable to each
task (stuff that's not the kernel and not the runtime libraries).
You can easily have 500 tasks, even RedHat out-of-the-box creates
about 60 tasks. That's 1,000 GB of potential swap-space required
to support this. This is not beyond the capabilites of a 32-bit
machine with a fast front-side bus and fast I/O (like wide SCSI).
Some persons tend to forget that 32-bit address space is available
to every user, some is shared, some is not. A reasonable rule-of-
thumb is to provide enough swap-space to duplicate the address-
space of every potential task.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.54 BogoMips).
Warning : 98.36% of all statistics are fiction.

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 16:11         ` linux-os (Dick Johnson)
@ 2006-01-23 16:50           ` Jamie Lokier
  2006-01-24  2:08           ` Horst von Brand
  2006-01-24  2:10           ` Horst von Brand
  2 siblings, 0 replies; 75+ messages in thread
From: Jamie Lokier @ 2006-01-23 16:50 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

linux-os (Dick Johnson) wrote:
> On Mon, 23 Jan 2006, Diego Calleja wrote:
> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> > In fact, I bet that most of the cases of machines needing too much
> > memory will be because of bugs in the programs and OOM'ing would be
> > a better solution.
> 
> You have roughly 2 GB of dynamic address-space avaliable to each
> task (stuff that's not the kernel and not the runtime libraries).
> You can easily have 500 tasks, even RedHat out-of-the-box creates
> about 60 tasks. That's 1,000 GB of potential swap-space required
> to support this.

And how many machines is it useful to use that much swap-space on?

> This is not beyond the capabilites of a 32-bit
> machine with a fast front-side bus and fast I/O (like wide SCSI).

Anything but the most expensively RAM-equipped machine would be stuck
in a useless swap-storm, if it's got 1000GB of GB of active swap space
and only a relatively tiny amount of physical RAM (e.g. 16GB).  The
same is true if only, say, 10% of the swap space is in active use.

Wide SCSI isn't fast enough to make that useful.

I think that was the point Diego was making: you can use that much
swap space, but by the time you do, whatever task you hoped to
accomplish won't get anywhere due to the swap-storm.

> Some persons tend to forget that 32-bit address space is available
> to every user, some is shared, some is not. A reasonable rule-of-
> thumb is to provide enough swap-space to duplicate the address-
> space of every potential task.

I think that's a ridiculous rule of thumb.  Not least because (a) even
the biggest drive available (e.g. 1TB) doesn't provide that much
swap-space, and (b) if you're actively using only a tiny fraction of
that, your machine has already become uselessly slow - even root
logins and command prompts don't work under those conditions.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-22 12:33 ` Robin Holt
@ 2006-01-23 18:03   ` Al Boldi
  2006-01-23 18:40     ` Valdis.Kletnieks
  2006-01-23 22:26     ` Pavel Machek
  0 siblings, 2 replies; 75+ messages in thread
From: Al Boldi @ 2006-01-23 18:03 UTC (permalink / raw)
  To: Robin Holt; +Cc: linux-kernel, linux-fsdevel

Robin Holt wrote:
> On Sat, Jan 21, 2006 at 09:08:41PM +0300, Al Boldi wrote:
> >
> > Wouldn't it be nice to take advantage of todays 64bit archs and TB
> > drives, and run a more modern way of life w/o this memory/storage split
> > personality?
>
> Your simple world introduces a level of complexity to the kernel which
> is nearly unmanageable.  Basically, you are asking the system to intuit
> your desires.  The swap device/file scheme allows an administrator to
> control some aspects of their system while giving the kernel developer
> a reasonable number of variables to work with.  That, at least to me,
> does not sound schizophrenic, but rather very reasonable.
>
> Sorry for raining on your parade,

Thanks for your detailed response, it rather felt like a fresh breeze.

Really, I was more thinking about a step by step rather than an all or none 
approach.  Something that would involve tmpfs merged with swap mapped into 
linear address space limited by arch bits, and everything else connected as 
archive.

The idea here is to run inside swap instead of using it as an addon.
In effect running inside memory cached by physical RAM.

Wouldn't something like this at least represent a simple starting point?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 18:03   ` Al Boldi
@ 2006-01-23 18:40     ` Valdis.Kletnieks
  2006-01-23 19:26       ` Benjamin LaHaise
  2006-01-23 22:26     ` Pavel Machek
  1 sibling, 1 reply; 75+ messages in thread
From: Valdis.Kletnieks @ 2006-01-23 18:40 UTC (permalink / raw)
  To: Al Boldi; +Cc: Robin Holt, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]

On Mon, 23 Jan 2006 21:03:06 +0300, Al Boldi said:

> The idea here is to run inside swap instead of using it as an addon.
> In effect running inside memory cached by physical RAM.
> 
> Wouldn't something like this at least represent a simple starting point?

We *already* treat RAM as a cache for the swap space and other backing store
(for instance, paging in executable code from a file), if you're looking at
it from the 30,000 foot fly-over...

However, it quickly digresses from a "simple starting point" when you try to
get decent performance out of it, even when people are doing things that tend
to make your algorithm fold up.  A machine with a gigabyte of memory has on the
order of a quarter million 4K pages - which page are you going to move out to
swap to make room?  And if you guess wrong, multiple processes will stall as
the system starts to thrash. (In fact, "thrashing" is just a short way of
saying "consistently guessing wrong as to which pages will be needed soon"....)

But hey, if you got a new page replacement algorithm that performs better,
feel free to post the code.. ;)

Example of why it's a pain in the butt:

A process does a "read(foo, &buffer, 65536);".  buffer is declared as 16
contiguous 4K pages, none of which are currently in memory.  How many pages do
you have to read in, and at what point do you issue the I/O? (hint - work this
problem for a device that's likely to return 64K of data, and again for a
device that has a high chance of only returning 2K of data.....)

But yeah, other than all the cruft like that, it's simple. :)


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 18:40     ` Valdis.Kletnieks
@ 2006-01-23 19:26       ` Benjamin LaHaise
  2006-01-23 19:40         ` Valdis.Kletnieks
  0 siblings, 1 reply; 75+ messages in thread
From: Benjamin LaHaise @ 2006-01-23 19:26 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Al Boldi, Robin Holt, linux-kernel, linux-fsdevel

On Mon, Jan 23, 2006 at 01:40:46PM -0500, Valdis.Kletnieks@vt.edu wrote:
> A process does a "read(foo, &buffer, 65536);".  buffer is declared as 16
> contiguous 4K pages, none of which are currently in memory.  How many pages do
> you have to read in, and at what point do you issue the I/O? (hint - work this
> problem for a device that's likely to return 64K of data, and again for a
> device that has a high chance of only returning 2K of data.....)

Actually, that is something that the vm could optimize out of the picture 
entirely -- it is a question of whether it is worth the added complexity 
to handle such a case.  copy_to_user already takes a slow path when it hits 
the page fault (we do a lookup on the exception handler already) and could 
test if an entire page is being overwritten, and if so proceed to destroy 
the old mapping and use a fresh page from ram.

That said, for the swap case, it probably happens so rarely that the extra 
code isn't worth it.  glibc is already using mmap() in place of read() for 
quite a few apps, so I'm not sure how much low hanging fruit there is left.  
If someone has an app that's read() heavy, it is probably easier to convert 
it to mmap() -- the exception being pipes and sockets which can't.  We need 
numbers. =-)

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 19:26       ` Benjamin LaHaise
@ 2006-01-23 19:40         ` Valdis.Kletnieks
  0 siblings, 0 replies; 75+ messages in thread
From: Valdis.Kletnieks @ 2006-01-23 19:40 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Al Boldi, Robin Holt, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

On Mon, 23 Jan 2006 14:26:06 EST, Benjamin LaHaise said:
> Actually, that is something that the vm could optimize out of the picture 
> entirely -- it is a question of whether it is worth the added complexity 
> to handle such a case.  copy_to_user already takes a slow path when it hits 
> the page fault (we do a lookup on the exception handler already) and could 
> test if an entire page is being overwritten, and if so proceed to destroy 
> the old mapping and use a fresh page from ram.

That was my point - it's easy till you start trying to get actual performance
out of it by optimizing stuff like that. ;)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-21 18:46 ` Avi Kivity
@ 2006-01-23 19:52   ` Bryan Henderson
  2006-01-25 22:04     ` Al Boldi
  2006-01-26  0:03     ` Jon Smirl
  0 siblings, 2 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-01-23 19:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Al Boldi, linux-fsdevel, linux-kernel

>Perhaps you'd be interested in single-level store architectures, where 
>no distinction is made between memory and storage. IBM uses it in one 
>(or maybe more) of their systems.

It's the IBM Eserver I Series, nee System/38 (A.D. 1980), aka AS/400.

It was expected at one time to be the next generation of computer 
architecture, but it turned out that the computing world had matured to 
the point that it was more important to be backward compatible than to 
push frontiers.

The single 128 bit address space addresses every byte of information in 
the system.  The underlying system keeps the majority of it on disk, and 
the logic that loads stuff into electronic memory when it has to be there 
is below the level that any ordinary program would see, much like the 
logic in an IA32 CPU that loads stuff into processor cache.  It's worth 
noting that nowhere in an I Series machine is a layer that looks like a 
CPU Linux runs on; it's designed for single level storage from the gates 
on up through the operating system.

I found Al's dream rather vague, which explains why several people 
inferred different ideas from it (and then beat them down).  It sort of 
sounds like single level storage, but also like virtual memory and like 
mmap.  I assume it's actually supposed to be something different from all 
those.

I personally have set my sights further down the road: I want an address 
space that addresses every byte of information in the universe, not just 
"in" a computer system.  And the infrastructure should move it around 
among various media for optimal access without me worrying about it.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 13:17       ` Jamie Lokier
@ 2006-01-23 20:21         ` Peter Chubb
  0 siblings, 0 replies; 75+ messages in thread
From: Peter Chubb @ 2006-01-23 20:21 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Chase Venters, Michael Loftis, Barry K. Nathan, Al Boldi,
	linux-kernel, linux-fsdevel

>>>>> "Jamie" == Jamie Lokier <jamie@shareable.org> writes:

Jamie> Chase Venters wrote:
>> Just as a curiosity... does anyone have any guesses as to the
>> runtime performance cost of hosting one or more swap files (which
>> thanks to on demand creation and growth are presumably built of
>> blocks scattered around the disk) versus having one or more simple
>> contiguous swap partitions?

>> I think it's probably a given that swap partitions are better; I'm
>> just curious how much better they might actually be.

Jamie> When programs must access files in addition to swapping, and
Jamie> that includes demand-paged executable files, swap files have
Jamie> the _potential_ to be faster because they provide opportunities
Jamie> to use the disk nearer the files which are being accessed.

If you can, put your swap on a different spindle...


Actually, the original poster's `dream' looked a lot like a
single-address-space operating system, such as Mungi (
http://www.cse.unsw.edu.au/~disy/Mungi/ )
-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au           ERTOS within National ICT Australia

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 15:05     ` Ram Gupta
  2006-01-23 15:26       ` Diego Calleja
@ 2006-01-23 20:43       ` Michael Loftis
  2006-01-23 22:42         ` Nikita Danilov
                           ` (2 more replies)
  1 sibling, 3 replies; 75+ messages in thread
From: Michael Loftis @ 2006-01-23 20:43 UTC (permalink / raw)
  To: Ram Gupta; +Cc: Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel



--On January 23, 2006 9:05:41 AM -0600 Ram Gupta <ram.gupta5@gmail.com> 
wrote:

>
> Linux also supports multiple swap files . But these are more
> beneficial if there are more than one disk in the system so that i/o
> can be done in parallel. These swap files may be activated at run time
> based on some criteria.

You missed the point.  The kernel in OS X maintains creation and use of 
these files automatically.  The point wasn't oh wow multiple files' it was 
that it creates them on the fly.  I just posted back with the apparent new 
method that's being used.  I'm not sure if the 512MB number continues or if 
the next file will be 1Gb or another 512M.  Or of memory size affects it or 
not.

I'm sure developer.apple.com or apple darwin pages have the information 
somewhere.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 18:03   ` Al Boldi
  2006-01-23 18:40     ` Valdis.Kletnieks
@ 2006-01-23 22:26     ` Pavel Machek
  1 sibling, 0 replies; 75+ messages in thread
From: Pavel Machek @ 2006-01-23 22:26 UTC (permalink / raw)
  To: Al Boldi; +Cc: Robin Holt, linux-kernel, linux-fsdevel

On Po 23-01-06 21:03:06, Al Boldi wrote:
> Robin Holt wrote:
> > On Sat, Jan 21, 2006 at 09:08:41PM +0300, Al Boldi wrote:
> > >
> > > Wouldn't it be nice to take advantage of todays 64bit archs and TB
> > > drives, and run a more modern way of life w/o this memory/storage split
> > > personality?
> >
> > Your simple world introduces a level of complexity to the kernel which
> > is nearly unmanageable.  Basically, you are asking the system to intuit
> > your desires.  The swap device/file scheme allows an administrator to
> > control some aspects of their system while giving the kernel developer
> > a reasonable number of variables to work with.  That, at least to me,
> > does not sound schizophrenic, but rather very reasonable.
> >
> > Sorry for raining on your parade,
> 
> Thanks for your detailed response, it rather felt like a fresh breeze.
> 
> Really, I was more thinking about a step by step rather than an all or none 
> approach.  Something that would involve tmpfs merged with swap mapped into 
> linear address space limited by arch bits, and everything else connected as 
> archive.
> 
> The idea here is to run inside swap instead of using it as an addon.
> In effect running inside memory cached by physical RAM.

And if you do not want to run inside swap? For example because your
machine has only RAM? This will not fly.

Having dreams is nice, but please avoid sharing them unless they come
with patches attached.
								Pavel 
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 20:43       ` Michael Loftis
@ 2006-01-23 22:42         ` Nikita Danilov
  2006-01-24 14:36           ` Ram Gupta
  2006-01-23 22:57         ` Ram Gupta
  2006-01-24 10:08         ` Meelis Roos
  2 siblings, 1 reply; 75+ messages in thread
From: Nikita Danilov @ 2006-01-23 22:42 UTC (permalink / raw)
  To: Michael Loftis; +Cc: Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

Michael Loftis writes:
 > 
 > 
 > --On January 23, 2006 9:05:41 AM -0600 Ram Gupta <ram.gupta5@gmail.com> 
 > wrote:
 > 
 > >
 > > Linux also supports multiple swap files . But these are more
 > > beneficial if there are more than one disk in the system so that i/o
 > > can be done in parallel. These swap files may be activated at run time
 > > based on some criteria.
 > 
 > You missed the point.  The kernel in OS X maintains creation and use of 
 > these files automatically.  The point wasn't oh wow multiple files' it was 
 > that it creates them on the fly.  I just posted back with the apparent new 

This can be done in Linux from user-space: write a script that monitors
free swap space (grep SwapFree /proc/meminfo), and adds/removes new swap
files err... on-the-fly, or --even better-- just-in-time.

The unique feature that Mac OS X VM does have, on the other hand, is
that it keeps profiles of access patterns of applications, and stores
then in files, associated with executables. This allows to quickly
pre-fault necessary pages during application startup (and this makes OSX
boot so fast).

Nikita.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 20:43       ` Michael Loftis
  2006-01-23 22:42         ` Nikita Danilov
@ 2006-01-23 22:57         ` Ram Gupta
  2006-01-24 10:08         ` Meelis Roos
  2 siblings, 0 replies; 75+ messages in thread
From: Ram Gupta @ 2006-01-23 22:57 UTC (permalink / raw)
  To: Michael Loftis; +Cc: Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

On 1/23/06, Michael Loftis <mloftis@wgops.com> wrote:

> You missed the point.  The kernel in OS X maintains creation and use of
> these files automatically.  The point wasn't oh wow multiple files' it was
> that it creates them on the fly.  I just posted back with the apparent new
> method that's being used.  I'm not sure if the 512MB number continues or if
> the next file will be 1Gb or another 512M.  Or of memory size affects it or
> not.
>
> I'm sure developer.apple.com or apple darwin pages have the information
> somewhere.
>

What do you mean by automatically? As I understand there is no such
thing .If there is a task it has to be done by someone. Something is
done automatically from application point of done because kernel takes
care of that. So if  creation and use of swap  files  is done
automatically then who does it? Is it done by hardware?

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 16:11         ` linux-os (Dick Johnson)
  2006-01-23 16:50           ` Jamie Lokier
@ 2006-01-24  2:08           ` Horst von Brand
  2006-01-25  6:13             ` Jamie Lokier
  2006-01-25  9:23             ` Bernd Petrovitsch
  2006-01-24  2:10           ` Horst von Brand
  2 siblings, 2 replies; 75+ messages in thread
From: Horst von Brand @ 2006-01-24  2:08 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

linux-os \(Dick Johnson\) <linux-os@analogic.com> wrote:
> On Mon, 23 Jan 2006, Diego Calleja wrote:

[...]

> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> > In fact, I bet that most of the cases of machines needing too much
> > memory will be because of bugs in the programs and OOM'ing would be
> > a better solution.

Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
memory, however fast you make it go. RAM is not expensive anymore...

> You have roughly 2 GB of dynamic address-space avaliable to each
> task (stuff that's not the kernel and not the runtime libraries).

Right. But your average task is far from that size, and most of it resides
in shared libraries and (perhaps shared) executables, and is perhaps even
COW shared with other tasks.

> You can easily have 500 tasks,

Even thousands.

>                                even RedHat out-of-the-box creates
> about 60 tasks. That's 1,000 GB of potential swap-space required
> to support this.

But you really never do. That is the point.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 16:11         ` linux-os (Dick Johnson)
  2006-01-23 16:50           ` Jamie Lokier
  2006-01-24  2:08           ` Horst von Brand
@ 2006-01-24  2:10           ` Horst von Brand
  2 siblings, 0 replies; 75+ messages in thread
From: Horst von Brand @ 2006-01-24  2:10 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

linux-os \(Dick Johnson\) <linux-os@analogic.com> wrote:
> On Mon, 23 Jan 2006, Diego Calleja wrote:

[...]

> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> > In fact, I bet that most of the cases of machines needing too much
> > memory will be because of bugs in the programs and OOM'ing would be
> > a better solution.

Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
memory, however fast you make it go. RAM is not expensive anymore...

> You have roughly 2 GB of dynamic address-space avaliable to each
> task (stuff that's not the kernel and not the runtime libraries).

Right. But your average task is far from that size, and most of it resides
in shared libraries and (perhaps shared) executables, and is perhaps even
COW shared with other tasks.

> You can easily have 500 tasks,

Even thousands.

>                                even RedHat out-of-the-box creates
> about 60 tasks. That's 1,000 GB of potential swap-space required
> to support this.

But you really never do. That is the point.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 20:43       ` Michael Loftis
  2006-01-23 22:42         ` Nikita Danilov
  2006-01-23 22:57         ` Ram Gupta
@ 2006-01-24 10:08         ` Meelis Roos
  2 siblings, 0 replies; 75+ messages in thread
From: Meelis Roos @ 2006-01-24 10:08 UTC (permalink / raw)
  To: mloftis, linux-kernel

ML> You missed the point.  The kernel in OS X maintains creation and use of 
ML> these files automatically.  The point wasn't oh wow multiple files' it was 
ML> that it creates them on the fly.  I just posted back with the apparent new 
ML> method that's being used.  I'm not sure if the 512MB number continues or if 
ML> the next file will be 1Gb or another 512M.  Or of memory size affects it or 
ML> not.

Not in kernel but userspace, seems like Linux:

http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/dynamic_pager.8.html

     The dynamic_pager daemon manages a pool of external swap files which the
     kernel uses to support demand paging.  This pool is expanded with new
     swap files as load on the system increases, and contracted when the swap-swapping
     ping resources are no longer needed.  The dynamic_pager daemon also pro-provides
     vides a notification service for those applications which wish to receive
     notices when the external paging pool expands or contracts.

-- 
Meelis Roos

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 22:42         ` Nikita Danilov
@ 2006-01-24 14:36           ` Ram Gupta
  2006-01-24 15:04             ` Diego Calleja
  2006-01-24 15:11             ` Nikita Danilov
  0 siblings, 2 replies; 75+ messages in thread
From: Ram Gupta @ 2006-01-24 14:36 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Michael Loftis, Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

On 1/23/06, Nikita Danilov <nikita@clusterfs.com> wrote:

>
> The unique feature that Mac OS X VM does have, on the other hand, is
> that it keeps profiles of access patterns of applications, and stores
> then in files, associated with executables. This allows to quickly
> pre-fault necessary pages during application startup (and this makes OSX
> boot so fast).

This feature is interesting though I am not sure about the fast boot
part of OSX.
as at boot time these applications are all started first time. So
there were no access pattern as yet. They still have to be demand
paged. But yes later accesses may be faster.

Thanks
Ram gupta

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-24 14:36           ` Ram Gupta
@ 2006-01-24 15:04             ` Diego Calleja
  2006-01-24 20:59               ` Bryan Henderson
  2006-01-24 15:11             ` Nikita Danilov
  1 sibling, 1 reply; 75+ messages in thread
From: Diego Calleja @ 2006-01-24 15:04 UTC (permalink / raw)
  To: Ram Gupta; +Cc: nikita, mloftis, barryn, a1426z, linux-kernel, linux-fsdevel

El Tue, 24 Jan 2006 08:36:50 -0600,
Ram Gupta <ram.gupta5@gmail.com> escribió:

> This feature is interesting though I am not sure about the fast boot
> part of OSX.
> as at boot time these applications are all started first time. So
> there were no access pattern as yet. They still have to be demand
> paged. But yes later accesses may be faster.


The stats are saved on disk (at least on windows). You don't really
care about "later accesses" when everything is already in cache,
this is supposed to speed up cold-cache startup. I don't know
if mac os x does it for every app, the darwin code I saw was
only for the startup of the system not for every app, but maybe that
part was in another module

Linux is the one desktop lacking something like this, both windows
and max os x have things like this. I've wondered for long time if
it's worth of it and if it could improve things in linux. The
prefault part is easy once you get the data. The hard part is to get
the statistics: I wonder if mincore(), /proc/$PID/maps 
and the recently posted /proc/$PID/pmap and all the statistics
the kernel can provide today are enought, or it's neccesary
something more complex.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-24 14:36           ` Ram Gupta
  2006-01-24 15:04             ` Diego Calleja
@ 2006-01-24 15:11             ` Nikita Danilov
  1 sibling, 0 replies; 75+ messages in thread
From: Nikita Danilov @ 2006-01-24 15:11 UTC (permalink / raw)
  To: Ram Gupta
  Cc: Michael Loftis, Barry K. Nathan, Al Boldi, linux-kernel, linux-fsdevel

Ram Gupta writes:
 > On 1/23/06, Nikita Danilov <nikita@clusterfs.com> wrote:
 > 
 > >
 > > The unique feature that Mac OS X VM does have, on the other hand, is
 > > that it keeps profiles of access patterns of applications, and stores
 > > then in files, associated with executables. This allows to quickly
 > > pre-fault necessary pages during application startup (and this makes OSX
 > > boot so fast).
 > 
 > This feature is interesting though I am not sure about the fast boot
 > part of OSX.
 > as at boot time these applications are all started first time. So
 > there were no access pattern as yet. They still have to be demand

That's the point: information about access patterns is stored in the
file. So next time when application is started (e.g., during boot)
kernel reads that file and pre-faults pages.

 > paged. But yes later accesses may be faster.
 > 
 > Thanks
 > Ram gupta

Nikita.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-24 15:04             ` Diego Calleja
@ 2006-01-24 20:59               ` Bryan Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-01-24 20:59 UTC (permalink / raw)
  To: Diego Calleja
  Cc: a1426z, barryn, linux-fsdevel, linux-kernel, mloftis, nikita, Ram Gupta

>Linux is the one desktop lacking something like this, both windows
>and max os x have things like this. I've wondered for long time if
>it's worth of it and if it could improve things in linux. The
>prefault part is easy once you get the data. The hard part is to get
>the statistics:

If you focus on the system startup speed problem, the stats are quite a 
bit simpler.  If you can take a snapshot of every mmap page in memory 
shortly after startup (and verify that no page frames were stolen during 
startup) and save that, you could just prefault all those pages in, in a 
single sweep, at the next boot.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-24  2:08           ` Horst von Brand
@ 2006-01-25  6:13             ` Jamie Lokier
  2006-01-25  9:23             ` Bernd Petrovitsch
  1 sibling, 0 replies; 75+ messages in thread
From: Jamie Lokier @ 2006-01-25  6:13 UTC (permalink / raw)
  To: Horst von Brand
  Cc: linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

Horst von Brand wrote:
> Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
> memory, however fast you make it go. RAM is not expensive anymore...

Actually, RAM is expensive if you've reached the limits of your
machine and have to buy a new machine to get more RAM.

That's exactly the situation I've reached with my laptop.  It's
extremely annoying.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-24  2:08           ` Horst von Brand
  2006-01-25  6:13             ` Jamie Lokier
@ 2006-01-25  9:23             ` Bernd Petrovitsch
  2006-01-25  9:42               ` Lee Revell
  2006-01-25 15:05               ` Jamie Lokier
  1 sibling, 2 replies; 75+ messages in thread
From: Bernd Petrovitsch @ 2006-01-25  9:23 UTC (permalink / raw)
  To: Horst von Brand
  Cc: linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Mon, 2006-01-23 at 23:08 -0300, Horst von Brand wrote:
[...]
> Good rule of thumb: If you run into swap, add RAM. Swap is /extremely/ slow
> memory, however fast you make it go. RAM is not expensive anymore...

- Except on laptops where you usually can't add *any* RAM. And if you
  can, it is *much much* more expensive than on "normal" PCs.
- Except if you - for whatever reason - have to throw out smaller RAMs
  to get larger (and much more expensive) RAMs into it.
- Except (as someone else mentioned) you have already equipped your main
  board to the max.

> > You have roughly 2 GB of dynamic address-space avaliable to each
> > task (stuff that's not the kernel and not the runtime libraries).
> 
> Right. But your average task is far from that size, and most of it resides
> in shared libraries and (perhaps shared) executables, and is perhaps even
> COW shared with other tasks.
> 
> > You can easily have 500 tasks,
> 
> Even thousands.
> 
> >                                even RedHat out-of-the-box creates
> > about 60 tasks. That's 1,000 GB of potential swap-space required
> > to support this.

And after login (on XFCE + a few standard tools in my case) > 200.

> But you really never do. That is the point.

ACK. X, evolution and Mozilla family (to name standard apps) are the
exceptions to this rule.

	Bermd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25  9:23             ` Bernd Petrovitsch
@ 2006-01-25  9:42               ` Lee Revell
  2006-01-25 15:02                 ` Jamie Lokier
  2006-01-25 15:05               ` Jamie Lokier
  1 sibling, 1 reply; 75+ messages in thread
From: Lee Revell @ 2006-01-25  9:42 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Wed, 2006-01-25 at 10:23 +0100, Bernd Petrovitsch wrote:
> 
> ACK. X, evolution and Mozilla family (to name standard apps) are the
> exceptions to this rule. 

If you decrease RLIMIT_STACK from the default 8MB to 256KB or 512KB you
will reduce the footprint of multithreaded apps like evolution by tens
or hundreds of MB, as glibc sets the thread stack size to RLIMIT_STACK
by default.

Lee


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25  9:42               ` Lee Revell
@ 2006-01-25 15:02                 ` Jamie Lokier
  2006-01-25 23:24                   ` Lee Revell
  0 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2006-01-25 15:02 UTC (permalink / raw)
  To: Lee Revell
  Cc: Bernd Petrovitsch, Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

Lee Revell wrote:
> On Wed, 2006-01-25 at 10:23 +0100, Bernd Petrovitsch wrote:
> > 
> > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > exceptions to this rule. 
> 
> If you decrease RLIMIT_STACK from the default 8MB to 256KB or 512KB you
> will reduce the footprint of multithreaded apps like evolution by tens
> or hundreds of MB, as glibc sets the thread stack size to RLIMIT_STACK
> by default.

That should make no difference to the real memory usage.  Stack pages
which aren't used don't take up RAM, and don't count in RSS.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25  9:23             ` Bernd Petrovitsch
  2006-01-25  9:42               ` Lee Revell
@ 2006-01-25 15:05               ` Jamie Lokier
  2006-01-25 15:47                 ` Bernd Petrovitsch
                                   ` (2 more replies)
  1 sibling, 3 replies; 75+ messages in thread
From: Jamie Lokier @ 2006-01-25 15:05 UTC (permalink / raw)
  To: Bernd Petrovitsch
  Cc: Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

Bernd Petrovitsch wrote:
> ACK. X, evolution and Mozilla family (to name standard apps) are the
> exceptions to this rule.

Mozilla / Firefox / Opera in particular.  300MB is not funny on a
laptop which cannot be expanded beyond 192MB.  Are there any usable
graphical _small_ web browsers around?  Usable meaning actually works
on real web sites with fancy features.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 15:05               ` Jamie Lokier
@ 2006-01-25 15:47                 ` Bernd Petrovitsch
  2006-01-25 16:09                 ` Diego Calleja
  2006-01-25 23:28                 ` Lee Revell
  2 siblings, 0 replies; 75+ messages in thread
From: Bernd Petrovitsch @ 2006-01-25 15:47 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Wed, 2006-01-25 at 15:05 +0000, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > exceptions to this rule.
> 
> Mozilla / Firefox / Opera in particular.  300MB is not funny on a
> laptop which cannot be expanded beyond 192MB.  Are there any usable

It is also not funny on 512M if you have other apps running.

> graphical _small_ web browsers around?  Usable meaning actually works
> on real web sites with fancy features.

None that I'm aware of:
- dillo doesn't know CSS and/or Javascript.
- epiphany is the Gnome standard browser - so it probably plays in the
  memory hog league.
- konqueror is KDEs default browser. I've never really used it.
- ____________________________________________________

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 15:05               ` Jamie Lokier
  2006-01-25 15:47                 ` Bernd Petrovitsch
@ 2006-01-25 16:09                 ` Diego Calleja
  2006-01-25 17:26                   ` Jamie Lokier
  2006-01-25 23:28                 ` Lee Revell
  2 siblings, 1 reply; 75+ messages in thread
From: Diego Calleja @ 2006-01-25 16:09 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: bernd, vonbrand, linux-os, ram.gupta5, mloftis, barryn, a1426z,
	linux-kernel, linux-fsdevel

El Wed, 25 Jan 2006 15:05:16 +0000,
Jamie Lokier <jamie@shareable.org> escribió:

> Mozilla / Firefox / Opera in particular.  300MB is not funny on a
> laptop which cannot be expanded beyond 192MB.  Are there any usable
> graphical _small_ web browsers around?  Usable meaning actually works
> on real web sites with fancy features.

Opera is probably the best browser when it comes to "features per byte
of memory used", so if that isn't useful....there's a minimo web browser
(http://www.mozilla.org/projects/minimo/) It's supposed to be designed
for mobile devices, but it may be usable on normal
computers.

The X server itself doesn't eat too many memory. In my box (radeon
9200SE graphic card) the X server only eats 11 MB of RAM - not too
much in my opinion for a 20-years-old code project which according
to the X developers it has many areas where it could be cleaned up.

The X server will grow its size because applications store the
images in the X server. And the X server is supposed to be
network-transparent, so apps send to the x server the data, not
a "reference to the data" (ie: a path to a file), so (i think) the
file cannot be mmap'ed to share the file in memory: there're still
some apps (or so I've heard) which send a image to the server
and keep a private copy in their own address space so the memory 
needed to store those images is *doubled* (gnome used to keep
*three* copies of the background image, one in nautilus, other
in gnome-settings-daemon and another in the X server, and
gnome-terminal keeps another copy when transparency is enabled)


Also, fontconfig allocates ~100 KB of memory per program launched.
There're patches to fix that by creating a mmap'able cache which is
shared between all the applications which has been merged in the
development version. I think there're many low-hanging fruits at
all levels, the problem is not just mozilla & friends

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 16:09                 ` Diego Calleja
@ 2006-01-25 17:26                   ` Jamie Lokier
  2006-01-26 19:13                     ` Bryan Henderson
  0 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2006-01-25 17:26 UTC (permalink / raw)
  To: Diego Calleja
  Cc: bernd, vonbrand, linux-os, ram.gupta5, mloftis, barryn, a1426z,
	linux-kernel, linux-fsdevel

Diego Calleja wrote:
> Opera is probably the best browser when it comes to "features per byte
> of memory used"

Really?  If I'm making use it, maybe visiting a few hundred pages a
day, and opening 20 tabs, I find I have to kill it every few days, to
reclaim the memory it's hogging, when its resident size exceeds my RAM
size and it starts chugging.

> Also, fontconfig allocates ~100 KB of memory per program launched.
> There're patches to fix that by creating a mmap'able cache which is
> shared between all the applications which has been merged in the
> development version. I think there're many low-hanging fruits at
> all levels, the problem is not just mozilla & friends

100kB per program, even for 100 programs, is nothing compared a
browser's 300MB footprint.  Now, some of that 300MB is permanently
swapped out for the first few days of running.  Libraries and such.
Which is relevant to this thread: swap is useful, just so you can swap
out completely unused parts of programs.  (The parts which could be
optimised away in principle).

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 19:52   ` Bryan Henderson
@ 2006-01-25 22:04     ` Al Boldi
  2006-01-26 19:18       ` Bryan Henderson
  2006-01-26  0:03     ` Jon Smirl
  1 sibling, 1 reply; 75+ messages in thread
From: Al Boldi @ 2006-01-25 22:04 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, linux-kernel

Bryan Henderson wrote:
> >Perhaps you'd be interested in single-level store architectures, where
> >no distinction is made between memory and storage. IBM uses it in one
> >(or maybe more) of their systems.
>
> It's the IBM Eserver I Series, nee System/38 (A.D. 1980), aka AS/400.
>
> It was expected at one time to be the next generation of computer
> architecture, but it turned out that the computing world had matured to
> the point that it was more important to be backward compatible than to
> push frontiers.
>
> The single 128 bit address space addresses every byte of information in
> the system.  The underlying system keeps the majority of it on disk, and
> the logic that loads stuff into electronic memory when it has to be there
> is below the level that any ordinary program would see, much like the
> logic in an IA32 CPU that loads stuff into processor cache.  It's worth
> noting that nowhere in an I Series machine is a layer that looks like a
> CPU Linux runs on; it's designed for single level storage from the gates
> on up through the operating system.
>
> I found Al's dream rather vague, which explains why several people
> inferred different ideas from it (and then beat them down).  It sort of
> sounds like single level storage, but also like virtual memory and like
> mmap.  I assume it's actually supposed to be something different from all
> those.

Not really different, but rather an attempt to use hardware in a 
native/direct fashion w/o running circles.  But first let's look at the 
reasons that led the industry to this mem/disk personality split.

Consider these archs:
	bits	space
	8	256
	16	64K
	32	4G
	64	16GG=16MT
	128	256GGGG=256TTT
	
It follows that with 
	8 and 16 bits you are forced to split
	32 is inbetween
	64 is more than enough for most purposes
	128 is astronomical for most purposes
	 :
	 :

So we have a situation right now that imposes a legacy solution on hardware 
that is really screaming (64+) to be taken advantage of.  This does not mean 
that we have to blow things out of proportion and reinvent the wheel, but 
instead revert the workaround that was necessary in the past (-32).  

If reverted properly, things should be completely transparent to user-space 
and definitely faster, lots faster, especially under load.  Think about it.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 15:26       ` Diego Calleja
  2006-01-23 16:11         ` linux-os (Dick Johnson)
@ 2006-01-25 22:27         ` Nix
  2006-01-26 15:13           ` Denis Vlasenko
  1 sibling, 1 reply; 75+ messages in thread
From: Nix @ 2006-01-25 22:27 UTC (permalink / raw)
  To: Diego Calleja
  Cc: Ram Gupta, mloftis, barryn, a1426z, linux-kernel, linux-fsdevel

On 23 Jan 2006, Diego Calleja wrote:
> El Mon, 23 Jan 2006 09:05:41 -0600,
> Ram Gupta <ram.gupta5@gmail.com> escribió:
> 
>> Linux also supports multiple swap files . But these are more
> 
> There're in fact a "dynamic swap" tool which apparently
> does what mac os x do: http://dynswapd.sourceforge.net/
> 
> However, I doubt the approach is really useful. If you need that much
> swap space, you're going well beyond the capabilities of the machine.

Well, to some extent it depends on your access patterns. The backup
program I use (`dar') is an enormous memory hog: it happily eats 5Gb on
my main fileserver (an UltraSPARC, so compiling it 64-bit does away with
address space sizing problems). That machine has only 512Mb RAM, so
you'd expect the thing would be swapping to death; but the backup
program's locality of reference is sufficiently good that it doesn't
swap much at all (and that in one tight lump at the end).

-- 
`Everyone has skeletons in the closet.  The US has the skeletons
 driving living folks into the closet.' --- Rebecca Ore

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 15:02                 ` Jamie Lokier
@ 2006-01-25 23:24                   ` Lee Revell
  0 siblings, 0 replies; 75+ messages in thread
From: Lee Revell @ 2006-01-25 23:24 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Petrovitsch, Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Wed, 2006-01-25 at 15:02 +0000, Jamie Lokier wrote:
> Lee Revell wrote:
> > On Wed, 2006-01-25 at 10:23 +0100, Bernd Petrovitsch wrote:
> > > 
> > > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > > exceptions to this rule. 
> > 
> > If you decrease RLIMIT_STACK from the default 8MB to 256KB or 512KB you
> > will reduce the footprint of multithreaded apps like evolution by tens
> > or hundreds of MB, as glibc sets the thread stack size to RLIMIT_STACK
> > by default.
> 
> That should make no difference to the real memory usage.  Stack pages
> which aren't used don't take up RAM, and don't count in RSS.

It still seems like not allocating memory that the application will
never use could enable the VM to make better decisions.  Also not
wasting 7.5MB per thread for the stack should make tracking down actual
bloat in the libraries easier.

Lee


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 15:05               ` Jamie Lokier
  2006-01-25 15:47                 ` Bernd Petrovitsch
  2006-01-25 16:09                 ` Diego Calleja
@ 2006-01-25 23:28                 ` Lee Revell
  2006-01-26  1:29                   ` Diego Calleja
  2006-01-26  5:01                   ` Jamie Lokier
  2 siblings, 2 replies; 75+ messages in thread
From: Lee Revell @ 2006-01-25 23:28 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Petrovitsch, Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Wed, 2006-01-25 at 15:05 +0000, Jamie Lokier wrote:
> Bernd Petrovitsch wrote:
> > ACK. X, evolution and Mozilla family (to name standard apps) are the
> > exceptions to this rule.
> 
> Mozilla / Firefox / Opera in particular.  300MB is not funny on a
> laptop which cannot be expanded beyond 192MB.  Are there any usable
> graphical _small_ web browsers around?  Usable meaning actually works
> on real web sites with fancy features.

"Small" and "fancy features" are not compatible.

That's the problem with the term "usable" - to developers it means
"supports the basic core functionality of a web browser" while to users
it means "supports every bell and whistle that I get on Windows".

Lee


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-23 19:52   ` Bryan Henderson
  2006-01-25 22:04     ` Al Boldi
@ 2006-01-26  0:03     ` Jon Smirl
  2006-01-26 19:48       ` Bryan Henderson
  1 sibling, 1 reply; 75+ messages in thread
From: Jon Smirl @ 2006-01-26  0:03 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, linux-kernel

On 1/23/06, Bryan Henderson <hbryan@us.ibm.com> wrote:
> >Perhaps you'd be interested in single-level store architectures, where
> >no distinction is made between memory and storage. IBM uses it in one
> >(or maybe more) of their systems.

Are there any Linux file systems that work by mmapping the entire
drive and using the paging system to do the read/writes? With 64 bits
there's enough address space to do that now. How does this perform
compared to a traditional block based scheme?

With the IBM 128b address space aren't the devices vulnerable to an
errant program spraying garbage into the address space? Is it better
to map each device into it's own address space?

--
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 23:28                 ` Lee Revell
@ 2006-01-26  1:29                   ` Diego Calleja
  2006-01-26  5:01                   ` Jamie Lokier
  1 sibling, 0 replies; 75+ messages in thread
From: Diego Calleja @ 2006-01-26  1:29 UTC (permalink / raw)
  To: Lee Revell
  Cc: jamie, bernd, vonbrand, linux-os, diegocg, ram.gupta5, mloftis,
	barryn, a1426z, linux-kernel, linux-fsdevel

El Wed, 25 Jan 2006 18:28:34 -0500,
Lee Revell <rlrevell@joe-job.com> escribió:

> > Mozilla / Firefox / Opera in particular.  300MB is not funny on a
> > laptop which cannot be expanded beyond 192MB.  Are there any usable
> > graphical _small_ web browsers around?  Usable meaning actually works
> > on real web sites with fancy features.
> 
> "Small" and "fancy features" are not compatible.
> 
> That's the problem with the term "usable" - to developers it means
> "supports the basic core functionality of a web browser" while to users
> it means "supports every bell and whistle that I get on Windows".


That'd be a interesting philosophical (and somewhat offtopic) flamewar:
It's is theorically possible to write a operative system with bells and
whistles for a computer with 200 MB of ram? 200 MB is really a lot of
ram....I'm really surprised at how easy is to write a program that eats
a docen of MB of ram just by showing a window and a few buttons.

In my perfect world, a superhero (say, Linus ;) would analyze and
redesign the whole software stack and would fix it. IMO some parts
of a complete gnu linux system have been accumulating fat with the
time, ej: plan 9's network abstraction could make possible to
kill tons of networking code from lot of apps...

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 23:28                 ` Lee Revell
  2006-01-26  1:29                   ` Diego Calleja
@ 2006-01-26  5:01                   ` Jamie Lokier
  2006-01-26  5:11                     ` Lee Revell
  1 sibling, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2006-01-26  5:01 UTC (permalink / raw)
  To: Lee Revell
  Cc: Bernd Petrovitsch, Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

Lee Revell wrote:
> > Mozilla / Firefox / Opera in particular.  300MB is not funny on a
> > laptop which cannot be expanded beyond 192MB.  Are there any usable
> > graphical _small_ web browsers around?  Usable meaning actually works
> > on real web sites with fancy features.
> 
> "Small" and "fancy features" are not compatible.
> 
> That's the problem with the term "usable" - to developers it means
> "supports the basic core functionality of a web browser" while to users
> it means "supports every bell and whistle that I get on Windows".

As both a developer and user, all I want is a web browser that works
with the sites I visit, and performs reasonably well on my laptop.

I know there are fast algorithms for layout, for running scripts and
updating trees, and the memory usage doesn't have to be anywhere near
as much as it is.

So it's reasonable to ask if anyone has written a fast browser that
works with current popular sites in fits in under 256MB after a few
days use.

Unfortunately, the response seems to be no, nobody has.  I guess it's
a big job and there isn't the interest and resourcing to do it.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-26  5:01                   ` Jamie Lokier
@ 2006-01-26  5:11                     ` Lee Revell
  2006-01-26 14:46                       ` Dave Kleikamp
  0 siblings, 1 reply; 75+ messages in thread
From: Lee Revell @ 2006-01-26  5:11 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Petrovitsch, Horst von Brand, linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Thu, 2006-01-26 at 05:01 +0000, Jamie Lokier wrote:
> Lee Revell wrote:
> > > Mozilla / Firefox / Opera in particular.  300MB is not funny on a
> > > laptop which cannot be expanded beyond 192MB.  Are there any usable
> > > graphical _small_ web browsers around?  Usable meaning actually works
> > > on real web sites with fancy features.
> > 
> > "Small" and "fancy features" are not compatible.
> > 
> > That's the problem with the term "usable" - to developers it means
> > "supports the basic core functionality of a web browser" while to users
> > it means "supports every bell and whistle that I get on Windows".
> 
> As both a developer and user, all I want is a web browser that works
> with the sites I visit, and performs reasonably well on my laptop.
> 
> I know there are fast algorithms for layout, for running scripts and
> updating trees, and the memory usage doesn't have to be anywhere near
> as much as it is.
> 
> So it's reasonable to ask if anyone has written a fast browser that
> works with current popular sites in fits in under 256MB after a few
> days use.
> 
> Unfortunately, the response seems to be no, nobody has.  I guess it's
> a big job and there isn't the interest and resourcing to do it.
> 

What's wrong with Firefox?

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
rlrevell  6423  6.7 16.7 167676 73804 ?        Sl   Jan25  79:41 /usr/lib/firefox/firefox-bin -a firefox

73MB is not bad.

Obviously if you open 20 tabs, it will take a lot more memory, as it's
going to have to cache all the rendered pages.

Lee


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-26  5:11                     ` Lee Revell
@ 2006-01-26 14:46                       ` Dave Kleikamp
  0 siblings, 0 replies; 75+ messages in thread
From: Dave Kleikamp @ 2006-01-26 14:46 UTC (permalink / raw)
  To: Lee Revell
  Cc: Jamie Lokier, Bernd Petrovitsch, Horst von Brand,
	linux-os (Dick Johnson),
	Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Thu, 2006-01-26 at 00:11 -0500, Lee Revell wrote:
> What's wrong with Firefox?
> 
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> rlrevell  6423  6.7 16.7 167676 73804 ?        Sl   Jan25  79:41 /usr/lib/firefox/firefox-bin -a firefox
> 
> 73MB is not bad.
> 
> Obviously if you open 20 tabs, it will take a lot more memory, as it's
> going to have to cache all the rendered pages.

I had a recent bad experience that I believe was due to a bug in
adblock.  Upgrading to the most recent version of adblock fixed a memory
leak that made firefox unusable after a while.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 22:27         ` Nix
@ 2006-01-26 15:13           ` Denis Vlasenko
  2006-01-26 16:23             ` Nix
  0 siblings, 1 reply; 75+ messages in thread
From: Denis Vlasenko @ 2006-01-26 15:13 UTC (permalink / raw)
  To: Nix
  Cc: Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Thursday 26 January 2006 00:27, Nix wrote:
> On 23 Jan 2006, Diego Calleja wrote:
> > El Mon, 23 Jan 2006 09:05:41 -0600,
> > Ram Gupta <ram.gupta5@gmail.com> escribió:
> > 
> >> Linux also supports multiple swap files . But these are more
> > 
> > There're in fact a "dynamic swap" tool which apparently
> > does what mac os x do: http://dynswapd.sourceforge.net/
> > 
> > However, I doubt the approach is really useful. If you need that much
> > swap space, you're going well beyond the capabilities of the machine.
> 
> Well, to some extent it depends on your access patterns. The backup
> program I use (`dar') is an enormous memory hog: it happily eats 5Gb on
> my main fileserver (an UltraSPARC, so compiling it 64-bit does away with
> address space sizing problems). That machine has only 512Mb RAM, so
> you'd expect the thing would be swapping to death; but the backup
> program's locality of reference is sufficiently good that it doesn't
> swap much at all (and that in one tight lump at the end).

Totally insane proggie.
--
vda

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-26 15:13           ` Denis Vlasenko
@ 2006-01-26 16:23             ` Nix
  0 siblings, 0 replies; 75+ messages in thread
From: Nix @ 2006-01-26 16:23 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Diego Calleja, Ram Gupta, mloftis, barryn, a1426z, linux-kernel,
	linux-fsdevel

On Thu, 26 Jan 2006, Denis Vlasenko announced authoritatively:
> On Thursday 26 January 2006 00:27, Nix wrote:
>> Well, to some extent it depends on your access patterns. The backup
>> program I use (`dar') is an enormous memory hog: it happily eats 5Gb on
>> my main fileserver (an UltraSPARC, so compiling it 64-bit does away with
>> address space sizing problems). That machine has only 512Mb RAM, so
>> you'd expect the thing would be swapping to death; but the backup
>> program's locality of reference is sufficiently good that it doesn't
>> swap much at all (and that in one tight lump at the end).
> 
> Totally insane proggie.

For incremental backups, it has to work out which files have been added
or removed across the whole disk; whether it stores this in temporary
files or in memory, if there's more file metadata than fits in physical
RAM, it'll be disk-bound working that out at the end no matter what you
do. And avoiding temporary files means you don't have problems with
those (growing) files landing in the backup.

(Now some of its design decisions, like the decision to represent things
like the sizes of files with a custom `infinint' class with a size of
something like 64 bytes, probably were insane. At least you can change
it at configure-time to use long longs instead, vastly reducing memory
usage to the mere 5Gb mentioned in that post...)

(Lovely feature set, shame about the memory hit.)

-- 
`Everyone has skeletons in the closet.  The US has the skeletons
 driving living folks into the closet.' --- Rebecca Ore

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 17:26                   ` Jamie Lokier
@ 2006-01-26 19:13                     ` Bryan Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-01-26 19:13 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: a1426z, barryn, bernd, Diego Calleja, linux-fsdevel,
	linux-kernel, linux-os, mloftis, ram.gupta5, vonbrand

>> Opera is probably the best browser when it comes to "features per byte
>> of memory used"
>
>Really?  If I'm making use it, maybe visiting a few hundred pages a
>day, and opening 20 tabs, I find I have to kill it every few days, to
>reclaim the memory it's hogging, when its resident size exceeds my RAM
>size and it starts chugging.

That matches my experience, though it does crash enough on its own that I 
often don't have to kill it.  I also use an rlimit (64MiB) to make the 
system kill it automatically before it gets too big, and an automatic 
restarter.  Opera is, thankfully, very good at bouncing back to exactly 
where it was when it died (minus the leaked memory).

But allowing for that extra operational procedure, I'd still say it has 
the most features per byte, and if you don't count ability to work with 
certain websites as a feature, I think it probably has the most features 
absolutely as well.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-25 22:04     ` Al Boldi
@ 2006-01-26 19:18       ` Bryan Henderson
  2006-01-27 16:12         ` Al Boldi
  0 siblings, 1 reply; 75+ messages in thread
From: Bryan Henderson @ 2006-01-26 19:18 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-fsdevel, linux-kernel

>[explanation of memory/disk split]
>...
>So we have a situation right now that imposes a legacy solution on 
hardware 
>that is really screaming (64+) to be taken advantage of.

Put that way, you seem to be describing exactly single level storage as 
seen in an IBM Eserver I Series (fka AS/400, nee System/38).

So we know it works, but also that people don't seem to care much for it 
(because in 35 years, it hasn't taken over the world - we got to today's 
machines with 64 bit address spaces for other reasons).

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-26  0:03     ` Jon Smirl
@ 2006-01-26 19:48       ` Bryan Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-01-26 19:48 UTC (permalink / raw)
  To: Jon Smirl; +Cc: linux-fsdevel, linux-kernel

>Are there any Linux file systems that work by mmapping the entire
>drive and using the paging system to do the read/writes? With 64 bits
>there's enough address space to do that now. How does this perform
>compared to a traditional block based scheme?

They pretty much all do that.  A filesystem driver doesn't actually map 
the whole drive into memory addresses all at once and generate page faults 
by referencing memory -- instead, it generates the page faults explicitly, 
which it can do more efficiently, and sets up the mappings in smaller 
pieces as needed (also more efficient).  But the code that reads the pages 
into the file cache and cleans dirty file cache pages out to the disk is 
the same paging code that responds to page faults on malloc'ed pages and 
writes such pages out to swap space when their page frames are needed for 
other things.

>With the IBM 128b address space aren't the devices vulnerable to an
>errant program spraying garbage into the address space? Is it better
>to map each device into it's own address space?

Partitioning your storage space along device lines and making someone who 
wants to store something identify a device for it is a pretty primitive 
way of limiting errant programs.  Something like Linux disk quota and 
rlimit (ulimit) is more appropriate to the task, and systems that gather 
all their disk storage (even if separate from main memory) into a single 
automated pool do have such quota systems.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-26 19:18       ` Bryan Henderson
@ 2006-01-27 16:12         ` Al Boldi
  2006-01-27 19:17           ` Bryan Henderson
  0 siblings, 1 reply; 75+ messages in thread
From: Al Boldi @ 2006-01-27 16:12 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, linux-kernel

Bryan Henderson wrote:
> >[explanation of memory/disk split]
> >...
> >So we have a situation right now that imposes a legacy solution on
> >hardware that is really screaming (64+) to be taken advantage of.
>
> Put that way, you seem to be describing exactly single level storage as
> seen in an IBM Eserver I Series (fka AS/400, nee System/38).

To some extent.

> So we know it works, but also that people don't seem to care much for it

People didn't care, because the AS/400 was based on a proprietary solution.  
I remember a client being forced to dump an AS/400 due to astronomical 
maintenance costs.

With todays generically mass-produced 64bit archs, what's not to care about a 
cost-effective system that provides direct mapped access into linear address 
space?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-27 16:12         ` Al Boldi
@ 2006-01-27 19:17           ` Bryan Henderson
  2006-01-30 13:21             ` Al Boldi
  0 siblings, 1 reply; 75+ messages in thread
From: Bryan Henderson @ 2006-01-27 19:17 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-fsdevel, linux-kernel

>> So we know it [single level storage] works, but also that people don't 
seem to care much for it
>
>People didn't care, because the AS/400 was based on a proprietary 
solution. 

I don't know what a "proprietary solution" is, but what we had was a 
complete demonstration of the value of single level storage, in commercial 
use and everything,  and other computer makers (and other business units 
of IBM) stuck with their memory/disk split personality.  For 25 years, 
lots of computer makers developed lots of new computer architectures and 
they all (practically speaking) had the memory/disk split.  There has to 
be a lesson in that.

>With todays generically mass-produced 64bit archs, what's not to care 
about a 
>cost-effective system that provides direct mapped access into linear 
address 
>space?

I don't know; I'm sure it's complicated.  But unless the stumbling block 
since 1980 has been that it was too hard to get/make a CPU with a 64 bit 
address space, I don't see what's different today.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-27 19:17           ` Bryan Henderson
@ 2006-01-30 13:21             ` Al Boldi
  2006-01-30 13:35               ` Kyle Moffett
  2006-01-30 16:49               ` Bryan Henderson
  0 siblings, 2 replies; 75+ messages in thread
From: Al Boldi @ 2006-01-30 13:21 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel, linux-kernel

Bryan Henderson wrote:
> >> So we know it [single level storage] works, but also that people don't
> >> seem to care much for it
>
> > People didn't care, because the AS/400 was based on a proprietary
> > solution.
>
> I don't know what a "proprietary solution" is, but what we had was a
> complete demonstration of the value of single level storage, in commercial
> use and everything,  and other computer makers (and other business units
> of IBM) stuck with their memory/disk split personality.  For 25 years,
> lots of computer makers developed lots of new computer architectures and
> they all (practically speaking) had the memory/disk split.  There has to
> be a lesson in that.

Sure there is lesson here.  People have a tendency to resist change, even 
though they know the current way is faulty.

> > With todays generically mass-produced 64bit archs, what's not to care
> > about a cost-effective system that provides direct mapped access into 
> > linear address space?
>
> I don't know; I'm sure it's complicated.

Why would you think that the shortest path between two points is complicated, 
when you have the ability to fly?

> But unless the stumbling block
> since 1980 has been that it was too hard to get/make a CPU with a 64 bit
> address space, I don't see what's different today.

You are hitting the nail right on it's head here.
Nothing moves the masses like mass-production.

So with 64bits widely available now, and to let Linux spread its wings and 
really fly, how could tmpfs merged w/ swap be tweaked to provide direct 
mapped access into this linear address space?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-30 13:21             ` Al Boldi
@ 2006-01-30 13:35               ` Kyle Moffett
  2006-01-31 15:56                 ` Al Boldi
  2006-01-30 16:49               ` Bryan Henderson
  1 sibling, 1 reply; 75+ messages in thread
From: Kyle Moffett @ 2006-01-30 13:35 UTC (permalink / raw)
  To: Al Boldi; +Cc: Bryan Henderson, linux-fsdevel, linux-kernel

On Jan 30, 2006, at 08:21, Al Boldi wrote:
> Bryan Henderson wrote:
>>>> So we know it [single level storage] works, but also that people  
>>>> don't seem to care much for it
>>
>>> People didn't care, because the AS/400 was based on a proprietary  
>>> solution.
>>
>> I don't know what a "proprietary solution" is, but what we had was  
>> a complete demonstration of the value of single level storage, in  
>> commercial use and everything,  and other computer makers (and  
>> other business units of IBM) stuck with their memory/disk split  
>> personality.  For 25 years, lots of computer makers developed lots  
>> of new computer architectures and they all (practically speaking)  
>> had the memory/disk split.  There has to be a lesson in that.
>
> Sure there is lesson here.  People have a tendency to resist  
> change, even though they know the current way is faulty.

Is it necessarily faulty?  It seems to me that the current way works  
pretty well so far, and unless you can prove a really strong point  
the other way, there's no point in changing.  You have to remember  
that change introduces bugs which then have to be located and removed  
again, so change is not necessarily cheap.

>>> With todays generically mass-produced 64bit archs, what's not to  
>>> care about a cost-effective system that provides direct mapped  
>>> access into  linear address space?
>>
>> I don't know; I'm sure it's complicated.
>
> Why would you think that the shortest path between two points is  
> complicated, when you have the ability to fly?

Bad analogy.  This is totally irrelevant to the rest of the discussion.

>> But unless the stumbling block since 1980 has been that it was too  
>> hard to get/make a CPU with a 64 bit address space, I don't see  
>> what's different today.
>
> You are hitting the nail right on it's head here. Nothing moves the  
> masses like mass-production.

Uhh, no, you misread his argument: If there were other reasons that  
this was not done in the past than lack of 64-bit CPUS, then this is  
probably still not practical/feasible/desirable.

Cheers,
Kyle Moffett

--
There is no way to make Linux robust with unreliable memory  
subsystems, sorry.  It would be like trying to make a human more  
robust with an unreliable O2 supply. Memory just has to work.
   -- Andi Kleen



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-30 13:21             ` Al Boldi
  2006-01-30 13:35               ` Kyle Moffett
@ 2006-01-30 16:49               ` Bryan Henderson
  1 sibling, 0 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-01-30 16:49 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-fsdevel, linux-kernel

>> > With todays generically mass-produced 64bit archs, what's not to care
>> > about a cost-effective system that provides direct mapped access into 

>> > linear address space?
>>
>> I don't know; I'm sure it's complicated.
>
>Why would you think that the shortest path between two points is 
complicated, 

I can see that my statement could be read a different way from what I 
meant.  I meant I'm sure that the reason people don't care about single 
level storage is complicated.  (Ergo I haven't tried, so far, to argue for 
or against it but just to point out some history).


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-30 13:35               ` Kyle Moffett
@ 2006-01-31 15:56                 ` Al Boldi
  2006-01-31 16:34                   ` Kyle Moffett
                                     ` (4 more replies)
  0 siblings, 5 replies; 75+ messages in thread
From: Al Boldi @ 2006-01-31 15:56 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Bryan Henderson, linux-fsdevel, linux-kernel

Kyle Moffett wrote:
> On Jan 30, 2006, at 08:21, Al Boldi wrote:
> > Bryan Henderson wrote:
> >>>> So we know it [single level storage] works, but also that people
> >>>> don't seem to care much for it
> >>>
> >>> People didn't care, because the AS/400 was based on a proprietary
> >>> solution.
> >>
> >> I don't know what a "proprietary solution" is, but what we had was
> >> a complete demonstration of the value of single level storage, in
> >> commercial use and everything,  and other computer makers (and
> >> other business units of IBM) stuck with their memory/disk split
> >> personality.  For 25 years, lots of computer makers developed lots
> >> of new computer architectures and they all (practically speaking)
> >> had the memory/disk split.  There has to be a lesson in that.
> >
> > Sure there is lesson here.  People have a tendency to resist
> > change, even though they know the current way is faulty.
>
> Is it necessarily faulty?  It seems to me that the current way works
> pretty well so far, and unless you can prove a really strong point
> the other way, there's no point in changing.  You have to remember
> that change introduces bugs which then have to be located and removed
> again, so change is not necessarily cheap.

Faulty, because we are currently running a legacy solution to workaround an 
8,16,(32) arch bits address space limitation, which does not exist in 
64bits+ archs for most purposes.

Trying to defend the current way would be similar to rejecting the move from 
16bit to 32bit. Do you remember that time?  One of the arguments used was:  
the current way works pretty well so far.

The advice here would be:  wake up and smell the coffee.

There is a lot to gain, for one there is no more swapping w/ all its related 
side-effects.  You're dealing with memory only.  You can also run your fs 
inside memory, like tmpfs, which is definitely faster.  And there may be 
lots of other advantages, due to the simplified architecture applied.

> >>> With todays generically mass-produced 64bit archs, what's not to
> >>> care about a cost-effective system that provides direct mapped
> >>> access into  linear address space?
> >>
> >> I don't know; I'm sure it's complicated.
> >
> > Why would you think that the shortest path between two points is
> > complicated, when you have the ability to fly?
>
> Bad analogy.

If you didn't understand it's meaning.  The shortest path meaning accessing 
hw w/o running workarounds; using 64bits+ to fly over past limitations.

> >> But unless the stumbling block since 1980 has been that it was too
> >> hard to get/make a CPU with a 64 bit address space, I don't see
> >> what's different today.
> >
> > You are hitting the nail right on it's head here. Nothing moves the
> > masses like mass-production.
>
> Uhh, no, you misread his argument: If there were other reasons that
> this was not done in the past than lack of 64-bit CPUS, then this is
> probably still not practical/feasible/desirable.

Uhh?
The point here is: Even if there were 64bit archs available in the past, this 
did not mean that moving into native 64bits would be commercially viable, 
due to its unavailability on the mass-market.

So with 64bits widely available now, and to let Linux spread its wings and 
really fly, how could tmpfs merged w/ swap be tweaked to provide direct 
mapped access into this linear address space?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-31 15:56                 ` Al Boldi
@ 2006-01-31 16:34                   ` Kyle Moffett
  2006-01-31 23:14                     ` Bryan Henderson
  2006-01-31 16:34                   ` Lennart Sorensen
                                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 75+ messages in thread
From: Kyle Moffett @ 2006-01-31 16:34 UTC (permalink / raw)
  To: Al Boldi; +Cc: Bryan Henderson, linux-fsdevel, linux-kernel

BTW, unless you have a patch or something to propose, let's take this  
off-list, it's getting kind of OT now.

On Jan 31, 2006, at 10:56, Al Boldi wrote:
> Kyle Moffett wrote:
>> Is it necessarily faulty?  It seems to me that the current way  
>> works pretty well so far, and unless you can prove a really strong  
>> point the other way, there's no point in changing.  You have to  
>> remember that change introduces bugs which then have to be located  
>> and removed again, so change is not necessarily cheap.
>
> Faulty, because we are currently running a legacy solution to  
> workaround an 8,16,(32) arch bits address space limitation, which  
> does not exist in 64bits+ archs for most purposes.

There are a lot of reasons for paging, only _one_ of them is/was to  
deal with too-small address spaces.  Other reasons are that sometimes  
you really _do_ want a nonlinear mapping of data/files/libs/etc.  It  
also allows easy remapping of IO space or video RAM into application  
address spaces, etc.  If you have a direct linear mapping from  
storage into RAM, common non-linear mappings become _extremely_  
complex and CPU-intensive.

Besides, you never did address the issue of large changes causing  
large bugs.  Any large change needs to have advantages proportional  
to the bugs it will cause, and you have not yet proven this case.

> Trying to defend the current way would be similar to rejecting the  
> move from  16bit to 32bit. Do you remember that time?  One of the  
> arguments used was:  the current way works pretty well so far.

Arbitrary analogies do not prove things.  Can you cite examples that  
clearly indicate how paged-memory is to direct-linear-mapping as 16- 
bit processors are to 32-bit processors?

> There is a lot to gain, for one there is no more swapping w/ all  
> its related side-effects.

This is *NOT* true.  When you have more data than RAM, you have to  
put data on disk, which means swapping, regardless of the method in  
which it is done.

> You're dealing with memory only.  You can also run your fs inside  
> memory, like tmpfs, which is definitely faster.

Not on Linux.  We have a whole unique dcache system precisely so that  
a frequently accessed filesystem _is_ as fast as tmpfs (Unless you're  
writing and syncing a lot, in which case you still need to wait for  
disk hardware to commit data).

> And there may be lots of other advantages, due to the simplified  
> architecture applied.

Can you describe in detail your "simplified architecture"?? I can't  
see any significant complexity advantages over the standard paging  
model that Linux has.

>>> Why would you think that the shortest path between two points is  
>>> complicated, when you have the ability to fly?
>>
>> Bad analogy.
>
> If you didn't understand it's meaning.  The shortest path meaning  
> accessing hw w/o running workarounds; using 64bits+ to fly over  
> past limitations.

This makes *NO* technical sense and is uselessly vague.  Applying  
vague indirect analogies to technical topics is a fruitless  
endeavor.  Please provide technical points and reasons why it _is_  
indead shorter/better/faster, and then you can still leave out the  
analogy because the technical argument is sufficient.

>>>> But unless the stumbling block since 1980 has been that it was too
>>>> hard to get/make a CPU with a 64 bit address space, I don't see
>>>> what's different today.
>>>
>>> You are hitting the nail right on it's head here. Nothing moves the
>>> masses like mass-production.
>>
>> Uhh, no, you misread his argument: If there were other reasons that
>> this was not done in the past than lack of 64-bit CPUS, then this is
>> probably still not practical/feasible/desirable.
>
> Uhh?
> The point here is: Even if there were 64bit archs available in the  
> past, this did not mean that moving into native 64bits would be  
> commercially viable, due to its unavailability on the mass-market.

Are you even reading these messages?

1) IF the ONLY reason this was not done before is that 64-bit archs  
were hard to get, then you are right.

2) IF there were OTHER reasons, then you are not correct.

This is the argument.  You keep discussing how 64-bit archs were not  
easily available before and are now, and I AGREE, but that is NOT  
RELEVANT to the point he made.  Can you prove that there are no other  
disadvantages to a linear-mapped model?

Cheers,
Kyle Moffett

--
Q: Why do programmers confuse Halloween and Christmas?
A: Because OCT 31 == DEC 25.




^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-31 15:56                 ` Al Boldi
  2006-01-31 16:34                   ` Kyle Moffett
@ 2006-01-31 16:34                   ` Lennart Sorensen
  2006-01-31 19:23                   ` Jamie Lokier
                                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 75+ messages in thread
From: Lennart Sorensen @ 2006-01-31 16:34 UTC (permalink / raw)
  To: Al Boldi; +Cc: Kyle Moffett, Bryan Henderson, linux-fsdevel, linux-kernel

On Tue, Jan 31, 2006 at 06:56:17PM +0300, Al Boldi wrote:
> Faulty, because we are currently running a legacy solution to workaround an 
> 8,16,(32) arch bits address space limitation, which does not exist in 
> 64bits+ archs for most purposes.
> 
> Trying to defend the current way would be similar to rejecting the move from 
> 16bit to 32bit. Do you remember that time?  One of the arguments used was:  
> the current way works pretty well so far.
> 
> The advice here would be:  wake up and smell the coffee.
> 
> There is a lot to gain, for one there is no more swapping w/ all its related 
> side-effects.  You're dealing with memory only.  You can also run your fs 
> inside memory, like tmpfs, which is definitely faster.  And there may be 
> lots of other advantages, due to the simplified architecture applied.

Of course there is swapping.  The cpu only executes thigns from physical
memory, so at some point you have to load stuff from disk to physical
memory.  That seems amazingly much like the definition of swapping too.
Sometimes you call it loading.  Not much difference really.  If
something else is occupying physical memory so there isn't room, it has
to be put somewhere, which if it is just caching some physical disk
space, you just dump it, but if it is some giant chunk of data you are
currently generating, then it needs to go to some other place that
handles temporary data that doesn't already have a palce in the
filesystem.  Unless you have infinite physical memory, at some point you
will have to move temporary data from physical memory to somewhere else.
That is swapping no matter how you view the system's address space.
Making it be called something else doesn't change the facts.
Applications don't currently care if they are swapped to disk or in
physical memory.  That is handled by the OS and is transparent to the
application.

> If you didn't understand it's meaning.  The shortest path meaning accessing 
> hw w/o running workarounds; using 64bits+ to fly over past limitations.

THe OS still has to map the address space to where it physically exists.
Mapping all disk space into the address space may actually be a lot less
efficient than using the filesystem interface for a block device.

> Uhh?
> The point here is: Even if there were 64bit archs available in the past, this 
> did not mean that moving into native 64bits would be commercially viable, 
> due to its unavailability on the mass-market.
> 
> So with 64bits widely available now, and to let Linux spread its wings and 
> really fly, how could tmpfs merged w/ swap be tweaked to provide direct 
> mapped access into this linear address space?

Applications can mmap files if they want to.  Your idea seems likely to
make the OS much more complex, and waste a lot of resources on mapping
disk space to the address space, and from the applications point of view
it doesn't seem to make any difference at all.  It might be a fun idea
for some academic research OS somewhere to go work out the kinks and see
if it has any efficiency at all in real use.  Given Linux runs on lots
of architectures, trying to make it work completely differently on 64bit
systems doesn't make that much sense really, especially when there is no
apparent benefit to the change.

Len Sorensen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-31 15:56                 ` Al Boldi
  2006-01-31 16:34                   ` Kyle Moffett
  2006-01-31 16:34                   ` Lennart Sorensen
@ 2006-01-31 19:23                   ` Jamie Lokier
  2006-02-01  4:06                   ` Barry K. Nathan
  2006-02-02 15:11                   ` Alan Cox
  4 siblings, 0 replies; 75+ messages in thread
From: Jamie Lokier @ 2006-01-31 19:23 UTC (permalink / raw)
  To: Al Boldi; +Cc: Kyle Moffett, Bryan Henderson, linux-fsdevel, linux-kernel

Al Boldi wrote:
> There is a lot to gain, for one there is no more swapping w/ all its related 
> side-effects.  You're dealing with memory only.

I'm sorry, I think I don't understand.  My weakness.  Can you please explain?

Presumably you will want access to more data than you have RAM,
because RAM is still limited to a few GB these days, whereas a typical
personal data store is a few 100s of GB.

64-bit architecture doesn't change this mismatch.  So how do you
propose to avoid swapping to/from a disk, with all the time delays and
I/O scheduling algorithms that needs?

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-31 16:34                   ` Kyle Moffett
@ 2006-01-31 23:14                     ` Bryan Henderson
  0 siblings, 0 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-01-31 23:14 UTC (permalink / raw)
  To: Kyle Moffett; +Cc: Al Boldi, linux-fsdevel, linux-kernel

>1) IF the ONLY reason this was not done before is that 64-bit archs 
>were hard to get, then you are right.
>
>2) IF there were OTHER reasons, then you are not correct.
>
>This is the argument.  You keep discussing how 64-bit archs were not 
>easily available before and are now, and I AGREE, but that is NOT 
>RELEVANT to the point he made. 

As I remember it, my argument was that single level storage was known and 
practical for 25 years and people did not flock to it, therefore they must 
not see it as useful.  So if 64 bit processors were not available enough 
during that time, that blows away my argument, because people might have 
liked the idea but just couldn't afford the necessary address width.  It 
doesn't matter if there were other reasons to shun the technology; all it 
takes is one.  And if 64 bit processors are more available today, that 
might tip the balance in favor of making the change away from multilevel 
storage.

But I don't really buy that 64 bit processors weren't available until 
recently.  I think they weren't produced in commodity fashion because 
people didn't have a need for them.  They saw what you can do with 128 bit 
addresses (i.e. single level storage) in the IBM I Series line, but 
weren't impressed.  People added lots of other new technology to the 
mainstream CPU lines, but not additional address bits.  Not until they 
wanted to address more than 4G of main memory at a time did they see any 
reason to make 64 bit processors in volume.

Ergo, I do think it was something bigger that made the industry stick with 
traditional multilevel storage all these years.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems




^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-31 15:56                 ` Al Boldi
                                     ` (2 preceding siblings ...)
  2006-01-31 19:23                   ` Jamie Lokier
@ 2006-02-01  4:06                   ` Barry K. Nathan
  2006-02-01  9:51                     ` Andrew Walrond
  2006-02-02 15:11                   ` Alan Cox
  4 siblings, 1 reply; 75+ messages in thread
From: Barry K. Nathan @ 2006-02-01  4:06 UTC (permalink / raw)
  To: Al Boldi; +Cc: Kyle Moffett, Bryan Henderson, linux-fsdevel, linux-kernel

On 1/31/06, Al Boldi <a1426z@gawab.com> wrote:
> Faulty, because we are currently running a legacy solution to workaround an
> 8,16,(32) arch bits address space limitation, which does not exist in
> 64bits+ archs for most purposes.

In the early 1990's (and maybe even the mid 90's), the typical hard
disk's storage could theoretically be byte-addressed using 32-bit
addresses -- just as (if I understand you correctly) you are arguing
that today's hard disks can be byte-addressed using 64-bit addresses.

If this was going to be practical ever (on commodity hardware anyway),
I would have expected someone to try it on a 32-bit PC or Mac when
hard drives were in the 100MB-3GB range... That suggests to me that
there's a more fundamental reason (i.e. other than lack of address
space) that caused people to stick with the current scheme.

[snip]
> There is a lot to gain, for one there is no more swapping w/ all its related
> side-effects.  You're dealing with memory only.  You can also run your fs
> inside memory, like tmpfs, which is definitely faster.  And there may be
> lots of other advantages, due to the simplified architecture applied.

tmpfs isn't "definitely faster". Remember those benchmarks where Linux
ext2 beat Solaris tmpfs?
http://www.tux.org/lkml/#s9-12

Also, the only way I see where "there is no more swapping" and
"[y]ou're dealing with memory only" is if the disk *becomes* main
memory, and main memory becomes an L3 (or L4) cache for the CPU [and
as a consequence, main memory also becomes the main form of long-term
storage]. Is that what you're proposing?

If so, then it actually makes *less* sense to me than before -- with
your scheme, you've reduced the speed of main memory by 100x or more,
then you try to compensate with a huge cache. IOW, you've reduced the
speed of *main* memory to (more or less) the speed of today's swap!
Suddenly it doesn't sound so good anymore...

--
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-01  4:06                   ` Barry K. Nathan
@ 2006-02-01  9:51                     ` Andrew Walrond
  2006-02-01 17:51                       ` Lennart Sorensen
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Walrond @ 2006-02-01  9:51 UTC (permalink / raw)
  To: linux-kernel

On Wednesday 01 February 2006 04:06, Barry K. Nathan wrote:
>
> Also, the only way I see where "there is no more swapping" and
> "[y]ou're dealing with memory only" is if the disk *becomes* main
> memory, and main memory becomes an L3 (or L4) cache for the CPU [and
> as a consequence, main memory also becomes the main form of long-term
> storage]. Is that what you're proposing?
>

In the not-too-distant future, there is likely to be a ram/disk price 
inversion; ram becomes cheaper/mb than disk. At that point, we'll be buying 
hardware based on "how much disk can I afford to provide power-off backup of 
my ram?" rather than "how much ram can I afford?"

At that point, things will change.

Maybe, then, everything _will_ be in ram (with the kernel will intelligently 
write out pages to the disk in the background, incase of power failure and 
ready for a shutdown). Disk reads only ever occur during a power-on 
population of ram.

Blue skys....

Andrew Walrond

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-01  9:51                     ` Andrew Walrond
@ 2006-02-01 17:51                       ` Lennart Sorensen
  2006-02-01 18:21                         ` Andrew Walrond
  0 siblings, 1 reply; 75+ messages in thread
From: Lennart Sorensen @ 2006-02-01 17:51 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel

On Wed, Feb 01, 2006 at 09:51:08AM +0000, Andrew Walrond wrote:
> In the not-too-distant future, there is likely to be a ram/disk price 
> inversion; ram becomes cheaper/mb than disk. At that point, we'll be buying 
> hardware based on "how much disk can I afford to provide power-off backup of 
> my ram?" rather than "how much ram can I afford?"

Hmm...

I resently bought a 250GB HD for my machine for $112, which is $0.50/GB
or $0.0005/MB.  I bought 512M ram for $55. which is $0.10/MB.  The ram
cost 200 times more per MB than the disk space.

In 1992 I got a 245MB HD for a new machine for $500 as far as I recall,
which was $2/MB.  I got 16MB ram for $800, which was $50/MB.  The ram
cost 25 times more than the disk space.

So just what kind of price trend are you looking at that will let you
get ram cheaper than disk space any time soon?  There has never been
such a trend yet as far as I know.  Maybe you have better data than me.
My experience shows the other direction.  Both memory and disk space are
much cheaper than they used to be, but the disk space has reduced in
price much faster than memory.

> At that point, things will change.

Sure, except I don't believe it will ever happen.

> Maybe, then, everything _will_ be in ram (with the kernel will intelligently 
> write out pages to the disk in the background, incase of power failure and 
> ready for a shutdown). Disk reads only ever occur during a power-on 
> population of ram.

Len Sorensen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-01 17:51                       ` Lennart Sorensen
@ 2006-02-01 18:21                         ` Andrew Walrond
  2006-02-01 18:25                           ` Lennart Sorensen
  0 siblings, 1 reply; 75+ messages in thread
From: Andrew Walrond @ 2006-02-01 18:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: Lennart Sorensen

On Wednesday 01 February 2006 17:51, Lennart Sorensen wrote:
>
> So just what kind of price trend are you looking at that will let you
> get ram cheaper than disk space any time soon?  There has never been
> such a trend yet as far as I know.  Maybe you have better data than me.
> My experience shows the other direction.  Both memory and disk space are
> much cheaper than they used to be, but the disk space has reduced in
> price much faster than memory.
>

I cannot disagree with the obvious trend to date, but rather than argue the 
many reasons why ram prices are artificially high right now, instead just 
grab a stick of ram in your left hand, and the heavy lump of precision 
engineered metal that is a hard drive in your right, and see if you can 
convince yourself that the one on the right will still be ahead of the curve 
in another 14 years.

Maybe it will. Drop me a mail in 2020 and I'll shout you dinner if you're 
right ;)

Andrew

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-01 18:21                         ` Andrew Walrond
@ 2006-02-01 18:25                           ` Lennart Sorensen
  0 siblings, 0 replies; 75+ messages in thread
From: Lennart Sorensen @ 2006-02-01 18:25 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel

On Wed, Feb 01, 2006 at 06:21:12PM +0000, Andrew Walrond wrote:
> I cannot disagree with the obvious trend to date, but rather than argue the 
> many reasons why ram prices are artificially high right now, instead just 
> grab a stick of ram in your left hand, and the heavy lump of precision 
> engineered metal that is a hard drive in your right, and see if you can 
> convince yourself that the one on the right will still be ahead of the curve 
> in another 14 years.

A metal case with a small circuit board, and some magnetic splattered
(very precisly) on a disk doesn't seem like as much work as trying to
fit over 10^12 transistors onto dies fitting in the same space.  Making
waffers for memory isn't free, and higher densities take work to
develop.  I am not sure what the current density for ram if in terms of
bits per area.  I am sure it is a lot less than what a harddisk managed
with magnetic material.  I am amazed either one works.

> Maybe it will. Drop me a mail in 2020 and I'll shout you dinner if you're 
> right ;)

We will see. :)

Len Sorensen

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-01-31 15:56                 ` Al Boldi
                                     ` (3 preceding siblings ...)
  2006-02-01  4:06                   ` Barry K. Nathan
@ 2006-02-02 15:11                   ` Alan Cox
  2006-02-02 18:59                     ` Al Boldi
  4 siblings, 1 reply; 75+ messages in thread
From: Alan Cox @ 2006-02-02 15:11 UTC (permalink / raw)
  To: Al Boldi; +Cc: Kyle Moffett, Bryan Henderson, linux-fsdevel, linux-kernel

On Maw, 2006-01-31 at 18:56 +0300, Al Boldi wrote:
> So with 64bits widely available now, and to let Linux spread its wings and 
> really fly, how could tmpfs merged w/ swap be tweaked to provide direct 
> mapped access into this linear address space?

Why bother. You can already create a private large file and mmap it if
you want to do this, and you will get better performance than being
smeared around swap with everyone else.

Currently swap means your data is mixed in with other stuff. Swap could
do preallocation of each vma when running in limited overcommit modes
and it would run a lot faster if you did but you would pay a lot in
flexibility and efficiency, as well as needing a lot more swap.

Far better to let applications wanting to work this way do it
themselves. Just mmap and the cache balancing and pager will do the rest
for you.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-02 15:11                   ` Alan Cox
@ 2006-02-02 18:59                     ` Al Boldi
  2006-02-02 22:33                       ` Bryan Henderson
  2006-02-03 14:46                       ` Alan Cox
  0 siblings, 2 replies; 75+ messages in thread
From: Al Boldi @ 2006-02-02 18:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: Kyle Moffett, Bryan Henderson, linux-fsdevel, linux-kernel

Alan Cox wrote:
> On Maw, 2006-01-31 at 18:56 +0300, Al Boldi wrote:
> > So with 64bits widely available now, and to let Linux spread its wings
> > and really fly, how could tmpfs merged w/ swap be tweaked to provide
> > direct mapped access into this linear address space?
>
> Why bother. You can already create a private large file and mmap it if
> you want to do this, and you will get better performance than being
> smeared around swap with everyone else.
>
> Currently swap means your data is mixed in with other stuff. Swap could
> do preallocation of each vma when running in limited overcommit modes
> and it would run a lot faster if you did but you would pay a lot in
> flexibility and efficiency, as well as needing a lot more swap.
>
> Far better to let applications wanting to work this way do it
> themselves. Just mmap and the cache balancing and pager will do the rest
> for you.

So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be added 
to the total memory available to the OS, as is done w/ swap?

And if that's possible, why not replace swap w/ mmap'd disk-space?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-02 18:59                     ` Al Boldi
@ 2006-02-02 22:33                       ` Bryan Henderson
  2006-02-03 14:46                       ` Alan Cox
  1 sibling, 0 replies; 75+ messages in thread
From: Bryan Henderson @ 2006-02-02 22:33 UTC (permalink / raw)
  To: Al Boldi; +Cc: Alan Cox, linux-fsdevel, linux-kernel, Kyle Moffett

>So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be 
added 
>to the total memory available to the OS, as is done w/ swap?

Yes.

>And if that's possible, why not replace swap w/ mmap'd disk-space?

Because mmapped disk space has a permanent mapping of address to disk 
location.  That's how the earliest virtual memory systems worked, but we 
moved beyond that to what we have now (what we've been calling swapping), 
where the mapping gets established at the last possible moment, which 
means we can go a lot faster.  E.g. when the OS needs to steal 10 page 
frames used for malloc pages which are scattered across the virtual 
address space, it could write all those pages out in a single cluster 
wherever a disk head happens to be at the moment.

Also, given that we use multiple address spaces (my shell and your shell 
both have an Address 0, but they're different pages), there'd be a giant 
allocation problem in assigning a contiguous area of disk to each address 
space.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-02 18:59                     ` Al Boldi
  2006-02-02 22:33                       ` Bryan Henderson
@ 2006-02-03 14:46                       ` Alan Cox
  1 sibling, 0 replies; 75+ messages in thread
From: Alan Cox @ 2006-02-03 14:46 UTC (permalink / raw)
  To: Al Boldi; +Cc: Kyle Moffett, Bryan Henderson, linux-fsdevel, linux-kernel

On Iau, 2006-02-02 at 21:59 +0300, Al Boldi wrote:
> So w/ 1GB RAM, no swap, and 1TB disk mmap'd, could this mmap'd space be added 
> to the total memory available to the OS, as is done w/ swap?

Yes in theory. It would be harder to manage.

> And if that's possible, why not replace swap w/ mmap'd disk-space?

Swap is just somewhere to stick data that isnt file backed, you could
build a swapless mmap based OS but it wouldn't be quite the same as
Unix/Linux are.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-01 14:38 ` Jamie Lokier
@ 2006-02-02 12:26   ` Al Boldi
  0 siblings, 0 replies; 75+ messages in thread
From: Al Boldi @ 2006-02-02 12:26 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: linux-kernel, linux-fsdevel

Jamie Lokier wrote:
> If I understand your scheme, you're suggesting the kernel accesses
> disks, filesystems, etc. by simply reading and writing somewhere in
> the 64-bit address space.
>
> At some level, that will involve page faults to move data between RAM and
> disk.
>
> Those page faults are relatively slow - governed by the CPU's page
> fault mechanism.  Probably slower than what the kernel does now:
> testing flags and indirecting through "struct page *".

Is there a way to benchmark this difference?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
  2006-02-01 13:58 Al Boldi
@ 2006-02-01 14:38 ` Jamie Lokier
  2006-02-02 12:26   ` Al Boldi
  0 siblings, 1 reply; 75+ messages in thread
From: Jamie Lokier @ 2006-02-01 14:38 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, linux-fsdevel

Al Boldi wrote:
> > Presumably you will want access to more data than you have RAM,
> > because RAM is still limited to a few GB these days, whereas a typical
> > personal data store is a few 100s of GB.
> >
> > 64-bit architecture doesn't change this mismatch.  So how do you
> > propose to avoid swapping to/from a disk, with all the time delays and
> > I/O scheduling algorithms that needs?
> 
> This is exactly what a linear-mapped memory model avoids.
> Everything is already mapped into memory/disk.

Having everything mapped to memory/disk *does not* avoid time delays
and I/O scheduling.  At some level, whether it's software or hardware,
something has to schedule the I/O to disk because there isn't enough RAM.

How do you propose to avoid those delays?

In my terminology, I/O of pages between disk and memory is called
swapping.  (Or paging, or loading, or virtual memory I/O...)

Perhaps you have a different terminology?

> Would you call reading and writing to memory/disk swapping?

Yes, if it involves the disk and heuristic paging decisions.  Whether
that's handled by software or hardware.

> > Applications don't currently care if they are swapped to disk or in
> > physical memory.  That is handled by the OS and is transparent to the
> > application.
> 
> Yes, a linear-mapped memory model extends this transparency to the OS.

Yes, that is possible.  It's slow in practice because that
transparency comes at the cost of page faults (when the OS accesses
that linear-mapped memory), which is slow on the kinds of CPU we are
talking about - i.e. commodity 64-bit chips.

> > > If you didn't understand it's meaning.  The shortest path meaning
> > > accessing hw w/o running workarounds; using 64bits+ to fly over past
> > > limitations.
> >
> > THe OS still has to map the address space to where it physically exists.
> > Mapping all disk space into the address space may actually be a lot less
> > efficient than using the filesystem interface for a block device.
> 
> Did you try tmpfs?

Actually, mmap() to read a tmpfs file can be slower than just calling
read(), for some access patterns.  It's because page faults, which are
used to map the file, can be slower than copying data.  However,
copying uses more memory.  Today we leave it to the application to
decide which method to use.

> > If so, then it actually makes *less* sense to me than before -- with
> > your scheme, you've reduced the speed of main memory by 100x or more,
> > then you try to compensate with a huge cache. IOW, you've reduced the
> > speed of *main* memory to (more or less) the speed of today's swap!
> > Suddenly it doesn't sound so good anymore...
> 
> There really isn't anything new here; we do swap and access the fs on disk 
> and compensate with a huge dcache now.  All this idea implies, is to remove 
> certain barriers that could not be easily passed before, thus move swap and 
> fs into main memory.
> 
> Can you see how removing barriers would aid performance?

I suspect that, despite possibly simplifying code, removing those
barriers would make it run slower.

If I understand your scheme, you're suggesting the kernel accesses
disks, filesystems, etc. by simply reading and writing somewhere in
the 64-bit address space.

At some level, that will involve page faults to move data between RAM and disk.

Those page faults are relatively slow - governed by the CPU's page
fault mechanism.  Probably slower than what the kernel does now:
testing flags and indirecting through "struct page *".

However, do feel free to try out your idea.  If it is actually notably
faster, or if it makes no difference to speed but makes a lot of code
simpler, well then surely it will be interesting.

-- Jamie

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [RFC] VM: I have a dream...
@ 2006-02-01 13:58 Al Boldi
  2006-02-01 14:38 ` Jamie Lokier
  0 siblings, 1 reply; 75+ messages in thread
From: Al Boldi @ 2006-02-01 13:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

Thanks for your detailed responses!

Kyle Moffett wrote:
> BTW, unless you have a patch or something to propose, let's take this
> off-list, it's getting kind of OT now.

No patches yet, but even if there were, would they get accepted?

> On Jan 31, 2006, at 10:56, Al Boldi wrote:
> > Kyle Moffett wrote:
> >> Is it necessarily faulty?  It seems to me that the current way
> >> works pretty well so far, and unless you can prove a really strong
> >> point the other way, there's no point in changing.  You have to
> >> remember that change introduces bugs which then have to be located
> >> and removed again, so change is not necessarily cheap.
> >
> > Faulty, because we are currently running a legacy solution to
> > workaround an 8,16,(32) arch bits address space limitation, which
> > does not exist in 64bits+ archs for most purposes.
>
> There are a lot of reasons for paging, only _one_ of them is/was to
> deal with too-small address spaces.  Other reasons are that sometimes
> you really _do_ want a nonlinear mapping of data/files/libs/etc.  It
> also allows easy remapping of IO space or video RAM into application
> address spaces, etc.  If you have a direct linear mapping from
> storage into RAM, common non-linear mappings become _extremely_
> complex and CPU-intensive.
>
> Besides, you never did address the issue of large changes causing
> large bugs.  Any large change needs to have advantages proportional
> to the bugs it will cause, and you have not yet proven this case.

How could reverting a workaround introduce large bugs?

> > Trying to defend the current way would be similar to rejecting the
> > move from  16bit to 32bit. Do you remember that time?  One of the
> > arguments used was:  the current way works pretty well so far.
>
> Arbitrary analogies do not prove things.

Analogies are there to make a long story short.

> Can you cite examples that
> clearly indicate how paged-memory is to direct-linear-mapping as 16-
> bit processors are to 32-bit processors?

I mentioned this in a previous message.

> > There is a lot to gain, for one there is no more swapping w/ all
> > its related side-effects.
>
> This is *NOT* true.  When you have more data than RAM, you have to
> put data on disk, which means swapping, regardless of the method in
> which it is done.
>
> > You're dealing with memory only.  You can also run your fs inside
> > memory, like tmpfs, which is definitely faster.
>
> Not on Linux.  We have a whole unique dcache system precisely so that
> a frequently accessed filesystem _is_ as fast as tmpfs (Unless you're
> writing and syncing a lot, in which case you still need to wait for
> disk hardware to commit data).

This is true, and may very well explain why dcache is so CPU intensive.

> > And there may be lots of other advantages, due to the simplified
> > architecture applied.
>
> Can you describe in detail your "simplified architecture"?? I can't
> see any significant complexity advantages over the standard paging
> model that Linux has.
>
> >>> Why would you think that the shortest path between two points is
> >>> complicated, when you have the ability to fly?
> >>
> >> Bad analogy.
> >
> > If you didn't understand it's meaning.  The shortest path meaning
> > accessing hw w/o running workarounds; using 64bits+ to fly over
> > past limitations.
>
> This makes *NO* technical sense and is uselessly vague.  Applying
> vague indirect analogies to technical topics is a fruitless
> endeavor.  Please provide technical points and reasons why it _is_
> indead shorter/better/faster, and then you can still leave out the
> analogy because the technical argument is sufficient.
>
> >>>> But unless the stumbling block since 1980 has been that it was too
> >>>> hard to get/make a CPU with a 64 bit address space, I don't see
> >>>> what's different today.
> >>>
> >>> You are hitting the nail right on it's head here. Nothing moves the
> >>> masses like mass-production.
> >>
> >> Uhh, no, you misread his argument: If there were other reasons that
> >> this was not done in the past than lack of 64-bit CPUS, then this is
> >> probably still not practical/feasible/desirable.
> >
> > Uhh?
> > The point here is: Even if there were 64bit archs available in the
> > past, this did not mean that moving into native 64bits would be
> > commercially viable, due to its unavailability on the mass-market.
>
> Are you even reading these messages?

Bryan Henderson wrote:
> >1) IF the ONLY reason this was not done before is that 64-bit archs
> >were hard to get, then you are right.
> >
> >2) IF there were OTHER reasons, then you are not correct.
> >
> >This is the argument.  You keep discussing how 64-bit archs were not
> >easily available before and are now, and I AGREE, but that is NOT
> >RELEVANT to the point he made.
>
> As I remember it, my argument was that single level storage was known and
> practical for 25 years and people did not flock to it, therefore they must
> not see it as useful.  So if 64 bit processors were not available enough
> during that time, that blows away my argument, because people might have
> liked the idea but just couldn't afford the necessary address width.  It
> doesn't matter if there were other reasons to shun the technology; all it
> takes is one.  And if 64 bit processors are more available today, that
> might tip the balance in favor of making the change away from multilevel
> storage.

Thanks for clarifying this!

> But I don't really buy that 64 bit processors weren't available until
> recently.  I think they weren't produced in commodity fashion because
> people didn't have a need for them.  They saw what you can do with 128 bit
> addresses (i.e. single level storage) in the IBM I Series line, but
> weren't impressed.  People added lots of other new technology to the
> mainstream CPU lines, but not additional address bits.  Not until they
> wanted to address more than 4G of main memory at a time did they see any
> reason to make 64 bit processors in volume.

True, so with 64bits=16MTB what reason would there be to stick with a swapped 
memory model?

Jamie Lokier wrote:
> Al Boldi wrote:
> > There is a lot to gain, for one there is no more swapping w/ all its
> > related side-effects.  You're dealing with memory only.
>
> I'm sorry, I think I don't understand.  My weakness.  Can you please
> explain?
>
> Presumably you will want access to more data than you have RAM,
> because RAM is still limited to a few GB these days, whereas a typical
> personal data store is a few 100s of GB.
>
> 64-bit architecture doesn't change this mismatch.  So how do you
> propose to avoid swapping to/from a disk, with all the time delays and
> I/O scheduling algorithms that needs?

This is exactly what a linear-mapped memory model avoids.
Everything is already mapped into memory/disk.

Lennart Sorensen wrote:
> Of course there is swapping.  The cpu only executes thigns from physical
> memory, so at some point you have to load stuff from disk to physical
> memory.  That seems amazingly much like the definition of swapping too.
> Sometimes you call it loading.  Not much difference really.  If
> something else is occupying physical memory so there isn't room, it has
> to be put somewhere, which if it is just caching some physical disk
> space, you just dump it, but if it is some giant chunk of data you are
> currently generating, then it needs to go to some other place that
> handles temporary data that doesn't already have a palce in the
> filesystem.  Unless you have infinite physical memory, at some point you
> will have to move temporary data from physical memory to somewhere else.
> That is swapping no matter how you view the system's address space.
> Making it be called something else doesn't change the facts.

Would you call reading and writing to memory/disk swapping?

> Applications don't currently care if they are swapped to disk or in
> physical memory.  That is handled by the OS and is transparent to the
> application.

Yes, a linear-mapped memory model extends this transparency to the OS.

> > If you didn't understand it's meaning.  The shortest path meaning
> > accessing hw w/o running workarounds; using 64bits+ to fly over past
> > limitations.
>
> THe OS still has to map the address space to where it physically exists.
> Mapping all disk space into the address space may actually be a lot less
> efficient than using the filesystem interface for a block device.

Did you try tmpfs?

> > Uhh?
> > The point here is: Even if there were 64bit archs available in the past,
> > this did not mean that moving into native 64bits would be commercially
> > viable, due to its unavailability on the mass-market.
> >
> > So with 64bits widely available now, and to let Linux spread its wings
> > and really fly, how could tmpfs merged w/ swap be tweaked to provide
> > direct mapped access into this linear address space?
>
> Applications can mmap files if they want to.  Your idea seems likely to
> make the OS much more complex, and waste a lot of resources on mapping
> disk space to the address space, and from the applications point of view
> it doesn't seem to make any difference at all.  It might be a fun idea
> for some academic research OS somewhere to go work out the kinks and see
> if it has any efficiency at all in real use.  Given Linux runs on lots
> of architectures, trying to make it work completely differently on 64bit
> systems doesn't make that much sense really, especially when there is no
> apparent benefit to the change.

Arch bits have nothing to do with a linear-mapped memory model, they only 
limit its usefulness.  So with 8,16,(32) bits this linear-mapped model isn't 
really viable because of its address-space limit.  But with a 64bit+ arch 
the limits are wide enough to make a linear-mapped model viable.  A 32bit 
arch is inbetween, so for some a 4GB limit may be acceptable.

Barry K. Nathan wrote:
> On 1/31/06, Al Boldi <a1426z@gawab.com> wrote:
> > Faulty, because we are currently running a legacy solution to workaround
> > an 8,16,(32) arch bits address space limitation, which does not exist in
> > 64bits+ archs for most purposes.
>
> In the early 1990's (and maybe even the mid 90's), the typical hard
> disk's storage could theoretically be byte-addressed using 32-bit
> addresses -- just as (if I understand you correctly) you are arguing
> that today's hard disks can be byte-addressed using 64-bit addresses.
>
> If this was going to be practical ever (on commodity hardware anyway),
> I would have expected someone to try it on a 32-bit PC or Mac when
> hard drives were in the 100MB-3GB range... That suggests to me that
> there's a more fundamental reason (i.e. other than lack of address
> space) that caused people to stick with the current scheme.

32bits is in brackets - 8,16,(32) - to high-light that it's an inbetween.

> tmpfs isn't "definitely faster". Remember those benchmarks where Linux
> ext2 beat Solaris tmpfs?

Linux tmpfs is faster because it can short-circuit dcache, in effect doing an 
o_sync.  It slows down when swapping kicks in.

> Also, the only way I see where "there is no more swapping" and
> "[y]ou're dealing with memory only" is if the disk *becomes* main
> memory, and main memory becomes an L3 (or L4) cache for the CPU [and
> as a consequence, main memory also becomes the main form of long-term
> storage]. Is that what you're proposing?

In the long-term yes, maybe even move it into hardware.  But for the 
short-term there is no need to blow things out of proportion, a simple 
tweaking of tmpfs merged w/ swap may do the trick quick and easy.

> If so, then it actually makes *less* sense to me than before -- with
> your scheme, you've reduced the speed of main memory by 100x or more,
> then you try to compensate with a huge cache. IOW, you've reduced the
> speed of *main* memory to (more or less) the speed of today's swap!
> Suddenly it doesn't sound so good anymore...

There really isn't anything new here; we do swap and access the fs on disk 
and compensate with a huge dcache now.  All this idea implies, is to remove 
certain barriers that could not be easily passed before, thus move swap and 
fs into main memory.

Can you see how removing barriers would aid performance?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2006-02-03 17:41 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-21 18:08 [RFC] VM: I have a dream Al Boldi
2006-01-21 18:42 ` Jamie Lokier
2006-01-21 18:46 ` Avi Kivity
2006-01-23 19:52   ` Bryan Henderson
2006-01-25 22:04     ` Al Boldi
2006-01-26 19:18       ` Bryan Henderson
2006-01-27 16:12         ` Al Boldi
2006-01-27 19:17           ` Bryan Henderson
2006-01-30 13:21             ` Al Boldi
2006-01-30 13:35               ` Kyle Moffett
2006-01-31 15:56                 ` Al Boldi
2006-01-31 16:34                   ` Kyle Moffett
2006-01-31 23:14                     ` Bryan Henderson
2006-01-31 16:34                   ` Lennart Sorensen
2006-01-31 19:23                   ` Jamie Lokier
2006-02-01  4:06                   ` Barry K. Nathan
2006-02-01  9:51                     ` Andrew Walrond
2006-02-01 17:51                       ` Lennart Sorensen
2006-02-01 18:21                         ` Andrew Walrond
2006-02-01 18:25                           ` Lennart Sorensen
2006-02-02 15:11                   ` Alan Cox
2006-02-02 18:59                     ` Al Boldi
2006-02-02 22:33                       ` Bryan Henderson
2006-02-03 14:46                       ` Alan Cox
2006-01-30 16:49               ` Bryan Henderson
2006-01-26  0:03     ` Jon Smirl
2006-01-26 19:48       ` Bryan Henderson
2006-01-22  8:16 ` Pavel Machek
2006-01-22 12:33 ` Robin Holt
2006-01-23 18:03   ` Al Boldi
2006-01-23 18:40     ` Valdis.Kletnieks
2006-01-23 19:26       ` Benjamin LaHaise
2006-01-23 19:40         ` Valdis.Kletnieks
2006-01-23 22:26     ` Pavel Machek
2006-01-22 19:55 ` Barry K. Nathan
2006-01-23  5:23   ` Michael Loftis
2006-01-23  5:46     ` Chase Venters
2006-01-23  8:20       ` Barry K. Nathan
2006-01-23 13:17       ` Jamie Lokier
2006-01-23 20:21         ` Peter Chubb
2006-01-23 15:05     ` Ram Gupta
2006-01-23 15:26       ` Diego Calleja
2006-01-23 16:11         ` linux-os (Dick Johnson)
2006-01-23 16:50           ` Jamie Lokier
2006-01-24  2:08           ` Horst von Brand
2006-01-25  6:13             ` Jamie Lokier
2006-01-25  9:23             ` Bernd Petrovitsch
2006-01-25  9:42               ` Lee Revell
2006-01-25 15:02                 ` Jamie Lokier
2006-01-25 23:24                   ` Lee Revell
2006-01-25 15:05               ` Jamie Lokier
2006-01-25 15:47                 ` Bernd Petrovitsch
2006-01-25 16:09                 ` Diego Calleja
2006-01-25 17:26                   ` Jamie Lokier
2006-01-26 19:13                     ` Bryan Henderson
2006-01-25 23:28                 ` Lee Revell
2006-01-26  1:29                   ` Diego Calleja
2006-01-26  5:01                   ` Jamie Lokier
2006-01-26  5:11                     ` Lee Revell
2006-01-26 14:46                       ` Dave Kleikamp
2006-01-24  2:10           ` Horst von Brand
2006-01-25 22:27         ` Nix
2006-01-26 15:13           ` Denis Vlasenko
2006-01-26 16:23             ` Nix
2006-01-23 20:43       ` Michael Loftis
2006-01-23 22:42         ` Nikita Danilov
2006-01-24 14:36           ` Ram Gupta
2006-01-24 15:04             ` Diego Calleja
2006-01-24 20:59               ` Bryan Henderson
2006-01-24 15:11             ` Nikita Danilov
2006-01-23 22:57         ` Ram Gupta
2006-01-24 10:08         ` Meelis Roos
2006-02-01 13:58 Al Boldi
2006-02-01 14:38 ` Jamie Lokier
2006-02-02 12:26   ` Al Boldi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).