All of lore.kernel.org
 help / color / mirror / Atom feed
* mem=16MB laptop testing
@ 2003-10-14 10:55 William Lee Irwin III
  2003-10-14 11:01 ` John Bradford
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: William Lee Irwin III @ 2003-10-14 10:55 UTC (permalink / raw)
  To: linux-kernel

So I tried mem=16m on my laptop (stinkpad T21). I made the following
potentially useless observations:

MemTotal:        12424 kB
MemFree:           352 kB
Buffers:           180 kB
Cached:           1328 kB
SwapCached:       3548 kB
Active:           4576 kB
Inactive:          664 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        12424 kB
LowFree:           352 kB
SwapTotal:      997880 kB
SwapFree:       969112 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:           4320 kB
Slab:             4884 kB
Committed_AS:    45776 kB
PageTables:        656 kB
VmallocTotal:  1015752 kB
VmallocUsed:       732 kB
VmallocChunk:  1014368 kB

(a) The profile buffer requires about a 5MB bootmem allocation;
	this near halves MemTotal when used. I refrained from using it,
	as otherwise it's a test of mem=8m instead of mem=16m.

(b) bootmem allocations aren't adding up; after kernel text, data,
	and tracing __alloc_bootmem_core(), there is still about 0.5MB
	still missing from MemTotal. I still haven't found where it's
	gone. mem_map's bootmem allocation also didn't show up in the
	logs, but it should only be 160KB for 16MB of RAM, not 512KB.
	Matt Mackall spotted this, too.

(c) mem= no longer bounds the highest physical address, but rather
	the sum of memory in e820 entries post-sanitization. This
	means a ZONE_NORMAL with about 384KB showed up, with duly
	perverse heuristic consequences for page_alloc.c

(d) The system thrashed heavily on boot, allowing the largest mm
	to acquire an RSS no larger than about 100KB. This needed
	turning /proc/sys/vm/min_free_kb down to 128 to make the
	system behave closer to normally. Matt Mackall spotted this.

(e) About 4.8MB are consumed by slab allocations at runtime.
	The top 10 slab abusers are:

inode_cache               840K           840K     100.00%   
dentry_cache              746K           753K      99.07%   
ext3_inode_cache          591K           592K      99.84%   
size-4096                 504K           504K     100.00%   
size-512                  203K           204K      99.75%   
size-2048                 182K           204K      89.22%   
pgd                       188K           188K     100.00%   
task_struct               100K           108K      92.86%   
vm_area_struct             93K           101K      92.28%   
blkdev_requests           101K           101K     100.00%   

The inode_cache culprit is the obvious butt of many complaints:
# find /sys | wc -l
2656

... which accounts for 100% of the 840KB. TANSTAAFL. OTOH, maybe we
need to learn to do better than pinning dentries and inodes in-core...

(f) the VM appeared to favor processes that burn cpu and take many faults:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  nFLT COMMAND      
  486 wli       16   0  4196 2072  456 S  1.7 16.7   0:19.41 312k slabtop      
  413 wli       15   0  4360 1064  188 S  0.0  8.6   0:20.33 757k VMTop        
  420 wli       15   0  2004  456  320 R  0.3  3.7   0:15.41 229k top          
  416 wli       16   0  5964  184  116 S  0.0  1.5   0:01.09  13k sshd         
  435 root      15   0 22304  184   88 S  0.0  1.5   0:06.60  85k XFree86      
  409 wli       15   0  5964  180  112 S  0.0  1.4   0:00.21 1646 sshd         
  466 wli       16   0  5964  180  112 S  0.0  1.4   0:00.34 4598 sshd         
  373 root      15   0  1724  152  108 S  0.0  1.2   0:00.07 2126 cron         
  207 root      16   0  1520   96   48 S  0.0  0.8   0:00.14 4342 syslogd      
  417 wli       16   0  3088   88   68 S  0.0  0.7   0:00.08 2289 zsh          

The top 3 RSS consumers were statistics reporting programs that (of
course) burn immense amounts of cpu, and in what is probably no
coincidence, also dominate the nflt category. There are also a bunch
of mostly useless processes holding bits of RAM. Load control, anyone?

(g) X isn't terribly swift; it's slower than I remember old Sun IPC's
	being, though they had 24MB RAM. OTOH luserspace is much more
	bloated these days. zsh alone is at least 3 times the size of
	ksh, which I used back then. fvwm2 is a lot bigger than fvwm1.
	And so on and so forth. I guess the upshot is "unbloating" the
	kernel wouldn't do much good anyway, since luserspace isn't in
	any kind of shape to run in this kind of environment anymore either.


-- wli

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 10:55 mem=16MB laptop testing William Lee Irwin III
@ 2003-10-14 11:01 ` John Bradford
  2003-10-14 11:08   ` William Lee Irwin III
  2003-10-14 11:56 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: John Bradford @ 2003-10-14 11:01 UTC (permalink / raw)
  To: William Lee Irwin III, linux-kernel

> (g) X isn't terribly swift; it's slower than I remember old Sun IPC's
> 	being, though they had 24MB RAM. OTOH luserspace is much more
> 	bloated these days. zsh alone is at least 3 times the size of
> 	ksh, which I used back then. fvwm2 is a lot bigger than fvwm1.
> 	And so on and so forth. I guess the upshot is "unbloating" the
> 	kernel wouldn't do much good anyway, since luserspace isn't in
> 	any kind of shape to run in this kind of environment anymore either.

Depends on what you consider usable.  I thought X worked pretty well
in swapless 8MB last time I tried it, (last year, around 2.5.40).
Admittedly that was only running a few xterms locally.  A 4MB + 20MB
swap box was suprisingly usable for fairly intense remote applications
over a compressed 9600 bps serial link.

John.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:01 ` John Bradford
@ 2003-10-14 11:08   ` William Lee Irwin III
  2003-10-14 13:20     ` John Bradford
  0 siblings, 1 reply; 29+ messages in thread
From: William Lee Irwin III @ 2003-10-14 11:08 UTC (permalink / raw)
  To: John Bradford; +Cc: linux-kernel

At some point in the past, I wrote:
>> (g) X isn't terribly swift; it's slower than I remember old Sun IPC's
>> 	being, though they had 24MB RAM. OTOH luserspace is much more
>> 	bloated these days. zsh alone is at least 3 times the size of
>> 	ksh, which I used back then. fvwm2 is a lot bigger than fvwm1.
>> 	And so on and so forth. I guess the upshot is "unbloating" the
>> 	kernel wouldn't do much good anyway, since luserspace isn't in
>> 	any kind of shape to run in this kind of environment anymore either.

On Tue, Oct 14, 2003 at 12:01:00PM +0100, John Bradford wrote:
> Depends on what you consider usable.  I thought X worked pretty well
> in swapless 8MB last time I tried it, (last year, around 2.5.40).
> Admittedly that was only running a few xterms locally.  A 4MB + 20MB
> swap box was suprisingly usable for fairly intense remote applications
> over a compressed 9600 bps serial link.

It's not that it's particularly unusable, it was merely substantially
slower than vaguely comparable machines I remember from way back when.


-- wli

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 10:55 mem=16MB laptop testing William Lee Irwin III
  2003-10-14 11:01 ` John Bradford
@ 2003-10-14 11:56 ` Andrew Morton
  2003-10-14 11:58   ` Russell King
                     ` (3 more replies)
  2003-10-15  0:35 ` Nick Piggin
  2003-10-15  4:31 ` Andrew Morton
  3 siblings, 4 replies; 29+ messages in thread
From: Andrew Morton @ 2003-10-14 11:56 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> So I tried mem=16m on my laptop (stinkpad T21).

Thanks for doing this.  We should try to not suck in this situation.

> (a) The profile buffer requires about a 5MB bootmem allocation;
> 	this near halves MemTotal when used. I refrained from using it,
> 	as otherwise it's a test of mem=8m instead of mem=16m.

OK, so don't boot with `profile=N', yes?

> (b) bootmem allocations aren't adding up; after kernel text, data,
> 	and tracing __alloc_bootmem_core(), there is still about 0.5MB
> 	still missing from MemTotal. I still haven't found where it's
> 	gone. mem_map's bootmem allocation also didn't show up in the
> 	logs, but it should only be 160KB for 16MB of RAM, not 512KB.
> 	Matt Mackall spotted this, too.

Perhaps drop a printk(size) and a dump_stack() into the bootmem allocator,
then postprocess the dmesg output after it's booted?

> (c) mem= no longer bounds the highest physical address, but rather
> 	the sum of memory in e820 entries post-sanitization. This
> 	means a ZONE_NORMAL with about 384KB showed up, with duly
> 	perverse heuristic consequences for page_alloc.c

I don't understand this.  You mean almost all memory was in ZONE_DMA?

"mem=" does not accurately emulate having that much memory.  So a 512M box
booted with "mem=256M" has a different amount of memory from a 256M box
booted with no "mem=" option.  It would be nice to fix that, but I've never
looked into it.

> (d) The system thrashed heavily on boot, allowing the largest mm
> 	to acquire an RSS no larger than about 100KB. This needed
> 	turning /proc/sys/vm/min_free_kb down to 128 to make the
> 	system behave closer to normally. Matt Mackall spotted this.

hrm.  min_free_kbytes is normally 1024.  I'm surprised that the additional
900k made so much difference.  We must be on the hairy edge.

It looks like we need to precalculate/scale min_free_kbytes, yes?

> (e) About 4.8MB are consumed by slab allocations at runtime.
> 	The top 10 slab abusers are:
> 
> inode_cache               840K           840K     100.00%   
> dentry_cache              746K           753K      99.07%   
> ext3_inode_cache          591K           592K      99.84%   
> size-4096                 504K           504K     100.00%   
> size-512                  203K           204K      99.75%   
> size-2048                 182K           204K      89.22%   
> pgd                       188K           188K     100.00%   
> task_struct               100K           108K      92.86%   
> vm_area_struct             93K           101K      92.28%   
> blkdev_requests           101K           101K     100.00%   
> 
> The inode_cache culprit is the obvious butt of many complaints:
> # find /sys | wc -l
> 2656
> 
> ... which accounts for 100% of the 840KB. TANSTAAFL. OTOH, maybe we
> need to learn to do better than pinning dentries and inodes in-core...

I guess not mounting /sys doesn't help here.  It would be nice.  Maybe with
a CONFIG_I_WILL_NEVER_MOUNT_SYSFS we could avoid all those allocations.

> Load control, anyone?

Roger Luethi is working on it; I need to pay some attention to his patch. 
I expect we'll have something for post-2.6.0.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:56 ` Andrew Morton
@ 2003-10-14 11:58   ` Russell King
  2003-10-14 12:10     ` Andrew Morton
  2003-10-14 12:17   ` Anton Blanchard
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Russell King @ 2003-10-14 11:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, linux-kernel

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> I guess not mounting /sys doesn't help here.  It would be nice.  Maybe with
> a CONFIG_I_WILL_NEVER_MOUNT_SYSFS we could avoid all those allocations.

I believe sysfs is required for mounting the root filesystem - see
name_to_dev_t in init/do_mounts.c.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:58   ` Russell King
@ 2003-10-14 12:10     ` Andrew Morton
  2003-10-14 12:18       ` Russell King
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2003-10-14 12:10 UTC (permalink / raw)
  To: Russell King; +Cc: wli, linux-kernel

Russell King <rmk+lkml@arm.linux.org.uk> wrote:
>
> On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> > I guess not mounting /sys doesn't help here.  It would be nice.  Maybe with
> > a CONFIG_I_WILL_NEVER_MOUNT_SYSFS we could avoid all those allocations.
> 
> I believe sysfs is required for mounting the root filesystem - see
> name_to_dev_t in init/do_mounts.c.

OK.  But it looks like if /sys is empty and you provide "root=03:02" then
things will still work.  It's a matter of trying it...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:56 ` Andrew Morton
  2003-10-14 11:58   ` Russell King
@ 2003-10-14 12:17   ` Anton Blanchard
  2003-10-14 12:31     ` Andrew Morton
  2003-10-14 12:28   ` William Lee Irwin III
  2003-10-15 12:12   ` Pavel Machek
  3 siblings, 1 reply; 29+ messages in thread
From: Anton Blanchard @ 2003-10-14 12:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, linux-kernel

 
> hrm.  min_free_kbytes is normally 1024.  I'm surprised that the additional
> 900k made so much difference.  We must be on the hairy edge.
> 
> It looks like we need to precalculate/scale min_free_kbytes, yes?

That would be good for both the low and high end. Id like to see it
default to something larger on my 16GB+ machines.

Anton

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 12:10     ` Andrew Morton
@ 2003-10-14 12:18       ` Russell King
  2003-10-14 12:30         ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: Russell King @ 2003-10-14 12:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: wli, linux-kernel

On Tue, Oct 14, 2003 at 05:10:31AM -0700, Andrew Morton wrote:
> Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> > On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> > > I guess not mounting /sys doesn't help here.  It would be nice.  Maybe with
> > > a CONFIG_I_WILL_NEVER_MOUNT_SYSFS we could avoid all those allocations.
> > 
> > I believe sysfs is required for mounting the root filesystem - see
> > name_to_dev_t in init/do_mounts.c.
> 
> OK.  But it looks like if /sys is empty and you provide "root=03:02" then
> things will still work.  It's a matter of trying it...

Uhh?

dev_t name_to_dev_t(char *name)
{
        dev_t res = 0;

        sys_mkdir("/sys", 0700);
        if (sys_mount("sysfs", "/sys", "sysfs", 0, NULL) < 0)
                goto out;

	...

out:
        sys_rmdir("/sys");
        return res;
}

If sysfs can't be mounted, then it looks like we can't even decode a
numeric major:minor root device specification.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:56 ` Andrew Morton
  2003-10-14 11:58   ` Russell King
  2003-10-14 12:17   ` Anton Blanchard
@ 2003-10-14 12:28   ` William Lee Irwin III
  2003-10-15 12:12   ` Pavel Machek
  3 siblings, 0 replies; 29+ messages in thread
From: William Lee Irwin III @ 2003-10-14 12:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

William Lee Irwin III <wli@holomorphy.com> wrote:
>> So I tried mem=16m on my laptop (stinkpad T21).

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> Thanks for doing this.  We should try to not suck in this situation.

William Lee Irwin III <wli@holomorphy.com> wrote:
>> (a) The profile buffer requires about a 5MB bootmem allocation;
>> 	this near halves MemTotal when used. I refrained from using it,
>> 	as otherwise it's a test of mem=8m instead of mem=16m.

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> OK, so don't boot with `profile=N', yes?

That's pretty much my take on it, though it did rob me of profiles.
The next time I sleep I'll probably let it boot and bring it up with
mem=24m or something where I expect additional mem= to balance profile=


William Lee Irwin III <wli@holomorphy.com> wrote:
>> (b) bootmem allocations aren't adding up; after kernel text, data,
>> 	and tracing __alloc_bootmem_core(), there is still about 0.5MB
>> 	still missing from MemTotal. I still haven't found where it's
>> 	gone. mem_map's bootmem allocation also didn't show up in the
>> 	logs, but it should only be 160KB for 16MB of RAM, not 512KB.
>> 	Matt Mackall spotted this, too.

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> Perhaps drop a printk(size) and a dump_stack() into the bootmem allocator,
> then postprocess the dmesg output after it's booted?

That's actually exactly how I traced it. Possibly the log buffer
dropped messages or some such nonsense. It's single-threaded in early
boot so it's all mindless drivel anyway.


William Lee Irwin III <wli@holomorphy.com> wrote:
>> (c) mem= no longer bounds the highest physical address, but rather
>> 	the sum of memory in e820 entries post-sanitization. This
>> 	means a ZONE_NORMAL with about 384KB showed up, with duly
>> 	perverse heuristic consequences for page_alloc.c

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> I don't understand this.  You mean almost all memory was in ZONE_DMA?
> "mem=" does not accurately emulate having that much memory.  So a 512M box
> booted with "mem=256M" has a different amount of memory from a 256M box
> booted with no "mem=" option.  It would be nice to fix that, but I've never
> looked into it.

Linux version 2.6.0-test6-wli-6 (wli@megeira) (gcc version 3.3 (Debian)) #1 Wed Oct 8 14:45:07 PDT 2003
Video mode to be used for restore is f00
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000000fff0000 (usable)
 BIOS-e820: 000000000fff0000 - 000000000fffec00 (ACPI data)
 BIOS-e820: 000000000fffec00 - 0000000010000000 (ACPI NVS)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
user-defined physical RAM map:
 user: 0000000000000000 - 000000000009f800 (usable)
 user: 000000000009f800 - 00000000000a0000 (reserved)
 user: 00000000000e0000 - 0000000000100000 (reserved)
 user: 0000000000100000 - 0000000001060800 (usable)
16MB LOWMEM available.
On node 0 totalpages: 4192
  DMA zone: 4096 pages, LIFO batch:16
  Normal zone: 96 pages, LIFO batch:1
  HighMem zone: 0 pages, LIFO batch:1
DMI 2.3 present.
IBM machine detected. Enabling interrupts during APM calls.
IBM machine detected. Disabling SMBus accesses.
Building zonelist for node : 0
Kernel command line: root=/dev/hda2 mem=16m console=ttyS0,115200n8 init=/bin/sh

limit_regions() cuts e820 RAM regions short when the total size of
all the regions is seen to exceed the limit passed as an argument.
limit_regions() is how mem=${N}m does its dirty work.


William Lee Irwin III <wli@holomorphy.com> wrote:
>> (d) The system thrashed heavily on boot, allowing the largest mm
>> 	to acquire an RSS no larger than about 100KB. This needed
>> 	turning /proc/sys/vm/min_free_kb down to 128 to make the
>> 	system behave closer to normally. Matt Mackall spotted this.

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> hrm.  min_free_kbytes is normally 1024.  I'm surprised that the additional
> 900k made so much difference.  We must be on the hairy edge.
> It looks like we need to precalculate/scale min_free_kbytes, yes?

Well, ->pages_low and ->pages_high are twice it and thrice it
respectively, so we have a significant fraction of RAM involved in
the heuristics when the smoke clears. Now, exactly how these end up
influencing decisions I don't have decent enough logs for (the io for
logging got hard to keep up with in the presence of paging io). I can
probably arrange remote logging later on.


William Lee Irwin III <wli@holomorphy.com> wrote:
>> (e) About 4.8MB are consumed by slab allocations at runtime.
>> 	The top 10 slab abusers are:
>> inode_cache               840K           840K     100.00%   
>> dentry_cache              746K           753K      99.07%   
>> ext3_inode_cache          591K           592K      99.84%   
>> size-4096                 504K           504K     100.00%   
>> size-512                  203K           204K      99.75%   
>> size-2048                 182K           204K      89.22%   
>> pgd                       188K           188K     100.00%   
>> task_struct               100K           108K      92.86%   
>> vm_area_struct             93K           101K      92.28%   
>> blkdev_requests           101K           101K     100.00%   
>> The inode_cache culprit is the obvious butt of many complaints:
>> # find /sys | wc -l
>> 2656
>> ... which accounts for 100% of the 840KB. TANSTAAFL. OTOH, maybe we
>> need to learn to do better than pinning dentries and inodes in-core...

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> I guess not mounting /sys doesn't help here.  It would be nice.  Maybe with
> a CONFIG_I_WILL_NEVER_MOUNT_SYSFS we could avoid all those allocations.

I have a vague notion the treatment hugh gave tmpfs earlier in 2.6
would be useful for sysfs, though I've at least heard the observation
that quite a bit can be reconstructed from kobjects on the fly.


William Lee Irwin III <wli@holomorphy.com> wrote:
>> Load control, anyone?

On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> Roger Luethi is working on it; I need to pay some attention to his patch. 
> I expect we'll have something for post-2.6.0.

The name changes, but the presence of a helper is the same. I've felt
spurred to take it on myself, but am discouraged by other tasks and
all that. Well, that, and I should probably let someone else do
something (dammit, hugh, if you didn't have good ideas all the time I
wouldn't be compelled to make sure I have the stuff at my disposal).
I'll flag Roger down and see if I can say anything helpful about the
patch and so on if I don't get pegged to badly by interrupts this week.


-- wli

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 12:18       ` Russell King
@ 2003-10-14 12:30         ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2003-10-14 12:30 UTC (permalink / raw)
  To: Russell King; +Cc: wli, linux-kernel

Russell King <rmk+lkml@arm.linux.org.uk> wrote:
>
> On Tue, Oct 14, 2003 at 05:10:31AM -0700, Andrew Morton wrote:
> > Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> > > On Tue, Oct 14, 2003 at 04:56:14AM -0700, Andrew Morton wrote:
> > > > I guess not mounting /sys doesn't help here.  It would be nice.  Maybe with
> > > > a CONFIG_I_WILL_NEVER_MOUNT_SYSFS we could avoid all those allocations.
> > > 
> > > I believe sysfs is required for mounting the root filesystem - see
> > > name_to_dev_t in init/do_mounts.c.
> > 
> > OK.  But it looks like if /sys is empty and you provide "root=03:02" then
> > things will still work.  It's a matter of trying it...
> 
> Uhh?
> 
> dev_t name_to_dev_t(char *name)
> {
>         dev_t res = 0;
> 
>         sys_mkdir("/sys", 0700);
>         if (sys_mount("sysfs", "/sys", "sysfs", 0, NULL) < 0)
>                 goto out;
> 
> 	...
> 
> out:
>         sys_rmdir("/sys");
>         return res;
> }
> 
> If sysfs can't be mounted, then it looks like we can't even decode a
> numeric major:minor root device specification.

Well I was proposing that sysfs be present and mountable, but empty.  ie:
make sysfs_create() a no-op.  Something like that.  Additional touchups may
be needed of course.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 12:17   ` Anton Blanchard
@ 2003-10-14 12:31     ` Andrew Morton
  2003-10-14 12:44       ` Anton Blanchard
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2003-10-14 12:31 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: wli, linux-kernel

Anton Blanchard <anton@samba.org> wrote:
>
>  
> > hrm.  min_free_kbytes is normally 1024.  I'm surprised that the additional
> > 900k made so much difference.  We must be on the hairy edge.
> > 
> > It looks like we need to precalculate/scale min_free_kbytes, yes?
> 
> That would be good for both the low and high end. Id like to see it
> default to something larger on my 16GB+ machines.
> 

How big?

I guess it should be scaled by ZONE_DMA+ZONE_NORMAL, skipping ZONE_HIGHMEM.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 12:31     ` Andrew Morton
@ 2003-10-14 12:44       ` Anton Blanchard
  2003-10-14 23:40         ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: Anton Blanchard @ 2003-10-14 12:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: wli, linux-kernel


> How big?

Taking a complete guess, 16MB on a 16GB machine wouldnt be missed.

> I guess it should be scaled by ZONE_DMA+ZONE_NORMAL, skipping ZONE_HIGHMEM.

Works for me.

Anton

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:08   ` William Lee Irwin III
@ 2003-10-14 13:20     ` John Bradford
  0 siblings, 0 replies; 29+ messages in thread
From: John Bradford @ 2003-10-14 13:20 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

Quote from William Lee Irwin III <wli@holomorphy.com>:
> At some point in the past, I wrote:
> >> (g) X isn't terribly swift; it's slower than I remember old Sun IPC's
> >> 	being, though they had 24MB RAM. OTOH luserspace is much more
> >> 	bloated these days. zsh alone is at least 3 times the size of
> >> 	ksh, which I used back then. fvwm2 is a lot bigger than fvwm1.
> >> 	And so on and so forth. I guess the upshot is "unbloating" the
> >> 	kernel wouldn't do much good anyway, since luserspace isn't in
> >> 	any kind of shape to run in this kind of environment anymore either.
> 
> On Tue, Oct 14, 2003 at 12:01:00PM +0100, John Bradford wrote:
> > Depends on what you consider usable.  I thought X worked pretty well
> > in swapless 8MB last time I tried it, (last year, around 2.5.40).
> > Admittedly that was only running a few xterms locally.  A 4MB + 20MB
> > swap box was suprisingly usable for fairly intense remote applications
> > over a compressed 9600 bps serial link.
> 
> It's not that it's particularly unusable, it was merely substantially
> slower than vaguely comparable machines I remember from way back when.

Ah, OK.  Quite possibly my subjective observation was biased, simply
because I was expecting it to perform badly :-).

John.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 12:44       ` Anton Blanchard
@ 2003-10-14 23:40         ` Andrew Morton
  2003-10-15 13:32           ` Martin Waitz
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2003-10-14 23:40 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: wli, linux-kernel

Anton Blanchard <anton@samba.org> wrote:
>
> 
> > How big?
> 
> Taking a complete guess, 16MB on a 16GB machine wouldnt be missed.
> 
> > I guess it should be scaled by ZONE_DMA+ZONE_NORMAL, skipping ZONE_HIGHMEM.
> 
> Works for me.
> 

OK, I'm testing the below.


 25-akpm/include/linux/mm.h     |    2 ++
 25-akpm/include/linux/mmzone.h |    2 +-
 25-akpm/init/main.c            |    2 +-
 25-akpm/mm/oom_kill.c          |   14 --------------
 25-akpm/mm/page_alloc.c        |   40 +++++++++++++++++++++++++++++++++++++++-
 25-akpm/mm/swap.c              |   20 ++++++++++++++++++++
 6 files changed, 63 insertions(+), 17 deletions(-)

diff -puN mm/page_alloc.c~scale-min_free_kbytes mm/page_alloc.c
--- 25/mm/page_alloc.c~scale-min_free_kbytes	Tue Oct 14 16:25:08 2003
+++ 25-akpm/mm/page_alloc.c	Tue Oct 14 16:32:24 2003
@@ -1589,7 +1589,7 @@ void __init page_alloc_init(void)
  *	that the pages_{min,low,high} values for each zone are set correctly 
  *	with respect to min_free_kbytes.
  */
-void setup_per_zone_pages_min(void)
+static void setup_per_zone_pages_min(void)
 {
 	unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
 	unsigned long lowmem_pages = 0;
@@ -1633,6 +1633,44 @@ void setup_per_zone_pages_min(void)
 }
 
 /*
+ * Initialise min_free_kbytes.
+ *
+ * For small machines we want it small (128k min).  For large machines
+ * we want it large (16MB max).  But it is not linear, because network
+ * bandwidth does not increase linearly with machine size.  We use
+ *
+ *	min_free_kbytes = lowmem_kbytes / sqrt(lowmem_kbytes)
+ *
+ * which yields
+ *
+ *     8MB:		128k
+ *    16MB:		170k
+ *    32MB:		256k
+ *    64MB:		341k
+ *   128MB:		512k
+ *   256MB:		682k
+ *   512MB:		1024k
+ *  1024MB:		1365k
+ *  2048MB:		2048k
+ *  4096MB:		2739k
+ *  8192MB:		4096k
+ * 16348MB:		5461k
+ */
+void __init init_per_zone_pages_min(void)
+{
+	unsigned long lowmem_kbytes;
+
+	lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);
+
+	min_free_kbytes = lowmem_kbytes / int_sqrt(lowmem_kbytes);
+	if (min_free_kbytes < 128)
+		min_free_kbytes = 128;
+	if (min_free_kbytes > 16384)
+		min_free_kbytes = 16384;
+	setup_per_zone_pages_min();
+}
+
+/*
  * min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so 
  *	that we can call setup_per_zone_pages_min() whenever min_free_kbytes 
  *	changes.
diff -puN init/main.c~scale-min_free_kbytes init/main.c
--- 25/init/main.c~scale-min_free_kbytes	Tue Oct 14 16:25:08 2003
+++ 25-akpm/init/main.c	Tue Oct 14 16:38:58 2003
@@ -396,7 +396,7 @@ asmlinkage void __init start_kernel(void
 	lock_kernel();
 	printk(linux_banner);
 	setup_arch(&command_line);
-	setup_per_zone_pages_min();
+	init_per_zone_pages_min();
 	setup_per_cpu_areas();
 
 	/*
diff -puN include/linux/mmzone.h~scale-min_free_kbytes include/linux/mmzone.h
--- 25/include/linux/mmzone.h~scale-min_free_kbytes	Tue Oct 14 16:25:08 2003
+++ 25-akpm/include/linux/mmzone.h	Tue Oct 14 16:25:08 2003
@@ -284,7 +284,7 @@ struct ctl_table;
 struct file;
 int min_free_kbytes_sysctl_handler(struct ctl_table *, int, struct file *, 
 					  void *, size_t *);
-extern void setup_per_zone_pages_min(void);
+extern void init_per_zone_pages_min(void);
 
 
 #ifdef CONFIG_NUMA
diff -puN mm/oom_kill.c~scale-min_free_kbytes mm/oom_kill.c
--- 25/mm/oom_kill.c~scale-min_free_kbytes	Tue Oct 14 16:25:08 2003
+++ 25-akpm/mm/oom_kill.c	Tue Oct 14 16:25:08 2003
@@ -24,20 +24,6 @@
 /* #define DEBUG */
 
 /**
- * int_sqrt - oom_kill.c internal function, rough approximation to sqrt
- * @x: integer of which to calculate the sqrt
- * 
- * A very rough approximation to the sqrt() function.
- */
-static unsigned int int_sqrt(unsigned int x)
-{
-	unsigned int out = x;
-	while (x & ~(unsigned int)1) x >>=2, out >>=1;
-	if (x) out -= out >> 2;
-	return (out ? out : 1);
-}	
-
-/**
  * oom_badness - calculate a numeric value for how bad this task has been
  * @p: task struct of which task we should calculate
  *
diff -puN mm/swap.c~scale-min_free_kbytes mm/swap.c
--- 25/mm/swap.c~scale-min_free_kbytes	Tue Oct 14 16:25:08 2003
+++ 25-akpm/mm/swap.c	Tue Oct 14 16:32:36 2003
@@ -380,6 +380,26 @@ void vm_acct_memory(long pages)
 EXPORT_SYMBOL(vm_acct_memory);
 #endif
 
+/**
+ * int_sqrt - rough approximation to sqrt
+ * @x: integer of which to calculate the sqrt
+ *
+ * A very rough approximation to the sqrt() function.
+ */
+unsigned long int_sqrt(unsigned long x)
+{
+	unsigned long out = x;
+
+	while (x & ~(unsigned long)1) {
+		x >>= 2;
+		out >>= 1;
+	}
+
+	if (x)
+		out -= out >> 2;
+
+	return (out ? out : 1);
+}
 
 /*
  * Perform any setup for the swap system
diff -puN include/linux/mm.h~scale-min_free_kbytes include/linux/mm.h
--- 25/include/linux/mm.h~scale-min_free_kbytes	Tue Oct 14 16:25:08 2003
+++ 25-akpm/include/linux/mm.h	Tue Oct 14 16:25:57 2003
@@ -615,6 +615,8 @@ extern struct page * follow_page(struct 
 extern int remap_page_range(struct vm_area_struct *vma, unsigned long from,
 		unsigned long to, unsigned long size, pgprot_t prot);
 
+unsigned long int_sqrt(unsigned long x);
+
 #ifndef CONFIG_DEBUG_PAGEALLOC
 static inline void
 kernel_map_pages(struct page *page, int numpages, int enable)

_


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 10:55 mem=16MB laptop testing William Lee Irwin III
  2003-10-14 11:01 ` John Bradford
  2003-10-14 11:56 ` Andrew Morton
@ 2003-10-15  0:35 ` Nick Piggin
  2003-10-15  4:31 ` Andrew Morton
  3 siblings, 0 replies; 29+ messages in thread
From: Nick Piggin @ 2003-10-15  0:35 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel



William Lee Irwin III wrote:

>So I tried mem=16m on my laptop (stinkpad T21). I made the following
>potentially useless observations:
>

snip


>
>inode_cache               840K           840K     100.00%   
>dentry_cache              746K           753K      99.07%   
>ext3_inode_cache          591K           592K      99.84%   
>size-4096                 504K           504K     100.00%   
>size-512                  203K           204K      99.75%   
>size-2048                 182K           204K      89.22%   
>pgd                       188K           188K     100.00%   
>task_struct               100K           108K      92.86%   
>vm_area_struct             93K           101K      92.28%   
>blkdev_requests           101K           101K     100.00%   
>

Hmm blkdev_requests looks big. 4 struct requests are allocated for every 
queue,
which totals about 600 bytes. What does /sys/block and the 
blkdev_requests line
from /proc/slabinfo look like?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 10:55 mem=16MB laptop testing William Lee Irwin III
                   ` (2 preceding siblings ...)
  2003-10-15  0:35 ` Nick Piggin
@ 2003-10-15  4:31 ` Andrew Morton
  3 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2003-10-15  4:31 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin III <wli@holomorphy.com> wrote:
>
> (e) About 4.8MB are consumed by slab allocations at runtime.
>  	The top 10 slab abusers are:
> 
>  inode_cache               840K           840K     100.00%   
>  dentry_cache              746K           753K      99.07%   
>  ext3_inode_cache          591K           592K      99.84%   
>  size-4096                 504K           504K     100.00%   
>  size-512                  203K           204K      99.75%   
>  size-2048                 182K           204K      89.22%   
>  pgd                       188K           188K     100.00%   
>  task_struct               100K           108K      92.86%   
>  vm_area_struct             93K           101K      92.28%   
>  blkdev_requests           101K           101K     100.00%   
> 
>  The inode_cache culprit is the obvious butt of many complaints:
>  # find /sys | wc -l
>  2656
> 

hmm, you have a lot more sysfs entries than I do:

vmm:/home/akpm> find /sys|wc -l
    849

The below patch nukes them all; saves around half meg here.  You need to
add "nosysfs" and "root=NN:MM" to the kernel boot commandline.  Please let
me know how much space you save.



 fs/sysfs/bin.c        |    6 ++++++
 fs/sysfs/dir.c        |   16 +++++++++++++++-
 fs/sysfs/file.c       |    9 +++++++++
 fs/sysfs/group.c      |    6 ++++++
 fs/sysfs/inode.c      |   18 ++++++++++++++++--
 fs/sysfs/symlink.c    |    3 +++
 fs/sysfs/sysfs.h      |    3 +++
 include/linux/sysfs.h |    0 
 8 files changed, 58 insertions(+), 3 deletions(-)

diff -puN fs/sysfs/inode.c~nosysfs fs/sysfs/inode.c
--- 25/fs/sysfs/inode.c~nosysfs	2003-10-14 18:24:42.000000000 -0700
+++ 25-akpm/fs/sysfs/inode.c	2003-10-14 18:40:16.000000000 -0700
@@ -11,7 +11,8 @@
 #include <linux/pagemap.h>
 #include <linux/namei.h>
 #include <linux/backing-dev.h>
-extern struct super_block * sysfs_sb;
+#include <linux/init.h>
+#include "sysfs.h"
 
 static struct address_space_operations sysfs_aops = {
 	.readpage	= simple_readpage,
@@ -24,6 +25,8 @@ static struct backing_dev_info sysfs_bac
 	.memory_backed	= 1,	/* Does not contribute to dirty memory */
 };
 
+int nosysfs;
+
 struct inode * sysfs_new_inode(mode_t mode)
 {
 	struct inode * inode = new_inode(sysfs_sb);
@@ -44,6 +47,10 @@ int sysfs_create(struct dentry * dentry,
 {
 	int error = 0;
 	struct inode * inode = NULL;
+
+	if (nosysfs)
+		return 0;
+
 	if (dentry) {
 		if (!dentry->d_inode) {
 			if ((inode = sysfs_new_inode(mode)))
@@ -87,6 +94,8 @@ void sysfs_hash_and_remove(struct dentry
 {
 	struct dentry * victim;
 
+	if (nosysfs)
+		return;
 	down(&dir->d_inode->i_sem);
 	victim = sysfs_get_dentry(dir,name);
 	if (!IS_ERR(victim)) {
@@ -107,4 +116,9 @@ void sysfs_hash_and_remove(struct dentry
 	up(&dir->d_inode->i_sem);
 }
 
-
+static int __init nosysfs_setup(char *str)
+{
+	nosysfs = 1;
+	return 1;
+}
+__setup("nosysfs", nosysfs_setup);
diff -puN fs/sysfs/dir.c~nosysfs fs/sysfs/dir.c
--- 25/fs/sysfs/dir.c~nosysfs	2003-10-14 18:25:40.000000000 -0700
+++ 25-akpm/fs/sysfs/dir.c	2003-10-14 18:29:50.000000000 -0700
@@ -46,6 +46,8 @@ static int create_dir(struct kobject * k
 
 int sysfs_create_subdir(struct kobject * k, const char * n, struct dentry ** d)
 {
+	if (nosysfs)
+		return 0;
 	return create_dir(k,k->dentry,n,d);
 }
 
@@ -61,6 +63,9 @@ int sysfs_create_dir(struct kobject * ko
 	struct dentry * parent;
 	int error = 0;
 
+	if (nosysfs)
+		return 0;
+
 	if (!kobj)
 		return -EINVAL;
 
@@ -102,6 +107,8 @@ static void remove_dir(struct dentry * d
 
 void sysfs_remove_subdir(struct dentry * d)
 {
+	if (nosysfs)
+		return;
 	remove_dir(d);
 }
 
@@ -118,8 +125,12 @@ void sysfs_remove_subdir(struct dentry *
 void sysfs_remove_dir(struct kobject * kobj)
 {
 	struct list_head * node;
-	struct dentry * dentry = dget(kobj->dentry);
+	struct dentry *dentry;
+
+	if (nosysfs)
+		return;
 
+	dentry = dget(kobj->dentry);
 	if (!dentry)
 		return;
 
@@ -164,6 +175,9 @@ void sysfs_rename_dir(struct kobject * k
 {
 	struct dentry * new_dentry, * parent;
 
+	if (nosysfs)
+		return;
+
 	if (!strcmp(kobject_name(kobj), new_name))
 		return;
 
diff -puN include/linux/sysfs.h~nosysfs include/linux/sysfs.h
diff -puN fs/sysfs/sysfs.h~nosysfs fs/sysfs/sysfs.h
--- 25/fs/sysfs/sysfs.h~nosysfs	2003-10-14 18:27:36.000000000 -0700
+++ 25-akpm/fs/sysfs/sysfs.h	2003-10-14 18:28:22.000000000 -0700
@@ -1,4 +1,7 @@
+struct super_block;
 
+extern struct super_block *sysfs_sb;
+extern int nosysfs;
 extern struct vfsmount * sysfs_mount;
 
 extern struct inode * sysfs_new_inode(mode_t mode);
diff -puN fs/sysfs/file.c~nosysfs fs/sysfs/file.c
--- 25/fs/sysfs/file.c~nosysfs	2003-10-14 18:38:15.000000000 -0700
+++ 25-akpm/fs/sysfs/file.c	2003-10-14 18:40:06.000000000 -0700
@@ -350,6 +350,9 @@ int sysfs_add_file(struct dentry * dir, 
 	struct dentry * dentry;
 	int error;
 
+	if (nosysfs)
+		return 0;
+
 	down(&dir->d_inode->i_sem);
 	dentry = sysfs_get_dentry(dir,attr->name);
 	if (!IS_ERR(dentry)) {
@@ -374,6 +377,9 @@ int sysfs_add_file(struct dentry * dir, 
 
 int sysfs_create_file(struct kobject * kobj, const struct attribute * attr)
 {
+	if (nosysfs)
+		return 0;
+
 	if (kobj && attr)
 		return sysfs_add_file(kobj->dentry,attr);
 	return -EINVAL;
@@ -394,6 +400,9 @@ int sysfs_update_file(struct kobject * k
 	struct dentry * victim;
 	int res = -ENOENT;
 
+	if (nosysfs)
+		return 0;
+
 	down(&dir->d_inode->i_sem);
 	victim = sysfs_get_dentry(dir, attr->name);
 	if (!IS_ERR(victim)) {
diff -puN fs/sysfs/symlink.c~nosysfs fs/sysfs/symlink.c
--- 25/fs/sysfs/symlink.c~nosysfs	2003-10-14 18:40:32.000000000 -0700
+++ 25-akpm/fs/sysfs/symlink.c	2003-10-14 18:40:57.000000000 -0700
@@ -79,6 +79,9 @@ int sysfs_create_link(struct kobject * k
 	char * path;
 	char * s;
 
+	if (nosysfs)
+		return 0;
+
 	depth = object_depth(kobj);
 	size = object_path_length(target) + depth * 3 - 1;
 	if (size > PATH_MAX)
diff -puN fs/sysfs/bin.c~nosysfs fs/sysfs/bin.c
--- 25/fs/sysfs/bin.c~nosysfs	2003-10-14 18:51:48.000000000 -0700
+++ 25-akpm/fs/sysfs/bin.c	2003-10-14 18:52:20.000000000 -0700
@@ -152,6 +152,9 @@ int sysfs_create_bin_file(struct kobject
 	struct dentry * parent;
 	int error = 0;
 
+	if (nosysfs)
+		return 0;
+
 	if (!kobj || !attr)
 		return -EINVAL;
 
@@ -185,6 +188,9 @@ int sysfs_create_bin_file(struct kobject
 
 int sysfs_remove_bin_file(struct kobject * kobj, struct bin_attribute * attr)
 {
+	if (nosysfs)
+		return 0;
+
 	sysfs_hash_and_remove(kobj->dentry,attr->attr.name);
 	return 0;
 }
diff -puN fs/sysfs/group.c~nosysfs fs/sysfs/group.c
--- 25/fs/sysfs/group.c~nosysfs	2003-10-14 19:32:58.000000000 -0700
+++ 25-akpm/fs/sysfs/group.c	2003-10-14 19:33:16.000000000 -0700
@@ -45,6 +45,9 @@ int sysfs_create_group(struct kobject * 
 	struct dentry * dir;
 	int error;
 
+	if (nosysfs)
+		return 0;
+
 	if (grp->name) {
 		error = sysfs_create_subdir(kobj,grp->name,&dir);
 		if (error)
@@ -65,6 +68,9 @@ void sysfs_remove_group(struct kobject *
 {
 	struct dentry * dir;
 
+	if (nosysfs)
+		return;
+
 	if (grp->name)
 		dir = sysfs_get_dentry(kobj->dentry,grp->name);
 	else

_


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 11:56 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2003-10-14 12:28   ` William Lee Irwin III
@ 2003-10-15 12:12   ` Pavel Machek
  2003-10-15 12:51     ` William Lee Irwin III
  3 siblings, 1 reply; 29+ messages in thread
From: Pavel Machek @ 2003-10-15 12:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: William Lee Irwin III, linux-kernel

Hi!

> > (c) mem= no longer bounds the highest physical address, but rather
> > 	the sum of memory in e820 entries post-sanitization. This
> > 	means a ZONE_NORMAL with about 384KB showed up, with duly
> > 	perverse heuristic consequences for page_alloc.c
> 
> I don't understand this.  You mean almost all memory was in ZONE_DMA?
> 
> "mem=" does not accurately emulate having that much memory.  So a 512M box
> booted with "mem=256M" has a different amount of memory from a 256M box
> booted with no "mem=" option.  It would be nice to fix that, but I've never
> looked into it.

I do not think this wants to be fixed. It should remain compatible
with 2.4.X, and if it is not that's a bug [and pretty dangerous & hard
to debug one -- if you mark something as ram which is not, you get
real bad data corruption].

									Pavel

-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 12:12   ` Pavel Machek
@ 2003-10-15 12:51     ` William Lee Irwin III
  2003-10-15 13:20       ` Pavel Machek
  0 siblings, 1 reply; 29+ messages in thread
From: William Lee Irwin III @ 2003-10-15 12:51 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrew Morton, linux-kernel

On Wed, Oct 15, 2003 at 02:12:08PM +0200, Pavel Machek wrote:
> I do not think this wants to be fixed. It should remain compatible
> with 2.4.X, and if it is not that's a bug [and pretty dangerous & hard
> to debug one -- if you mark something as ram which is not, you get
> real bad data corruption].

2.4:
static void __init limit_regions (unsigned long long size)
{
	unsigned long long current_addr = 0;
	int i;

	for (i = 0; i < e820.nr_map; i++) {
		if (e820.map[i].type == E820_RAM) {
			current_addr = e820.map[i].addr + e820.map[i].size;
			if (current_addr >= size) {
				e820.map[i].size -= current_addr-size;
				e820.nr_map = i + 1;
				return;
			}
		}
	}
}

2.5:
static void __init limit_regions (unsigned long long size)
{
	int i;
	unsigned long long current_size = 0;

	for (i = 0; i < e820.nr_map; i++) {
		if (e820.map[i].type == E820_RAM) {
			current_size += e820.map[i].size;
			if (current_size >= size) {
				e820.map[i].size -= current_size-size;
				e820.nr_map = i + 1;
				return;
			}
		}
	}
}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 12:51     ` William Lee Irwin III
@ 2003-10-15 13:20       ` Pavel Machek
  2003-10-15 13:28         ` William Lee Irwin III
  2003-10-15 15:32         ` Dave Jones
  0 siblings, 2 replies; 29+ messages in thread
From: Pavel Machek @ 2003-10-15 13:20 UTC (permalink / raw)
  To: William Lee Irwin III, Andrew Morton, linux-kernel

Hi!

> > I do not think this wants to be fixed. It should remain compatible
> > with 2.4.X, and if it is not that's a bug [and pretty dangerous & hard
> > to debug one -- if you mark something as ram which is not, you get
> > real bad data corruption].
> 
> 2.4:
> static void __init limit_regions (unsigned long long size)
> {
> 	unsigned long long current_addr = 0;
> 	int i;
> 
> 	for (i = 0; i < e820.nr_map; i++) {
> 		if (e820.map[i].type == E820_RAM) {
> 			current_addr = e820.map[i].addr + e820.map[i].size;
> 			if (current_addr >= size) {
> 				e820.map[i].size -= current_addr-size;
> 				e820.nr_map = i + 1;
> 				return;
> 			}
> 		}
> 	}
> }
> 
> 2.5:
> static void __init limit_regions (unsigned long long size)
> {
> 	int i;
> 	unsigned long long current_size = 0;
> 
> 	for (i = 0; i < e820.nr_map; i++) {
> 		if (e820.map[i].type == E820_RAM) {
> 			current_size += e820.map[i].size;
> 			if (current_size >= size) {
> 				e820.map[i].size -= current_size-size;
> 				e820.nr_map = i + 1;
> 				return;
> 			}
> 		}
> 	}
> }

Do you want to say that calculation is different, already? We should
probably make 2.5 version match 2.4 version, that's what users
expect. Who changed it and why?

							Pavel

-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 13:20       ` Pavel Machek
@ 2003-10-15 13:28         ` William Lee Irwin III
  2003-10-15 13:59           ` Larry Sendlosky
  2003-10-15 15:32         ` Dave Jones
  1 sibling, 1 reply; 29+ messages in thread
From: William Lee Irwin III @ 2003-10-15 13:28 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrew Morton, linux-kernel

On Wed, Oct 15, 2003 at 03:20:54PM +0200, Pavel Machek wrote:
> Do you want to say that calculation is different, already? We should
> probably make 2.5 version match 2.4 version, that's what users
> expect. Who changed it and why?

No idea when it changed, but I was at least duly disturbed by the tiny
384KB ZONE_NORMAL materializing out of thin air when I booted mem=16m.


-- wli

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-14 23:40         ` Andrew Morton
@ 2003-10-15 13:32           ` Martin Waitz
  2003-10-15 17:34             ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: Martin Waitz @ 2003-10-15 13:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Anton Blanchard, wli, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

hi :)

On Tue, Oct 14, 2003 at 04:40:04PM -0700, Andrew Morton wrote:
> + *	min_free_kbytes = lowmem_kbytes / sqrt(lowmem_kbytes)

you do have a strange sqrt here ;)

if you do a 'x*=2' at the start of your int_sqrt, your results
are closer to a real sqrt.

then, to get similar min_free_kbytes results, you could do
	min_free_kbytes = int_sqrt(2*lowmem_kbytes);

which is easier to understand, me thinks ;)

-- 
CU,		  / Friedrich-Alexander University Erlangen, Germany
Martin Waitz	//  Department of Computer Science 3       _________
______________/// - - - - - - - - - - - - - - - - - - - - ///
dies ist eine manuell generierte mail, sie beinhaltet    //
tippfehler und ist auch ohne grossbuchstaben gueltig.   /

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 13:28         ` William Lee Irwin III
@ 2003-10-15 13:59           ` Larry Sendlosky
  2003-10-15 15:34             ` Dave Jones
  2003-10-15 15:38             ` Thomas Schlichter
  0 siblings, 2 replies; 29+ messages in thread
From: Larry Sendlosky @ 2003-10-15 13:59 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Pavel Machek, Andrew Morton, linux-kernel

Changeset 1.403.15.8 2002/6/05 davej@suse.de
[PATCH] large x86 setup cleanup.

Patrick Mochel did a great job here at splitting up some of the larger
messy parts of arch/i386/kernel/setup.c, and introduced a nice abstraction
which gives us a much nicer way to ensure we can add workarounds for vendor
specific bugs / features without polluting other vendor code paths.

Mark Haverkamp also brought this up to date for merging in my tree circa
2.5.14, and asides from 1-2 now fixed small thinkos, there haven't been
any problems.

This also features a workaround for an errata item on stepping C0 of
the Intel Pentium 4 Xeon, which isn't in your tree yet, where we must
disable the hardware prefetcher to ensure sane operation.

arch/i386/kernel/setup.c 1.41.1.16 2002/06/03 10:10:19 davej@suse.de
large x86 setup cleanup.




William Lee Irwin III wrote:

>On Wed, Oct 15, 2003 at 03:20:54PM +0200, Pavel Machek wrote:
>  
>
>>Do you want to say that calculation is different, already? We should
>>probably make 2.5 version match 2.4 version, that's what users
>>expect. Who changed it and why?
>>    
>>
>
>No idea when it changed, but I was at least duly disturbed by the tiny
>384KB ZONE_NORMAL materializing out of thin air when I booted mem=16m.
>
>
>-- wli
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>.
>
>  
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 13:20       ` Pavel Machek
  2003-10-15 13:28         ` William Lee Irwin III
@ 2003-10-15 15:32         ` Dave Jones
  2003-10-15 17:20           ` Andrew Morton
  1 sibling, 1 reply; 29+ messages in thread
From: Dave Jones @ 2003-10-15 15:32 UTC (permalink / raw)
  To: Pavel Machek; +Cc: William Lee Irwin III, Andrew Morton, linux-kernel

On Wed, Oct 15, 2003 at 03:20:54PM +0200, Pavel Machek wrote:

 > Do you want to say that calculation is different, already? We should
 > probably make 2.5 version match 2.4 version, that's what users
 > expect. Who changed it and why?

More a case of who didn't change it (in 2.6 at least).
This routine was identical until rev 1.42 of 2.4 when hch changed it to how
it stands today, with the comment...

[PATCH] memsetup fixes (again)

The mem= fixes from Red Hat's tree had a small bug:
if mem= was not actually used with the additional features, but
int plain old way, is used the value as the size of memory it
wants, not the upper limit.  The problem with this is that there
is a small difference due to memory holes.
				    
I had one report of a person using mem= to reduce memory size for
a broken i386 chipset thaty only supports 64MB cached and the rest
as mtd/slram device for swap.  I got broken as the boundaries changed.


Assuming this patch is correct, it needs forward porting to 2.6

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 13:59           ` Larry Sendlosky
@ 2003-10-15 15:34             ` Dave Jones
  2003-10-15 15:38             ` Thomas Schlichter
  1 sibling, 0 replies; 29+ messages in thread
From: Dave Jones @ 2003-10-15 15:34 UTC (permalink / raw)
  Cc: William Lee Irwin III, Pavel Machek, Andrew Morton, linux-kernel

On Wed, Oct 15, 2003 at 09:59:41AM -0400, Larry Sendlosky wrote:
 > Changeset 1.403.15.8 2002/6/05 davej@suse.de
 > [PATCH] large x86 setup cleanup.

Completely unrelated changeset.

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 13:59           ` Larry Sendlosky
  2003-10-15 15:34             ` Dave Jones
@ 2003-10-15 15:38             ` Thomas Schlichter
  2003-10-15 16:06               ` Dave Jones
  2003-10-15 17:45               ` Mike Dresser
  1 sibling, 2 replies; 29+ messages in thread
From: Thomas Schlichter @ 2003-10-15 15:38 UTC (permalink / raw)
  To: William Lee Irwin III, Pavel Machek
  Cc: Larry Sendlosky, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2401 bytes --]

Well, as davej submitted his patch he seems to just have missed a Changeset 
applied to the 2.4 tree...:

ChangeSet@1.404.2.2  2002-05-06 21:30:10-03:00  hch@infradead.org
[PATCH] memsetup fixes (again)

The mem= fixes from Red Hat's tree had a small bug:
if mem= was not actually used with the additional features, but
int plain old way, is used the value as the size of memory it
wants, not the upper limit.  The problem with this is that there
is a small difference due to memory holes.

I had one report of a person using mem= to reduce memory size for
a broken i386 chipset thaty only supports 64MB cached and the rest
as mtd/slram device for swap.  I got broken as the boundaries changed.

arch/i386/kernel/setup.c@1.42  2002-04-23 18:52:12-03:00  hch@infradead.org

So obviously this should be fixed in the 2.6 tree too!

Regards
   Thomas

P.S.: How can be assured that fixes for the 2.4 tree get into the 2.6 tree 
when they are needed there, too? I'd wonder if this missed CS is the only 
one...

On Wednesday 15 October 2003 15:59, Larry Sendlosky wrote:
> Changeset 1.403.15.8 2002/6/05 davej@suse.de
> [PATCH] large x86 setup cleanup.
>
> Patrick Mochel did a great job here at splitting up some of the larger
> messy parts of arch/i386/kernel/setup.c, and introduced a nice abstraction
> which gives us a much nicer way to ensure we can add workarounds for vendor
> specific bugs / features without polluting other vendor code paths.
>
> Mark Haverkamp also brought this up to date for merging in my tree circa
> 2.5.14, and asides from 1-2 now fixed small thinkos, there haven't been
> any problems.
>
> This also features a workaround for an errata item on stepping C0 of
> the Intel Pentium 4 Xeon, which isn't in your tree yet, where we must
> disable the hardware prefetcher to ensure sane operation.
>
> arch/i386/kernel/setup.c 1.41.1.16 2002/06/03 10:10:19 davej@suse.de
> large x86 setup cleanup.
>
> William Lee Irwin III wrote:
> >On Wed, Oct 15, 2003 at 03:20:54PM +0200, Pavel Machek wrote:
> >>Do you want to say that calculation is different, already? We should
> >>probably make 2.5 version match 2.4 version, that's what users
> >>expect. Who changed it and why?
> >
> >No idea when it changed, but I was at least duly disturbed by the tiny
> >384KB ZONE_NORMAL materializing out of thin air when I booted mem=16m.

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 15:38             ` Thomas Schlichter
@ 2003-10-15 16:06               ` Dave Jones
  2003-10-15 17:45               ` Mike Dresser
  1 sibling, 0 replies; 29+ messages in thread
From: Dave Jones @ 2003-10-15 16:06 UTC (permalink / raw)
  To: Thomas Schlichter
  Cc: William Lee Irwin III, Pavel Machek, Larry Sendlosky,
	Andrew Morton, linux-kernel

On Wed, Oct 15, 2003 at 05:38:45PM +0200, Thomas Schlichter wrote:
 > Well, as davej submitted his patch he seems to just have missed a Changeset 
 > applied to the 2.4 tree...:
 > 
 > ChangeSet@1.404.2.2  2002-05-06 21:30:10-03:00  hch@infradead.org

Around June last year was when I didn't have enough spare time to
continue doing the 2.4 -> 2.5 pushes as frequently as I had.
As a result, lots of bits fell through the cracks, and no-one else
really bothered to step up and take over this much needed role.

 > So obviously this should be fixed in the 2.6 tree too!

Looks that way.

 > P.S.: How can be assured that fixes for the 2.4 tree get into the 2.6 tree 
 > when they are needed there, too?

There's a number of possible approaches.

The "realtime" method.
 As a cset gets merged in 2.4, push it to 2.6 if needed.
This is what I was doing in early 2.5.x in the -dj branch.
It does however require you watch the trees constantly, which is a time sink,
but if you're like me, and you do that anyway...
This method can easily be a full time job depending on the rate of
change happening in both trees. It is however a good way to learn about
many different parts of the tree, and get a good "global" view of whats
going on in the trees.
This method falls apart if you have to take time off for any reason.
Coming back after a month with around a gig of Changesets to sift
through (and more constantly coming in whilst you're sifting) is a
real energy drain.  I was probably one of the few people who was glad
when Linus/Marcelo went on vacation, as it was good 'catch up' time.

The "painful" method.
 Finding someone to go through every changeset committed since ~2.4.18/.19
 which was when I stopped doing the sync and making sure that they got into
 2.6 too.

The "less effort" approach.
 Do it on an as-needed basis, when issues crop up like this one, check out
 the same code in 2.4, see what changed, see if theres anything that got
 missed.

The "distributed" method.
 Many monkeys make light work. Pick a random driver.
 diff -u linux-2.4/drivers/$foo linux-2.6/drivers/$foo
 review diff. Make patches if needed.


There are also other problems not covered above.
- Some csets don't make sense to forward port even though they
  may look like "obvious fixes".
- Linus has been known to reject patches that made it into 2.4
  This is a *real pain* when the cset isn't a 1-2 liner.
- Nothing remains still very long.
  Fixes that went into 2.4 may need considerable rewriting and
  bending to fit in 2.6. This was incredibly painful for some parts
  of the kernel, but again.. is a good learning process.
- Some kernel hacker prima donna's really hate you touching "their" code
  despite the fact you're submitting the same patch they pushed in 2.4
  Be prepared to deal with assholes, and have to retransmit patches
  dozens of times before they make it in via them, (or be prepared to
  put up with the abuse when you shortcut them and send to Linus).

It is a very time consuming, laborious, largely thankless process with few
rewards other than satisfaction that you've "done something worthwhile",
and the good learning experience.

 > I'd wonder if this missed CS is the only one...

I wish..

		Dave

-- 
 Dave Jones     http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 15:32         ` Dave Jones
@ 2003-10-15 17:20           ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2003-10-15 17:20 UTC (permalink / raw)
  To: Dave Jones; +Cc: pavel, wli, linux-kernel

Dave Jones <davej@redhat.com> wrote:
>
> [PATCH] memsetup fixes (again)
> 
>  The mem= fixes from Red Hat's tree had a small bug:
>  if mem= was not actually used with the additional features, but
>  int plain old way, is used the value as the size of memory it
>  wants, not the upper limit.  The problem with this is that there
>  is a small difference due to memory holes.
>  				    
>  I had one report of a person using mem= to reduce memory size for
>  a broken i386 chipset thaty only supports 64MB cached and the rest
>  as mtd/slram device for swap.  I got broken as the boundaries changed.
> 
> 
>  Assuming this patch is correct, it needs forward porting to 2.6

I'll queue up a patch to do that.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 13:32           ` Martin Waitz
@ 2003-10-15 17:34             ` Andrew Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Morton @ 2003-10-15 17:34 UTC (permalink / raw)
  To: Martin Waitz; +Cc: anton, wli, linux-kernel

Martin Waitz <tali@admingilde.org> wrote:
>
> On Tue, Oct 14, 2003 at 04:40:04PM -0700, Andrew Morton wrote:
>  > + *	min_free_kbytes = lowmem_kbytes / sqrt(lowmem_kbytes)
> 
>  you do have a strange sqrt here ;)

You're the fifth person to tell me that.  Is this linux-kernel or linux-math?

Turns out that the int_sqrt() I stole from oom-kill.c appears to get wrong
numbers anyway.  I'll probably steal fb_sqrt(), which appears to get
correct numbers and consolidate it all...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mem=16MB laptop testing
  2003-10-15 15:38             ` Thomas Schlichter
  2003-10-15 16:06               ` Dave Jones
@ 2003-10-15 17:45               ` Mike Dresser
  1 sibling, 0 replies; 29+ messages in thread
From: Mike Dresser @ 2003-10-15 17:45 UTC (permalink / raw)
  To: linux-kernel

On Wed, 15 Oct 2003, Thomas Schlichter wrote:

> I had one report of a person using mem= to reduce memory size for
> a broken i386 chipset thaty only supports 64MB cached and the rest
> as mtd/slram device for swap.  I got broken as the boundaries changed.

The 430FX, HX, VX, and TX ones?

There's also some VIA/Ali/etc chipsets of that same era that have cache
ram limits as well.

Found a good page listing all the limits while looking up info
yesterday, from when PC Chips was pirating BIOS code, for a discussion
going on over at one of the storagereview.com forums

Mike

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2003-10-15 17:45 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-14 10:55 mem=16MB laptop testing William Lee Irwin III
2003-10-14 11:01 ` John Bradford
2003-10-14 11:08   ` William Lee Irwin III
2003-10-14 13:20     ` John Bradford
2003-10-14 11:56 ` Andrew Morton
2003-10-14 11:58   ` Russell King
2003-10-14 12:10     ` Andrew Morton
2003-10-14 12:18       ` Russell King
2003-10-14 12:30         ` Andrew Morton
2003-10-14 12:17   ` Anton Blanchard
2003-10-14 12:31     ` Andrew Morton
2003-10-14 12:44       ` Anton Blanchard
2003-10-14 23:40         ` Andrew Morton
2003-10-15 13:32           ` Martin Waitz
2003-10-15 17:34             ` Andrew Morton
2003-10-14 12:28   ` William Lee Irwin III
2003-10-15 12:12   ` Pavel Machek
2003-10-15 12:51     ` William Lee Irwin III
2003-10-15 13:20       ` Pavel Machek
2003-10-15 13:28         ` William Lee Irwin III
2003-10-15 13:59           ` Larry Sendlosky
2003-10-15 15:34             ` Dave Jones
2003-10-15 15:38             ` Thomas Schlichter
2003-10-15 16:06               ` Dave Jones
2003-10-15 17:45               ` Mike Dresser
2003-10-15 15:32         ` Dave Jones
2003-10-15 17:20           ` Andrew Morton
2003-10-15  0:35 ` Nick Piggin
2003-10-15  4:31 ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.