linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.63-mm1
@ 2003-02-27 10:59 Andrew Morton
  2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-27 10:59 UTC (permalink / raw)
  To: linux-kernel, linux-mm


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm1/

. Tons of changes to the anticipatory scheduler.  It may not be working
  very well at present.  Please use "elevator=deadline" if it causes
  problems.

. Updated smalldevfs patch.

. A fix for the VMA-based reverse mapping patch.

. Added Ingo's latest CPU scheduler update.

. Lots of random fixes.



 linus.patch

 Latest from Linus

-initial-jiffies.patch
-user-times-jiffies-wrap-fix.patch
-put_page-speedup.patch
-slab-batchcount-limit-fix.patch
-crc32-speedup-2.patch
-flush-tlb-all-2.patch
-linux-2.5.62-early_ioremap_A0.patch
-linux-2.5.62-x440disco_A0.patch
-use-find_get_page.patch
-irda-interruptible-sleep.patch
-dget-BUG.patch
-disk-accounting-fix.patch
-hugh-inode-pruning-race-fix.patch
-kill-bogus-wakeup-messge.patch
-dont-sync-with-stopped-pdflush.patch
-irq-balance-disable-fix.patch
-oom-killer-dont-spin-on-same-task.patch
-add-missing-global_flush_tlb-calls.patch
-ext3-O_SYNC-speedup.patch
-remove-MAX_BLKDEV-from-genhd.patch

 Merged

+separate.patch

 My contribution to the spelling bee.

+rpc_rmdir-fix.patch

 Fix the NFS oops

+ppc64-scruffiness.patch

 Fix some warnings

-reiserfs_file_write-4.patch
+reiserfs_file_write-5.patch

 Updated (I don't think it changed)

+limit-write-latency.patch

 Fix potential source of write-vs-write latency in VFS

+lockd-lockup-fix-2.patch

 Updated patch from Neil for an NFS server deadlock

+loop-hack.patch

 Fix an OOM and oops in loop

+flock-fix.patch

 File locking fix from Matthew

+sysfs-dget-fix-2.patch

 Fix a sysfs dentry race (this isn't right)

+irq-sharing-fix.patch

 Fix SA_INTERRUPT for shared interrupts

+anticipation_is_killing_me.patch
+as-fix-hughs-problem.patch
+as-cleanup.patch
+as-start-stop-anticipation-helpers.patch
+as-cleanup-2.patch
+as-cleanup-3.patch
+as-cleanup-3-write-latency-fix.patch
+as-handle-exitted-tasks.patch
+as-handle-exitted-tasks-fix.patch
+as-no-plugging-and-cleanups.patch
+as-remove-debug.patch
+as-track-queued-reads.patch
+as-accounting-fix.patch
+as-nr_reads-fix.patch
+as-tuning.patch
+as-disable-nr_reads.patch

 Anticipatory scheduler work

 smalldevfs.patch

 Updated

-smalldevfs-dcache_rcu-fix.patch

 Folded into smalldevfs.patch

+objrmap-X-fix.patch

 Fix VMA-based reverse mapping

+per-cpu-disk-stats.patch

 Use per-cpu data for disk accounting

+presto_get_sb-fix.patch

 Fix an intermezzo oops

+on_each_cpu.patch
+on_each_cpu-ldt-cleanup.patch

 preempt-safety for smp_call_function()

+notsc-panic.patch

 x86 TSC cleanup

+alloc_pages_cleanup.patch

 Code consolidation

+ext2-handle-htree-flag.patch

 ext2 htree back-compatibility

+sched-a3.patch

 CPU scheduler update

+mpparse-typo-fix.patch

 Fix a printk bug

+i386-no-swap-fix.patch

 Fix ia32 CONFIG_SWAP=n

+remove-hugetlb_key.patch
+hugetlbpage-doc-update.patch
+hugetlb-valid-page-ranges.patch

 Hugetlbpage work




All 88 patches:

linus.patch
  Latest from Linus

separate.patch

mm.patch
  add -mmN to EXTRAVERSION

rpc_rmdir-fix.patch
  Fix nfs oops during mount

ppc64-reloc_hide.patch

ppc64-pci-patch.patch
  Subject: pci patch

ppc64-e100-fix.patch
  fix e100 for big-endian machines

ppc64-aio-32bit-emulation.patch
  32/64bit emulation for aio

ppc64-64-bit-exec-fix.patch
  Subject: 64bit exec

ppc64-scruffiness.patch
  Fix some PPC64 compile warnings

sym-do-160.patch
  make the SYM driver do 160 MB/sec

kgdb.patch

nfsd-disable-softirq.patch
  Fix race in svcsock.c in 2.5.61

report-lost-ticks.patch
  make lost-tick detection more informative

devfs-fix.patch

ptrace-flush.patch
  cache flushing in the ptrace code

buffer-debug.patch
  buffer.c debugging

warn-null-wakeup.patch

ext3-truncate-ordered-pages.patch
  ext3: explicitly free truncated pages

deadline-dispatching-fix.patch
  deadline IO scheduler dispatching fix

nfs-unstable-pages.patch
  "unstable" page accounting for NFS.

limit-write-latency.patch

reiserfs_file_write-5.patch

tcp-wakeups.patch
  Use fast wakeups in TCP/IPV4

lockd-lockup-fix-2.patch
  Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)

rcu-stats.patch
  RCU statistics reporting

ext3-journalled-data-assertion-fix.patch
  Remove incorrect assertion from ext3

nfs-speedup.patch

nfs-oom-fix.patch
  nfs oom fix

sk-allocation.patch
  Subject: Re: nfs oom

nfs-more-oom-fix.patch

nfs-sendfile.patch
  Implement sendfile() for NFS

rpciod-atomic-allocations.patch
  Make rcpiod use atomic allocations

linux-isp.patch

isp-update-1.patch

remove-unused-congestion-stuff.patch
  Subject: [PATCH] remove unused congestion stuff

aic-makefile-fix.patch
  aicasm Makefile fix

loop-hack.patch
  loop: Fix OOM and oops

atm_dev_sem.patch
  convert atm_dev_lock from spinlock to semaphore

flock-fix.patch
  flock fixes for 2.5.62

sysfs-dget-fix-2.patch

irq-sharing-fix.patch
  fix irq sharing and SA_INTERRUPT on x86

as-iosched.patch
  anticipatory I/O scheduler

as-comments-and-tweaks.patch
  antsched: commentary and

as-hz-1000-fix.patch
  Fix anticipatory scheduler for HZ=100

as-tidy-up-rename.patch
  tidy up AS rename

anticipation_is_killing_me.patch

as-update-1.patch
  AS update

as-break-anticipation-on-write.patch
  AS break on write

as-break-if-readahead.patch
  detect overlapping reads and writes

as-fix-hughs-problem.patch
  Add a pointer to the queue into struct as_data

as-cleanup.patch
  anticipatory scheduler cleanups

as-start-stop-anticipation-helpers.patch
  AS: add anticipation stop/start helper functions

as-cleanup-2.patch
  Subject: [PATCH] some cleanups 2

as-cleanup-3.patch
  AS: more cleanups

as-cleanup-3-write-latency-fix.patch
  Fix as-cleanup-3

as-handle-exitted-tasks.patch

as-handle-exitted-tasks-fix.patch
  fix for as IO contexts

as-no-plugging-and-cleanups.patch
  AS no plugging + cleanups

as-remove-debug.patch

as-track-queued-reads.patch
  AS: track queued reads

as-accounting-fix.patch
  AS: track queued reads (fix)

as-nr_reads-fix.patch
  AS: read accounting fix

as-tuning.patch
  AS: tuning

as-disable-nr_reads.patch
  AS: disable per-process in-flight read logic

readahead-shrink-to-zero.patch
  Allow VFS readahead to fall to zero

cfq-2.patch
  CFQ scheduler, #2

smalldevfs.patch
  smalldevfs

objrmap-2.5.62-5.patch
  object-based rmap

objrmap-X-fix.patch
  objrmap fix for X

oprofile-up-fix.patch
  fix oprofile on UP (lockless sync)

update_atime-speedup.patch
  speed up update_atime()

ext2-update_atime_speedup.patch
  Use one_sec_update_atime in ext2

ext3-update_atime_speedup.patch
  Use one_sec_update_atime in ext2

UPDATE_ATIME-to-update_atime.patch
  Rename UPDATE_ATIME to update_atime

per-cpu-disk-stats.patch
  Make diskstats per-cpu using kmalloc_percpu

presto_get_sb-fix.patch
  fix presto_get_sb() return value and oops.

on_each_cpu.patch
  fix preempt-issues with smp_call_function()

on_each_cpu-ldt-cleanup.patch

notsc-panic.patch
  Don't panic if TSC is enabled and notsc is used

alloc_pages_cleanup.patch
  clean up redundant code for alloc_pages

ext2-handle-htree-flag.patch
  ext2: clear ext3 htree flag on directories

sched-a3.patch
  "HT scheduler", sched-2.5.63-A3

mpparse-typo-fix.patch
  fix typo in arch/i386/kernel/mpparse.c in printk

i386-no-swap-fix.patch
  allow CONFIG_SWAP=n for i386

remove-hugetlb_key.patch
  remove dead hugetlb_key forward decl

hugetlbpage-doc-update.patch
  hugetlbpage documentation update

hugetlb-valid-page-ranges.patch
  hugetlb: fix MAP_FIXED handling




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Rising io_load results Re: 2.5.63-mm1
  2003-02-27 10:59 2.5.63-mm1 Andrew Morton
@ 2003-02-27 21:22 ` Con Kolivas
  2003-02-27 21:44   ` Andrew Morton
  2003-02-28  0:17 ` 2.5.63-mm1 Ed Tomlinson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2003-02-27 21:22 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm


I mentioned this previously; it's still happening.

This started some time around 2.5.62-mm3 with the io_load results on contest 
benchmarking (http://contest.kolivas.org) rising with each run. It still 
occurs with 2.5.63-mm1 regardless of which elevator is specified. This is the 
io load result time(seconds) for 6 consecutive runs in compile time:

111
147
221
284
334
358

/proc/meminfo after 6 runs and mem flushing:

MemTotal:       256156 kB
MemFree:        238708 kB
Buffers:          2320 kB
Cached:           1552 kB
SwapCached:       1780 kB
Active:           5876 kB
Inactive:         2120 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       256156 kB
LowFree:        238708 kB
SwapTotal:     4194272 kB
SwapFree:      4192416 kB
Dirty:              28 kB
Writeback:           0 kB
Mapped:       4294923652 kB
Slab:             4872 kB
Committed_AS:     7032 kB
PageTables:        200 kB
ReverseMaps:       631

I am refraining from publishing any benchmark results with this happening. It 
doesn't seem to occur on 2.5.63

Con

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
@ 2003-02-27 21:44   ` Andrew Morton
  2003-02-27 22:01     ` Dave McCracken
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-02-27 21:44 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, linux-mm

Con Kolivas <kernel@kolivas.org> wrote:
>
> 
> This started some time around 2.5.62-mm3 with the io_load results on contest 
> benchmarking (http://contest.kolivas.org) rising with each run.
> ...
> Mapped:       4294923652 kB

Well that's gotta hurt.  This metric is used in making writeback decisions. 
Probably the objrmap patch.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-27 21:44   ` Andrew Morton
@ 2003-02-27 22:01     ` Dave McCracken
  2003-02-27 22:24       ` Andrew Morton
  2003-02-27 23:56       ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
  0 siblings, 2 replies; 24+ messages in thread
From: Dave McCracken @ 2003-02-27 22:01 UTC (permalink / raw)
  To: Andrew Morton, Con Kolivas; +Cc: linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]


--On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
<akpm@digeo.com> wrote:

>> ...
>> Mapped:       4294923652 kB
> 
> Well that's gotta hurt.  This metric is used in making writeback
> decisions.  Probably the objrmap patch.

Oops.  You're right.  Here's a patch to fix it.

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059

[-- Attachment #2: objmapped-2.5.63-1.diff --]
[-- Type: text/plain, Size: 337 bytes --]

--- 2.5.63-objrmap/mm/rmap.c	2003-02-27 15:58:34.000000000 -0600
+++ 2.5.63-objfix/mm/rmap.c	2003-02-27 15:56:56.000000000 -0600
@@ -248,6 +248,8 @@
 			BUG();
 		if (PageSwapCache(page))
 			BUG();
+		if (atomic_read(&page->pte.mapcount) == 0)
+			inc_page_state(nr_mapped);
 		atomic_inc(&page->pte.mapcount);
 		return pte_chain;
 	}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-27 22:01     ` Dave McCracken
@ 2003-02-27 22:24       ` Andrew Morton
  2003-03-03 21:06         ` [PATCH 2.5.63] Teach page_mapped about the anon flag Dave McCracken
  2003-02-27 23:56       ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-02-27 22:24 UTC (permalink / raw)
  To: Dave McCracken; +Cc: kernel, linux-kernel, linux-mm

Dave McCracken <dmccr@us.ibm.com> wrote:
>
> 
> --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> <akpm@digeo.com> wrote:
> 
> >> ...
> >> Mapped:       4294923652 kB
> > 
> > Well that's gotta hurt.  This metric is used in making writeback
> > decisions.  Probably the objrmap patch.
> 
> Oops.  You're right.  Here's a patch to fix it.
> 

Thanks.

I'm just looking at page_mapped().  It is now implicitly assuming that the
architecture's representation of a zero-count atomic_t is all-bits-zero.

This is not true on sparc32 if some other CPU is in the middle of an
atomic_foo() against that counter.  Maybe the assumption is false on other
architectures too.

So page_mapped() really should be performing an atomic_read() if that is
appropriate to the particular page.  I guess this involves testing
page->mapping.  Which is stable only when the page is locked or
mapping->page_lock is held.

It appears that all page_mapped() callers are inside lock_page() at present,
so a quick audit and addition of a comment would be appropriate there please.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-27 22:01     ` Dave McCracken
  2003-02-27 22:24       ` Andrew Morton
@ 2003-02-27 23:56       ` Con Kolivas
  2003-02-28  0:06         ` Andrew Morton
  1 sibling, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2003-02-27 23:56 UTC (permalink / raw)
  To: Dave McCracken, Andrew Morton; +Cc: linux-kernel, linux-mm

On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
>
> <akpm@digeo.com> wrote:
> >> ...
> >> Mapped:       4294923652 kB
> >
> > Well that's gotta hurt.  This metric is used in making writeback
> > decisions.  Probably the objrmap patch.
>
> Oops.  You're right.  Here's a patch to fix it.

Thanks. 

This looks better after a run:

MemTotal:       256156 kB
MemFree:        189448 kB
Buffers:         46744 kB
Cached:           4176 kB
SwapCached:          0 kB
Active:          51840 kB
Inactive:         1768 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       256156 kB
LowFree:        189448 kB
SwapTotal:     4194272 kB
SwapFree:      4194272 kB
Dirty:               0 kB
Writeback:           0 kB
Mapped:        4546752 kB
Slab:             8468 kB
Committed_AS:     7032 kB
PageTables:        200 kB
ReverseMaps:       662

Con

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-27 23:56       ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
@ 2003-02-28  0:06         ` Andrew Morton
  2003-02-28  0:28           ` Con Kolivas
  2003-02-28 12:48           ` Hugh Dickins
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28  0:06 UTC (permalink / raw)
  To: Con Kolivas; +Cc: dmccr, linux-kernel, linux-mm

Con Kolivas <kernel@kolivas.org> wrote:
>
> On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> > --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> >
> > <akpm@digeo.com> wrote:
> > >> ...
> > >> Mapped:       4294923652 kB
> > >
> > > Well that's gotta hurt.  This metric is used in making writeback
> > > decisions.  Probably the objrmap patch.
> >
> > Oops.  You're right.  Here's a patch to fix it.
> 
> Thanks. 
> 
> This looks better after a run:
> 
> MemTotal:       256156 kB
> ...
> Mapped:        4546752 kB

No, it is still wrong.  Mapped cannot exceed MemTotal.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.5.63-mm1
  2003-02-27 10:59 2.5.63-mm1 Andrew Morton
  2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
@ 2003-02-28  0:17 ` Ed Tomlinson
  2003-02-28  0:46   ` 2.5.63-mm1 Andrew Morton
  2003-02-28 12:16 ` 2.5.63-mm1 steven roemen
       [not found] ` <3E5F7DAD.2080306@cyberone.com.au>
  3 siblings, 1 reply; 24+ messages in thread
From: Ed Tomlinson @ 2003-02-28  0:17 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm; +Cc: Nick Piggin

On February 27, 2003 05:59 am, Andrew Morton wrote:
> . Tons of changes to the anticipatory scheduler.  It may not be working
>   very well at present.  Please use "elevator=deadline" if it causes
>   problems.

The anticipatory scheduler hangs here at the same place it did in 62-mm2,
cfq continues to work fine.  A sysrq+T of the hang follows:

Hope this helps,
Ed Tomlinson

SysRq : Show State

                         free                        sibling
  task             PC    stack   pid father child younger older
swapper       D DFF8FB20 11876     1      0     2               (L-TLB)
Call Trace:
 [<c01143aa>] io_schedule+0xe/0x18
 [<c012a105>] __lock_page+0x8d/0xac
 [<c0114ba8>] autoremove_wake_function+0x0/0x38
 [<c0114ba8>] autoremove_wake_function+0x0/0x38
 [<c012a58e>] do_generic_mapping_read+0x13a/0x340
 [<c012aa5a>] __generic_file_aio_read+0x1c6/0x1e4
 [<c012a794>] file_read_actor+0x0/0x100
 [<c012ab3f>] generic_file_read+0x7f/0x9c
 [<c015400c>] dput+0x1c/0x1a0
 [<c015400c>] dput+0x1c/0x1a0
 [<c012ff37>] kmem_cache_alloc+0x23/0x60
 [<c0140e57>] vfs_read+0xab/0x150
 [<c01498c4>] kernel_read+0x3c/0x48
 [<c0161f82>] load_elf_binary+0x2f2/0xbbc
 [<c012ab3f>] generic_file_read+0x7f/0x9c
 [<c012f91c>] cache_init_objs+0x34/0x60
 [<c012d2af>] buffered_rmqueue+0xfb/0x108
 [<c012d33c>] __alloc_pages+0x80/0x264
 [<c014a4ad>] search_binary_handler+0xad/0x23c
 [<c0161c90>] load_elf_binary+0x0/0xbbc
 [<c014a786>] do_execve+0x14a/0x1a8
 [<c0107750>] sys_execve+0x2c/0x60
 [<c0108c47>] syscall_call+0x7/0xb
 [<c0105175>] init+0x109/0x174
 [<c010506c>] init+0x0/0x174
 [<c0107019>] kernel_thread_helper+0x5/0xc

ksoftirqd/0   S DFF8A000 4294963836     2      1             3       (L-TLB)
Call Trace:
 [<c011a1fc>] ksoftirqd+0x24/0xa4
 [<c011a23e>] ksoftirqd+0x66/0xa4
 [<c011a1d8>] ksoftirqd+0x0/0xa4
 [<c0107019>] kernel_thread_helper+0x5/0xc

events/0      D DFF89ED4 4294953708     3      1    12       4     2 (L-TLB)
Call Trace:
 [<c0113985>] wait_for_completion+0x9d/0xe0
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0116363>] do_fork+0x113/0x14c
 [<c010708e>] kernel_thread+0x6e/0x84
 [<c0122b50>] __call_usermodehelper+0x0/0x58
 [<c0122a70>] ____call_usermodehelper+0x0/0x94
 [<c0107014>] kernel_thread_helper+0x0/0xc
 [<c0122b80>] __call_usermodehelper+0x30/0x58
 [<c0122a70>] ____call_usermodehelper+0x0/0x94
 [<c012304f>] worker_thread+0x1a3/0x274
 [<c0122eac>] worker_thread+0x0/0x274
 [<c0122b50>] __call_usermodehelper+0x0/0x58
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0107019>] kernel_thread_helper+0x5/0xc

khubd         D DFD61D94 4292690652     4      1             5     3 (L-TLB)
Call Trace:
 [<c01136a0>] do_schedule+0x2a0/0x348
 [<c0113985>] wait_for_completion+0x9d/0xe0
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0122cb2>] call_usermodehelper+0x10a/0x118
 [<c01f44d8>] usb_hotplug+0x0/0x1c4
 [<c0122b50>] __call_usermodehelper+0x0/0x58
 [<c0122b50>] __call_usermodehelper+0x0/0x58
 [<c01b5a42>] do_hotplug+0x1c2/0x1ec
 [<c01b5a91>] dev_hotplug+0x25/0x30
 [<c01f44d8>] usb_hotplug+0x0/0x1c4
 [<c01b3d9a>] device_add+0x112/0x148
 [<c01f4ef6>] usb_new_device+0x322/0x480
 [<c0117086>] printk+0x122/0x148
 [<c01f6a9f>] usb_hub_port_connect_change+0x233/0x2c4
 [<c01f6c69>] usb_hub_events+0x139/0x2c8
 [<c01f6e25>] usb_hub_thread+0x2d/0xd4
 [<c01f6df8>] usb_hub_thread+0x0/0xd4
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0107019>] kernel_thread_helper+0x5/0xc

pdflush       S DFD2FFD4 4292485228     5      1             6     4 (L-TLB)
Call Trace:
 [<c012e7e5>] __pdflush+0x95/0x1b0
 [<c012e900>] pdflush+0x0/0x14
 [<c012e90f>] pdflush+0xf/0x14
 [<c0107019>] kernel_thread_helper+0x5/0xc

pdflush       S DFD2DFD4 14388     6      1             7     5 (L-TLB)
Call Trace:
 [<c012e7e5>] __pdflush+0x95/0x1b0
 [<c012e900>] pdflush+0x0/0x14
 [<c012e90f>] pdflush+0xf/0x14
 [<c0107019>] kernel_thread_helper+0x5/0xc

kswapd0       S DFD29F44 4294958912     7      1             8     6 (L-TLB)
Call Trace:
 [<c01328fb>] kswapd+0xcb/0xf0
 [<c0132830>] kswapd+0x0/0xf0
 [<c0109d26>] math_state_restore+0x2a/0x3c
 [<c0108f05>] device_not_available+0x25/0x2a
 [<c010e3f5>] save_init_fpu+0x1d/0x3c
 [<c0113770>] preempt_schedule+0x28/0x40
 [<c0112eb3>] schedule_tail+0x2f/0x94
 [<c0108b06>] ret_from_fork+0x6/0x20
 [<c0114ba8>] autoremove_wake_function+0x0/0x38
 [<c0114ba8>] autoremove_wake_function+0x0/0x38
 [<c0107019>] kernel_thread_helper+0x5/0xc

aio/0         S DFFE8EA0 4294952400     8      1             9     7 (L-TLB)
Call Trace:
 [<c0122fa8>] worker_thread+0xfc/0x274
 [<c0122eac>] worker_thread+0x0/0x274
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0107019>] kernel_thread_helper+0x5/0xc

kpnpbiosd     Z DFFEE800 4294880232     9      1            10     8 (L-TLB)
Call Trace:
 [<c0118b99>] do_exit+0x41d/0x428
 [<c01aca44>] pnp_dock_thread+0x0/0xf4
 [<c0118bbb>] complete_and_exit+0x17/0x18
 [<c01acadc>] pnp_dock_thread+0x98/0xf4
 [<c01aca44>] pnp_dock_thread+0x0/0xf4
 [<c0107019>] kernel_thread_helper+0x5/0xc

kseriod       S DFC44000 4294030016    10      1            11     9 (L-TLB)
Call Trace:
 [<c02073e7>] serio_thread+0x9f/0x12c
 [<c0207348>] serio_thread+0x0/0x12c
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0107019>] kernel_thread_helper+0x5/0xc

reiserfs/0    S DFCBD460  8080    11      1                  10 (L-TLB)
Call Trace:
 [<c0122fa8>] worker_thread+0xfc/0x274
 [<c0122eac>] worker_thread+0x0/0x274
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0113788>] default_wake_function+0x0/0x18
 [<c0107019>] kernel_thread_helper+0x5/0xc

events/0      D DFAC7A30 4294892756    12      3                     (L-TLB)
Call Trace:
 [<c01143aa>] io_schedule+0xe/0x18
 [<c012a105>] __lock_page+0x8d/0xac
 [<c0114ba8>] autoremove_wake_function+0x0/0x38
 [<c0114ba8>] autoremove_wake_function+0x0/0x38
 [<c012a58e>] do_generic_mapping_read+0x13a/0x340
 [<c012aa5a>] __generic_file_aio_read+0x1c6/0x1e4
 [<c012a794>] file_read_actor+0x0/0x100
 [<c017f6b0>] reiserfs_get_block+0x0/0x11cc
 [<c012ab3f>] generic_file_read+0x7f/0x9c
 [<c015400c>] dput+0x1c/0x1a0
 [<c015400c>] dput+0x1c/0x1a0
 [<c012ff37>] kmem_cache_alloc+0x23/0x60
 [<c0140e57>] vfs_read+0xab/0x150
 [<c01498c4>] kernel_read+0x3c/0x48
 [<c0161f82>] load_elf_binary+0x2f2/0xbbc
 [<c012ab3f>] generic_file_read+0x7f/0x9c
 [<c014bf83>] real_lookup+0x67/0xd0
 [<c014c254>] do_lookup+0x48/0x84
 [<c015400c>] dput+0x1c/0x1a0
 [<c014c95a>] link_path_walk+0x6ca/0x848
 [<c014a4ad>] search_binary_handler+0xad/0x23c
 [<c0161c90>] load_elf_binary+0x0/0xbbc
 [<c01614c1>] load_script+0x1d1/0x1e0
 [<c012d2af>] buffered_rmqueue+0xfb/0x108
 [<c012d33c>] __alloc_pages+0x80/0x264
 [<c014a4ad>] search_binary_handler+0xad/0x23c
 [<c01612f0>] load_script+0x0/0x1e0
 [<c014a786>] do_execve+0x14a/0x1a8
 [<c0107750>] sys_execve+0x2c/0x60
 [<c0108c47>] syscall_call+0x7/0xb
 [<c0122ae8>] ____call_usermodehelper+0x78/0x94
 [<c0122a70>] ____call_usermodehelper+0x0/0x94
 [<c0107019>] kernel_thread_helper+0x5/0xc




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-28  0:06         ` Andrew Morton
@ 2003-02-28  0:28           ` Con Kolivas
  2003-02-28  7:46             ` Duncan Sands
  2003-02-28 12:48           ` Hugh Dickins
  1 sibling, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2003-02-28  0:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dmccr, linux-kernel, linux-mm

On Fri, 28 Feb 2003 11:06 am, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> > > --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> > >
> > > <akpm@digeo.com> wrote:
> > > >> ...
> > > >> Mapped:       4294923652 kB
> > > >
> > > > Well that's gotta hurt.  This metric is used in making writeback
> > > > decisions.  Probably the objrmap patch.
> > >
> > > Oops.  You're right.  Here's a patch to fix it.
> >
> > Thanks.
> >
> > This looks better after a run:
> >
> > MemTotal:       256156 kB
> > ...
> > Mapped:        4546752 kB
>
> No, it is still wrong.  Mapped cannot exceed MemTotal.

Hmm a few more runs and io_load starts rising again and this is the meminfo in 
the middle of a run:

MemTotal:       256156 kB
MemFree:         26564 kB
Buffers:         11300 kB
Cached:         198048 kB
SwapCached:          0 kB
Active:           7164 kB
Inactive:       204736 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       256156 kB
LowFree:         26564 kB
SwapTotal:     4194272 kB
SwapFree:      4194272 kB
Dirty:            5780 kB
Writeback:           0 kB
Mapped:        6000680 kB
Slab:            13056 kB
Committed_AS:     7040 kB
PageTables:        200 kB
ReverseMaps:       664

Con


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.5.63-mm1
  2003-02-28  0:17 ` 2.5.63-mm1 Ed Tomlinson
@ 2003-02-28  0:46   ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28  0:46 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: linux-kernel, linux-mm, piggin

Ed Tomlinson <tomlins@cam.org> wrote:
>
> On February 27, 2003 05:59 am, Andrew Morton wrote:
> > . Tons of changes to the anticipatory scheduler.  It may not be working
> >   very well at present.  Please use "elevator=deadline" if it causes
> >   problems.
> 
> The anticipatory scheduler hangs here at the same place it did in 62-mm2,
> cfq continues to work fine.  A sysrq+T of the hang follows:

I must say, Ed: you have an eerie ability to break stuff.

Please send me your .config.

>                          free                        sibling
>   task             PC    stack   pid father child younger older
> swapper       D DFF8FB20 11876     1      0     2               (L-TLB)

Interesting amount of free stack you have there.  You broke show_task() too!



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-28  0:28           ` Con Kolivas
@ 2003-02-28  7:46             ` Duncan Sands
  2003-02-28  8:06               ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Duncan Sands @ 2003-02-28  7:46 UTC (permalink / raw)
  To: Con Kolivas, Andrew Morton; +Cc: dmccr, linux-kernel, linux-mm

Hi Con, are you sure this is not the same for 2.5.63?
I left 2.5.63 running over night (doing nothing but run
KDE), and in the morning it was swapping heavily.
About 200MB was swapped out and this did not reduce
with usage.  According to top, 10% of memory was being
used by a Konsole with nothing in it (could be a memory
leak in Konsole).  After half an hour I gave up - it was
too unusable.  Maybe -mm1 just accentuates a problem
that is already there in 2.5.63.

Ciao,

Duncan.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-28  7:46             ` Duncan Sands
@ 2003-02-28  8:06               ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28  8:06 UTC (permalink / raw)
  To: Duncan Sands; +Cc: kernel, dmccr, linux-kernel, linux-mm

Duncan Sands <baldrick@wanadoo.fr> wrote:
>
> Hi Con, are you sure this is not the same for 2.5.63?
> I left 2.5.63 running over night (doing nothing but run
> KDE), and in the morning it was swapping heavily.
> About 200MB was swapped out and this did not reduce
> with usage.  According to top, 10% of memory was being
> used by a Konsole with nothing in it (could be a memory
> leak in Konsole).  After half an hour I gave up - it was
> too unusable.  Maybe -mm1 just accentuates a problem
> that is already there in 2.5.63.
> 

Please take a snapshot of /proc/meminfo and /proc/slabinfo
if anything like this happens.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.5.63-mm1
  2003-02-27 10:59 2.5.63-mm1 Andrew Morton
  2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
  2003-02-28  0:17 ` 2.5.63-mm1 Ed Tomlinson
@ 2003-02-28 12:16 ` steven roemen
  2003-02-28 12:24   ` 2.5.63-mm1 Andrew Morton
       [not found] ` <3E5F7DAD.2080306@cyberone.com.au>
  3 siblings, 1 reply; 24+ messages in thread
From: steven roemen @ 2003-02-28 12:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm


the kernel oopses when i2c is compiled into the kernel with -mm1, and
-mm1 with dave mccraken's patch.  

also when i remove i2c from the kernel and boot into it with AS as the
elevator, the load (via top) starts at 2.00, yet the processors aren't
loaded very much at all.  is this a known issue(this is the first -mm
kernel i've run)?

-steve

On Thu, 2003-02-27 at 04:59, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm1/
> 
> . Tons of changes to the anticipatory scheduler.  It may not be working
>   very well at present.  Please use "elevator=deadline" if it causes
>   problems.
> 
> . Updated smalldevfs patch.
> 
> . A fix for the VMA-based reverse mapping patch.
> 
> . Added Ingo's latest CPU scheduler update.
> 
> . Lots of random fixes.
> 
> 
> 
>  linus.patch
> 
>  Latest from Linus
> 
> -initial-jiffies.patch
> -user-times-jiffies-wrap-fix.patch
> -put_page-speedup.patch
> -slab-batchcount-limit-fix.patch
> -crc32-speedup-2.patch
> -flush-tlb-all-2.patch
> -linux-2.5.62-early_ioremap_A0.patch
> -linux-2.5.62-x440disco_A0.patch
> -use-find_get_page.patch
> -irda-interruptible-sleep.patch
> -dget-BUG.patch
> -disk-accounting-fix.patch
> -hugh-inode-pruning-race-fix.patch
> -kill-bogus-wakeup-messge.patch
> -dont-sync-with-stopped-pdflush.patch
> -irq-balance-disable-fix.patch
> -oom-killer-dont-spin-on-same-task.patch
> -add-missing-global_flush_tlb-calls.patch
> -ext3-O_SYNC-speedup.patch
> -remove-MAX_BLKDEV-from-genhd.patch
> 
>  Merged
> 
> +separate.patch
> 
>  My contribution to the spelling bee.
> 
> +rpc_rmdir-fix.patch
> 
>  Fix the NFS oops
> 
> +ppc64-scruffiness.patch
> 
>  Fix some warnings
> 
> -reiserfs_file_write-4.patch
> +reiserfs_file_write-5.patch
> 
>  Updated (I don't think it changed)
> 
> +limit-write-latency.patch
> 
>  Fix potential source of write-vs-write latency in VFS
> 
> +lockd-lockup-fix-2.patch
> 
>  Updated patch from Neil for an NFS server deadlock
> 
> +loop-hack.patch
> 
>  Fix an OOM and oops in loop
> 
> +flock-fix.patch
> 
>  File locking fix from Matthew
> 
> +sysfs-dget-fix-2.patch
> 
>  Fix a sysfs dentry race (this isn't right)
> 
> +irq-sharing-fix.patch
> 
>  Fix SA_INTERRUPT for shared interrupts
> 
> +anticipation_is_killing_me.patch
> +as-fix-hughs-problem.patch
> +as-cleanup.patch
> +as-start-stop-anticipation-helpers.patch
> +as-cleanup-2.patch
> +as-cleanup-3.patch
> +as-cleanup-3-write-latency-fix.patch
> +as-handle-exitted-tasks.patch
> +as-handle-exitted-tasks-fix.patch
> +as-no-plugging-and-cleanups.patch
> +as-remove-debug.patch
> +as-track-queued-reads.patch
> +as-accounting-fix.patch
> +as-nr_reads-fix.patch
> +as-tuning.patch
> +as-disable-nr_reads.patch
> 
>  Anticipatory scheduler work
> 
>  smalldevfs.patch
> 
>  Updated
> 
> -smalldevfs-dcache_rcu-fix.patch
> 
>  Folded into smalldevfs.patch
> 
> +objrmap-X-fix.patch
> 
>  Fix VMA-based reverse mapping
> 
> +per-cpu-disk-stats.patch
> 
>  Use per-cpu data for disk accounting
> 
> +presto_get_sb-fix.patch
> 
>  Fix an intermezzo oops
> 
> +on_each_cpu.patch
> +on_each_cpu-ldt-cleanup.patch
> 
>  preempt-safety for smp_call_function()
> 
> +notsc-panic.patch
> 
>  x86 TSC cleanup
> 
> +alloc_pages_cleanup.patch
> 
>  Code consolidation
> 
> +ext2-handle-htree-flag.patch
> 
>  ext2 htree back-compatibility
> 
> +sched-a3.patch
> 
>  CPU scheduler update
> 
> +mpparse-typo-fix.patch
> 
>  Fix a printk bug
> 
> +i386-no-swap-fix.patch
> 
>  Fix ia32 CONFIG_SWAP=n
> 
> +remove-hugetlb_key.patch
> +hugetlbpage-doc-update.patch
> +hugetlb-valid-page-ranges.patch
> 
>  Hugetlbpage work
> 
> 
> 
> 
> All 88 patches:
> 
> linus.patch
>   Latest from Linus
> 
> separate.patch
> 
> mm.patch
>   add -mmN to EXTRAVERSION
> 
> rpc_rmdir-fix.patch
>   Fix nfs oops during mount
> 
> ppc64-reloc_hide.patch
> 
> ppc64-pci-patch.patch
>   Subject: pci patch
> 
> ppc64-e100-fix.patch
>   fix e100 for big-endian machines
> 
> ppc64-aio-32bit-emulation.patch
>   32/64bit emulation for aio
> 
> ppc64-64-bit-exec-fix.patch
>   Subject: 64bit exec
> 
> ppc64-scruffiness.patch
>   Fix some PPC64 compile warnings
> 
> sym-do-160.patch
>   make the SYM driver do 160 MB/sec
> 
> kgdb.patch
> 
> nfsd-disable-softirq.patch
>   Fix race in svcsock.c in 2.5.61
> 
> report-lost-ticks.patch
>   make lost-tick detection more informative
> 
> devfs-fix.patch
> 
> ptrace-flush.patch
>   cache flushing in the ptrace code
> 
> buffer-debug.patch
>   buffer.c debugging
> 
> warn-null-wakeup.patch
> 
> ext3-truncate-ordered-pages.patch
>   ext3: explicitly free truncated pages
> 
> deadline-dispatching-fix.patch
>   deadline IO scheduler dispatching fix
> 
> nfs-unstable-pages.patch
>   "unstable" page accounting for NFS.
> 
> limit-write-latency.patch
> 
> reiserfs_file_write-5.patch
> 
> tcp-wakeups.patch
>   Use fast wakeups in TCP/IPV4
> 
> lockd-lockup-fix-2.patch
>   Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)
> 
> rcu-stats.patch
>   RCU statistics reporting
> 
> ext3-journalled-data-assertion-fix.patch
>   Remove incorrect assertion from ext3
> 
> nfs-speedup.patch
> 
> nfs-oom-fix.patch
>   nfs oom fix
> 
> sk-allocation.patch
>   Subject: Re: nfs oom
> 
> nfs-more-oom-fix.patch
> 
> nfs-sendfile.patch
>   Implement sendfile() for NFS
> 
> rpciod-atomic-allocations.patch
>   Make rcpiod use atomic allocations
> 
> linux-isp.patch
> 
> isp-update-1.patch
> 
> remove-unused-congestion-stuff.patch
>   Subject: [PATCH] remove unused congestion stuff
> 
> aic-makefile-fix.patch
>   aicasm Makefile fix
> 
> loop-hack.patch
>   loop: Fix OOM and oops
> 
> atm_dev_sem.patch
>   convert atm_dev_lock from spinlock to semaphore
> 
> flock-fix.patch
>   flock fixes for 2.5.62
> 
> sysfs-dget-fix-2.patch
> 
> irq-sharing-fix.patch
>   fix irq sharing and SA_INTERRUPT on x86
> 
> as-iosched.patch
>   anticipatory I/O scheduler
> 
> as-comments-and-tweaks.patch
>   antsched: commentary and
> 
> as-hz-1000-fix.patch
>   Fix anticipatory scheduler for HZ=100
> 
> as-tidy-up-rename.patch
>   tidy up AS rename
> 
> anticipation_is_killing_me.patch
> 
> as-update-1.patch
>   AS update
> 
> as-break-anticipation-on-write.patch
>   AS break on write
> 
> as-break-if-readahead.patch
>   detect overlapping reads and writes
> 
> as-fix-hughs-problem.patch
>   Add a pointer to the queue into struct as_data
> 
> as-cleanup.patch
>   anticipatory scheduler cleanups
> 
> as-start-stop-anticipation-helpers.patch
>   AS: add anticipation stop/start helper functions
> 
> as-cleanup-2.patch
>   Subject: [PATCH] some cleanups 2
> 
> as-cleanup-3.patch
>   AS: more cleanups
> 
> as-cleanup-3-write-latency-fix.patch
>   Fix as-cleanup-3
> 
> as-handle-exitted-tasks.patch
> 
> as-handle-exitted-tasks-fix.patch
>   fix for as IO contexts
> 
> as-no-plugging-and-cleanups.patch
>   AS no plugging + cleanups
> 
> as-remove-debug.patch
> 
> as-track-queued-reads.patch
>   AS: track queued reads
> 
> as-accounting-fix.patch
>   AS: track queued reads (fix)
> 
> as-nr_reads-fix.patch
>   AS: read accounting fix
> 
> as-tuning.patch
>   AS: tuning
> 
> as-disable-nr_reads.patch
>   AS: disable per-process in-flight read logic
> 
> readahead-shrink-to-zero.patch
>   Allow VFS readahead to fall to zero
> 
> cfq-2.patch
>   CFQ scheduler, #2
> 
> smalldevfs.patch
>   smalldevfs
> 
> objrmap-2.5.62-5.patch
>   object-based rmap
> 
> objrmap-X-fix.patch
>   objrmap fix for X
> 
> oprofile-up-fix.patch
>   fix oprofile on UP (lockless sync)
> 
> update_atime-speedup.patch
>   speed up update_atime()
> 
> ext2-update_atime_speedup.patch
>   Use one_sec_update_atime in ext2
> 
> ext3-update_atime_speedup.patch
>   Use one_sec_update_atime in ext2
> 
> UPDATE_ATIME-to-update_atime.patch
>   Rename UPDATE_ATIME to update_atime
> 
> per-cpu-disk-stats.patch
>   Make diskstats per-cpu using kmalloc_percpu
> 
> presto_get_sb-fix.patch
>   fix presto_get_sb() return value and oops.
> 
> on_each_cpu.patch
>   fix preempt-issues with smp_call_function()
> 
> on_each_cpu-ldt-cleanup.patch
> 
> notsc-panic.patch
>   Don't panic if TSC is enabled and notsc is used
> 
> alloc_pages_cleanup.patch
>   clean up redundant code for alloc_pages
> 
> ext2-handle-htree-flag.patch
>   ext2: clear ext3 htree flag on directories
> 
> sched-a3.patch
>   "HT scheduler", sched-2.5.63-A3
> 
> mpparse-typo-fix.patch
>   fix typo in arch/i386/kernel/mpparse.c in printk
> 
> i386-no-swap-fix.patch
>   allow CONFIG_SWAP=n for i386
> 
> remove-hugetlb_key.patch
>   remove dead hugetlb_key forward decl
> 
> hugetlbpage-doc-update.patch
>   hugetlbpage documentation update
> 
> hugetlb-valid-page-ranges.patch
>   hugetlb: fix MAP_FIXED handling
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.5.63-mm1
  2003-02-28 12:16 ` 2.5.63-mm1 steven roemen
@ 2003-02-28 12:24   ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28 12:24 UTC (permalink / raw)
  To: steven roemen; +Cc: linux-kernel, linux-mm

steven roemen <sdroemen1@cox.net> wrote:
>
> 
> the kernel oopses when i2c is compiled into the kernel with -mm1, and
> -mm1 with dave mccraken's patch.  

Please send a full report on this to the mailing list.

> also when i remove i2c from the kernel and boot into it with AS as the
> elevator, the load (via top) starts at 2.00, yet the processors aren't
> loaded very much at all.  is this a known issue(this is the first -mm
> kernel i've run)?

Run `ps aux' when the system is idle and see if there are any tasks
in "D" state.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-28  0:06         ` Andrew Morton
  2003-02-28  0:28           ` Con Kolivas
@ 2003-02-28 12:48           ` Hugh Dickins
  2003-02-28 15:56             ` Dave McCracken
  1 sibling, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2003-02-28 12:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, dmccr, linux-kernel, linux-mm

On Thu, 27 Feb 2003, Andrew Morton wrote:
> 
> No, it is still wrong.  Mapped cannot exceed MemTotal.

It needs this in addition to Dave's patch from yesterday:

--- 2.5.63-objfix-1/mm/rmap.c	Thu Feb 27 23:37:28 2003
+++ 2.5.63-objfix-2/mm/rmap.c	Fri Feb 28 12:33:58 2003
@@ -349,7 +349,8 @@
 			BUG();
 		if (atomic_read(&page->pte.mapcount) == 0)
 			BUG();
-		atomic_dec(&page->pte.mapcount);
+		if (atomic_dec_and_test(&page->pte.mapcount))
+			dec_page_state(nr_mapped);
 		return;
 	}
 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Rising io_load results Re: 2.5.63-mm1
  2003-02-28 12:48           ` Hugh Dickins
@ 2003-02-28 15:56             ` Dave McCracken
  0 siblings, 0 replies; 24+ messages in thread
From: Dave McCracken @ 2003-02-28 15:56 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: Con Kolivas, linux-kernel, linux-mm


--On Friday, February 28, 2003 12:48:06 +0000 Hugh Dickins
<hugh@veritas.com> wrote:

> On Thu, 27 Feb 2003, Andrew Morton wrote:
>> 
>> No, it is still wrong.  Mapped cannot exceed MemTotal.
> 
> It needs this in addition to Dave's patch from yesterday:
> 
> --- 2.5.63-objfix-1/mm/rmap.c	Thu Feb 27 23:37:28 2003
> +++ 2.5.63-objfix-2/mm/rmap.c	Fri Feb 28 12:33:58 2003
> @@ -349,7 +349,8 @@
>  			BUG();
>  		if (atomic_read(&page->pte.mapcount) == 0)
>  			BUG();
> -		atomic_dec(&page->pte.mapcount);
> +		if (atomic_dec_and_test(&page->pte.mapcount))
> +			dec_page_state(nr_mapped);
>  		return;
>  	}

D'oh.  I should have seen that one.  Thanks.

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] tiobench on UP and ptg-D3-mm1
       [not found]   ` <200302282227.56311.tomlins@cam.org>
@ 2003-03-01 15:04     ` Ed Tomlinson
  0 siblings, 0 replies; 24+ messages in thread
From: Ed Tomlinson @ 2003-03-01 15:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew,

You mentioned problems with tiobench on UP.  This message was partly 
composed with this script running:

for dir in /pool{a,e,g}/tio
do
        (       cd $dir
                tiobench --size 128 --threads 16 > /dev/null 2>&1 &
        )
done
  
response was slow but usable.  Its actually a fairly good example showing 
what the ptg patch can do.  Here is a "vmstat -a 5" of the run.

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
 3  0  72188 255068  39132 180080    0    0     0    67 1052   621  4  3 93  0
 2  0  72188 254868  39236 180144    0    0     0    66 1062   639  5  3 92  0
39  0  72188 250604  39324 183796    0    0     2    82 1201  1163 27 13 60  0
49  0  72188 250196  39416 183924    0    0     0    65 1200  1053 92  8  0  0
52  3  72188 129660 159300 184104    0    0     0 12670 1228   782 27 73  0  0
52  2  72188  10364 275964 185144    0    0     0 16706 1304   970 16 84  0  0
58  6  74248   5292 275584 190256   13  394    21 19348 1401   925 15 85  0  0
22 29  74248   2248 277480 191160   36    0    65 19530 1543  1249 37 55  0  8
31 27  74248   2284 277472 191124    0    0    19  8378 1277   686 87 13  0  0
11 34  74248   4308 275360 191124    0    0     6  5119 1576  1174 54 19  0 28
 3 51  74248   4164 275036 191544    0    0    51  1805 1603  1005 44 11  0 45
 1 49  74248   3524 274308 192388    0    0   133  1690 1613  1694 21  9  0 69
 2 38  74248   3484 274664 193212    0    0    56  1755 1485   831  7  6  0 87
19 11  74248   3300 273792 194276    0    0   204  1741 1502   955 24  7  0 69
16  7  74248   3584 216036 252272    0    0 10351  1333 1716  1456 32 33  0 35
14 25  74248 128772 147112 196100   39    0  5041   376 1413  1565 70 29  0  1
 3 16  74248  57316 156176 259012    0    0 14367     0 1698  1393 51 49  0  0
 4  4  74248 150240  83964 238672    0    0  7649   722 1396  1096 66 34  0  0
 9  3  74248 142896  85368 244680    0    0  1466    12 1286  1053 90 10  0  0
 8  0  74248 220180  33184 219640   82    0   917    77 1263   985 86 14  0  0
 2  0  74248 270764   9512 193160    0    0     0    58 1057   665 69  6 25  0
 4  0  74248 270788   9576 193220    0    0     0    60 1056   720 15  4 81  0

This is using cfq on a k6-III 400 with 512m all impacted fs(es) are reiserfs.  

What this does is detect thread groups (where they are defined as processes sharing 
both mm and FDs or processes tagged as members of a kernel thread group) and reduces 
the timeslices given to these processes when to many processes are active in a 
group.  This allows other tasks to get cpu IF there is a demand.  There is also a 
governor set for user tasks - in this case it will not affect the test.

The patch has been tested on UP and compiles for SMP.  It should be OK on SMP.  On
numa boxes it would really benefit from a dynamic way to alloc per node storage.  
The ptgroup->active[] and user->active[] arrays should really point to atomic_t(s) 
in per node storage.

I have been using variants of this patch since the beginning of Jan - it lets me run
a java freenet server, which is heavily threaded, without it impacting my interactive
response much.

Ed Tomlinson

PS. patch applies to 2.5.63-mm1, with a little twiddling it should also be
applicable to .63 (sched.c) or .63bk (sched.c, fork.c)

--------------- 
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1026  -> 1.1028 
#	include/linux/sched.h	1.139   -> 1.140  
#	       kernel/fork.c	1.111   -> 1.113  
#	       kernel/user.c	1.8     -> 1.9    
#	      kernel/sched.c	1.164   -> 1.165  
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/02/28	ed@oscar.et.ca	1.1027
# Add user and thread group governors to prevent either from monoplizing
# the system.  The governors work by limiting the sum of the timeslices
# of active tasks in a group to <n> timeslices.  The defaults set <n> to
# 1.5 for thread groups and to 30 for user tasks.  For numa systems the
# governors are per node.
# --------------------------------------------
#
diff -Nru a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h	Fri Feb 28 07:33:49 2003
+++ b/include/linux/sched.h	Fri Feb 28 07:33:49 2003
@@ -195,6 +195,11 @@
 
 #include <linux/aio.h>
 
+struct ptg_struct {		/* pseudo thread groups */
+	atomic_t active[MAX_NUMNODES];
+        atomic_t count;         /* number of refs */
+};
+
 struct mm_struct {
 	struct vm_area_struct * mmap;		/* list of VMAs */
 	struct rb_root mm_rb;
@@ -295,6 +300,7 @@
 struct user_struct {
 	atomic_t __count;	/* reference count */
 	atomic_t processes;	/* How many processes does this user have? */
+	atomic_t active[MAX_NUMNODES];
 	atomic_t files;		/* How many open files does this user have? */
 
 	/* Hash table maintenance information */
@@ -361,6 +367,8 @@
 	struct list_head ptrace_list;
 
 	struct mm_struct *mm, *active_mm;
+	struct ptg_struct * ptgroup;		/* pseudo thread group for this task */
+	atomic_t *governor;			/* the atomic_t that governs this task */
 
 /* task state */
 	struct linux_binfmt *binfmt;
diff -Nru a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c	Fri Feb 28 07:33:49 2003
+++ b/kernel/fork.c	Fri Feb 28 07:33:49 2003
@@ -72,12 +72,24 @@
 	return total;
 }
 
+void free_ptgroup(struct task_struct *tsk)
+{
+	if (tsk->ptgroup && atomic_sub_and_test(1,&tsk->ptgroup->count)) {
+                kfree(tsk->ptgroup);
+                tsk->ptgroup = NULL;
+                tsk->governor = &tsk->user->active[cpu_to_node(task_cpu(tsk))];
+                if (tsk == current)
+                        atomic_inc(tsk->governor);
+        }
+}
+
 void __put_task_struct(struct task_struct *tsk)
 {
 	WARN_ON(!(tsk->state & (TASK_DEAD | TASK_ZOMBIE)));
 	WARN_ON(atomic_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	free_ptgroup(tsk);
 	security_task_free(tsk);
 	free_uid(tsk->user);
 
@@ -465,6 +477,7 @@
 
 	tsk->mm = NULL;
 	tsk->active_mm = NULL;
+	tsk->ptgroup = NULL;
 
 	/*
 	 * Are we cloning a kernel thread?
@@ -730,6 +743,32 @@
 	p->flags = new_flags;
 }
 
+static inline int setup_governor(unsigned long clone_flags, struct task_struct *p)
+{
+	if ( ((clone_flags & CLONE_VM) && (clone_flags & CLONE_FILES)) ||
+	     (clone_flags & CLONE_THREAD)) {
+		if (current->ptgroup)
+			atomic_inc(&current->ptgroup->count);
+		else {
+			int i;
+			current->ptgroup = kmalloc(sizeof(struct ptg_struct), GFP_ATOMIC);
+			if (!current->ptgroup)
+				return 1;
+			/* printk(KERN_INFO "ptgroup - pid %u\n",current->pid); */
+			atomic_set(&current->ptgroup->count,2);
+			for(i=0; i < MAX_NUMNODES; i++)
+				atomic_set(&current->ptgroup->active[i], 0);
+			atomic_set(&current->ptgroup->active[numa_node_id()], 1);
+			atomic_dec(current->governor);
+			current->governor = &current->ptgroup->active[numa_node_id()];
+		}
+		p->ptgroup = current->ptgroup;
+		p->governor = &p->ptgroup->active[numa_node_id()];
+	} else
+		p->governor = &p->user->active[numa_node_id()];
+	return 0;
+}
+
 asmlinkage int sys_set_tid_address(int *tidptr)
 {
 	current->clear_child_tid = tidptr;
@@ -872,6 +911,12 @@
 		goto bad_fork_cleanup_mm;
 	retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
 	if (retval)
+		goto bad_fork_cleanup_namespace;
+	/*
+	 * Setup the governor pointer for the new process, allocating a new ptg as
+	 * required if the process is a thread. 
+	 */
+	if (setup_governor(clone_flags, p))
 		goto bad_fork_cleanup_namespace;
 
 	if (clone_flags & CLONE_CHILD_SETTID)
diff -Nru a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c	Fri Feb 28 07:33:49 2003
+++ b/kernel/sched.c	Fri Feb 28 07:33:49 2003
@@ -69,6 +69,9 @@
 #define STARVATION_LIMIT	(2*HZ)
 #define AGRESSIVE_IDLE_STEAL	1
 #define NODE_THRESHOLD          125
+#define THREAD_GOVERNOR		15	/* allow threads groups 1.5 full timeslices */
+#define USER_GOVERNOR		300	/* allow user 30 full timeslices */
+
 
 /*
  * If a task is 'interactive' then we reinsert it in the active
@@ -124,7 +127,26 @@
 
 static inline unsigned int task_timeslice(task_t *p)
 {
-	return BASE_TIMESLICE(p);
+	int slice = BASE_TIMESLICE(p);
+	int threads = atomic_read(p->governor) * 10;
+	int govern = threads;
+	if (p->user->uid)
+		govern = (p->ptgroup) ? THREAD_GOVERNOR : USER_GOVERNOR;
+	if (threads > govern) {
+		slice = (slice * govern) / threads;
+		slice = (slice > MIN_TIMESLICE) ? slice : MIN_TIMESLICE;
+	}
+#if 1
+	{
+		static int next;
+		if (time_after(jiffies, next)) {
+			printk(KERN_INFO "uid %d pid %d nod %d ptg %x gov %x threads %d lim %d slice %d\n",
+			  p->uid, p->pid, numa_node_id(), p->ptgroup, p->governor, threads/10, govern, slice);
+			next = jiffies + HZ*300;
+		}
+	}
+#endif
+	return slice;
 }
 
 /*
@@ -251,16 +273,18 @@
 	rq->node_nr_running = &node_nr_running[0];
 }
 
-static inline void nr_running_inc(runqueue_t *rq)
+static inline void nr_running_inc(task_t *p, runqueue_t *rq)
 {
 	atomic_inc(rq->node_nr_running);
 	rq->nr_running++;
+	atomic_inc(p->governor);
 }
 
-static inline void nr_running_dec(runqueue_t *rq)
+static inline void nr_running_dec(task_t *p, runqueue_t *rq)
 {
 	atomic_dec(rq->node_nr_running);
 	rq->nr_running--;
+	atomic_dec(p->governor);
 }
 
 __init void node_nr_running_init(void)
@@ -274,8 +298,8 @@
 #else /* !CONFIG_NUMA */
 
 # define nr_running_init(rq)   do { } while (0)
-# define nr_running_inc(rq)    do { (rq)->nr_running++; } while (0)
-# define nr_running_dec(rq)    do { (rq)->nr_running--; } while (0)
+# define nr_running_inc(p, rq)    do { (rq)->nr_running++; atomic_inc((p)->governor); } while (0)
+# define nr_running_dec(p, rq)    do { (rq)->nr_running--; atomic_dec((p)->governor); } while (0)
 
 #endif /* CONFIG_NUMA */
 
@@ -380,7 +404,7 @@
 static inline void __activate_task(task_t *p, runqueue_t *rq)
 {
 	enqueue_task(p, rq->active);
-	nr_running_inc(rq);
+	nr_running_inc(p, rq);
 }
 
 static inline void activate_task(task_t *p, runqueue_t *rq)
@@ -408,7 +432,7 @@
  */
 static inline void deactivate_task(struct task_struct *p, runqueue_t *rq)
 {
-	nr_running_dec(rq);
+	nr_running_dec(p, rq);
 	if (p->state == TASK_UNINTERRUPTIBLE)
 		rq->nr_uninterruptible++;
 	dequeue_task(p, p->array);
@@ -1068,9 +1092,15 @@
 static inline void pull_task(runqueue_t *src_rq, prio_array_t *src_array, task_t *p, runqueue_t *this_rq, int this_cpu)
 {
 	dequeue_task(p, src_array);
-	nr_running_dec(src_rq);
+	nr_running_dec(p, src_rq);
 	set_task_cpu(p, this_cpu);
-	nr_running_inc(this_rq);
+#ifdef CONFIG_NUMA
+        if (p->ptgroup)
+                p->governor = &p->ptgroup->active[cpu_to_node(this_cpu)];
+        else
+                p->governor = &p->user->active[cpu_to_node(this_cpu)];
+#endif
+	nr_running_inc(p, this_rq);
 	enqueue_task(p, this_rq->active);
 	wake_up_cpu(this_rq, this_cpu, p);
 }
@@ -2729,6 +2759,8 @@
 	cpu_idle_ptr(smp_processor_id()) = current;
 
 	set_task_cpu(current, smp_processor_id());
+        current->governor = &current->user->active[numa_node_id()];
+	atomic_inc(current->governor);
 	wake_up_forked_process(current);
 
 	init_timers();
diff -Nru a/kernel/user.c b/kernel/user.c
--- a/kernel/user.c	Fri Feb 28 07:33:49 2003
+++ b/kernel/user.c	Fri Feb 28 07:33:49 2003
@@ -30,6 +30,7 @@
 struct user_struct root_user = {
 	.__count	= ATOMIC_INIT(1),
 	.processes	= ATOMIC_INIT(1),
+	.active		= {[0 ...MAX_NUMNODES-1] = ATOMIC_INIT(0)},
 	.files		= ATOMIC_INIT(0)
 };
 
@@ -89,6 +90,7 @@
 
 	if (!up) {
 		struct user_struct *new;
+		int i;
 
 		new = kmem_cache_alloc(uid_cachep, SLAB_KERNEL);
 		if (!new)
@@ -96,6 +98,8 @@
 		new->uid = uid;
 		atomic_set(&new->__count, 1);
 		atomic_set(&new->processes, 0);
+		for(i=0; i < MAX_NUMNODES; i++)
+			atomic_set(&new->active[i], 0);
 		atomic_set(&new->files, 0);
 
 		/*
@@ -130,6 +134,11 @@
 	atomic_inc(&new_user->processes);
 	atomic_dec(&old_user->processes);
 	current->user = new_user;
+	if (!current->ptgroup) {
+		atomic_dec(current->governor);
+		current->governor = &current->user->active[numa_node_id()];
+		atomic_inc(current->governor);
+	}
 	free_uid(old_user);
 }
 





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2.5.63] Teach page_mapped about the anon flag
  2003-02-27 22:24       ` Andrew Morton
@ 2003-03-03 21:06         ` Dave McCracken
  2003-03-03 21:12           ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Dave McCracken @ 2003-03-03 21:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]


--On Thursday, February 27, 2003 14:24:50 -0800 Andrew Morton
<akpm@digeo.com> wrote:

> I'm just looking at page_mapped().  It is now implicitly assuming that the
> architecture's representation of a zero-count atomic_t is all-bits-zero.
> 
> This is not true on sparc32 if some other CPU is in the middle of an
> atomic_foo() against that counter.  Maybe the assumption is false on other
> architectures too.
> 
> So page_mapped() really should be performing an atomic_read() if that is
> appropriate to the particular page.  I guess this involves testing
> page->mapping.  Which is stable only when the page is locked or
> mapping->page_lock is held.
> 
> It appears that all page_mapped() callers are inside lock_page() at
> present, so a quick audit and addition of a comment would be appropriate
> there please.

I'm not at all confident that page_mapped() is adequately protected.
Here's a patch that explicitly handles the atomic_t case.

Dave McCracken

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059

[-- Attachment #2: objfix-2.5.63-1.diff --]
[-- Type: text/plain, Size: 738 bytes --]

--- 2.5.63-objrmap/include/linux/mm.h	2003-02-27 15:58:34.000000000 -0600
+++ 2.5.63-objfix/include/linux/mm.h	2003-02-28 14:21:56.000000000 -0600
@@ -363,10 +363,16 @@
  * Return true if this page is mapped into pagetables.  Subtle: test pte.direct
  * rather than pte.chain.  Because sometimes pte.direct is 64-bit, and .chain
  * is only 32-bit.
+ *
+ * If the page is an object-mapped page, we need to do an atomic read of
+ * pte.mapcount instead, since atomic values may not be zero in the upper bits.
  */
 static inline int page_mapped(struct page *page)
 {
-	return page->pte.direct != 0;
+	if (PageAnon(page))
+		return page->pte.direct != 0;
+	else
+		return atomic_read(&page->pte.mapcount) != 0;
 }
 
 /*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
  2003-03-03 21:06         ` [PATCH 2.5.63] Teach page_mapped about the anon flag Dave McCracken
@ 2003-03-03 21:12           ` Andrew Morton
  2003-03-03 21:24             ` Dave McCracken
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-03-03 21:12 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-kernel, linux-mm

Dave McCracken <dmccr@us.ibm.com> wrote:
>
> 
> --On Thursday, February 27, 2003 14:24:50 -0800 Andrew Morton
> <akpm@digeo.com> wrote:
> 
> > I'm just looking at page_mapped().  It is now implicitly assuming that the
> > architecture's representation of a zero-count atomic_t is all-bits-zero.
> > 
> > This is not true on sparc32 if some other CPU is in the middle of an
> > atomic_foo() against that counter.  Maybe the assumption is false on other
> > architectures too.
> > 
> > So page_mapped() really should be performing an atomic_read() if that is
> > appropriate to the particular page.  I guess this involves testing
> > page->mapping.  Which is stable only when the page is locked or
> > mapping->page_lock is held.
> > 
> > It appears that all page_mapped() callers are inside lock_page() at
> > present, so a quick audit and addition of a comment would be appropriate
> > there please.
> 
> I'm not at all confident that page_mapped() is adequately protected.

It is.  All callers which need to be 100% accurate are under
pte_chain_lock().

> Here's a patch that explicitly handles the atomic_t case.

OK..  But it increases dependency on PageAnon.  Wasn't the plan to remove
that at some time?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
  2003-03-03 21:12           ` Andrew Morton
@ 2003-03-03 21:24             ` Dave McCracken
  2003-03-03 21:35               ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Dave McCracken @ 2003-03-03 21:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm


--On Monday, March 03, 2003 13:12:10 -0800 Andrew Morton <akpm@digeo.com>
wrote:

> It is.  All callers which need to be 100% accurate are under
> pte_chain_lock().

Hmm, good point.  Some places may not need perfect accuracy.  Also, if it
gives a false positive it means someone else is doing an atomic op on it,
so it's likely to be in transition to/from true anyway.

Ok, you've convinced me.  Please ignore the patch.  I'll hang onto it in
case we get proved wrong at some point.

Dave

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
  2003-03-03 21:24             ` Dave McCracken
@ 2003-03-03 21:35               ` Andrew Morton
  2003-03-03 21:52                 ` Dave McCracken
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-03-03 21:35 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-kernel, linux-mm

Dave McCracken <dmccr@us.ibm.com> wrote:
>
> 
> --On Monday, March 03, 2003 13:12:10 -0800 Andrew Morton <akpm@digeo.com>
> wrote:
> 
> > It is.  All callers which need to be 100% accurate are under
> > pte_chain_lock().
> 
> Hmm, good point.  Some places may not need perfect accuracy.  Also, if it
> gives a false positive it means someone else is doing an atomic op on it,
> so it's likely to be in transition to/from true anyway.
> 
> Ok, you've convinced me.  Please ignore the patch.  I'll hang onto it in
> case we get proved wrong at some point.

We do need a patch I think.  page_mapped() is still assuming that an
all-bits-zero atomic_t corresponds to a zero-value atomic_t.

This does appear to be true for all supported architectures, but it's a bit
grubby.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
  2003-03-03 21:35               ` Andrew Morton
@ 2003-03-03 21:52                 ` Dave McCracken
  2003-03-03 22:15                   ` Andrew Morton
  0 siblings, 1 reply; 24+ messages in thread
From: Dave McCracken @ 2003-03-03 21:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm


--On Monday, March 03, 2003 13:35:39 -0800 Andrew Morton <akpm@digeo.com>
wrote:

> We do need a patch I think.  page_mapped() is still assuming that an
> all-bits-zero atomic_t corresponds to a zero-value atomic_t.
> 
> This does appear to be true for all supported architectures, but it's a
> bit grubby.

If that's ever not true then we need extra code to initialize/rezero that
field, since we assume it's zero on alloc, and the pte_chain code also
assumes it's zero for a new page.

Dave

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
  2003-03-03 21:52                 ` Dave McCracken
@ 2003-03-03 22:15                   ` Andrew Morton
  2003-03-04 18:32                     ` [PATCH 2.5.63] Make objrmap mapcount non-atomic Dave McCracken
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-03-03 22:15 UTC (permalink / raw)
  To: Dave McCracken; +Cc: linux-kernel, linux-mm

Dave McCracken <dmccr@us.ibm.com> wrote:
>
> 
> --On Monday, March 03, 2003 13:35:39 -0800 Andrew Morton <akpm@digeo.com>
> wrote:
> 
> > We do need a patch I think.  page_mapped() is still assuming that an
> > all-bits-zero atomic_t corresponds to a zero-value atomic_t.
> > 
> > This does appear to be true for all supported architectures, but it's a
> > bit grubby.
> 
> If that's ever not true then we need extra code to initialize/rezero that
> field, since we assume it's zero on alloc, and the pte_chain code also
> assumes it's zero for a new page.

Well why not make mapcount an "int" and move the places where it is modified
inside pte_chain_lock()?

That does not increase the number of atomic operations, and it makes me stop
wondering if this:

                if (atomic_read(&page->pte.mapcount) == 0)
                        inc_page_state(nr_mapped);

is racy ;)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2.5.63] Make objrmap mapcount non-atomic
  2003-03-03 22:15                   ` Andrew Morton
@ 2003-03-04 18:32                     ` Dave McCracken
  0 siblings, 0 replies; 24+ messages in thread
From: Dave McCracken @ 2003-03-04 18:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]


--On Monday, March 03, 2003 14:15:18 -0800 Andrew Morton <akpm@digeo.com>
wrote:

> Well why not make mapcount an "int" and move the places where it is
> modified inside pte_chain_lock()?
> 
> That does not increase the number of atomic operations, and it makes me
> stop wondering if this:
> 
>                 if (atomic_read(&page->pte.mapcount) == 0)
>                         inc_page_state(nr_mapped);
> 
> is racy ;)

That would be entirely too easy a solution :)

You're entirely right, of course.  Here's the patch that makes it an int
instead of atomic, with the appropriate locking.

Dave

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059

[-- Attachment #2: objfix-2.5.63-2.diff --]
[-- Type: text/plain, Size: 2270 bytes --]

--- 2.5.63-objrmap/./include/linux/mm.h	2003-02-27 15:58:34.000000000 -0600
+++ 2.5.63-objfix/./include/linux/mm.h	2003-03-03 16:26:21.000000000 -0600
@@ -172,7 +172,7 @@
 		struct pte_chain *chain;/* Reverse pte mapping pointer.
 					 * protected by PG_chainlock */
 		pte_addr_t direct;
-		atomic_t mapcount;
+		int mapcount;
 	} pte;
 	unsigned long private;		/* mapping-private opaque data */
 
--- 2.5.63-objrmap/./mm/rmap.c	2003-02-28 14:19:10.000000000 -0600
+++ 2.5.63-objfix/./mm/rmap.c	2003-03-03 20:08:43.000000000 -0600
@@ -144,7 +144,7 @@
 	struct vm_area_struct *vma;
 	int referenced = 0;
 
-	if (atomic_read(&page->pte.mapcount) == 0)
+	if (!page->pte.mapcount)
 		return 0;
 
 	if (!mapping)
@@ -243,19 +243,20 @@
 	if (!pfn_valid(page_to_pfn(page)) || PageReserved(page))
 		return pte_chain;
 
+	pte_chain_lock(page);
+
 	if (!PageAnon(page)) {
 		if (!page->mapping)
 			BUG();
 		if (PageSwapCache(page))
 			BUG();
-		if (atomic_read(&page->pte.mapcount) == 0)
+		if (!page->pte.mapcount)
 			inc_page_state(nr_mapped);
-		atomic_inc(&page->pte.mapcount);
+		page->pte.mapcount++;
+		pte_chain_unlock(page);
 		return pte_chain;
 	}
 
-	pte_chain_lock(page);
-
 #ifdef DEBUG_RMAP
 	/*
 	 * This stuff needs help to get up to highmem speed.
@@ -342,20 +343,22 @@
 	if (!page_mapped(page))
 		return;		/* remap_page_range() from a driver? */
 
+	pte_chain_lock(page);
+
 	if (!PageAnon(page)) {
 		if (!page->mapping)
 			BUG();
 		if (PageSwapCache(page))
 			BUG();
-		if (atomic_read(&page->pte.mapcount) == 0)
+		if (!page->pte.mapcount)
 			BUG();
-		if (atomic_dec_and_test(&page->pte.mapcount))
+		page->pte.mapcount--;
+		if (!page->pte.mapcount)
 			dec_page_state(nr_mapped);
+		pte_chain_unlock(page);
 		return;
 	}
 
-	pte_chain_lock(page);
-
 	if (PageDirect(page)) {
 		if (page->pte.direct == pte_paddr) {
 			page->pte.direct = 0;
@@ -471,11 +474,11 @@
 	if (pte_dirty(pteval))
 		set_page_dirty(page);
 
-	if (atomic_read(&page->pte.mapcount) == 0)
+	if (!page->pte.mapcount)
 		BUG();
 
 	mm->rss--;
-	atomic_dec(&page->pte.mapcount);
+	page->pte.mapcount--;
 	page_cache_release(page);
 
 out_unmap:
@@ -516,7 +519,7 @@
 			goto out;
 	}
 
-	if (atomic_read(&page->pte.mapcount) != 0)
+	if (page->pte.mapcount)
 		BUG();
 
 out:

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-03-04 18:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-27 10:59 2.5.63-mm1 Andrew Morton
2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
2003-02-27 21:44   ` Andrew Morton
2003-02-27 22:01     ` Dave McCracken
2003-02-27 22:24       ` Andrew Morton
2003-03-03 21:06         ` [PATCH 2.5.63] Teach page_mapped about the anon flag Dave McCracken
2003-03-03 21:12           ` Andrew Morton
2003-03-03 21:24             ` Dave McCracken
2003-03-03 21:35               ` Andrew Morton
2003-03-03 21:52                 ` Dave McCracken
2003-03-03 22:15                   ` Andrew Morton
2003-03-04 18:32                     ` [PATCH 2.5.63] Make objrmap mapcount non-atomic Dave McCracken
2003-02-27 23:56       ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
2003-02-28  0:06         ` Andrew Morton
2003-02-28  0:28           ` Con Kolivas
2003-02-28  7:46             ` Duncan Sands
2003-02-28  8:06               ` Andrew Morton
2003-02-28 12:48           ` Hugh Dickins
2003-02-28 15:56             ` Dave McCracken
2003-02-28  0:17 ` 2.5.63-mm1 Ed Tomlinson
2003-02-28  0:46   ` 2.5.63-mm1 Andrew Morton
2003-02-28 12:16 ` 2.5.63-mm1 steven roemen
2003-02-28 12:24   ` 2.5.63-mm1 Andrew Morton
     [not found] ` <3E5F7DAD.2080306@cyberone.com.au>
     [not found]   ` <200302282227.56311.tomlins@cam.org>
2003-03-01 15:04     ` [PATCH] tiobench on UP and ptg-D3-mm1 Ed Tomlinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).