* 2.5.63-mm1
@ 2003-02-27 10:59 Andrew Morton
2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
` (3 more replies)
0 siblings, 4 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-27 10:59 UTC (permalink / raw)
To: linux-kernel, linux-mm
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm1/
. Tons of changes to the anticipatory scheduler. It may not be working
very well at present. Please use "elevator=deadline" if it causes
problems.
. Updated smalldevfs patch.
. A fix for the VMA-based reverse mapping patch.
. Added Ingo's latest CPU scheduler update.
. Lots of random fixes.
linus.patch
Latest from Linus
-initial-jiffies.patch
-user-times-jiffies-wrap-fix.patch
-put_page-speedup.patch
-slab-batchcount-limit-fix.patch
-crc32-speedup-2.patch
-flush-tlb-all-2.patch
-linux-2.5.62-early_ioremap_A0.patch
-linux-2.5.62-x440disco_A0.patch
-use-find_get_page.patch
-irda-interruptible-sleep.patch
-dget-BUG.patch
-disk-accounting-fix.patch
-hugh-inode-pruning-race-fix.patch
-kill-bogus-wakeup-messge.patch
-dont-sync-with-stopped-pdflush.patch
-irq-balance-disable-fix.patch
-oom-killer-dont-spin-on-same-task.patch
-add-missing-global_flush_tlb-calls.patch
-ext3-O_SYNC-speedup.patch
-remove-MAX_BLKDEV-from-genhd.patch
Merged
+separate.patch
My contribution to the spelling bee.
+rpc_rmdir-fix.patch
Fix the NFS oops
+ppc64-scruffiness.patch
Fix some warnings
-reiserfs_file_write-4.patch
+reiserfs_file_write-5.patch
Updated (I don't think it changed)
+limit-write-latency.patch
Fix potential source of write-vs-write latency in VFS
+lockd-lockup-fix-2.patch
Updated patch from Neil for an NFS server deadlock
+loop-hack.patch
Fix an OOM and oops in loop
+flock-fix.patch
File locking fix from Matthew
+sysfs-dget-fix-2.patch
Fix a sysfs dentry race (this isn't right)
+irq-sharing-fix.patch
Fix SA_INTERRUPT for shared interrupts
+anticipation_is_killing_me.patch
+as-fix-hughs-problem.patch
+as-cleanup.patch
+as-start-stop-anticipation-helpers.patch
+as-cleanup-2.patch
+as-cleanup-3.patch
+as-cleanup-3-write-latency-fix.patch
+as-handle-exitted-tasks.patch
+as-handle-exitted-tasks-fix.patch
+as-no-plugging-and-cleanups.patch
+as-remove-debug.patch
+as-track-queued-reads.patch
+as-accounting-fix.patch
+as-nr_reads-fix.patch
+as-tuning.patch
+as-disable-nr_reads.patch
Anticipatory scheduler work
smalldevfs.patch
Updated
-smalldevfs-dcache_rcu-fix.patch
Folded into smalldevfs.patch
+objrmap-X-fix.patch
Fix VMA-based reverse mapping
+per-cpu-disk-stats.patch
Use per-cpu data for disk accounting
+presto_get_sb-fix.patch
Fix an intermezzo oops
+on_each_cpu.patch
+on_each_cpu-ldt-cleanup.patch
preempt-safety for smp_call_function()
+notsc-panic.patch
x86 TSC cleanup
+alloc_pages_cleanup.patch
Code consolidation
+ext2-handle-htree-flag.patch
ext2 htree back-compatibility
+sched-a3.patch
CPU scheduler update
+mpparse-typo-fix.patch
Fix a printk bug
+i386-no-swap-fix.patch
Fix ia32 CONFIG_SWAP=n
+remove-hugetlb_key.patch
+hugetlbpage-doc-update.patch
+hugetlb-valid-page-ranges.patch
Hugetlbpage work
All 88 patches:
linus.patch
Latest from Linus
separate.patch
mm.patch
add -mmN to EXTRAVERSION
rpc_rmdir-fix.patch
Fix nfs oops during mount
ppc64-reloc_hide.patch
ppc64-pci-patch.patch
Subject: pci patch
ppc64-e100-fix.patch
fix e100 for big-endian machines
ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio
ppc64-64-bit-exec-fix.patch
Subject: 64bit exec
ppc64-scruffiness.patch
Fix some PPC64 compile warnings
sym-do-160.patch
make the SYM driver do 160 MB/sec
kgdb.patch
nfsd-disable-softirq.patch
Fix race in svcsock.c in 2.5.61
report-lost-ticks.patch
make lost-tick detection more informative
devfs-fix.patch
ptrace-flush.patch
cache flushing in the ptrace code
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages
deadline-dispatching-fix.patch
deadline IO scheduler dispatching fix
nfs-unstable-pages.patch
"unstable" page accounting for NFS.
limit-write-latency.patch
reiserfs_file_write-5.patch
tcp-wakeups.patch
Use fast wakeups in TCP/IPV4
lockd-lockup-fix-2.patch
Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)
rcu-stats.patch
RCU statistics reporting
ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3
nfs-speedup.patch
nfs-oom-fix.patch
nfs oom fix
sk-allocation.patch
Subject: Re: nfs oom
nfs-more-oom-fix.patch
nfs-sendfile.patch
Implement sendfile() for NFS
rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations
linux-isp.patch
isp-update-1.patch
remove-unused-congestion-stuff.patch
Subject: [PATCH] remove unused congestion stuff
aic-makefile-fix.patch
aicasm Makefile fix
loop-hack.patch
loop: Fix OOM and oops
atm_dev_sem.patch
convert atm_dev_lock from spinlock to semaphore
flock-fix.patch
flock fixes for 2.5.62
sysfs-dget-fix-2.patch
irq-sharing-fix.patch
fix irq sharing and SA_INTERRUPT on x86
as-iosched.patch
anticipatory I/O scheduler
as-comments-and-tweaks.patch
antsched: commentary and
as-hz-1000-fix.patch
Fix anticipatory scheduler for HZ=100
as-tidy-up-rename.patch
tidy up AS rename
anticipation_is_killing_me.patch
as-update-1.patch
AS update
as-break-anticipation-on-write.patch
AS break on write
as-break-if-readahead.patch
detect overlapping reads and writes
as-fix-hughs-problem.patch
Add a pointer to the queue into struct as_data
as-cleanup.patch
anticipatory scheduler cleanups
as-start-stop-anticipation-helpers.patch
AS: add anticipation stop/start helper functions
as-cleanup-2.patch
Subject: [PATCH] some cleanups 2
as-cleanup-3.patch
AS: more cleanups
as-cleanup-3-write-latency-fix.patch
Fix as-cleanup-3
as-handle-exitted-tasks.patch
as-handle-exitted-tasks-fix.patch
fix for as IO contexts
as-no-plugging-and-cleanups.patch
AS no plugging + cleanups
as-remove-debug.patch
as-track-queued-reads.patch
AS: track queued reads
as-accounting-fix.patch
AS: track queued reads (fix)
as-nr_reads-fix.patch
AS: read accounting fix
as-tuning.patch
AS: tuning
as-disable-nr_reads.patch
AS: disable per-process in-flight read logic
readahead-shrink-to-zero.patch
Allow VFS readahead to fall to zero
cfq-2.patch
CFQ scheduler, #2
smalldevfs.patch
smalldevfs
objrmap-2.5.62-5.patch
object-based rmap
objrmap-X-fix.patch
objrmap fix for X
oprofile-up-fix.patch
fix oprofile on UP (lockless sync)
update_atime-speedup.patch
speed up update_atime()
ext2-update_atime_speedup.patch
Use one_sec_update_atime in ext2
ext3-update_atime_speedup.patch
Use one_sec_update_atime in ext2
UPDATE_ATIME-to-update_atime.patch
Rename UPDATE_ATIME to update_atime
per-cpu-disk-stats.patch
Make diskstats per-cpu using kmalloc_percpu
presto_get_sb-fix.patch
fix presto_get_sb() return value and oops.
on_each_cpu.patch
fix preempt-issues with smp_call_function()
on_each_cpu-ldt-cleanup.patch
notsc-panic.patch
Don't panic if TSC is enabled and notsc is used
alloc_pages_cleanup.patch
clean up redundant code for alloc_pages
ext2-handle-htree-flag.patch
ext2: clear ext3 htree flag on directories
sched-a3.patch
"HT scheduler", sched-2.5.63-A3
mpparse-typo-fix.patch
fix typo in arch/i386/kernel/mpparse.c in printk
i386-no-swap-fix.patch
allow CONFIG_SWAP=n for i386
remove-hugetlb_key.patch
remove dead hugetlb_key forward decl
hugetlbpage-doc-update.patch
hugetlbpage documentation update
hugetlb-valid-page-ranges.patch
hugetlb: fix MAP_FIXED handling
^ permalink raw reply [flat|nested] 24+ messages in thread
* Rising io_load results Re: 2.5.63-mm1
2003-02-27 10:59 2.5.63-mm1 Andrew Morton
@ 2003-02-27 21:22 ` Con Kolivas
2003-02-27 21:44 ` Andrew Morton
2003-02-28 0:17 ` 2.5.63-mm1 Ed Tomlinson
` (2 subsequent siblings)
3 siblings, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2003-02-27 21:22 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, linux-mm
I mentioned this previously; it's still happening.
This started some time around 2.5.62-mm3 with the io_load results on contest
benchmarking (http://contest.kolivas.org) rising with each run. It still
occurs with 2.5.63-mm1 regardless of which elevator is specified. This is the
io load result time(seconds) for 6 consecutive runs in compile time:
111
147
221
284
334
358
/proc/meminfo after 6 runs and mem flushing:
MemTotal: 256156 kB
MemFree: 238708 kB
Buffers: 2320 kB
Cached: 1552 kB
SwapCached: 1780 kB
Active: 5876 kB
Inactive: 2120 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 256156 kB
LowFree: 238708 kB
SwapTotal: 4194272 kB
SwapFree: 4192416 kB
Dirty: 28 kB
Writeback: 0 kB
Mapped: 4294923652 kB
Slab: 4872 kB
Committed_AS: 7032 kB
PageTables: 200 kB
ReverseMaps: 631
I am refraining from publishing any benchmark results with this happening. It
doesn't seem to occur on 2.5.63
Con
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
@ 2003-02-27 21:44 ` Andrew Morton
2003-02-27 22:01 ` Dave McCracken
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-02-27 21:44 UTC (permalink / raw)
To: Con Kolivas; +Cc: linux-kernel, linux-mm
Con Kolivas <kernel@kolivas.org> wrote:
>
>
> This started some time around 2.5.62-mm3 with the io_load results on contest
> benchmarking (http://contest.kolivas.org) rising with each run.
> ...
> Mapped: 4294923652 kB
Well that's gotta hurt. This metric is used in making writeback decisions.
Probably the objrmap patch.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-27 21:44 ` Andrew Morton
@ 2003-02-27 22:01 ` Dave McCracken
2003-02-27 22:24 ` Andrew Morton
2003-02-27 23:56 ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
0 siblings, 2 replies; 24+ messages in thread
From: Dave McCracken @ 2003-02-27 22:01 UTC (permalink / raw)
To: Andrew Morton, Con Kolivas; +Cc: linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 517 bytes --]
--On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
<akpm@digeo.com> wrote:
>> ...
>> Mapped: 4294923652 kB
>
> Well that's gotta hurt. This metric is used in making writeback
> decisions. Probably the objrmap patch.
Oops. You're right. Here's a patch to fix it.
Dave McCracken
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
[-- Attachment #2: objmapped-2.5.63-1.diff --]
[-- Type: text/plain, Size: 337 bytes --]
--- 2.5.63-objrmap/mm/rmap.c 2003-02-27 15:58:34.000000000 -0600
+++ 2.5.63-objfix/mm/rmap.c 2003-02-27 15:56:56.000000000 -0600
@@ -248,6 +248,8 @@
BUG();
if (PageSwapCache(page))
BUG();
+ if (atomic_read(&page->pte.mapcount) == 0)
+ inc_page_state(nr_mapped);
atomic_inc(&page->pte.mapcount);
return pte_chain;
}
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-27 22:01 ` Dave McCracken
@ 2003-02-27 22:24 ` Andrew Morton
2003-03-03 21:06 ` [PATCH 2.5.63] Teach page_mapped about the anon flag Dave McCracken
2003-02-27 23:56 ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-02-27 22:24 UTC (permalink / raw)
To: Dave McCracken; +Cc: kernel, linux-kernel, linux-mm
Dave McCracken <dmccr@us.ibm.com> wrote:
>
>
> --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> <akpm@digeo.com> wrote:
>
> >> ...
> >> Mapped: 4294923652 kB
> >
> > Well that's gotta hurt. This metric is used in making writeback
> > decisions. Probably the objrmap patch.
>
> Oops. You're right. Here's a patch to fix it.
>
Thanks.
I'm just looking at page_mapped(). It is now implicitly assuming that the
architecture's representation of a zero-count atomic_t is all-bits-zero.
This is not true on sparc32 if some other CPU is in the middle of an
atomic_foo() against that counter. Maybe the assumption is false on other
architectures too.
So page_mapped() really should be performing an atomic_read() if that is
appropriate to the particular page. I guess this involves testing
page->mapping. Which is stable only when the page is locked or
mapping->page_lock is held.
It appears that all page_mapped() callers are inside lock_page() at present,
so a quick audit and addition of a comment would be appropriate there please.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-27 22:01 ` Dave McCracken
2003-02-27 22:24 ` Andrew Morton
@ 2003-02-27 23:56 ` Con Kolivas
2003-02-28 0:06 ` Andrew Morton
1 sibling, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2003-02-27 23:56 UTC (permalink / raw)
To: Dave McCracken, Andrew Morton; +Cc: linux-kernel, linux-mm
On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
>
> <akpm@digeo.com> wrote:
> >> ...
> >> Mapped: 4294923652 kB
> >
> > Well that's gotta hurt. This metric is used in making writeback
> > decisions. Probably the objrmap patch.
>
> Oops. You're right. Here's a patch to fix it.
Thanks.
This looks better after a run:
MemTotal: 256156 kB
MemFree: 189448 kB
Buffers: 46744 kB
Cached: 4176 kB
SwapCached: 0 kB
Active: 51840 kB
Inactive: 1768 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 256156 kB
LowFree: 189448 kB
SwapTotal: 4194272 kB
SwapFree: 4194272 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 4546752 kB
Slab: 8468 kB
Committed_AS: 7032 kB
PageTables: 200 kB
ReverseMaps: 662
Con
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-27 23:56 ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
@ 2003-02-28 0:06 ` Andrew Morton
2003-02-28 0:28 ` Con Kolivas
2003-02-28 12:48 ` Hugh Dickins
0 siblings, 2 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28 0:06 UTC (permalink / raw)
To: Con Kolivas; +Cc: dmccr, linux-kernel, linux-mm
Con Kolivas <kernel@kolivas.org> wrote:
>
> On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> > --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> >
> > <akpm@digeo.com> wrote:
> > >> ...
> > >> Mapped: 4294923652 kB
> > >
> > > Well that's gotta hurt. This metric is used in making writeback
> > > decisions. Probably the objrmap patch.
> >
> > Oops. You're right. Here's a patch to fix it.
>
> Thanks.
>
> This looks better after a run:
>
> MemTotal: 256156 kB
> ...
> Mapped: 4546752 kB
No, it is still wrong. Mapped cannot exceed MemTotal.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.5.63-mm1
2003-02-27 10:59 2.5.63-mm1 Andrew Morton
2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
@ 2003-02-28 0:17 ` Ed Tomlinson
2003-02-28 0:46 ` 2.5.63-mm1 Andrew Morton
2003-02-28 12:16 ` 2.5.63-mm1 steven roemen
[not found] ` <3E5F7DAD.2080306@cyberone.com.au>
3 siblings, 1 reply; 24+ messages in thread
From: Ed Tomlinson @ 2003-02-28 0:17 UTC (permalink / raw)
To: Andrew Morton, linux-kernel, linux-mm; +Cc: Nick Piggin
On February 27, 2003 05:59 am, Andrew Morton wrote:
> . Tons of changes to the anticipatory scheduler. It may not be working
> very well at present. Please use "elevator=deadline" if it causes
> problems.
The anticipatory scheduler hangs here at the same place it did in 62-mm2,
cfq continues to work fine. A sysrq+T of the hang follows:
Hope this helps,
Ed Tomlinson
SysRq : Show State
free sibling
task PC stack pid father child younger older
swapper D DFF8FB20 11876 1 0 2 (L-TLB)
Call Trace:
[<c01143aa>] io_schedule+0xe/0x18
[<c012a105>] __lock_page+0x8d/0xac
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c012a58e>] do_generic_mapping_read+0x13a/0x340
[<c012aa5a>] __generic_file_aio_read+0x1c6/0x1e4
[<c012a794>] file_read_actor+0x0/0x100
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c015400c>] dput+0x1c/0x1a0
[<c015400c>] dput+0x1c/0x1a0
[<c012ff37>] kmem_cache_alloc+0x23/0x60
[<c0140e57>] vfs_read+0xab/0x150
[<c01498c4>] kernel_read+0x3c/0x48
[<c0161f82>] load_elf_binary+0x2f2/0xbbc
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c012f91c>] cache_init_objs+0x34/0x60
[<c012d2af>] buffered_rmqueue+0xfb/0x108
[<c012d33c>] __alloc_pages+0x80/0x264
[<c014a4ad>] search_binary_handler+0xad/0x23c
[<c0161c90>] load_elf_binary+0x0/0xbbc
[<c014a786>] do_execve+0x14a/0x1a8
[<c0107750>] sys_execve+0x2c/0x60
[<c0108c47>] syscall_call+0x7/0xb
[<c0105175>] init+0x109/0x174
[<c010506c>] init+0x0/0x174
[<c0107019>] kernel_thread_helper+0x5/0xc
ksoftirqd/0 S DFF8A000 4294963836 2 1 3 (L-TLB)
Call Trace:
[<c011a1fc>] ksoftirqd+0x24/0xa4
[<c011a23e>] ksoftirqd+0x66/0xa4
[<c011a1d8>] ksoftirqd+0x0/0xa4
[<c0107019>] kernel_thread_helper+0x5/0xc
events/0 D DFF89ED4 4294953708 3 1 12 4 2 (L-TLB)
Call Trace:
[<c0113985>] wait_for_completion+0x9d/0xe0
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0116363>] do_fork+0x113/0x14c
[<c010708e>] kernel_thread+0x6e/0x84
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c0122a70>] ____call_usermodehelper+0x0/0x94
[<c0107014>] kernel_thread_helper+0x0/0xc
[<c0122b80>] __call_usermodehelper+0x30/0x58
[<c0122a70>] ____call_usermodehelper+0x0/0x94
[<c012304f>] worker_thread+0x1a3/0x274
[<c0122eac>] worker_thread+0x0/0x274
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc
khubd D DFD61D94 4292690652 4 1 5 3 (L-TLB)
Call Trace:
[<c01136a0>] do_schedule+0x2a0/0x348
[<c0113985>] wait_for_completion+0x9d/0xe0
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0122cb2>] call_usermodehelper+0x10a/0x118
[<c01f44d8>] usb_hotplug+0x0/0x1c4
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c01b5a42>] do_hotplug+0x1c2/0x1ec
[<c01b5a91>] dev_hotplug+0x25/0x30
[<c01f44d8>] usb_hotplug+0x0/0x1c4
[<c01b3d9a>] device_add+0x112/0x148
[<c01f4ef6>] usb_new_device+0x322/0x480
[<c0117086>] printk+0x122/0x148
[<c01f6a9f>] usb_hub_port_connect_change+0x233/0x2c4
[<c01f6c69>] usb_hub_events+0x139/0x2c8
[<c01f6e25>] usb_hub_thread+0x2d/0xd4
[<c01f6df8>] usb_hub_thread+0x0/0xd4
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc
pdflush S DFD2FFD4 4292485228 5 1 6 4 (L-TLB)
Call Trace:
[<c012e7e5>] __pdflush+0x95/0x1b0
[<c012e900>] pdflush+0x0/0x14
[<c012e90f>] pdflush+0xf/0x14
[<c0107019>] kernel_thread_helper+0x5/0xc
pdflush S DFD2DFD4 14388 6 1 7 5 (L-TLB)
Call Trace:
[<c012e7e5>] __pdflush+0x95/0x1b0
[<c012e900>] pdflush+0x0/0x14
[<c012e90f>] pdflush+0xf/0x14
[<c0107019>] kernel_thread_helper+0x5/0xc
kswapd0 S DFD29F44 4294958912 7 1 8 6 (L-TLB)
Call Trace:
[<c01328fb>] kswapd+0xcb/0xf0
[<c0132830>] kswapd+0x0/0xf0
[<c0109d26>] math_state_restore+0x2a/0x3c
[<c0108f05>] device_not_available+0x25/0x2a
[<c010e3f5>] save_init_fpu+0x1d/0x3c
[<c0113770>] preempt_schedule+0x28/0x40
[<c0112eb3>] schedule_tail+0x2f/0x94
[<c0108b06>] ret_from_fork+0x6/0x20
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0107019>] kernel_thread_helper+0x5/0xc
aio/0 S DFFE8EA0 4294952400 8 1 9 7 (L-TLB)
Call Trace:
[<c0122fa8>] worker_thread+0xfc/0x274
[<c0122eac>] worker_thread+0x0/0x274
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc
kpnpbiosd Z DFFEE800 4294880232 9 1 10 8 (L-TLB)
Call Trace:
[<c0118b99>] do_exit+0x41d/0x428
[<c01aca44>] pnp_dock_thread+0x0/0xf4
[<c0118bbb>] complete_and_exit+0x17/0x18
[<c01acadc>] pnp_dock_thread+0x98/0xf4
[<c01aca44>] pnp_dock_thread+0x0/0xf4
[<c0107019>] kernel_thread_helper+0x5/0xc
kseriod S DFC44000 4294030016 10 1 11 9 (L-TLB)
Call Trace:
[<c02073e7>] serio_thread+0x9f/0x12c
[<c0207348>] serio_thread+0x0/0x12c
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc
reiserfs/0 S DFCBD460 8080 11 1 10 (L-TLB)
Call Trace:
[<c0122fa8>] worker_thread+0xfc/0x274
[<c0122eac>] worker_thread+0x0/0x274
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc
events/0 D DFAC7A30 4294892756 12 3 (L-TLB)
Call Trace:
[<c01143aa>] io_schedule+0xe/0x18
[<c012a105>] __lock_page+0x8d/0xac
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c012a58e>] do_generic_mapping_read+0x13a/0x340
[<c012aa5a>] __generic_file_aio_read+0x1c6/0x1e4
[<c012a794>] file_read_actor+0x0/0x100
[<c017f6b0>] reiserfs_get_block+0x0/0x11cc
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c015400c>] dput+0x1c/0x1a0
[<c015400c>] dput+0x1c/0x1a0
[<c012ff37>] kmem_cache_alloc+0x23/0x60
[<c0140e57>] vfs_read+0xab/0x150
[<c01498c4>] kernel_read+0x3c/0x48
[<c0161f82>] load_elf_binary+0x2f2/0xbbc
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c014bf83>] real_lookup+0x67/0xd0
[<c014c254>] do_lookup+0x48/0x84
[<c015400c>] dput+0x1c/0x1a0
[<c014c95a>] link_path_walk+0x6ca/0x848
[<c014a4ad>] search_binary_handler+0xad/0x23c
[<c0161c90>] load_elf_binary+0x0/0xbbc
[<c01614c1>] load_script+0x1d1/0x1e0
[<c012d2af>] buffered_rmqueue+0xfb/0x108
[<c012d33c>] __alloc_pages+0x80/0x264
[<c014a4ad>] search_binary_handler+0xad/0x23c
[<c01612f0>] load_script+0x0/0x1e0
[<c014a786>] do_execve+0x14a/0x1a8
[<c0107750>] sys_execve+0x2c/0x60
[<c0108c47>] syscall_call+0x7/0xb
[<c0122ae8>] ____call_usermodehelper+0x78/0x94
[<c0122a70>] ____call_usermodehelper+0x0/0x94
[<c0107019>] kernel_thread_helper+0x5/0xc
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-28 0:06 ` Andrew Morton
@ 2003-02-28 0:28 ` Con Kolivas
2003-02-28 7:46 ` Duncan Sands
2003-02-28 12:48 ` Hugh Dickins
1 sibling, 1 reply; 24+ messages in thread
From: Con Kolivas @ 2003-02-28 0:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: dmccr, linux-kernel, linux-mm
On Fri, 28 Feb 2003 11:06 am, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> > > --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> > >
> > > <akpm@digeo.com> wrote:
> > > >> ...
> > > >> Mapped: 4294923652 kB
> > > >
> > > > Well that's gotta hurt. This metric is used in making writeback
> > > > decisions. Probably the objrmap patch.
> > >
> > > Oops. You're right. Here's a patch to fix it.
> >
> > Thanks.
> >
> > This looks better after a run:
> >
> > MemTotal: 256156 kB
> > ...
> > Mapped: 4546752 kB
>
> No, it is still wrong. Mapped cannot exceed MemTotal.
Hmm a few more runs and io_load starts rising again and this is the meminfo in
the middle of a run:
MemTotal: 256156 kB
MemFree: 26564 kB
Buffers: 11300 kB
Cached: 198048 kB
SwapCached: 0 kB
Active: 7164 kB
Inactive: 204736 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 256156 kB
LowFree: 26564 kB
SwapTotal: 4194272 kB
SwapFree: 4194272 kB
Dirty: 5780 kB
Writeback: 0 kB
Mapped: 6000680 kB
Slab: 13056 kB
Committed_AS: 7040 kB
PageTables: 200 kB
ReverseMaps: 664
Con
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.5.63-mm1
2003-02-28 0:17 ` 2.5.63-mm1 Ed Tomlinson
@ 2003-02-28 0:46 ` Andrew Morton
0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28 0:46 UTC (permalink / raw)
To: Ed Tomlinson; +Cc: linux-kernel, linux-mm, piggin
Ed Tomlinson <tomlins@cam.org> wrote:
>
> On February 27, 2003 05:59 am, Andrew Morton wrote:
> > . Tons of changes to the anticipatory scheduler. It may not be working
> > very well at present. Please use "elevator=deadline" if it causes
> > problems.
>
> The anticipatory scheduler hangs here at the same place it did in 62-mm2,
> cfq continues to work fine. A sysrq+T of the hang follows:
I must say, Ed: you have an eerie ability to break stuff.
Please send me your .config.
> free sibling
> task PC stack pid father child younger older
> swapper D DFF8FB20 11876 1 0 2 (L-TLB)
Interesting amount of free stack you have there. You broke show_task() too!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-28 0:28 ` Con Kolivas
@ 2003-02-28 7:46 ` Duncan Sands
2003-02-28 8:06 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: Duncan Sands @ 2003-02-28 7:46 UTC (permalink / raw)
To: Con Kolivas, Andrew Morton; +Cc: dmccr, linux-kernel, linux-mm
Hi Con, are you sure this is not the same for 2.5.63?
I left 2.5.63 running over night (doing nothing but run
KDE), and in the morning it was swapping heavily.
About 200MB was swapped out and this did not reduce
with usage. According to top, 10% of memory was being
used by a Konsole with nothing in it (could be a memory
leak in Konsole). After half an hour I gave up - it was
too unusable. Maybe -mm1 just accentuates a problem
that is already there in 2.5.63.
Ciao,
Duncan.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-28 7:46 ` Duncan Sands
@ 2003-02-28 8:06 ` Andrew Morton
0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28 8:06 UTC (permalink / raw)
To: Duncan Sands; +Cc: kernel, dmccr, linux-kernel, linux-mm
Duncan Sands <baldrick@wanadoo.fr> wrote:
>
> Hi Con, are you sure this is not the same for 2.5.63?
> I left 2.5.63 running over night (doing nothing but run
> KDE), and in the morning it was swapping heavily.
> About 200MB was swapped out and this did not reduce
> with usage. According to top, 10% of memory was being
> used by a Konsole with nothing in it (could be a memory
> leak in Konsole). After half an hour I gave up - it was
> too unusable. Maybe -mm1 just accentuates a problem
> that is already there in 2.5.63.
>
Please take a snapshot of /proc/meminfo and /proc/slabinfo
if anything like this happens.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.5.63-mm1
2003-02-27 10:59 2.5.63-mm1 Andrew Morton
2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
2003-02-28 0:17 ` 2.5.63-mm1 Ed Tomlinson
@ 2003-02-28 12:16 ` steven roemen
2003-02-28 12:24 ` 2.5.63-mm1 Andrew Morton
[not found] ` <3E5F7DAD.2080306@cyberone.com.au>
3 siblings, 1 reply; 24+ messages in thread
From: steven roemen @ 2003-02-28 12:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
the kernel oopses when i2c is compiled into the kernel with -mm1, and
-mm1 with dave mccraken's patch.
also when i remove i2c from the kernel and boot into it with AS as the
elevator, the load (via top) starts at 2.00, yet the processors aren't
loaded very much at all. is this a known issue(this is the first -mm
kernel i've run)?
-steve
On Thu, 2003-02-27 at 04:59, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm1/
>
> . Tons of changes to the anticipatory scheduler. It may not be working
> very well at present. Please use "elevator=deadline" if it causes
> problems.
>
> . Updated smalldevfs patch.
>
> . A fix for the VMA-based reverse mapping patch.
>
> . Added Ingo's latest CPU scheduler update.
>
> . Lots of random fixes.
>
>
>
> linus.patch
>
> Latest from Linus
>
> -initial-jiffies.patch
> -user-times-jiffies-wrap-fix.patch
> -put_page-speedup.patch
> -slab-batchcount-limit-fix.patch
> -crc32-speedup-2.patch
> -flush-tlb-all-2.patch
> -linux-2.5.62-early_ioremap_A0.patch
> -linux-2.5.62-x440disco_A0.patch
> -use-find_get_page.patch
> -irda-interruptible-sleep.patch
> -dget-BUG.patch
> -disk-accounting-fix.patch
> -hugh-inode-pruning-race-fix.patch
> -kill-bogus-wakeup-messge.patch
> -dont-sync-with-stopped-pdflush.patch
> -irq-balance-disable-fix.patch
> -oom-killer-dont-spin-on-same-task.patch
> -add-missing-global_flush_tlb-calls.patch
> -ext3-O_SYNC-speedup.patch
> -remove-MAX_BLKDEV-from-genhd.patch
>
> Merged
>
> +separate.patch
>
> My contribution to the spelling bee.
>
> +rpc_rmdir-fix.patch
>
> Fix the NFS oops
>
> +ppc64-scruffiness.patch
>
> Fix some warnings
>
> -reiserfs_file_write-4.patch
> +reiserfs_file_write-5.patch
>
> Updated (I don't think it changed)
>
> +limit-write-latency.patch
>
> Fix potential source of write-vs-write latency in VFS
>
> +lockd-lockup-fix-2.patch
>
> Updated patch from Neil for an NFS server deadlock
>
> +loop-hack.patch
>
> Fix an OOM and oops in loop
>
> +flock-fix.patch
>
> File locking fix from Matthew
>
> +sysfs-dget-fix-2.patch
>
> Fix a sysfs dentry race (this isn't right)
>
> +irq-sharing-fix.patch
>
> Fix SA_INTERRUPT for shared interrupts
>
> +anticipation_is_killing_me.patch
> +as-fix-hughs-problem.patch
> +as-cleanup.patch
> +as-start-stop-anticipation-helpers.patch
> +as-cleanup-2.patch
> +as-cleanup-3.patch
> +as-cleanup-3-write-latency-fix.patch
> +as-handle-exitted-tasks.patch
> +as-handle-exitted-tasks-fix.patch
> +as-no-plugging-and-cleanups.patch
> +as-remove-debug.patch
> +as-track-queued-reads.patch
> +as-accounting-fix.patch
> +as-nr_reads-fix.patch
> +as-tuning.patch
> +as-disable-nr_reads.patch
>
> Anticipatory scheduler work
>
> smalldevfs.patch
>
> Updated
>
> -smalldevfs-dcache_rcu-fix.patch
>
> Folded into smalldevfs.patch
>
> +objrmap-X-fix.patch
>
> Fix VMA-based reverse mapping
>
> +per-cpu-disk-stats.patch
>
> Use per-cpu data for disk accounting
>
> +presto_get_sb-fix.patch
>
> Fix an intermezzo oops
>
> +on_each_cpu.patch
> +on_each_cpu-ldt-cleanup.patch
>
> preempt-safety for smp_call_function()
>
> +notsc-panic.patch
>
> x86 TSC cleanup
>
> +alloc_pages_cleanup.patch
>
> Code consolidation
>
> +ext2-handle-htree-flag.patch
>
> ext2 htree back-compatibility
>
> +sched-a3.patch
>
> CPU scheduler update
>
> +mpparse-typo-fix.patch
>
> Fix a printk bug
>
> +i386-no-swap-fix.patch
>
> Fix ia32 CONFIG_SWAP=n
>
> +remove-hugetlb_key.patch
> +hugetlbpage-doc-update.patch
> +hugetlb-valid-page-ranges.patch
>
> Hugetlbpage work
>
>
>
>
> All 88 patches:
>
> linus.patch
> Latest from Linus
>
> separate.patch
>
> mm.patch
> add -mmN to EXTRAVERSION
>
> rpc_rmdir-fix.patch
> Fix nfs oops during mount
>
> ppc64-reloc_hide.patch
>
> ppc64-pci-patch.patch
> Subject: pci patch
>
> ppc64-e100-fix.patch
> fix e100 for big-endian machines
>
> ppc64-aio-32bit-emulation.patch
> 32/64bit emulation for aio
>
> ppc64-64-bit-exec-fix.patch
> Subject: 64bit exec
>
> ppc64-scruffiness.patch
> Fix some PPC64 compile warnings
>
> sym-do-160.patch
> make the SYM driver do 160 MB/sec
>
> kgdb.patch
>
> nfsd-disable-softirq.patch
> Fix race in svcsock.c in 2.5.61
>
> report-lost-ticks.patch
> make lost-tick detection more informative
>
> devfs-fix.patch
>
> ptrace-flush.patch
> cache flushing in the ptrace code
>
> buffer-debug.patch
> buffer.c debugging
>
> warn-null-wakeup.patch
>
> ext3-truncate-ordered-pages.patch
> ext3: explicitly free truncated pages
>
> deadline-dispatching-fix.patch
> deadline IO scheduler dispatching fix
>
> nfs-unstable-pages.patch
> "unstable" page accounting for NFS.
>
> limit-write-latency.patch
>
> reiserfs_file_write-5.patch
>
> tcp-wakeups.patch
> Use fast wakeups in TCP/IPV4
>
> lockd-lockup-fix-2.patch
> Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)
>
> rcu-stats.patch
> RCU statistics reporting
>
> ext3-journalled-data-assertion-fix.patch
> Remove incorrect assertion from ext3
>
> nfs-speedup.patch
>
> nfs-oom-fix.patch
> nfs oom fix
>
> sk-allocation.patch
> Subject: Re: nfs oom
>
> nfs-more-oom-fix.patch
>
> nfs-sendfile.patch
> Implement sendfile() for NFS
>
> rpciod-atomic-allocations.patch
> Make rcpiod use atomic allocations
>
> linux-isp.patch
>
> isp-update-1.patch
>
> remove-unused-congestion-stuff.patch
> Subject: [PATCH] remove unused congestion stuff
>
> aic-makefile-fix.patch
> aicasm Makefile fix
>
> loop-hack.patch
> loop: Fix OOM and oops
>
> atm_dev_sem.patch
> convert atm_dev_lock from spinlock to semaphore
>
> flock-fix.patch
> flock fixes for 2.5.62
>
> sysfs-dget-fix-2.patch
>
> irq-sharing-fix.patch
> fix irq sharing and SA_INTERRUPT on x86
>
> as-iosched.patch
> anticipatory I/O scheduler
>
> as-comments-and-tweaks.patch
> antsched: commentary and
>
> as-hz-1000-fix.patch
> Fix anticipatory scheduler for HZ=100
>
> as-tidy-up-rename.patch
> tidy up AS rename
>
> anticipation_is_killing_me.patch
>
> as-update-1.patch
> AS update
>
> as-break-anticipation-on-write.patch
> AS break on write
>
> as-break-if-readahead.patch
> detect overlapping reads and writes
>
> as-fix-hughs-problem.patch
> Add a pointer to the queue into struct as_data
>
> as-cleanup.patch
> anticipatory scheduler cleanups
>
> as-start-stop-anticipation-helpers.patch
> AS: add anticipation stop/start helper functions
>
> as-cleanup-2.patch
> Subject: [PATCH] some cleanups 2
>
> as-cleanup-3.patch
> AS: more cleanups
>
> as-cleanup-3-write-latency-fix.patch
> Fix as-cleanup-3
>
> as-handle-exitted-tasks.patch
>
> as-handle-exitted-tasks-fix.patch
> fix for as IO contexts
>
> as-no-plugging-and-cleanups.patch
> AS no plugging + cleanups
>
> as-remove-debug.patch
>
> as-track-queued-reads.patch
> AS: track queued reads
>
> as-accounting-fix.patch
> AS: track queued reads (fix)
>
> as-nr_reads-fix.patch
> AS: read accounting fix
>
> as-tuning.patch
> AS: tuning
>
> as-disable-nr_reads.patch
> AS: disable per-process in-flight read logic
>
> readahead-shrink-to-zero.patch
> Allow VFS readahead to fall to zero
>
> cfq-2.patch
> CFQ scheduler, #2
>
> smalldevfs.patch
> smalldevfs
>
> objrmap-2.5.62-5.patch
> object-based rmap
>
> objrmap-X-fix.patch
> objrmap fix for X
>
> oprofile-up-fix.patch
> fix oprofile on UP (lockless sync)
>
> update_atime-speedup.patch
> speed up update_atime()
>
> ext2-update_atime_speedup.patch
> Use one_sec_update_atime in ext2
>
> ext3-update_atime_speedup.patch
> Use one_sec_update_atime in ext2
>
> UPDATE_ATIME-to-update_atime.patch
> Rename UPDATE_ATIME to update_atime
>
> per-cpu-disk-stats.patch
> Make diskstats per-cpu using kmalloc_percpu
>
> presto_get_sb-fix.patch
> fix presto_get_sb() return value and oops.
>
> on_each_cpu.patch
> fix preempt-issues with smp_call_function()
>
> on_each_cpu-ldt-cleanup.patch
>
> notsc-panic.patch
> Don't panic if TSC is enabled and notsc is used
>
> alloc_pages_cleanup.patch
> clean up redundant code for alloc_pages
>
> ext2-handle-htree-flag.patch
> ext2: clear ext3 htree flag on directories
>
> sched-a3.patch
> "HT scheduler", sched-2.5.63-A3
>
> mpparse-typo-fix.patch
> fix typo in arch/i386/kernel/mpparse.c in printk
>
> i386-no-swap-fix.patch
> allow CONFIG_SWAP=n for i386
>
> remove-hugetlb_key.patch
> remove dead hugetlb_key forward decl
>
> hugetlbpage-doc-update.patch
> hugetlbpage documentation update
>
> hugetlb-valid-page-ranges.patch
> hugetlb: fix MAP_FIXED handling
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: 2.5.63-mm1
2003-02-28 12:16 ` 2.5.63-mm1 steven roemen
@ 2003-02-28 12:24 ` Andrew Morton
0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2003-02-28 12:24 UTC (permalink / raw)
To: steven roemen; +Cc: linux-kernel, linux-mm
steven roemen <sdroemen1@cox.net> wrote:
>
>
> the kernel oopses when i2c is compiled into the kernel with -mm1, and
> -mm1 with dave mccraken's patch.
Please send a full report on this to the mailing list.
> also when i remove i2c from the kernel and boot into it with AS as the
> elevator, the load (via top) starts at 2.00, yet the processors aren't
> loaded very much at all. is this a known issue(this is the first -mm
> kernel i've run)?
Run `ps aux' when the system is idle and see if there are any tasks
in "D" state.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-28 0:06 ` Andrew Morton
2003-02-28 0:28 ` Con Kolivas
@ 2003-02-28 12:48 ` Hugh Dickins
2003-02-28 15:56 ` Dave McCracken
1 sibling, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2003-02-28 12:48 UTC (permalink / raw)
To: Andrew Morton; +Cc: Con Kolivas, dmccr, linux-kernel, linux-mm
On Thu, 27 Feb 2003, Andrew Morton wrote:
>
> No, it is still wrong. Mapped cannot exceed MemTotal.
It needs this in addition to Dave's patch from yesterday:
--- 2.5.63-objfix-1/mm/rmap.c Thu Feb 27 23:37:28 2003
+++ 2.5.63-objfix-2/mm/rmap.c Fri Feb 28 12:33:58 2003
@@ -349,7 +349,8 @@
BUG();
if (atomic_read(&page->pte.mapcount) == 0)
BUG();
- atomic_dec(&page->pte.mapcount);
+ if (atomic_dec_and_test(&page->pte.mapcount))
+ dec_page_state(nr_mapped);
return;
}
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rising io_load results Re: 2.5.63-mm1
2003-02-28 12:48 ` Hugh Dickins
@ 2003-02-28 15:56 ` Dave McCracken
0 siblings, 0 replies; 24+ messages in thread
From: Dave McCracken @ 2003-02-28 15:56 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton; +Cc: Con Kolivas, linux-kernel, linux-mm
--On Friday, February 28, 2003 12:48:06 +0000 Hugh Dickins
<hugh@veritas.com> wrote:
> On Thu, 27 Feb 2003, Andrew Morton wrote:
>>
>> No, it is still wrong. Mapped cannot exceed MemTotal.
>
> It needs this in addition to Dave's patch from yesterday:
>
> --- 2.5.63-objfix-1/mm/rmap.c Thu Feb 27 23:37:28 2003
> +++ 2.5.63-objfix-2/mm/rmap.c Fri Feb 28 12:33:58 2003
> @@ -349,7 +349,8 @@
> BUG();
> if (atomic_read(&page->pte.mapcount) == 0)
> BUG();
> - atomic_dec(&page->pte.mapcount);
> + if (atomic_dec_and_test(&page->pte.mapcount))
> + dec_page_state(nr_mapped);
> return;
> }
D'oh. I should have seen that one. Thanks.
Dave McCracken
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] tiobench on UP and ptg-D3-mm1
[not found] ` <200302282227.56311.tomlins@cam.org>
@ 2003-03-01 15:04 ` Ed Tomlinson
0 siblings, 0 replies; 24+ messages in thread
From: Ed Tomlinson @ 2003-03-01 15:04 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Andrew,
You mentioned problems with tiobench on UP. This message was partly
composed with this script running:
for dir in /pool{a,e,g}/tio
do
( cd $dir
tiobench --size 128 --threads 16 > /dev/null 2>&1 &
)
done
response was slow but usable. Its actually a fairly good example showing
what the ptg patch can do. Here is a "vmstat -a 5" of the run.
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free inact active si so bi bo in cs us sy id wa
3 0 72188 255068 39132 180080 0 0 0 67 1052 621 4 3 93 0
2 0 72188 254868 39236 180144 0 0 0 66 1062 639 5 3 92 0
39 0 72188 250604 39324 183796 0 0 2 82 1201 1163 27 13 60 0
49 0 72188 250196 39416 183924 0 0 0 65 1200 1053 92 8 0 0
52 3 72188 129660 159300 184104 0 0 0 12670 1228 782 27 73 0 0
52 2 72188 10364 275964 185144 0 0 0 16706 1304 970 16 84 0 0
58 6 74248 5292 275584 190256 13 394 21 19348 1401 925 15 85 0 0
22 29 74248 2248 277480 191160 36 0 65 19530 1543 1249 37 55 0 8
31 27 74248 2284 277472 191124 0 0 19 8378 1277 686 87 13 0 0
11 34 74248 4308 275360 191124 0 0 6 5119 1576 1174 54 19 0 28
3 51 74248 4164 275036 191544 0 0 51 1805 1603 1005 44 11 0 45
1 49 74248 3524 274308 192388 0 0 133 1690 1613 1694 21 9 0 69
2 38 74248 3484 274664 193212 0 0 56 1755 1485 831 7 6 0 87
19 11 74248 3300 273792 194276 0 0 204 1741 1502 955 24 7 0 69
16 7 74248 3584 216036 252272 0 0 10351 1333 1716 1456 32 33 0 35
14 25 74248 128772 147112 196100 39 0 5041 376 1413 1565 70 29 0 1
3 16 74248 57316 156176 259012 0 0 14367 0 1698 1393 51 49 0 0
4 4 74248 150240 83964 238672 0 0 7649 722 1396 1096 66 34 0 0
9 3 74248 142896 85368 244680 0 0 1466 12 1286 1053 90 10 0 0
8 0 74248 220180 33184 219640 82 0 917 77 1263 985 86 14 0 0
2 0 74248 270764 9512 193160 0 0 0 58 1057 665 69 6 25 0
4 0 74248 270788 9576 193220 0 0 0 60 1056 720 15 4 81 0
This is using cfq on a k6-III 400 with 512m all impacted fs(es) are reiserfs.
What this does is detect thread groups (where they are defined as processes sharing
both mm and FDs or processes tagged as members of a kernel thread group) and reduces
the timeslices given to these processes when to many processes are active in a
group. This allows other tasks to get cpu IF there is a demand. There is also a
governor set for user tasks - in this case it will not affect the test.
The patch has been tested on UP and compiles for SMP. It should be OK on SMP. On
numa boxes it would really benefit from a dynamic way to alloc per node storage.
The ptgroup->active[] and user->active[] arrays should really point to atomic_t(s)
in per node storage.
I have been using variants of this patch since the beginning of Jan - it lets me run
a java freenet server, which is heavily threaded, without it impacting my interactive
response much.
Ed Tomlinson
PS. patch applies to 2.5.63-mm1, with a little twiddling it should also be
applicable to .63 (sched.c) or .63bk (sched.c, fork.c)
---------------
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1026 -> 1.1028
# include/linux/sched.h 1.139 -> 1.140
# kernel/fork.c 1.111 -> 1.113
# kernel/user.c 1.8 -> 1.9
# kernel/sched.c 1.164 -> 1.165
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/02/28 ed@oscar.et.ca 1.1027
# Add user and thread group governors to prevent either from monoplizing
# the system. The governors work by limiting the sum of the timeslices
# of active tasks in a group to <n> timeslices. The defaults set <n> to
# 1.5 for thread groups and to 30 for user tasks. For numa systems the
# governors are per node.
# --------------------------------------------
#
diff -Nru a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h Fri Feb 28 07:33:49 2003
+++ b/include/linux/sched.h Fri Feb 28 07:33:49 2003
@@ -195,6 +195,11 @@
#include <linux/aio.h>
+struct ptg_struct { /* pseudo thread groups */
+ atomic_t active[MAX_NUMNODES];
+ atomic_t count; /* number of refs */
+};
+
struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
@@ -295,6 +300,7 @@
struct user_struct {
atomic_t __count; /* reference count */
atomic_t processes; /* How many processes does this user have? */
+ atomic_t active[MAX_NUMNODES];
atomic_t files; /* How many open files does this user have? */
/* Hash table maintenance information */
@@ -361,6 +367,8 @@
struct list_head ptrace_list;
struct mm_struct *mm, *active_mm;
+ struct ptg_struct * ptgroup; /* pseudo thread group for this task */
+ atomic_t *governor; /* the atomic_t that governs this task */
/* task state */
struct linux_binfmt *binfmt;
diff -Nru a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c Fri Feb 28 07:33:49 2003
+++ b/kernel/fork.c Fri Feb 28 07:33:49 2003
@@ -72,12 +72,24 @@
return total;
}
+void free_ptgroup(struct task_struct *tsk)
+{
+ if (tsk->ptgroup && atomic_sub_and_test(1,&tsk->ptgroup->count)) {
+ kfree(tsk->ptgroup);
+ tsk->ptgroup = NULL;
+ tsk->governor = &tsk->user->active[cpu_to_node(task_cpu(tsk))];
+ if (tsk == current)
+ atomic_inc(tsk->governor);
+ }
+}
+
void __put_task_struct(struct task_struct *tsk)
{
WARN_ON(!(tsk->state & (TASK_DEAD | TASK_ZOMBIE)));
WARN_ON(atomic_read(&tsk->usage));
WARN_ON(tsk == current);
+ free_ptgroup(tsk);
security_task_free(tsk);
free_uid(tsk->user);
@@ -465,6 +477,7 @@
tsk->mm = NULL;
tsk->active_mm = NULL;
+ tsk->ptgroup = NULL;
/*
* Are we cloning a kernel thread?
@@ -730,6 +743,32 @@
p->flags = new_flags;
}
+static inline int setup_governor(unsigned long clone_flags, struct task_struct *p)
+{
+ if ( ((clone_flags & CLONE_VM) && (clone_flags & CLONE_FILES)) ||
+ (clone_flags & CLONE_THREAD)) {
+ if (current->ptgroup)
+ atomic_inc(¤t->ptgroup->count);
+ else {
+ int i;
+ current->ptgroup = kmalloc(sizeof(struct ptg_struct), GFP_ATOMIC);
+ if (!current->ptgroup)
+ return 1;
+ /* printk(KERN_INFO "ptgroup - pid %u\n",current->pid); */
+ atomic_set(¤t->ptgroup->count,2);
+ for(i=0; i < MAX_NUMNODES; i++)
+ atomic_set(¤t->ptgroup->active[i], 0);
+ atomic_set(¤t->ptgroup->active[numa_node_id()], 1);
+ atomic_dec(current->governor);
+ current->governor = ¤t->ptgroup->active[numa_node_id()];
+ }
+ p->ptgroup = current->ptgroup;
+ p->governor = &p->ptgroup->active[numa_node_id()];
+ } else
+ p->governor = &p->user->active[numa_node_id()];
+ return 0;
+}
+
asmlinkage int sys_set_tid_address(int *tidptr)
{
current->clear_child_tid = tidptr;
@@ -872,6 +911,12 @@
goto bad_fork_cleanup_mm;
retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
if (retval)
+ goto bad_fork_cleanup_namespace;
+ /*
+ * Setup the governor pointer for the new process, allocating a new ptg as
+ * required if the process is a thread.
+ */
+ if (setup_governor(clone_flags, p))
goto bad_fork_cleanup_namespace;
if (clone_flags & CLONE_CHILD_SETTID)
diff -Nru a/kernel/sched.c b/kernel/sched.c
--- a/kernel/sched.c Fri Feb 28 07:33:49 2003
+++ b/kernel/sched.c Fri Feb 28 07:33:49 2003
@@ -69,6 +69,9 @@
#define STARVATION_LIMIT (2*HZ)
#define AGRESSIVE_IDLE_STEAL 1
#define NODE_THRESHOLD 125
+#define THREAD_GOVERNOR 15 /* allow threads groups 1.5 full timeslices */
+#define USER_GOVERNOR 300 /* allow user 30 full timeslices */
+
/*
* If a task is 'interactive' then we reinsert it in the active
@@ -124,7 +127,26 @@
static inline unsigned int task_timeslice(task_t *p)
{
- return BASE_TIMESLICE(p);
+ int slice = BASE_TIMESLICE(p);
+ int threads = atomic_read(p->governor) * 10;
+ int govern = threads;
+ if (p->user->uid)
+ govern = (p->ptgroup) ? THREAD_GOVERNOR : USER_GOVERNOR;
+ if (threads > govern) {
+ slice = (slice * govern) / threads;
+ slice = (slice > MIN_TIMESLICE) ? slice : MIN_TIMESLICE;
+ }
+#if 1
+ {
+ static int next;
+ if (time_after(jiffies, next)) {
+ printk(KERN_INFO "uid %d pid %d nod %d ptg %x gov %x threads %d lim %d slice %d\n",
+ p->uid, p->pid, numa_node_id(), p->ptgroup, p->governor, threads/10, govern, slice);
+ next = jiffies + HZ*300;
+ }
+ }
+#endif
+ return slice;
}
/*
@@ -251,16 +273,18 @@
rq->node_nr_running = &node_nr_running[0];
}
-static inline void nr_running_inc(runqueue_t *rq)
+static inline void nr_running_inc(task_t *p, runqueue_t *rq)
{
atomic_inc(rq->node_nr_running);
rq->nr_running++;
+ atomic_inc(p->governor);
}
-static inline void nr_running_dec(runqueue_t *rq)
+static inline void nr_running_dec(task_t *p, runqueue_t *rq)
{
atomic_dec(rq->node_nr_running);
rq->nr_running--;
+ atomic_dec(p->governor);
}
__init void node_nr_running_init(void)
@@ -274,8 +298,8 @@
#else /* !CONFIG_NUMA */
# define nr_running_init(rq) do { } while (0)
-# define nr_running_inc(rq) do { (rq)->nr_running++; } while (0)
-# define nr_running_dec(rq) do { (rq)->nr_running--; } while (0)
+# define nr_running_inc(p, rq) do { (rq)->nr_running++; atomic_inc((p)->governor); } while (0)
+# define nr_running_dec(p, rq) do { (rq)->nr_running--; atomic_dec((p)->governor); } while (0)
#endif /* CONFIG_NUMA */
@@ -380,7 +404,7 @@
static inline void __activate_task(task_t *p, runqueue_t *rq)
{
enqueue_task(p, rq->active);
- nr_running_inc(rq);
+ nr_running_inc(p, rq);
}
static inline void activate_task(task_t *p, runqueue_t *rq)
@@ -408,7 +432,7 @@
*/
static inline void deactivate_task(struct task_struct *p, runqueue_t *rq)
{
- nr_running_dec(rq);
+ nr_running_dec(p, rq);
if (p->state == TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible++;
dequeue_task(p, p->array);
@@ -1068,9 +1092,15 @@
static inline void pull_task(runqueue_t *src_rq, prio_array_t *src_array, task_t *p, runqueue_t *this_rq, int this_cpu)
{
dequeue_task(p, src_array);
- nr_running_dec(src_rq);
+ nr_running_dec(p, src_rq);
set_task_cpu(p, this_cpu);
- nr_running_inc(this_rq);
+#ifdef CONFIG_NUMA
+ if (p->ptgroup)
+ p->governor = &p->ptgroup->active[cpu_to_node(this_cpu)];
+ else
+ p->governor = &p->user->active[cpu_to_node(this_cpu)];
+#endif
+ nr_running_inc(p, this_rq);
enqueue_task(p, this_rq->active);
wake_up_cpu(this_rq, this_cpu, p);
}
@@ -2729,6 +2759,8 @@
cpu_idle_ptr(smp_processor_id()) = current;
set_task_cpu(current, smp_processor_id());
+ current->governor = ¤t->user->active[numa_node_id()];
+ atomic_inc(current->governor);
wake_up_forked_process(current);
init_timers();
diff -Nru a/kernel/user.c b/kernel/user.c
--- a/kernel/user.c Fri Feb 28 07:33:49 2003
+++ b/kernel/user.c Fri Feb 28 07:33:49 2003
@@ -30,6 +30,7 @@
struct user_struct root_user = {
.__count = ATOMIC_INIT(1),
.processes = ATOMIC_INIT(1),
+ .active = {[0 ...MAX_NUMNODES-1] = ATOMIC_INIT(0)},
.files = ATOMIC_INIT(0)
};
@@ -89,6 +90,7 @@
if (!up) {
struct user_struct *new;
+ int i;
new = kmem_cache_alloc(uid_cachep, SLAB_KERNEL);
if (!new)
@@ -96,6 +98,8 @@
new->uid = uid;
atomic_set(&new->__count, 1);
atomic_set(&new->processes, 0);
+ for(i=0; i < MAX_NUMNODES; i++)
+ atomic_set(&new->active[i], 0);
atomic_set(&new->files, 0);
/*
@@ -130,6 +134,11 @@
atomic_inc(&new_user->processes);
atomic_dec(&old_user->processes);
current->user = new_user;
+ if (!current->ptgroup) {
+ atomic_dec(current->governor);
+ current->governor = ¤t->user->active[numa_node_id()];
+ atomic_inc(current->governor);
+ }
free_uid(old_user);
}
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2.5.63] Teach page_mapped about the anon flag
2003-02-27 22:24 ` Andrew Morton
@ 2003-03-03 21:06 ` Dave McCracken
2003-03-03 21:12 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: Dave McCracken @ 2003-03-03 21:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]
--On Thursday, February 27, 2003 14:24:50 -0800 Andrew Morton
<akpm@digeo.com> wrote:
> I'm just looking at page_mapped(). It is now implicitly assuming that the
> architecture's representation of a zero-count atomic_t is all-bits-zero.
>
> This is not true on sparc32 if some other CPU is in the middle of an
> atomic_foo() against that counter. Maybe the assumption is false on other
> architectures too.
>
> So page_mapped() really should be performing an atomic_read() if that is
> appropriate to the particular page. I guess this involves testing
> page->mapping. Which is stable only when the page is locked or
> mapping->page_lock is held.
>
> It appears that all page_mapped() callers are inside lock_page() at
> present, so a quick audit and addition of a comment would be appropriate
> there please.
I'm not at all confident that page_mapped() is adequately protected.
Here's a patch that explicitly handles the atomic_t case.
Dave McCracken
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
[-- Attachment #2: objfix-2.5.63-1.diff --]
[-- Type: text/plain, Size: 738 bytes --]
--- 2.5.63-objrmap/include/linux/mm.h 2003-02-27 15:58:34.000000000 -0600
+++ 2.5.63-objfix/include/linux/mm.h 2003-02-28 14:21:56.000000000 -0600
@@ -363,10 +363,16 @@
* Return true if this page is mapped into pagetables. Subtle: test pte.direct
* rather than pte.chain. Because sometimes pte.direct is 64-bit, and .chain
* is only 32-bit.
+ *
+ * If the page is an object-mapped page, we need to do an atomic read of
+ * pte.mapcount instead, since atomic values may not be zero in the upper bits.
*/
static inline int page_mapped(struct page *page)
{
- return page->pte.direct != 0;
+ if (PageAnon(page))
+ return page->pte.direct != 0;
+ else
+ return atomic_read(&page->pte.mapcount) != 0;
}
/*
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
2003-03-03 21:06 ` [PATCH 2.5.63] Teach page_mapped about the anon flag Dave McCracken
@ 2003-03-03 21:12 ` Andrew Morton
2003-03-03 21:24 ` Dave McCracken
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-03-03 21:12 UTC (permalink / raw)
To: Dave McCracken; +Cc: linux-kernel, linux-mm
Dave McCracken <dmccr@us.ibm.com> wrote:
>
>
> --On Thursday, February 27, 2003 14:24:50 -0800 Andrew Morton
> <akpm@digeo.com> wrote:
>
> > I'm just looking at page_mapped(). It is now implicitly assuming that the
> > architecture's representation of a zero-count atomic_t is all-bits-zero.
> >
> > This is not true on sparc32 if some other CPU is in the middle of an
> > atomic_foo() against that counter. Maybe the assumption is false on other
> > architectures too.
> >
> > So page_mapped() really should be performing an atomic_read() if that is
> > appropriate to the particular page. I guess this involves testing
> > page->mapping. Which is stable only when the page is locked or
> > mapping->page_lock is held.
> >
> > It appears that all page_mapped() callers are inside lock_page() at
> > present, so a quick audit and addition of a comment would be appropriate
> > there please.
>
> I'm not at all confident that page_mapped() is adequately protected.
It is. All callers which need to be 100% accurate are under
pte_chain_lock().
> Here's a patch that explicitly handles the atomic_t case.
OK.. But it increases dependency on PageAnon. Wasn't the plan to remove
that at some time?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
2003-03-03 21:12 ` Andrew Morton
@ 2003-03-03 21:24 ` Dave McCracken
2003-03-03 21:35 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: Dave McCracken @ 2003-03-03 21:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
--On Monday, March 03, 2003 13:12:10 -0800 Andrew Morton <akpm@digeo.com>
wrote:
> It is. All callers which need to be 100% accurate are under
> pte_chain_lock().
Hmm, good point. Some places may not need perfect accuracy. Also, if it
gives a false positive it means someone else is doing an atomic op on it,
so it's likely to be in transition to/from true anyway.
Ok, you've convinced me. Please ignore the patch. I'll hang onto it in
case we get proved wrong at some point.
Dave
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
2003-03-03 21:24 ` Dave McCracken
@ 2003-03-03 21:35 ` Andrew Morton
2003-03-03 21:52 ` Dave McCracken
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-03-03 21:35 UTC (permalink / raw)
To: Dave McCracken; +Cc: linux-kernel, linux-mm
Dave McCracken <dmccr@us.ibm.com> wrote:
>
>
> --On Monday, March 03, 2003 13:12:10 -0800 Andrew Morton <akpm@digeo.com>
> wrote:
>
> > It is. All callers which need to be 100% accurate are under
> > pte_chain_lock().
>
> Hmm, good point. Some places may not need perfect accuracy. Also, if it
> gives a false positive it means someone else is doing an atomic op on it,
> so it's likely to be in transition to/from true anyway.
>
> Ok, you've convinced me. Please ignore the patch. I'll hang onto it in
> case we get proved wrong at some point.
We do need a patch I think. page_mapped() is still assuming that an
all-bits-zero atomic_t corresponds to a zero-value atomic_t.
This does appear to be true for all supported architectures, but it's a bit
grubby.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
2003-03-03 21:35 ` Andrew Morton
@ 2003-03-03 21:52 ` Dave McCracken
2003-03-03 22:15 ` Andrew Morton
0 siblings, 1 reply; 24+ messages in thread
From: Dave McCracken @ 2003-03-03 21:52 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
--On Monday, March 03, 2003 13:35:39 -0800 Andrew Morton <akpm@digeo.com>
wrote:
> We do need a patch I think. page_mapped() is still assuming that an
> all-bits-zero atomic_t corresponds to a zero-value atomic_t.
>
> This does appear to be true for all supported architectures, but it's a
> bit grubby.
If that's ever not true then we need extra code to initialize/rezero that
field, since we assume it's zero on alloc, and the pte_chain code also
assumes it's zero for a new page.
Dave
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2.5.63] Teach page_mapped about the anon flag
2003-03-03 21:52 ` Dave McCracken
@ 2003-03-03 22:15 ` Andrew Morton
2003-03-04 18:32 ` [PATCH 2.5.63] Make objrmap mapcount non-atomic Dave McCracken
0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2003-03-03 22:15 UTC (permalink / raw)
To: Dave McCracken; +Cc: linux-kernel, linux-mm
Dave McCracken <dmccr@us.ibm.com> wrote:
>
>
> --On Monday, March 03, 2003 13:35:39 -0800 Andrew Morton <akpm@digeo.com>
> wrote:
>
> > We do need a patch I think. page_mapped() is still assuming that an
> > all-bits-zero atomic_t corresponds to a zero-value atomic_t.
> >
> > This does appear to be true for all supported architectures, but it's a
> > bit grubby.
>
> If that's ever not true then we need extra code to initialize/rezero that
> field, since we assume it's zero on alloc, and the pte_chain code also
> assumes it's zero for a new page.
Well why not make mapcount an "int" and move the places where it is modified
inside pte_chain_lock()?
That does not increase the number of atomic operations, and it makes me stop
wondering if this:
if (atomic_read(&page->pte.mapcount) == 0)
inc_page_state(nr_mapped);
is racy ;)
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2.5.63] Make objrmap mapcount non-atomic
2003-03-03 22:15 ` Andrew Morton
@ 2003-03-04 18:32 ` Dave McCracken
0 siblings, 0 replies; 24+ messages in thread
From: Dave McCracken @ 2003-03-04 18:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 815 bytes --]
--On Monday, March 03, 2003 14:15:18 -0800 Andrew Morton <akpm@digeo.com>
wrote:
> Well why not make mapcount an "int" and move the places where it is
> modified inside pte_chain_lock()?
>
> That does not increase the number of atomic operations, and it makes me
> stop wondering if this:
>
> if (atomic_read(&page->pte.mapcount) == 0)
> inc_page_state(nr_mapped);
>
> is racy ;)
That would be entirely too easy a solution :)
You're entirely right, of course. Here's the patch that makes it an int
instead of atomic, with the appropriate locking.
Dave
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
[-- Attachment #2: objfix-2.5.63-2.diff --]
[-- Type: text/plain, Size: 2270 bytes --]
--- 2.5.63-objrmap/./include/linux/mm.h 2003-02-27 15:58:34.000000000 -0600
+++ 2.5.63-objfix/./include/linux/mm.h 2003-03-03 16:26:21.000000000 -0600
@@ -172,7 +172,7 @@
struct pte_chain *chain;/* Reverse pte mapping pointer.
* protected by PG_chainlock */
pte_addr_t direct;
- atomic_t mapcount;
+ int mapcount;
} pte;
unsigned long private; /* mapping-private opaque data */
--- 2.5.63-objrmap/./mm/rmap.c 2003-02-28 14:19:10.000000000 -0600
+++ 2.5.63-objfix/./mm/rmap.c 2003-03-03 20:08:43.000000000 -0600
@@ -144,7 +144,7 @@
struct vm_area_struct *vma;
int referenced = 0;
- if (atomic_read(&page->pte.mapcount) == 0)
+ if (!page->pte.mapcount)
return 0;
if (!mapping)
@@ -243,19 +243,20 @@
if (!pfn_valid(page_to_pfn(page)) || PageReserved(page))
return pte_chain;
+ pte_chain_lock(page);
+
if (!PageAnon(page)) {
if (!page->mapping)
BUG();
if (PageSwapCache(page))
BUG();
- if (atomic_read(&page->pte.mapcount) == 0)
+ if (!page->pte.mapcount)
inc_page_state(nr_mapped);
- atomic_inc(&page->pte.mapcount);
+ page->pte.mapcount++;
+ pte_chain_unlock(page);
return pte_chain;
}
- pte_chain_lock(page);
-
#ifdef DEBUG_RMAP
/*
* This stuff needs help to get up to highmem speed.
@@ -342,20 +343,22 @@
if (!page_mapped(page))
return; /* remap_page_range() from a driver? */
+ pte_chain_lock(page);
+
if (!PageAnon(page)) {
if (!page->mapping)
BUG();
if (PageSwapCache(page))
BUG();
- if (atomic_read(&page->pte.mapcount) == 0)
+ if (!page->pte.mapcount)
BUG();
- if (atomic_dec_and_test(&page->pte.mapcount))
+ page->pte.mapcount--;
+ if (!page->pte.mapcount)
dec_page_state(nr_mapped);
+ pte_chain_unlock(page);
return;
}
- pte_chain_lock(page);
-
if (PageDirect(page)) {
if (page->pte.direct == pte_paddr) {
page->pte.direct = 0;
@@ -471,11 +474,11 @@
if (pte_dirty(pteval))
set_page_dirty(page);
- if (atomic_read(&page->pte.mapcount) == 0)
+ if (!page->pte.mapcount)
BUG();
mm->rss--;
- atomic_dec(&page->pte.mapcount);
+ page->pte.mapcount--;
page_cache_release(page);
out_unmap:
@@ -516,7 +519,7 @@
goto out;
}
- if (atomic_read(&page->pte.mapcount) != 0)
+ if (page->pte.mapcount)
BUG();
out:
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2003-03-04 18:21 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-27 10:59 2.5.63-mm1 Andrew Morton
2003-02-27 21:22 ` Rising io_load results 2.5.63-mm1 Con Kolivas
2003-02-27 21:44 ` Andrew Morton
2003-02-27 22:01 ` Dave McCracken
2003-02-27 22:24 ` Andrew Morton
2003-03-03 21:06 ` [PATCH 2.5.63] Teach page_mapped about the anon flag Dave McCracken
2003-03-03 21:12 ` Andrew Morton
2003-03-03 21:24 ` Dave McCracken
2003-03-03 21:35 ` Andrew Morton
2003-03-03 21:52 ` Dave McCracken
2003-03-03 22:15 ` Andrew Morton
2003-03-04 18:32 ` [PATCH 2.5.63] Make objrmap mapcount non-atomic Dave McCracken
2003-02-27 23:56 ` Rising io_load results Re: 2.5.63-mm1 Con Kolivas
2003-02-28 0:06 ` Andrew Morton
2003-02-28 0:28 ` Con Kolivas
2003-02-28 7:46 ` Duncan Sands
2003-02-28 8:06 ` Andrew Morton
2003-02-28 12:48 ` Hugh Dickins
2003-02-28 15:56 ` Dave McCracken
2003-02-28 0:17 ` 2.5.63-mm1 Ed Tomlinson
2003-02-28 0:46 ` 2.5.63-mm1 Andrew Morton
2003-02-28 12:16 ` 2.5.63-mm1 steven roemen
2003-02-28 12:24 ` 2.5.63-mm1 Andrew Morton
[not found] ` <3E5F7DAD.2080306@cyberone.com.au>
[not found] ` <200302282227.56311.tomlins@cam.org>
2003-03-01 15:04 ` [PATCH] tiobench on UP and ptg-D3-mm1 Ed Tomlinson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).