linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Feedback on preemptible kernel patch
@ 2001-09-08  5:22 grue
  2001-09-08  5:47 ` Robert Love
  0 siblings, 1 reply; 32+ messages in thread
From: grue @ 2001-09-08  5:22 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

I am running 2.4.10-pre4 with the rml-preempt patch.
built and rebooted this on my workstation yesterday when I saw the patch
posted and it's been working great.

I'm running it on a dual P3-550 with 256MB ram with CONFIG_SMP and no
problems whatsoever although it hasn't been worked 'real' hard yet.
(load no higher than 4) ;)

Figured I'd give some positive feedback about the patch. If you want,
Rob, I could run some benchmarks on this against an unpatched kernel, or
if you have some ideas for me to really stress this thing to see if it
breaks, let me know.

--
Gregory Finch



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-08  5:22 Feedback on preemptible kernel patch grue
@ 2001-09-08  5:47 ` Robert Love
  2001-09-08 17:33   ` Arjan Filius
  2001-09-09 18:57   ` grue
  0 siblings, 2 replies; 32+ messages in thread
From: Robert Love @ 2001-09-08  5:47 UTC (permalink / raw)
  To: grue; +Cc: linux-kernel

On Sat, 2001-09-08 at 01:22, grue@lakesweb.com wrote:
> I am running 2.4.10-pre4 with the rml-preempt patch.
> built and rebooted this on my workstation yesterday when I saw the patch
> posted and it's been working great.

_Very_ glad to hear this.

> I'm running it on a dual P3-550 with 256MB ram with CONFIG_SMP and no
> problems whatsoever although it hasn't been worked 'real' hard yet.
> (load no higher than 4) ;)

I am surprised you have no problems with CONFIG_SMP=y &&
CONFIG_PREEMPT=y.  Promising.

> Figured I'd give some positive feedback about the patch. If you want,
> Rob, I could run some benchmarks on this against an unpatched kernel, or
> if you have some ideas for me to really stress this thing to see if it
> breaks, let me know.

I would love this.  We could use some SMP datapoints badly.

You can run some of the tests made especially for testing latency, like
an audio benchmark.  One such test is at
http://www.gardena.net/benno/linux/latencytest-0.42.tar.gz

Obviously a heavily tasked I/O benchmark is useful, I have used dbench
in the past (ftp://samba.org/pub/tridge/dbench/) (try it with 16
processes or so), but I have been told I should use bonnie.

You can time normal things, too. `time make dep clean bzImage' is always
a favorite :)

Under UP, enabling preemption helps all of this (the recent linuxdevices
article on preemption shows a 30x decrease in latency.).  Both myself
and Nigel have run dbench with good results for -16.  See
http://kpreempt.sourceforge.net/ for some more.

whatever you can... anyhow, thank you for the positive feedback.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-08  5:47 ` Robert Love
@ 2001-09-08 17:33   ` Arjan Filius
  2001-09-08 18:22     ` safemode
                       ` (3 more replies)
  2001-09-09 18:57   ` grue
  1 sibling, 4 replies; 32+ messages in thread
From: Arjan Filius @ 2001-09-08 17:33 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

Hello Robert,


I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
But it seems to hit highmem (see below) (i do have 1.5GB ram)
2.4.10-pre4 plain runs just fine.

With the kernel option mem=850M the patched kernel boots an seems to run
fine. However i didn't do any stress testing yet, but i still notice
hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
Athlon, and removing for example a _large_ file (reiser-on-lvm).

My syslog output with highmem:

Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080bdd5c   ecx: f5764260   edx: f4d4c000
Sep  8 18:10:16 sjoerd kernel: esi: c26cca60   edi: ffffffff   ebp: c26ca134   esp: f4d4dec8
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process S11dhcpd (pid: 2507, stackpage=f4d4d000)
Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 080bdd5c f5805f00 ffffffff 00000001 c012437d
Sep  8 18:10:16 sjoerd kernel:        f5805f00 f4d49a00 080bdd5c f4c822f4 55d54065 f4d4c000 f4d49a00 f5805f00
Sep  8 18:10:16 sjoerd kernel:        f5805f1c c0111a17 f5805f00 f4d49a00 080bdd5c 00000001 f4d4c000 00000007
Sep  8 18:10:16 sjoerd kernel: Call Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64] [do_exit+595/640]
Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69
Sep  8 18:10:16 sjoerd kernel: MAC unknown INTRUDERS?? (tf) IN=eth0 OUT= MAC= SRC=192.168.0.5 DST=192.168.0.255 LEN=241 TOS=0x02 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=138 DPT=138 LEN=221
Sep  8 18:10:16 sjoerd kernel: MAC unknown INTRUDERS?? (tf) IN=eth0 OUT= MAC= SRC=192.168.0.5 DST=192.168.0.255 LEN=96 TOS=0x02 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=137 DPT=137 LEN=76
Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_anonymous_page+130/368]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010286
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080c501c   ecx: f5764260   edx: f4d4c000
Sep  8 18:10:16 sjoerd kernel: esi: c26c4fec   edi: f5805f00   ebp: f4d497c0   esp: f4d4dea0
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process dhcpd (pid: 2508, stackpage=f4d4d000)
Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 080c501c f4d497c0 f5805f00 00000001 c012420f
Sep  8 18:10:16 sjoerd kernel:        f5805f00 f4d497c0 f4c63314 00000001 080c501c 080c501c f5805f00 ffffffff
Sep  8 18:10:16 sjoerd kernel:        00000001 c012434e f5805f00 f4d497c0 080c501c 00000001 f4c63314 f4d4c000
Sep  8 18:10:16 sjoerd kernel: Call Trace: [do_no_page+47/272] [handle_mm_fault+94/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [do_munmap+86/640]
Sep  8 18:10:16 sjoerd kernel:    [fput+116/224] [do_brk+176/368] [sys_brk+187/240] [error_code+52/64]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69
Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_anonymous_page+130/368]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 40017000   ecx: f5735f7c   edx: f4c88000
Sep  8 18:10:16 sjoerd kernel: esi: c26c9298   edi: f5805d80   ebp: f4c945c0   esp: f4c89dc8
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process python (pid: 2456, stackpage=f4c89000)
Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 40017000 f4c945c0 f5805d80 00000001 c012420f
Sep  8 18:10:16 sjoerd kernel:        f5805d80 f4c945c0 f4c9c05c 00000001 40017000 40017000 f5805d80 ffffffff
Sep  8 18:10:16 sjoerd kernel:        00000001 c012434e f5805d80 f4c945c0 40017000 00000001 f4c9c05c f4c88000
Sep  8 18:10:16 sjoerd kernel: Call Trace: [do_no_page+47/272] [handle_mm_fault+94/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [block_read_full_page+240/688]
Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64] [file_read_actor+113/224] [do_generic_file_read+505/1344] [generic_file_read+99/128] [file_read_actor+0/224] [sys_read+150/208]
Sep  8 18:10:16 sjoerd kernel:    [system_call+51/56]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69
Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: bffff960   ecx: f5764260   edx: f4ce4000
Sep  8 18:10:16 sjoerd kernel: esi: c26d04d0   edi: ffffffff   ebp: c26ca4a8   esp: f4ce5ec8
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process rc (pid: 2514, stackpage=f4ce5000)
Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f bffff960 f5805780 ffffffff 00000001 c012437d
Sep  8 18:10:16 sjoerd kernel:        f5805780 f4c54dc0 bffff960 f4ca8ffc 55e30065 f4ce4000 f4c54dc0 f5805780
Sep  8 18:10:16 sjoerd kernel:        f580579c c0111a17 f5805780 f4c54dc0 bffff960 00000001 f4ce4000 00000007
Sep  8 18:10:16 sjoerd kernel: Call Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64] [do_exit+595/640]
Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69
Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[filemap_nopage+300/1344]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 00000001   ecx: f5764260   edx: f4c3e000
Sep  8 18:10:16 sjoerd kernel: esi: c297ac20   edi: 00000015   ebp: c270df9c   esp: f4c3fb30
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process ncpserv (pid: 2513, stackpage=f4c3f000)
Sep  8 18:10:16 sjoerd kernel: Stack: c02110b2 c0211160 0000005f 40016000 f4c54f00 f4c62140 00000001 00000019
Sep  8 18:10:16 sjoerd kernel:        f7af9960 f74f7a24 f74f7980 f4db9c40 c0124252 f4c54f00 40016000 00000001
Sep  8 18:10:16 sjoerd kernel:        400162a8 f4c62140 ffffffff 00000001 c012434e f4c62140 f4c54f00 400162a8
Sep  8 18:10:16 sjoerd kernel: Call Trace: [do_no_page+114/272] [handle_mm_fault+94/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [file_read_actor+177/224]
Sep  8 18:10:16 sjoerd kernel:    [update_atime+68/80] [do_generic_file_read+1333/1344] [do_munmap+86/640] [update_atime+68/80] [error_code+52/64] [clear_user+46/64]
Sep  8 18:10:16 sjoerd kernel:    [padzero+28/32] [load_elf_interp+619/704] [load_elf_binary+1959/2704] [load_elf_binary+0/2704] [nfsd:__insmod_nfsd_O/lib/modules/2.4.10-pre4/kernel/fs/nfsd/nfsd+-13721617/96] [search_binary_handler+152/496]
Sep  8 18:10:16 sjoerd kernel:    [do_execve+380/496] [do_execve+403/496] [sys_execve+47/96] [system_call+51/56]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69
Sep  8 18:10:16 sjoerd kernel: LOOUT REJECT TCP IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=356 TOS=0x02 PREC=0x00 TTL=64 ID=32512 PROTO=TCP SPT=32775 DPT=15607 WINDOW=32767 RES=0x00 ACK PSH FIN URGP=0
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080b170c   ecx: f4ce4260   edx: f5946000
Sep  8 18:10:16 sjoerd kernel: esi: c26dec2c   edi: ffffffff   ebp: c26ca2cc   esp: f5947ec8
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process rc (pid: 156, stackpage=f5947000)
Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 080b170c f752a080 ffffffff 00000001 c012437d
Sep  8 18:10:16 sjoerd kernel:        f752a080 f75282c0 080b170c f59de2c4 56197065 f5946000 f75282c0 f752a080
Sep  8 18:10:16 sjoerd kernel:        f752a09c c0111a17 f752a080 f75282c0 080b170c 00000001 f5946000 00000007
Sep  8 18:10:16 sjoerd kernel: Call Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [copy_thread+136/160] [do_fork+1619/1792]
Sep  8 18:10:16 sjoerd kernel:    [write_chan+0/544] [sys_fork+20/32] [error_code+52/64]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69
Sep  8 18:10:16 sjoerd kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
Sep  8 18:10:16 sjoerd kernel: CPU:    0
Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080b04e0   ecx: f5735f7c   edx: c299a000
Sep  8 18:10:16 sjoerd kernel: esi: c2962850   edi: ffffffff   ebp: c292d82c   esp: c299bec8
Sep  8 18:10:16 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep  8 18:10:16 sjoerd kernel: Process init (pid: 1, stackpage=c299b000)
Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 080b04e0 f752a140 ffffffff 00000001 c012437d
Sep  8 18:10:16 sjoerd kernel:        f752a140 f7528180 080b04e0 f751a2c0 5f910065 c299a000 f7528180 f752a140
Sep  8 18:10:16 sjoerd kernel:        f752a15c c0111a17 f752a140 f7528180 080b04e0 00000001 c299a000 00000007
Sep  8 18:10:16 sjoerd kernel: Call Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136] [do_page_fault+0/1136] [copy_thread+136/160] [do_fork+1619/1792]
Sep  8 18:10:16 sjoerd kernel:    [sys_fork+20/32] [error_code+52/64]
Sep  8 18:10:16 sjoerd kernel:
Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0 2b 05 ac ba 2a c0 69

-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-08 17:33   ` Arjan Filius
@ 2001-09-08 18:22     ` safemode
  2001-09-08 20:58     ` [SMP lock BUG?] " Roger Larsson
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: safemode @ 2001-09-08 18:22 UTC (permalink / raw)
  To: Arjan Filius, Robert Love; +Cc: linux-kernel

On Saturday 08 September 2001 13:33, Arjan Filius wrote:
> Hello Robert,
>
>
> I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> But it seems to hit highmem (see below) (i do have 1.5GB ram)
> 2.4.10-pre4 plain runs just fine.
>
> With the kernel option mem=850M the patched kernel boots an seems to run
> fine. However i didn't do any stress testing yet, but i still notice
> hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
> Athlon, and removing for example a _large_ file (reiser-on-lvm).

Have you tried running it without an altered priority level (altered by you 
that is)?  I run freeamp without any special nice level and running it while 
dbench is pushing the system into 25 load or something else is throttling the 
system at 100% cpu and hogging 200MB of ram - it never skips.  With or 
without the patch.  Some programs actually suffer from changing from default 
priority.  i see that with hdparm -t sometimes.  This is on a 850Mhz athlon, 
by the way.  I find mpg123 to be slower than most other players.  Try some of 
them.  If you're console only, freeamp is both a console and gui mp3 player.  
I imagine xmms would be pretty fast too when you turn off any extras.


> My syslog output with highmem:
>
> Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080bdd5c   ecx:
> f5764260   edx: f4d4c000 Sep  8 18:10:16 sjoerd kernel: esi: c26cca60  
> edi: ffffffff   ebp: c26ca134   esp: f4d4dec8 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process S11dhcpd (pid: 2507,
> stackpage=f4d4d000) Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0
> 0000005f 080bdd5c f5805f00 ffffffff 00000001 c012437d Sep  8 18:10:16
> sjoerd kernel:        f5805f00 f4d49a00 080bdd5c f4c822f4 55d54065 f4d4c000
> f4d49a00 f5805f00 Sep  8 18:10:16 sjoerd kernel:        f5805f1c c0111a17
> f5805f00 f4d49a00 080bdd5c 00000001 f4d4c000 00000007 Sep  8 18:10:16
> sjoerd kernel: Call Trace: [handle_mm_fault+141/224]
> [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64]
> [do_exit+595/640] Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]
> Sep  8 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69 Sep  8 18:10:16 sjoerd kernel: MAC unknown INTRUDERS??
> (tf) IN=eth0 OUT= MAC= SRC=192.168.0.5 DST=192.168.0.255 LEN=241 TOS=0x02
> PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=138 DPT=138 LEN=221 Sep  8 18:10:16
> sjoerd kernel: MAC unknown INTRUDERS?? (tf) IN=eth0 OUT= MAC=
> SRC=192.168.0.5 DST=192.168.0.255 LEN=96 TOS=0x02 PREC=0x00 TTL=64 ID=0 DF
> PROTO=UDP SPT=137 DPT=137 LEN=76 Sep  8 18:10:16 sjoerd kernel: kernel BUG
> at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16
> sjoerd kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_anonymous_page+130/368]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010286
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080c501c   ecx:
> f5764260   edx: f4d4c000 Sep  8 18:10:16 sjoerd kernel: esi: c26c4fec  
> edi: f5805f00   ebp: f4d497c0   esp: f4d4dea0 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process dhcpd (pid: 2508,
> stackpage=f4d4d000) Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0
> 0000005f 080c501c f4d497c0 f5805f00 00000001 c012420f Sep  8 18:10:16
> sjoerd kernel:        f5805f00 f4d497c0 f4c63314 00000001 080c501c 080c501c
> f5805f00 ffffffff Sep  8 18:10:16 sjoerd kernel:        00000001 c012434e
> f5805f00 f4d497c0 080c501c 00000001 f4c63314 f4d4c000 Sep  8 18:10:16
> sjoerd kernel: Call Trace: [do_no_page+47/272] [handle_mm_fault+94/224]
> [do_page_fault+375/1136] [do_page_fault+0/1136] [do_munmap+86/640] Sep  8
> 18:10:16 sjoerd kernel:    [fput+116/224] [do_brk+176/368]
> [sys_brk+187/240] [error_code+52/64] Sep  8 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69 Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_anonymous_page+130/368]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 40017000   ecx:
> f5735f7c   edx: f4c88000 Sep  8 18:10:16 sjoerd kernel: esi: c26c9298  
> edi: f5805d80   ebp: f4c945c0   esp: f4c89dc8 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process python (pid: 2456,
> stackpage=f4c89000) Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0
> 0000005f 40017000 f4c945c0 f5805d80 00000001 c012420f Sep  8 18:10:16
> sjoerd kernel:        f5805d80 f4c945c0 f4c9c05c 00000001 40017000 40017000
> f5805d80 ffffffff Sep  8 18:10:16 sjoerd kernel:        00000001 c012434e
> f5805d80 f4c945c0 40017000 00000001 f4c9c05c f4c88000 Sep  8 18:10:16
> sjoerd kernel: Call Trace: [do_no_page+47/272] [handle_mm_fault+94/224]
> [do_page_fault+375/1136] [do_page_fault+0/1136]
> [block_read_full_page+240/688] Sep  8 18:10:16 sjoerd kernel:   
> [error_code+52/64] [file_read_actor+113/224]
> [do_generic_file_read+505/1344] [generic_file_read+99/128]
> [file_read_actor+0/224] [sys_read+150/208] Sep  8 18:10:16 sjoerd kernel:  
>  [system_call+51/56]
> Sep  8 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69 Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> kernel: kernel BUG at /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95!
> Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: bffff960   ecx:
> f5764260   edx: f4ce4000 Sep  8 18:10:16 sjoerd kernel: esi: c26d04d0  
> edi: ffffffff   ebp: c26ca4a8   esp: f4ce5ec8 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process rc (pid: 2514, stackpage=f4ce5000)
> Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f bffff960
> f5805780 ffffffff 00000001 c012437d Sep  8 18:10:16 sjoerd kernel:       
> f5805780 f4c54dc0 bffff960 f4ca8ffc 55e30065 f4ce4000 f4c54dc0 f5805780 Sep
>  8 18:10:16 sjoerd kernel:        f580579c c0111a17 f5805780 f4c54dc0
> bffff960 00000001 f4ce4000 00000007 Sep  8 18:10:16 sjoerd kernel: Call
> Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136]
> [do_page_fault+0/1136] [__mmdrop+58/64] [do_exit+595/640] Sep  8 18:10:16
> sjoerd kernel:    [error_code+52/64]
> Sep  8 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69 Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[filemap_nopage+300/1344]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 00000001   ecx:
> f5764260   edx: f4c3e000 Sep  8 18:10:16 sjoerd kernel: esi: c297ac20  
> edi: 00000015   ebp: c270df9c   esp: f4c3fb30 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process ncpserv (pid: 2513,
> stackpage=f4c3f000) Sep  8 18:10:16 sjoerd kernel: Stack: c02110b2 c0211160
> 0000005f 40016000 f4c54f00 f4c62140 00000001 00000019 Sep  8 18:10:16
> sjoerd kernel:        f7af9960 f74f7a24 f74f7980 f4db9c40 c0124252 f4c54f00
> 40016000 00000001 Sep  8 18:10:16 sjoerd kernel:        400162a8 f4c62140
> ffffffff 00000001 c012434e f4c62140 f4c54f00 400162a8 Sep  8 18:10:16
> sjoerd kernel: Call Trace: [do_no_page+114/272] [handle_mm_fault+94/224]
> [do_page_fault+375/1136] [do_page_fault+0/1136] [file_read_actor+177/224]
> Sep  8 18:10:16 sjoerd kernel:    [update_atime+68/80]
> [do_generic_file_read+1333/1344] [do_munmap+86/640] [update_atime+68/80]
> [error_code+52/64] [clear_user+46/64] Sep  8 18:10:16 sjoerd kernel:   
> [padzero+28/32] [load_elf_interp+619/704] [load_elf_binary+1959/2704]
> [load_elf_binary+0/2704]
> [nfsd:__insmod_nfsd_O/lib/modules/2.4.10-pre4/kernel/fs/nfsd/nfsd+-13721617
>/96] [search_binary_handler+152/496] Sep  8 18:10:16 sjoerd kernel:   
> [do_execve+380/496] [do_execve+403/496] [sys_execve+47/96]
> [system_call+51/56] Sep  8 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69 Sep  8 18:10:16 sjoerd kernel: LOOUT REJECT TCP IN=
> OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=356 TOS=0x02 PREC=0x00 TTL=64
> ID=32512 PROTO=TCP SPT=32775 DPT=15607 WINDOW=32767 RES=0x00 ACK PSH FIN
> URGP=0 Sep  8 18:10:16 sjoerd kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080b170c   ecx:
> f4ce4260   edx: f5946000 Sep  8 18:10:16 sjoerd kernel: esi: c26dec2c  
> edi: ffffffff   ebp: c26ca2cc   esp: f5947ec8 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process rc (pid: 156, stackpage=f5947000)
> Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 080b170c
> f752a080 ffffffff 00000001 c012437d Sep  8 18:10:16 sjoerd kernel:       
> f752a080 f75282c0 080b170c f59de2c4 56197065 f5946000 f75282c0 f752a080 Sep
>  8 18:10:16 sjoerd kernel:        f752a09c c0111a17 f752a080 f75282c0
> 080b170c 00000001 f5946000 00000007 Sep  8 18:10:16 sjoerd kernel: Call
> Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136]
> [do_page_fault+0/1136] [copy_thread+136/160] [do_fork+1619/1792] Sep  8
> 18:10:16 sjoerd kernel:    [write_chan+0/544] [sys_fork+20/32]
> [error_code+52/64] Sep  8 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69 Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> Sep  8 18:10:16 sjoerd kernel: EFLAGS: 00010282
> Sep  8 18:10:16 sjoerd kernel: eax: 00000043   ebx: 080b04e0   ecx:
> f5735f7c   edx: c299a000 Sep  8 18:10:16 sjoerd kernel: esi: c2962850  
> edi: ffffffff   ebp: c292d82c   esp: c299bec8 Sep  8 18:10:16 sjoerd
> kernel: ds: 0018   es: 0018   ss: 0018
> Sep  8 18:10:16 sjoerd kernel: Process init (pid: 1, stackpage=c299b000)
> Sep  8 18:10:16 sjoerd kernel: Stack: c0210bd2 c0210cc0 0000005f 080b04e0
> f752a140 ffffffff 00000001 c012437d Sep  8 18:10:16 sjoerd kernel:       
> f752a140 f7528180 080b04e0 f751a2c0 5f910065 c299a000 f7528180 f752a140 Sep
>  8 18:10:16 sjoerd kernel:        f752a15c c0111a17 f752a140 f7528180
> 080b04e0 00000001 c299a000 00000007 Sep  8 18:10:16 sjoerd kernel: Call
> Trace: [handle_mm_fault+141/224] [do_page_fault+375/1136]
> [do_page_fault+0/1136] [copy_thread+136/160] [do_fork+1619/1792] Sep  8
> 18:10:16 sjoerd kernel:    [sys_fork+20/32] [error_code+52/64] Sep  8
> 18:10:16 sjoerd kernel:
> Sep  8 18:10:16 sjoerd kernel: Code: 0f 0b 83 c4 0c 8b 15 e8 2f 2a c0 89 f0
> 2b 05 ac ba 2a c0 69

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-08 17:33   ` Arjan Filius
  2001-09-08 18:22     ` safemode
@ 2001-09-08 20:58     ` Roger Larsson
  2001-09-08 22:18       ` Arjan Filius
  2001-09-09 14:55       ` george anzinger
  2001-09-09  4:40     ` Robert Love
  2001-09-09 17:09     ` Robert Love
  3 siblings, 2 replies; 32+ messages in thread
From: Roger Larsson @ 2001-09-08 20:58 UTC (permalink / raw)
  To: Arjan Filius, Robert Love; +Cc: linux-kernel, linux-mm

Hi,

This is interesting. [Assumes UP Athlon - correct]
Note that all BUGs out in highmem.h:95 (kmap_atomic)
and that test is only on if you have enabled HIGHMEM_DEBUG
[my analyze is done with a 2.4.10-pre2 kernel, but I checked with
later patches and I do not think they fix it either...]

The preemptive kernel puts more SMP stress on the kernel than
running with multiple CPUs.

So this might be a potential bug in the kernel proper, running with
a SMP computer.

If I understand the bug correctly, a process gets a page fault.
Starts to map in the page. But before the final part it checks -
and the page is already there!!! Correct?

On Saturday den 8 September 2001 19:33, Arjan Filius wrote:
> Hello Robert,
>
>
> I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> But it seems to hit highmem (see below) (i do have 1.5GB ram)
> 2.4.10-pre4 plain runs just fine.
>
> With the kernel option mem=850M the patched kernel boots an seems to run
> fine. However i didn't do any stress testing yet, but i still notice
> hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
> Athlon, and removing for example a _large_ file (reiser-on-lvm).
>
> My syslog output with highmem:
>
> Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> kernel: invalid operand: 0000
> Sep  8 18:10:16 sjoerd kernel: CPU:    0
> Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> [- - -]
> sjoerd kernel: Call Trace: [handle_mm_fault+141/224]
> [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64]
> [do_exit+595/640] Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]

Lets look at this example. You need to add some inline functions...

handle_mm_fault
	takes the mm->page_table_lock [this should prevent reschedules]
	allocs pmd
	allocs pte
	handle_pte_fault(...)
handle_pte_fault [inline, most likely path]
	pte is present
	it is a write access
	but the pte is not writeable  - call do_wp_page
do_wp_page
	plays some games with the lock...
	finally calls copy_cow_page [inline] with the page_table_lock
	UNLOCKED!
copy_cow_page
	calls clear_user_highpage or copy_user_highpage
both clear_user_highpage and copy_user_highpage
	calls kmap_atomic
kmap_atomic
	page is a highmem page
	but during the time this process was unlocked some other
	thread has allocated the page in question... BUG out.

So somewere between the UNLOCK (might be a lot later) and the
BUG test in kmap_atomic the process running in kernel got preempted.
(most likely during the page copy since it will take some time)

Another process (thread) started to run - hit the same page fault
but succeeded in its alloc.

Back to the first process it continues, finally checks - the page
is there... and BUGS.

Note that this can happen in a pure SMP kernel.

But let the processes (threads) run on two CPUs. And let the
first get an interrupt/bh after unlock - the other can pass
and add the page before the first one can continue - same
result!

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-08 20:58     ` [SMP lock BUG?] " Roger Larsson
@ 2001-09-08 22:18       ` Arjan Filius
  2001-09-09 14:55       ` george anzinger
  1 sibling, 0 replies; 32+ messages in thread
From: Arjan Filius @ 2001-09-08 22:18 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Robert Love, linux-kernel, linux-mm

Hello Roger,

On Sat, 8 Sep 2001, Roger Larsson wrote:

> Hi,
>
> This is interesting. [Assumes UP Athlon - correct]

UP Athlon, and compiled as UP (as always).
I haven't tested my system with an SMP kernel for a long while.



> Note that all BUGs out in highmem.h:95 (kmap_atomic)
> and that test is only on if you have enabled HIGHMEM_DEBUG
It seems to be on indeed.

> [my analyze is done with a 2.4.10-pre2 kernel, but I checked with
> later patches and I do not think they fix it either...]
>
> The preemptive kernel puts more SMP stress on the kernel than
> running with multiple CPUs.
>
> So this might be a potential bug in the kernel proper, running with
> a SMP computer.

>
> If I understand the bug correctly, a process gets a page fault.
> Starts to map in the page. But before the final part it checks -
> and the page is already there!!! Correct?

ehh.. Should compiling SMP on UP (just for test) trigger this?


Greatings,


>
> On Saturday den 8 September 2001 19:33, Arjan Filius wrote:
> > Hello Robert,
> >
> >
> > I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> > But it seems to hit highmem (see below) (i do have 1.5GB ram)
> > 2.4.10-pre4 plain runs just fine.
> >
> > With the kernel option mem=850M the patched kernel boots an seems to run
> > fine. However i didn't do any stress testing yet, but i still notice
> > hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
> > Athlon, and removing for example a _large_ file (reiser-on-lvm).
> >
> > My syslog output with highmem:
> >
> > Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> > /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> > kernel: invalid operand: 0000
> > Sep  8 18:10:16 sjoerd kernel: CPU:    0
> > Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> > [- - -]
> > sjoerd kernel: Call Trace: [handle_mm_fault+141/224]
> > [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64]
> > [do_exit+595/640] Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]
>
> Lets look at this example. You need to add some inline functions...
>
> handle_mm_fault
> 	takes the mm->page_table_lock [this should prevent reschedules]
> 	allocs pmd
> 	allocs pte
> 	handle_pte_fault(...)
> handle_pte_fault [inline, most likely path]
> 	pte is present
> 	it is a write access
> 	but the pte is not writeable  - call do_wp_page
> do_wp_page
> 	plays some games with the lock...
> 	finally calls copy_cow_page [inline] with the page_table_lock
> 	UNLOCKED!
> copy_cow_page
> 	calls clear_user_highpage or copy_user_highpage
> both clear_user_highpage and copy_user_highpage
> 	calls kmap_atomic
> kmap_atomic
> 	page is a highmem page
> 	but during the time this process was unlocked some other
> 	thread has allocated the page in question... BUG out.
>
> So somewere between the UNLOCK (might be a lot later) and the
> BUG test in kmap_atomic the process running in kernel got preempted.
> (most likely during the page copy since it will take some time)
>
> Another process (thread) started to run - hit the same page fault
> but succeeded in its alloc.
>
> Back to the first process it continues, finally checks - the page
> is there... and BUGS.
>
> Note that this can happen in a pure SMP kernel.
>
> But let the processes (threads) run on two CPUs. And let the
> first get an interrupt/bh after unlock - the other can pass
> and add the page before the first one can continue - same
> result!
>
> /RogerL
>
>

-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-08 17:33   ` Arjan Filius
  2001-09-08 18:22     ` safemode
  2001-09-08 20:58     ` [SMP lock BUG?] " Roger Larsson
@ 2001-09-09  4:40     ` Robert Love
  2001-09-09 17:09     ` Robert Love
  3 siblings, 0 replies; 32+ messages in thread
From: Robert Love @ 2001-09-09  4:40 UTC (permalink / raw)
  To: Arjan Filius; +Cc: linux-kernel

On Sat, 2001-09-08 at 13:33, Arjan Filius wrote:
> I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> But it seems to hit highmem (see below) (i do have 1.5GB ram)
> 2.4.10-pre4 plain runs just fine.

going through this thread and now looking at the highmem code, it is
clear highmem is not going to be preempt safe.

until Nigel or I (or someone?) can go through it all and add appropriate
locks, its a bomb waiting to blow.

thank you for the feedback...stay tuned.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-08 20:58     ` [SMP lock BUG?] " Roger Larsson
  2001-09-08 22:18       ` Arjan Filius
@ 2001-09-09 14:55       ` george anzinger
  2001-09-09 22:25         ` Arjan Filius
  1 sibling, 1 reply; 32+ messages in thread
From: george anzinger @ 2001-09-09 14:55 UTC (permalink / raw)
  To: Roger Larsson; +Cc: Arjan Filius, Robert Love, linux-kernel, linux-mm

If the page it is the correct one, when it is found mapped, the code
should just exit, not BUG() IHMO.

George


Roger Larsson wrote:
> 
> Hi,
> 
> This is interesting. [Assumes UP Athlon - correct]
> Note that all BUGs out in highmem.h:95 (kmap_atomic)
> and that test is only on if you have enabled HIGHMEM_DEBUG
> [my analyze is done with a 2.4.10-pre2 kernel, but I checked with
> later patches and I do not think they fix it either...]
> 
> The preemptive kernel puts more SMP stress on the kernel than
> running with multiple CPUs.
> 
> So this might be a potential bug in the kernel proper, running with
> a SMP computer.
> 
> If I understand the bug correctly, a process gets a page fault.
> Starts to map in the page. But before the final part it checks -
> and the page is already there!!! Correct?
> 
> On Saturday den 8 September 2001 19:33, Arjan Filius wrote:
> > Hello Robert,
> >
> >
> > I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> > But it seems to hit highmem (see below) (i do have 1.5GB ram)
> > 2.4.10-pre4 plain runs just fine.
> >
> > With the kernel option mem=850M the patched kernel boots an seems to run
> > fine. However i didn't do any stress testing yet, but i still notice
> > hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
> > Athlon, and removing for example a _large_ file (reiser-on-lvm).
> >
> > My syslog output with highmem:
> >
> > Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> > /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> > kernel: invalid operand: 0000
> > Sep  8 18:10:16 sjoerd kernel: CPU:    0
> > Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> > [- - -]
> > sjoerd kernel: Call Trace: [handle_mm_fault+141/224]
> > [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64]
> > [do_exit+595/640] Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]
> 
> Lets look at this example. You need to add some inline functions...
> 
> handle_mm_fault
>         takes the mm->page_table_lock [this should prevent reschedules]
>         allocs pmd
>         allocs pte
>         handle_pte_fault(...)
> handle_pte_fault [inline, most likely path]
>         pte is present
>         it is a write access
>         but the pte is not writeable  - call do_wp_page
> do_wp_page
>         plays some games with the lock...
>         finally calls copy_cow_page [inline] with the page_table_lock
>         UNLOCKED!
> copy_cow_page
>         calls clear_user_highpage or copy_user_highpage
> both clear_user_highpage and copy_user_highpage
>         calls kmap_atomic
> kmap_atomic
>         page is a highmem page
>         but during the time this process was unlocked some other
>         thread has allocated the page in question... BUG out.
> 
> So somewere between the UNLOCK (might be a lot later) and the
> BUG test in kmap_atomic the process running in kernel got preempted.
> (most likely during the page copy since it will take some time)
> 
> Another process (thread) started to run - hit the same page fault
> but succeeded in its alloc.
> 
> Back to the first process it continues, finally checks - the page
> is there... and BUGS.
> 
> Note that this can happen in a pure SMP kernel.
> 
> But let the processes (threads) run on two CPUs. And let the
> first get an interrupt/bh after unlock - the other can pass
> and add the page before the first one can continue - same
> result!
> 
> /RogerL
> 
> --
> Roger Larsson
> Skellefteå
> Sweden
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-08 17:33   ` Arjan Filius
                       ` (2 preceding siblings ...)
  2001-09-09  4:40     ` Robert Love
@ 2001-09-09 17:09     ` Robert Love
  2001-09-09 21:07       ` Arjan Filius
  2001-09-09 21:23       ` Arjan Filius
  3 siblings, 2 replies; 32+ messages in thread
From: Robert Love @ 2001-09-09 17:09 UTC (permalink / raw)
  To: iafilius; +Cc: linux-kernel

Arjan,

the following patch was written by Manfred Spraul to fix your highmem
bug.  I haven't had a chance to go over it, but I would like it if you
could test it.  It can't hurt.  Patch it on top of the preempt patch and
enable CONFIG_PREEMPT, CONFIG_HIGHMEM, and CONFIG_HIGHMEM_DEBUG.

let me know what happens...any relevant messages, etc. please pass
along. if it does work, id be curious if they are any slowdowns


--- highmem.h.prev      Sun Sep  9 08:59:04 2001
+++ highmem.h   Sun Sep  9 09:00:07 2001
@@ -88,6 +88,7 @@
        if (page < highmem_start_page)
                return page_address(page);

+       ctx_sw_off();
        idx = type + KM_TYPE_NR*smp_processor_id();
        vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
#if HIGHMEM_DEBUG
@@ -119,6 +120,7 @@
        pte_clear(kmap_pte-idx);
        __flush_tlb_one(vaddr);
#endif
+       ctx_sw_on();
}

#endif /* __KERNEL__ */



-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net
-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-08  5:47 ` Robert Love
  2001-09-08 17:33   ` Arjan Filius
@ 2001-09-09 18:57   ` grue
  2001-09-09 21:44     ` Robert Love
  1 sibling, 1 reply; 32+ messages in thread
From: grue @ 2001-09-09 18:57 UTC (permalink / raw)
  To: rml; +Cc: linux-kernel

Hi Rob,

Just finished running some benchmarks on my workstation.
Dual P3-550, 256MB ram, no swap, both kernels with CONFIG_SMP=y, 440BX,
2-20GB 5400rpm drives.

All benchmarks running in an Eterm on E on XF86-4.1.0-DRI with xmms running to
listen for latency probs. Benchmarks run as root, everything else as regular
user.

linux-2.4.10-pre6 with rml-netdev-random patch and rml-preempt patch (pre5 patches
applied to pre6), with CONFIG_PREEMPT=y

dbench 16
Throughput 23.4608 MB/sec (NB=29.326 MB/sec  234.608 MBit/sec)
Throughput 22.6915 MB/sec (NB=28.3644 MB/sec  226.915 MBit/sec) - .5sec hiccup in xmms
Throughput 20.4314 MB/sec (NB=25.5392 MB/sec  204.314 MBit/sec) - .5sec hiccup in xmms
Throughput 27.2849 MB/sec (NB=34.1061 MB/sec  272.849 MBit/sec) - .5sec hiccup in xmms
Throughput 20.5148 MB/sec (NB=25.6435 MB/sec  205.148 MBit/sec) - 2sec and .5sec in xmms
loadavg around 14

Bonnie
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
         1024  9002 98.0 15893 15.1  6519 10.8  6101 78.6 23330 24.4 104.3  2.2

linux-2.4.10-pre6

dbench 16
Throughput 18.1821 MB/sec (NB=22.7276 MB/sec  181.821 MBit/sec)
Throughput 22.4247 MB/sec (NB=28.0309 MB/sec  224.247 MBit/sec) - .5sec hiccup in xmms
Throughput 20.2662 MB/sec (NB=25.3328 MB/sec  202.662 MBit/sec) - 2sec hiccup in xmms
Throughput 28.4072 MB/sec (NB=35.5089 MB/sec  284.072 MBit/sec) - 3sec hiccup in xmms
Throughput 24.0549 MB/sec (NB=30.0686 MB/sec  240.549 MBit/sec)
loadavg around 14

Bonnie
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
         1024  9173 99.4 16104 14.5  6488 10.2  6139 78.9 23260 22.1 105.1  2.3

There's no real difference in speed with either kernel, the hard drives
seem to be the bottleneck on this system.
-(root@g-box)-(~)-
->hdparm /dev/hda

/dev/hda:
 multcount    = 16 (on)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  1 (on)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 2491/255/63, sectors = 40031712, start = 0

The throughput from bonnie is very close to the max physical throughput
of the drives, so I'm not to concerned that there isn't much of a
difference in speed.

The biggest difference is in the usability of the system under the load.
With 2.4.10-pre6 vanilla, dbench seriously affects the interactivity of
X, as well as causes some long interuptions in xmms. With preempt
enabled, this is much improved. Still some interuptions in xmms, but the
system is still usable, although a little sluggish. On the vanilla
kernel, even bonnie caused a couple of hiccups in xmms, with preempt
enabled, bonnie didn't affect xmms at all. all programs were run without
altering nice values at all.

For a workstation, I like the difference, my system is still usable even
with the load upwards of 14, without preempt, it's like looking at a
couple of screenshots. This is not something that I would recommend for
a server, but for a workstation, it works great.

Out of morbid curiousity, I ran a make -j bzImage to see how well this
would handle being driven into the ground. No oopsen, the OOM killer
worked great, killed everything that wasn't being used. The only prob is
that it hosed my console and killed my ssh daemon. oops ;) Sending a
sysrq-SAK and then a 3-finger-salute rebooted the system perfectly, no
fsck or anything.

I couldn't build that latency test for my system, so no results from it.
I haven't had a chance to look at it's source, so I'm not sure if I can
make it work here or not.

I'll keep up with your patches and keep you posted.

--
Gregory Finch


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 17:09     ` Robert Love
@ 2001-09-09 21:07       ` Arjan Filius
  2001-09-09 21:26         ` Robert Love
  2001-09-09 21:23       ` Arjan Filius
  1 sibling, 1 reply; 32+ messages in thread
From: Arjan Filius @ 2001-09-09 21:07 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

Hello Robert,


I tried 2.4.10-pre4+preempt+this-patch.
Just booted up, and don't notice anything unusual.

On 9 Sep 2001, Robert Love wrote:

> Arjan,
>
> the following patch was written by Manfred Spraul to fix your highmem
> bug.  I haven't had a chance to go over it, but I would like it if you
> could test it.  It can't hurt.  Patch it on top of the preempt patch and
> enable CONFIG_PREEMPT, CONFIG_HIGHMEM, and CONFIG_HIGHMEM_DEBUG.

I found i do anly have a '#define HIGHMEM_DEBUG 1' in
./include/asm/highmem.h, which is default in 2.4.10-pre4.

>
> let me know what happens...any relevant messages, etc. please pass
> along. if it does work, id be curious if they are any slowdowns

Booting up, X, compiling kernel.. no problems.
For speed, i DO notice other processes seem not to wait on that one
programm which has much disk-access, so the (real) sluggish feeling has
gone. This is however with the preempt patch, and the ctx_sw_ patch below
seems only to affect stability in positive sense.

Can you advice what and how to test performance/latency?
The grafics/statistics on the websites you named are impressive..


Greatings,


>
>
> --- highmem.h.prev      Sun Sep  9 08:59:04 2001
> +++ highmem.h   Sun Sep  9 09:00:07 2001
> @@ -88,6 +88,7 @@
>         if (page < highmem_start_page)
>                 return page_address(page);
>
> +       ctx_sw_off();
>         idx = type + KM_TYPE_NR*smp_processor_id();
>         vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> #if HIGHMEM_DEBUG
> @@ -119,6 +120,7 @@
>         pte_clear(kmap_pte-idx);
>         __flush_tlb_one(vaddr);
> #endif
> +       ctx_sw_on();
> }
>
> #endif /* __KERNEL__ */
>
>
>
>

-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 17:09     ` Robert Love
  2001-09-09 21:07       ` Arjan Filius
@ 2001-09-09 21:23       ` Arjan Filius
  2001-09-09 21:37         ` Robert Love
  1 sibling, 1 reply; 32+ messages in thread
From: Arjan Filius @ 2001-09-09 21:23 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

Hi,

After my succes report i _do_ noticed something unusual:

I'm not sure it's preempt related, but you wanted feedback :)

Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
Sep  9 23:08:02 sjoerd last message repeated 93 times
Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
Sep  9 23:08:02 sjoerd last message repeated 281 times

This is at the very moment i make a ppp connection to internet, and
get/set the time with netdate (for the first time after a reboot).
I didn't see this a second time (yet).

Btw this is 2.4.10-pre4+preempt-patch+pacht-below.

Greetings,

On 9 Sep 2001, Robert Love wrote:

> Arjan,
>
> the following patch was written by Manfred Spraul to fix your highmem
> bug.  I haven't had a chance to go over it, but I would like it if you
> could test it.  It can't hurt.  Patch it on top of the preempt patch and
> enable CONFIG_PREEMPT, CONFIG_HIGHMEM, and CONFIG_HIGHMEM_DEBUG.
>
> let me know what happens...any relevant messages, etc. please pass
> along. if it does work, id be curious if they are any slowdowns
>
>
> --- highmem.h.prev      Sun Sep  9 08:59:04 2001
> +++ highmem.h   Sun Sep  9 09:00:07 2001
> @@ -88,6 +88,7 @@
>         if (page < highmem_start_page)
>                 return page_address(page);
>
> +       ctx_sw_off();
>         idx = type + KM_TYPE_NR*smp_processor_id();
>         vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
> #if HIGHMEM_DEBUG
> @@ -119,6 +120,7 @@
>         pte_clear(kmap_pte-idx);
>         __flush_tlb_one(vaddr);
> #endif
> +       ctx_sw_on();
> }
>
> #endif /* __KERNEL__ */
>
>
>
>

-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 21:07       ` Arjan Filius
@ 2001-09-09 21:26         ` Robert Love
  0 siblings, 0 replies; 32+ messages in thread
From: Robert Love @ 2001-09-09 21:26 UTC (permalink / raw)
  To: Arjan Filius; +Cc: linux-kernel

On Sun, 2001-09-09 at 17:07, Arjan Filius wrote:
> I tried 2.4.10-pre4+preempt+this-patch.
> Just booted up, and don't notice anything unusual.

very good so far...

> I found i do anly have a '#define HIGHMEM_DEBUG 1' in
> ./include/asm/highmem.h, which is default in 2.4.10-pre4.

OK, then no problem there.

> Booting up, X, compiling kernel.. no problems.

good...

> For speed, i DO notice other processes seem not to wait on that one
> programm which has much disk-access, so the (real) sluggish feeling has
> gone. This is however with the preempt patch, and the ctx_sw_ patch below
> seems only to affect stability in positive sense.

_GREAT_ ... now, the reason I asked if you notice any new slowdowns is
exactly what you seem to realize: I feared the ctx_sw patch may cause
obvious slowdown.  This could be because the ctw_on/offs are in the
wrong place, and causing much to much locking.

It seems like you notice no problems, and I am happy.

I am glad to hear this news, I am going to take a look at highmem's code
and then integrate a final solution into the preemption patch.

> Can you advice what and how to test performance/latency?
> The grafics/statistics on the websites you named are impressive..

Sure, you can run dbench <ftp://samba.org/pub/tridge/dbench/> try it
with around 16 threads (dbench -16).  You might also want to try playing
an mp3 in the background during this.  Notice it should not have large
skips (one user reporting 3s skipps dropping to 0.5s and 0s).

You can run the audio latency test
<http://www.gardena.net/benno/linux/latencytest-0.42.tar.gz>, although I
heard there are problems compiling it from some other preemption users.

Finally, simply time a kernel compile `time make dep clean bzImage' ...

We can use these for preemption enabled and disabled, highmem enabled
and disabled, etc...

Thank you for your help, 

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 21:23       ` Arjan Filius
@ 2001-09-09 21:37         ` Robert Love
  2001-09-10  3:24           ` Daniel Phillips
                             ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Robert Love @ 2001-09-09 21:37 UTC (permalink / raw)
  To: Arjan Filius; +Cc: linux-kernel

On Sun, 2001-09-09 at 17:23, Arjan Filius wrote:
> After my succes report i _do_ noticed something unusual:
> 
> I'm not sure it's preempt related, but you wanted feedback :)
> 
> Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> Sep  9 23:08:02 sjoerd last message repeated 93 times
> Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
> Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> Sep  9 23:08:02 sjoerd last message repeated 281 times
> 
> This is at the very moment i make a ppp connection to internet, and
> get/set the time with netdate (for the first time after a reboot).
> I didn't see this a second time (yet).
> 

damn, I was exciting we had solved everything :)

actually, I am not confident of what could cause these results.  the
2.4.10-pre is going through another set of changes it should not, and
one of them concerns exactly what you are reporting.

SO, I suggest two options: try pre6.  I don't have patches yet, but I
will diff them soon.  pre5 should apply fairly cleanly, anyhow.

Even better, try 2.4.9-ac10.  It is what I use, and there seems to be
less reported problems.  Plus, Alan is not messing with all the VM work
Linus is playing with right now.  Patches for 2.4.9-ac10 are available.

Both can be had at:
http://tech9.net/rml/linux/

I am curious if you see the error again, and what seems to cause it, but
honestly there is too much work being done in 2.4.10-pre to figure
things out.

Nevertheless, I will look into it -- keep me posted.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 18:57   ` grue
@ 2001-09-09 21:44     ` Robert Love
  0 siblings, 0 replies; 32+ messages in thread
From: Robert Love @ 2001-09-09 21:44 UTC (permalink / raw)
  To: grue; +Cc: linux-kernel

On Sun, 2001-09-09 at 14:57, grue@lakesweb.com wrote:
> Just finished running some benchmarks on my workstation.
> Dual P3-550, 256MB ram, no swap, both kernels with CONFIG_SMP=y, 440BX,
> 2-20GB 5400rpm drives.
> 
> All benchmarks running in an Eterm on E on XF86-4.1.0-DRI with xmms running to
> listen for latency probs. Benchmarks run as root, everything else as regular
> user.

The XMMS bit is a neat idea -- good thinking :)

> linux-2.4.10-pre6 with rml-netdev-random patch and rml-preempt patch (pre5 patches
> applied to pre6), with CONFIG_PREEMPT=y
> 
> dbench 16
> Throughput 23.4608 MB/sec (NB=29.326 MB/sec  234.608 MBit/sec)
> Throughput 22.6915 MB/sec (NB=28.3644 MB/sec  226.915 MBit/sec) - .5sec hiccup in xmms
> Throughput 20.4314 MB/sec (NB=25.5392 MB/sec  204.314 MBit/sec) - .5sec hiccup in xmms
> Throughput 27.2849 MB/sec (NB=34.1061 MB/sec  272.849 MBit/sec) - .5sec hiccup in xmms
> Throughput 20.5148 MB/sec (NB=25.6435 MB/sec  205.148 MBit/sec) - 2sec and .5sec in xmms
> loadavg around 14
> 
> Bonnie
>               -------Sequential Output-------- ---Sequential Input-- --Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
>          1024  9002 98.0 15893 15.1  6519 10.8  6101 78.6 23330 24.4 104.3  2.2
> 
> linux-2.4.10-pre6
> 
> dbench 16
> Throughput 18.1821 MB/sec (NB=22.7276 MB/sec  181.821 MBit/sec)
> Throughput 22.4247 MB/sec (NB=28.0309 MB/sec  224.247 MBit/sec) - .5sec hiccup in xmms
> Throughput 20.2662 MB/sec (NB=25.3328 MB/sec  202.662 MBit/sec) - 2sec hiccup in xmms
> Throughput 28.4072 MB/sec (NB=35.5089 MB/sec  284.072 MBit/sec) - 3sec hiccup in xmms
> Throughput 24.0549 MB/sec (NB=30.0686 MB/sec  240.549 MBit/sec)
> loadavg around 14
> 
> Bonnie
>               -------Sequential Output-------- ---Sequential Input-- --Random--
>               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
>          1024  9173 99.4 16104 14.5  6488 10.2  6139 78.9 23260 22.1 105.1  2.3
> 

I didn't average it out, but the dbench results seem to be slightly
better with preemption enabled, as they should be.  We have seen results
with significantly better results under preemption than not, but oh
well.  Your system is already SMP, so the benefit would not be as
noticable.

Bonnie should not be much different if it is not threading the I/O
across multiple processes, since there is nothing else to be preempted. 
If anything, the throughput should drop slightly with preemption
enabled.

> The biggest difference is in the usability of the system under the load.
> With 2.4.10-pre6 vanilla, dbench seriously affects the interactivity of
> X, as well as causes some long interuptions in xmms. With preempt
> enabled, this is much improved. Still some interuptions in xmms, but the
> system is still usable, although a little sluggish. On the vanilla
> kernel, even bonnie caused a couple of hiccups in xmms, with preempt
> enabled, bonnie didn't affect xmms at all. all programs were run without
> altering nice values at all.

This is exactly what I want to hear.  I am glad you did the
XMMS/how-it-feels test and that preemption came out ahead.

We can work on cutting the latency even further.  There are still some
long held locks in the kernel, and we can not preempt around them.

> For a workstation, I like the difference, my system is still usable even
> with the load upwards of 14, without preempt, it's like looking at a
> couple of screenshots. This is not something that I would recommend for
> a server, but for a workstation, it works great.

Great.

> Out of morbid curiousity, I ran a make -j bzImage to see how well this
> would handle being driven into the ground. No oopsen, the OOM killer
> worked great, killed everything that wasn't being used. The only prob is
> that it hosed my console and killed my ssh daemon. oops ;) Sending a
> sysrq-SAK and then a 3-finger-salute rebooted the system perfectly, no
> fsck or anything.
> 
> I couldn't build that latency test for my system, so no results from it.
> I haven't had a chance to look at it's source, so I'm not sure if I can
> make it work here or not.

Someone else reported to me privately it did not compile.  I don't think
they looked into it.

I would assume the latency is going to drop in the same manner it does
for a UP system (_very_ much).

> I'll keep up with your patches and keep you posted.

Great, thanks for the feedback.  While it is great to hear "preemption
seems to be an improvement under SMP", I am most exciting it works
without faults.

You can always find the newest diffs at http://tech9.net/rml/linux

Thanks again,

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-09 14:55       ` george anzinger
@ 2001-09-09 22:25         ` Arjan Filius
  0 siblings, 0 replies; 32+ messages in thread
From: Arjan Filius @ 2001-09-09 22:25 UTC (permalink / raw)
  To: george anzinger; +Cc: Roger Larsson, Robert Love, linux-kernel, linux-mm

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 4086 bytes --]

Hi,

On Sun, 9 Sep 2001, george anzinger wrote:

> If the page it is the correct one, when it is found mapped, the code
> should just exit, not BUG() IHMO.


I'll try the ac10 +preempt, see what happens.

>
> George
>
>
> Roger Larsson wrote:
> >
> > Hi,
> >
> > This is interesting. [Assumes UP Athlon - correct]
> > Note that all BUGs out in highmem.h:95 (kmap_atomic)
> > and that test is only on if you have enabled HIGHMEM_DEBUG
> > [my analyze is done with a 2.4.10-pre2 kernel, but I checked with
> > later patches and I do not think they fix it either...]
> >
> > The preemptive kernel puts more SMP stress on the kernel than
> > running with multiple CPUs.
> >
> > So this might be a potential bug in the kernel proper, running with
> > a SMP computer.
> >
> > If I understand the bug correctly, a process gets a page fault.
> > Starts to map in the page. But before the final part it checks -
> > and the page is already there!!! Correct?
> >
> > On Saturday den 8 September 2001 19:33, Arjan Filius wrote:
> > > Hello Robert,
> > >
> > >
> > > I tried 2.4.10-pre4 with patch-rml-2.4.10-pre4-preempt-kernel-1.
> > > But it seems to hit highmem (see below) (i do have 1.5GB ram)
> > > 2.4.10-pre4 plain runs just fine.
> > >
> > > With the kernel option mem=850M the patched kernel boots an seems to run
> > > fine. However i didn't do any stress testing yet, but i still notice
> > > hickups while playing mp3 files at -10 nice level with mpg123 on a 1.1GHz
> > > Athlon, and removing for example a _large_ file (reiser-on-lvm).
> > >
> > > My syslog output with highmem:
> > >
> > > Sep  8 18:10:16 sjoerd kernel: kernel BUG at
> > > /usr/src/linux-2.4.10-pre4/include/asm/highmem.h:95! Sep  8 18:10:16 sjoerd
> > > kernel: invalid operand: 0000
> > > Sep  8 18:10:16 sjoerd kernel: CPU:    0
> > > Sep  8 18:10:16 sjoerd kernel: EIP:    0010:[do_wp_page+636/1088]
> > > [- - -]
> > > sjoerd kernel: Call Trace: [handle_mm_fault+141/224]
> > > [do_page_fault+375/1136] [do_page_fault+0/1136] [__mmdrop+58/64]
> > > [do_exit+595/640] Sep  8 18:10:16 sjoerd kernel:    [error_code+52/64]
> >
> > Lets look at this example. You need to add some inline functions...
> >
> > handle_mm_fault
> >         takes the mm->page_table_lock [this should prevent reschedules]
> >         allocs pmd
> >         allocs pte
> >         handle_pte_fault(...)
> > handle_pte_fault [inline, most likely path]
> >         pte is present
> >         it is a write access
> >         but the pte is not writeable  - call do_wp_page
> > do_wp_page
> >         plays some games with the lock...
> >         finally calls copy_cow_page [inline] with the page_table_lock
> >         UNLOCKED!
> > copy_cow_page
> >         calls clear_user_highpage or copy_user_highpage
> > both clear_user_highpage and copy_user_highpage
> >         calls kmap_atomic
> > kmap_atomic
> >         page is a highmem page
> >         but during the time this process was unlocked some other
> >         thread has allocated the page in question... BUG out.
> >
> > So somewere between the UNLOCK (might be a lot later) and the
> > BUG test in kmap_atomic the process running in kernel got preempted.
> > (most likely during the page copy since it will take some time)
> >
> > Another process (thread) started to run - hit the same page fault
> > but succeeded in its alloc.
> >
> > Back to the first process it continues, finally checks - the page
> > is there... and BUGS.
> >
> > Note that this can happen in a pure SMP kernel.
> >
> > But let the processes (threads) run on two CPUs. And let the
> > first get an interrupt/bh after unlock - the other can pass
> > and add the page before the first one can continue - same
> > result!
> >
> > /RogerL
> >
> > --
> > Roger Larsson
> > Skellefteå
> > Sweden
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 21:37         ` Robert Love
@ 2001-09-10  3:24           ` Daniel Phillips
  2001-09-10  3:37             ` Jeremy Zawodny
  2001-09-10  5:09           ` Robert Love
  2001-09-11 19:47           ` Arjan Filius
  2 siblings, 1 reply; 32+ messages in thread
From: Daniel Phillips @ 2001-09-10  3:24 UTC (permalink / raw)
  To: Robert Love, Arjan Filius; +Cc: linux-kernel

On September 9, 2001 11:37 pm, Robert Love wrote:
> On Sun, 2001-09-09 at 17:23, Arjan Filius wrote:
> > After my succes report i _do_ noticed something unusual:
> > 
> > I'm not sure it's preempt related, but you wanted feedback :)
> > 
> > Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> > Sep  9 23:08:02 sjoerd last message repeated 93 times
> > Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
> > Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> > Sep  9 23:08:02 sjoerd last message repeated 281 times
> > 
> > This is at the very moment i make a ppp connection to internet, and
> > get/set the time with netdate (for the first time after a reboot).
> > I didn't see this a second time (yet).
> > 
> 
> damn, I was exciting we had solved everything :)
> 
> actually, I am not confident of what could cause these results.  the
> 2.4.10-pre is going through another set of changes it should not, and
> one of them concerns exactly what you are reporting.

This may not be your fault.  It's a GFP_NOFS recursive allocation - this
comes either from grow_buffers or ReiserFS, probably the former.  In
either case, it means we ran completely out of free pages, even though
the caller is willing to wait.  Hmm.  It smells like a loophole in vm
scanning.

--
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-10  3:24           ` Daniel Phillips
@ 2001-09-10  3:37             ` Jeremy Zawodny
  0 siblings, 0 replies; 32+ messages in thread
From: Jeremy Zawodny @ 2001-09-10  3:37 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Robert Love, Arjan Filius, linux-kernel

On Mon, Sep 10, 2001 at 05:24:36AM +0200, Daniel Phillips wrote:
> On September 9, 2001 11:37 pm, Robert Love wrote:
> > On Sun, 2001-09-09 at 17:23, Arjan Filius wrote:
> > > After my succes report i _do_ noticed something unusual:
> > > 
> > > I'm not sure it's preempt related, but you wanted feedback :)
> > > 
> > > Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> > > Sep  9 23:08:02 sjoerd last message repeated 93 times
> > > Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
> > > Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
>
> 
> This may not be your fault.  It's a GFP_NOFS recursive allocation -
> this comes either from grow_buffers or ReiserFS, probably the
> former.  In either case, it means we ran completely out of free
> pages, even though the caller is willing to wait.  Hmm.  It smells
> like a loophole in vm scanning.

I've seen that error on a couple 2.4.9 systems at work.  It's
certainly VM related, 'cause it doesn't happen when I disable swap on
them.  I've disabled it for performance reasons (the VM system is a
little retarded in 2.4.x IMHO, so I'm just not letting it swap).

Jeremy
-- 
Jeremy D. Zawodny     |  Perl, Web, MySQL, Linux Magazine, Yahoo!
<Jeremy@Zawodny.com>  |  http://jeremy.zawodny.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 21:37         ` Robert Love
  2001-09-10  3:24           ` Daniel Phillips
@ 2001-09-10  5:09           ` Robert Love
  2001-09-10 18:25             ` Daniel Phillips
  2001-09-10 21:29             ` Arjan Filius
  2001-09-11 19:47           ` Arjan Filius
  2 siblings, 2 replies; 32+ messages in thread
From: Robert Love @ 2001-09-10  5:09 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Arjan Filius, linux-kernel

On Sun, 2001-09-09 at 23:24, Daniel Phillips wrote:
> This may not be your fault.  It's a GFP_NOFS recursive allocation - this
> comes either from grow_buffers or ReiserFS, probably the former.  In
> either case, it means we ran completely out of free pages, even though
> the caller is willing to wait.  Hmm.  It smells like a loophole in vm
> scanning.

I am not a VM hacker -- can you tell me where to start? what do you
suspect it is?

If the user stops seeing the error with preemption disabled, is your
theory nulled, or does that just mean the problem is agitated by
preemption?

I don't think Arjan was using ReiserFS, so its from grow_buffers...

I appreciate your help.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-10  5:09           ` Robert Love
@ 2001-09-10 18:25             ` Daniel Phillips
  2001-09-10 21:29             ` Arjan Filius
  1 sibling, 0 replies; 32+ messages in thread
From: Daniel Phillips @ 2001-09-10 18:25 UTC (permalink / raw)
  To: Robert Love; +Cc: Arjan Filius, linux-kernel

On September 10, 2001 07:09 am, Robert Love wrote:
> On Sun, 2001-09-09 at 23:24, Daniel Phillips wrote:
> > This may not be your fault.  It's a GFP_NOFS recursive allocation - this
> > comes either from grow_buffers or ReiserFS, probably the former.  In
> > either case, it means we ran completely out of free pages, even though
> > the caller is willing to wait.  Hmm.  It smells like a loophole in vm
> > scanning.
> 
> I am not a VM hacker -- can you tell me where to start? what do you
> suspect it is?
> 
> If the user stops seeing the error with preemption disabled, is your
> theory nulled, or does that just mean the problem is agitated by
> preemption?
> 
> I don't think Arjan was using ReiserFS, so its from grow_buffers...
> 
> I appreciate your help.

The first thing to check is whether memory is really exhausted at the
time the errors are logged (cat /proc/meminfo).  Then you want to see
which paths in __alloc_pages could possibly allow this PF_MEMALLOC +
GFP_WAIT allocation request to drop all the way through without being
serviced.  Sorry, I haven't had time to do that and won't for a few
days.  Even if you triggered it, it is probably a hole in the scan
logic.  We have __GFP_WAIT, so it should wait.

Here's a hint, look very critically at this part of page_alloc.c:

455    /*
456     * Fail in case no progress was made and the
457     * allocation may not be able to block on IO.
458     */
459    return NULL;

--
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-10  5:09           ` Robert Love
  2001-09-10 18:25             ` Daniel Phillips
@ 2001-09-10 21:29             ` Arjan Filius
  2001-09-13 17:27               ` Robert Love
  1 sibling, 1 reply; 32+ messages in thread
From: Arjan Filius @ 2001-09-10 21:29 UTC (permalink / raw)
  To: Robert Love; +Cc: Daniel Phillips, linux-kernel

Hello,


On 10 Sep 2001, Robert Love wrote:

> On Sun, 2001-09-09 at 23:24, Daniel Phillips wrote:
> > This may not be your fault.  It's a GFP_NOFS recursive allocation - this
> > comes either from grow_buffers or ReiserFS, probably the former.  In
> > either case, it means we ran completely out of free pages, even though
> > the caller is willing to wait.  Hmm.  It smells like a loophole in vm
> > scanning.
>
> I am not a VM hacker -- can you tell me where to start? what do you
> suspect it is?
>
> If the user stops seeing the error with preemption disabled, is your
> theory nulled, or does that just mean the problem is agitated by
> preemption?
>
> I don't think Arjan was using ReiserFS, so its from grow_buffers...

Yes I am using reiserfs (for "ages"). better said, reiser on LVM.

Small discription of my system and used setup:
scsi-disk,scsi-cdrom,ide-disk,ide-scsi,ext2,reiser, iptables, ipv6,
acenic-Gbit-ethernet, ramdisk, highmem (1.5GB-ram), Athlon 1.1GHz, Asus
a7v MB (via).


 Greatings,


>
> I appreciate your help.
>
>

-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-09 21:37         ` Robert Love
  2001-09-10  3:24           ` Daniel Phillips
  2001-09-10  5:09           ` Robert Love
@ 2001-09-11 19:47           ` Arjan Filius
  2 siblings, 0 replies; 32+ messages in thread
From: Arjan Filius @ 2001-09-11 19:47 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

Hello,

On 9 Sep 2001, Robert Love wrote:

> On Sun, 2001-09-09 at 17:23, Arjan Filius wrote:
> > After my succes report i _do_ noticed something unusual:
> >
> > I'm not sure it's preempt related, but you wanted feedback :)
> >
> > Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> > Sep  9 23:08:02 sjoerd last message repeated 93 times
> > Sep  9 23:08:02 sjoerd kernel: cation failed (gfp=0x70/1).
> > Sep  9 23:08:02 sjoerd kernel: __alloc_pages: 0-order allocation failed (gfp=0x70/1).
> > Sep  9 23:08:02 sjoerd last message repeated 281 times
> >
> > This is at the very moment i make a ppp connection to internet, and
> > get/set the time with netdate (for the first time after a reboot).
> > I didn't see this a second time (yet).
> >
>
> damn, I was exciting we had solved everything :)
>
> actually, I am not confident of what could cause these results.  the
> 2.4.10-pre is going through another set of changes it should not, and
> one of them concerns exactly what you are reporting.
>
> SO, I suggest two options: try pre6.  I don't have patches yet, but I
> will diff them soon.  pre5 should apply fairly cleanly, anyhow.
>
> Even better, try 2.4.9-ac10.  It is what I use, and there seems to be
> less reported problems.  Plus, Alan is not messing with all the VM work
> Linus is playing with right now.  Patches for 2.4.9-ac10 are available.
>
> Both can be had at:
> http://tech9.net/rml/linux/
>
> I am curious if you see the error again, and what seems to cause it, but
> honestly there is too much work being done in 2.4.10-pre to figure
> things out.
>
> Nevertheless, I will look into it -- keep me posted.
>

It took some time, but here the results.

I do seem to have no problem at all with plain 2.4.10-pre6 and 2.4.9-ac10.

As adviced i tried the 2.4.9-ac10 with the preempt patch (without extra
lock-patch posted on lkm)
Just booted the system and i noticed various (4x) "invalid operand"'s, and
the 4 seconds from my systlog are below.

Sep 11 19:47:31 sjoerd kernel: klogd 1.3-3, log source = /proc/kmsg started.
Sep 11 19:47:31 sjoerd kernel: Inspecting /boot/System.map-2.4.9-ac10
Sep 11 19:47:31 sjoerd kernel: Loaded 13938 symbols from /boot/System.map-2.4.9-ac10.
Sep 11 19:47:31 sjoerd kernel: Symbols match kernel version 2.4.9.
Sep 11 19:47:31 sjoerd kernel: Loaded 584 symbols from 42 modules.
Sep 11 19:47:31 sjoerd kernel: IPv6 v0.8 for NET4.0
Sep 11 19:47:31 sjoerd kernel: IPv6 over IPv4 tunneling driver
Sep 11 19:47:31 sjoerd kernel: divert: not allocating divert_blk for non-ethernet device sit0
Sep 11 19:47:31 sjoerd kernel: divert: allocating divert_blk for eth0
Sep 11 19:47:31 sjoerd kernel: acenic.c: v0.81 04/20/2001  Jes Sorensen, linux-acenic@SunSITE.dk
Sep 11 19:47:31 sjoerd kernel:                             http://home.cern.ch/~jes/gige/acenic.html
Sep 11 19:47:31 sjoerd kernel: eth0: 3Com 3C985 Gigabit Ethernet at 0xdd800000, irq 11
Sep 11 19:47:31 sjoerd kernel:   Tigon II (Rev. 6), Firmware: 12.4.11, MAC: 00:60:08:f6:1d:5b
Sep 11 19:47:31 sjoerd kernel:   PCI cache line size set incorrectly (32 bytes) by BIOS/FW, correcting to 64
Sep 11 19:47:31 sjoerd kernel:   PCI bus width: 32 bits, speed: 33MHz, latency: 64 clks
Sep 11 19:47:31 sjoerd kernel:   Disabling PCI memory write and invalidate
Sep 11 19:47:31 sjoerd kernel: eth0: Firmware up and running
Sep 11 19:47:31 sjoerd kernel: eth0: Optical link UP (Full Duplex, Flow Control: TX RX)
Sep 11 19:47:31 sjoerd kernel: ip_conntrack (8192 buckets, 65536 max)
Sep 11 19:47:31 sjoerd kernel: ip_tables: (c)2000 Netfilter core team
Sep 11 19:47:31 sjoerd kernel: LOIN ICMP rate to high IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=90 TOS=0x02 PREC=0xC0 TTL=255 ID=32406 PROTO=ICMP TYPE=3 CODE=3 [SRC=127.0.0.1 DST=127.0.0.1 LEN=62 TOS=0x02 PREC=0x00 TTL=64 ID=6694 DF PROTO=UDP SPT=32768 DPT=53 LEN=42 ]
Sep 11 19:47:31 sjoerd kernel: LOIN ICMP rate to high IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=102 TOS=0x02 PREC=0xC0 TTL=255 ID=32413 PROTO=ICMP TYPE=3 CODE=3 [SRC=127.0.0.1 DST=127.0.0.1 LEN=74 TOS=0x02 PREC=0x00 TTL=64 ID=6783 DF PROTO=UDP SPT=32769 DPT=53 LEN=54 ]
Sep 11 19:47:31 sjoerd kernel: LOIN ICMP rate to high IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=92 TOS=0x02 PREC=0xC0 TTL=255 ID=32421 PROTO=ICMP TYPE=3 CODE=3 [SRC=127.0.0.1 DST=127.0.0.1 LEN=64 TOS=0x02 PREC=0x00 TTL=64 ID=7284 DF PROTO=UDP SPT=32770 DPT=53 LEN=44 ]
Sep 11 19:47:31 sjoerd kernel: eth0: no IPv6 routers present
Sep 11 19:47:31 sjoerd kernel: LOIN ICMP rate to high IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=102 TOS=0x02 PREC=0xC0 TTL=255 ID=32427 PROTO=ICMP TYPE=3 CODE=3 [SRC=127.0.0.1 DST=127.0.0.1 LEN=74 TOS=0x02 PREC=0x00 TTL=64 ID=7785 DF PROTO=UDP SPT=32770 DPT=53 LEN=54 ]
Sep 11 19:47:31 sjoerd kernel: device eth0 entered promiscuous mode
Sep 11 19:47:31 sjoerd kernel: invalid operand: 0000
Sep 11 19:47:31 sjoerd kernel: CPU:    0
Sep 11 19:47:31 sjoerd kernel: EIP:    0010:[do_anonymous_page+160/304]
Sep 11 19:47:31 sjoerd kernel: EFLAGS: 00010206
Sep 11 19:47:31 sjoerd kernel: eax: f54fc000   ebx: f7771e00   ecx: c023eee8   edx: c0001ff8
Sep 11 19:47:31 sjoerd kernel: esi: c289b37c   edi: f7632c00   ebp: f5519068   esp: f54fddd0
Sep 11 19:47:31 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep 11 19:47:31 sjoerd kernel: Process snort (pid: 2059, stackpage=f54fd000)
Sep 11 19:47:31 sjoerd kernel: Stack: 4001a000 00000000 f7632c00 f7771e00 c01236b1 f7771e00 f7632c00 f5519068
Sep 11 19:47:31 sjoerd kernel:        00000001 4001a000 4001a000 f7771e00 ffffffff 00000001 c012382e f7771e00
Sep 11 19:47:31 sjoerd kernel:        f7632c00 4001a000 00000001 f5519068 f54fc000 f7632c00 f7771e00 f7771e1c
Sep 11 19:47:31 sjoerd kernel: Call Trace: [do_no_page+49/336] [handle_mm_fault+94/240] [do_page_fault+407/1200] [do_page_fault+0/1200] [do_rw_disk+290/832]
Sep 11 19:47:31 sjoerd kernel:    [start_request+312/528] [start_request+405/528] [ide_do_request+711/800] [error_code+56/68] [file_read_actor+96/192] [do_generic_file_read+517/1264]
Sep 11 19:47:31 sjoerd kernel:    [generic_file_read+99/128] [file_read_actor+0/192] [sys_read+150/208] [system_call+51/56]
Sep 11 19:47:31 sjoerd kernel:
Sep 11 19:47:31 sjoerd kernel: Code: 0f 0b 89 f0 2b 05 ec 63 29 c0 69 c0 f1 f0 f0 f0 c1 f8 02 c1
Sep 11 19:47:31 sjoerd kernel:  <6>device eth0 left promiscuous mode
Sep 11 19:47:31 sjoerd kernel: device eth0 entered promiscuous mode
Sep 11 19:47:31 sjoerd kernel: eth0: Enabling Jumbo frame support
Sep 11 19:47:31 sjoerd kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Sep 11 19:47:31 sjoerd kernel: LOOUT REJECT UDP IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x02 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=605 DPT=111 LEN=64
Sep 11 19:47:31 sjoerd kernel: LOOUT REJECT UDP IN= OUT=lo SRC=127.0.0.1 DST=127.0.0.1 LEN=84 TOS=0x02 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=607 DPT=111 LEN=64

Sep 11 19:48:43 sjoerd : time=1000230523  unable to read "uptime"   ^I../SRC/dqs_load_avg.c 239 /usr/sbin/dqs_execd sjoerd.sjoerdnet
Sep 11 19:48:43 sjoerd kernel: invalid operand: 0000
Sep 11 19:48:43 sjoerd kernel: CPU:    0
Sep 11 19:48:43 sjoerd kernel: EIP:    0010:[filemap_nopage+284/1168]
Sep 11 19:48:43 sjoerd kernel: EFLAGS: 00010206
Sep 11 19:48:43 sjoerd kernel: eax: c2658524   ebx: 00000001   ecx: c023eee8   edx: c0001ff8
Sep 11 19:48:43 sjoerd kernel: esi: c297d838   edi: 00000015   ebp: c2658524   esp: f4169b3c
Sep 11 19:48:43 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep 11 19:48:43 sjoerd kernel: Process sh (pid: 2708, stackpage=f4169000)
Sep 11 19:48:43 sjoerd kernel: Stack: 40016000 f77153c0 f4372500 f4a67740 00000019 f7b173b8 f7715478 f77153c0
Sep 11 19:48:43 sjoerd kernel:        f4439f40 c0123720 f4372500 40016000 00000001 400162a8 f4a67740 ffffffff
Sep 11 19:48:43 sjoerd kernel:        00000001 c012382e f4a67740 f4372500 400162a8 00000001 f409a058 f4168000
Sep 11 19:48:43 sjoerd kernel: Call Trace: [do_no_page+160/336] [handle_mm_fault+94/240] [do_page_fault+407/1200] [do_page_fault+0/1200] [kunmap_high+86/128]
Sep 11 19:48:43 sjoerd kernel:    [file_read_actor+145/192] [update_atime+68/80] [do_generic_file_read+1246/1264] [do_munmap+86/608] [update_atime+68/80] [error_code+56/68]
Sep 11 19:48:43 sjoerd kernel:    [clear_user+46/64] [padzero+28/32] [load_elf_interp+619/704] [load_elf_binary+1927/2672] [load_elf_binary+0/2672] [kunmap_high+86/128]
Sep 11 19:48:43 sjoerd kernel:    [file_read_actor+145/192] [search_binary_handler+152/496] [do_execve+380/496] [do_execve+403/496] [sys_execve+47/96] [system_call+51/56]
Sep 11 19:48:43 sjoerd kernel:
Sep 11 19:48:43 sjoerd kernel: Code: 0f 0b 89 f0 2b 05 ec 63 29 c0 69 c0 f1 f0 f0 f0 c1 f8 02 c1

Sep 11 19:48:45 sjoerd kernel:  invalid operand: 0000
Sep 11 19:48:45 sjoerd kernel: CPU:    0
Sep 11 19:48:45 sjoerd kernel: EIP:    0010:[do_wp_page+604/912]
Sep 11 19:48:45 sjoerd kernel: EFLAGS: 00010206
Sep 11 19:48:45 sjoerd kernel: eax: c1004520   ebx: c1000010   ecx: c1000010   edx: c0001ff8
Sep 11 19:48:45 sjoerd kernel: esi: c25c0ff8   edi: c25984ec   ebp: 51e5a065   esp: f4001ed4
Sep 11 19:48:45 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep 11 19:48:45 sjoerd kernel: Process oracle (pid: 2718, stackpage=f4001000)
Sep 11 19:48:45 sjoerd kernel: Stack: 093e1a6c f4a678c0 ffffffff 00000001 c0123861 f4a678c0 f4024380 093e1a6c
Sep 11 19:48:45 sjoerd kernel:        f4016f84 51e5a065 f4000000 f4024380 f4a678c0 f4a678dc c0110ca7 f4a678c0
Sep 11 19:48:45 sjoerd kernel:        f4024380 093e1a6c 00000001 f4000000 00000007 c0110b10 bfffca1c f4a678dc
Sep 11 19:48:45 sjoerd kernel: Call Trace: [handle_mm_fault+145/240] [do_page_fault+407/1200] [do_page_fault+0/1200] [dput+25/416] [fput+116/224]
Sep 11 19:48:45 sjoerd kernel:    [filp_close+156/176] [sys_close+93/144] [error_code+56/68]
Sep 11 19:48:45 sjoerd kernel:
Sep 11 19:48:45 sjoerd kernel: Code: 0f 0b 89 f0 29 c8 69 c0 f1 f0 f0 f0 c1 f8 02 c1 e0 0c 0b 05

Sep 11 19:53:04 sjoerd kernel: invalid operand: 0000
Sep 11 19:53:04 sjoerd kernel: CPU:    0
Sep 11 19:53:04 sjoerd kernel: EIP:    0010:[filemap_nopage+284/1168]
Sep 11 19:53:04 sjoerd kernel: EFLAGS: 00010206
Sep 11 19:53:04 sjoerd kernel: eax: c23c8b9c   ebx: 00000001   ecx: c023eee8   edx: c0001ff8
Sep 11 19:53:04 sjoerd kernel: esi: c297d838   edi: 00000015   ebp: c23c8b9c   esp: f34a5b3c
Sep 11 19:53:04 sjoerd kernel: ds: 0018   es: 0018   ss: 0018
Sep 11 19:53:04 sjoerd kernel: Process qmail-remote (pid: 3084, stackpage=f34a5000)
Sep 11 19:53:04 sjoerd kernel: Stack: 40016000 f77153c0 f35da900 f3b70d40 00000019 f7b173b8 f7715478 f77153c0
Sep 11 19:53:04 sjoerd kernel:        f484e240 c0123720 f35da900 40016000 00000001 400162a8 f3b70d40 ffffffff
Sep 11 19:53:04 sjoerd kernel:        00000001 c012382e f3b70d40 f35da900 400162a8 00000001 f34c4058 f34a4000
Sep 11 19:53:04 sjoerd kernel: Call Trace: [do_no_page+160/336] [handle_mm_fault+94/240] [do_page_fault+407/1200] [do_page_fault+0/1200] [kunmap_high+86/128]
Sep 11 19:53:04 sjoerd kernel:    [file_read_actor+145/192] [update_atime+68/80] [do_generic_file_read+1246/1264] [do_munmap+86/608] [update_atime+68/80] [error_code+56/68]
Sep 11 19:53:04 sjoerd kernel:    [clear_user+46/64] [padzero+28/32] [load_elf_interp+619/704] [load_elf_binary+1927/2672] [load_elf_binary+0/2672] [search_binary_handler+152/496]
Sep 11 19:53:04 sjoerd kernel:    [do_execve+380/496] [do_execve+403/496] [sys_execve+47/96] [system_call+51/56]
Sep 11 19:53:04 sjoerd kernel:
Sep 11 19:53:04 sjoerd kernel: Code: 0f 0b 89 f0 2b 05 ec 63 29 c0 69 c0 f1 f0 f0 f0 c1 f8 02 c1


I rebooted the the same patched kernel with the mem=850M option (no
highmem used then), and i'm running it for a few hours but doesn't complain with
any special kernel message. And seems runs _just fine_.



Greetings,


-- 
Arjan Filius
mailto:iafilius@xs4all.nl


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-10 21:29             ` Arjan Filius
@ 2001-09-13 17:27               ` Robert Love
  2001-09-14  7:30                 ` george anzinger
  2001-09-14 15:01                 ` Robert Love
  0 siblings, 2 replies; 32+ messages in thread
From: Robert Love @ 2001-09-13 17:27 UTC (permalink / raw)
  To: Arjan Filius; +Cc: linux-kernel

On Mon, 2001-09-10 at 17:29, Arjan Filius wrote:
> Yes I am using reiserfs (for "ages"). better said, reiser on LVM.
> 
> Small discription of my system and used setup:
> scsi-disk,scsi-cdrom,ide-disk,ide-scsi,ext2,reiser, iptables, ipv6,
> acenic-Gbit-ethernet, ramdisk, highmem (1.5GB-ram), Athlon 1.1GHz, Asus
> a7v MB (via).

Hi Arjan,

first, highmem is fixed and the original patch you have from me is good.
second, Daniel Phillips gave me some feedback into how to figure out the
VM error.  I am working on it, although just the VM potential --
ReiserFS may be another problem.

third, you may be experiencing problems with a kernel optimized for
Athlon.  this may or may not be related to the current issues with an
Athlon-optimized kernel.  Basically, functions in arch/i386/lib/mmx.c
seem to need some locking to prevent preemption.  I have a basic patch
and we are working on a final one.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-13 17:27               ` Robert Love
@ 2001-09-14  7:30                 ` george anzinger
  2001-09-14 15:01                 ` Robert Love
  1 sibling, 0 replies; 32+ messages in thread
From: george anzinger @ 2001-09-14  7:30 UTC (permalink / raw)
  To: Robert Love; +Cc: Arjan Filius, linux-kernel

Robert Love wrote:
> 
> On Mon, 2001-09-10 at 17:29, Arjan Filius wrote:
> > Yes I am using reiserfs (for "ages"). better said, reiser on LVM.
> >
> > Small discription of my system and used setup:
> > scsi-disk,scsi-cdrom,ide-disk,ide-scsi,ext2,reiser, iptables, ipv6,
> > acenic-Gbit-ethernet, ramdisk, highmem (1.5GB-ram), Athlon 1.1GHz, Asus
> > a7v MB (via).
> 
> Hi Arjan,
> 
> first, highmem is fixed and the original patch you have from me is good.
> second, Daniel Phillips gave me some feedback into how to figure out the
> VM error.  I am working on it, although just the VM potential --
> ReiserFS may be another problem.
> 
> third, you may be experiencing problems with a kernel optimized for
> Athlon.  this may or may not be related to the current issues with an
> Athlon-optimized kernel.  Basically, functions in arch/i386/lib/mmx.c
> seem to need some locking to prevent preemption.  I have a basic patch
> and we are working on a final one.
> 
Right, the same problem as using floating point in the kernel (mmx uses
the FP regs and they are not saved).  The question is: Just how long do
these routines take?  If it is very long it may be best to just say no. 
One way would be to always pretend that the "in_interrupt" flag is set. 
I think possibly some routines are short and the switch off/ switch on
pair is right, but for the long ones, well the preemption patch is
supposed to make the kernel more preemptable, not less.  Any one have
execution times for these functions?

George

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Feedback on preemptible kernel patch
  2001-09-13 17:27               ` Robert Love
  2001-09-14  7:30                 ` george anzinger
@ 2001-09-14 15:01                 ` Robert Love
  1 sibling, 0 replies; 32+ messages in thread
From: Robert Love @ 2001-09-14 15:01 UTC (permalink / raw)
  To: george anzinger; +Cc: Arjan Filius, linux-kernel

On Fri, 2001-09-14 at 03:30, george anzinger wrote:
> Right, the same problem as using floating point in the kernel (mmx uses
> the FP regs and they are not saved).

Right, and I suspect we will find more problems of this type as we go
on.  In fact, the more general case "things that are SMP-safe but not
preempt safe" will be issues, too.  The highmem bug was one of these -
code that was SMP-safe but did not have lock points because it was
per-CPU code.  Preemption ruins all that.

> The question is: Just how long do these routines take?  If it is very long
> it may be best to just say no. One way would be to always pretend that
> the"in_interrupt" flag is set.  I think possibly some routines are
> short and the switch off/ switch on pair is right, but for the long ones,
> well the preemption patch is supposed to make the kernel more preemptable,
> not less.  Any one have execution times for these functions?

Well, its the routines in arch/i386/lib/mmx.c -- and just the ones that
call kernel_begin/end_fpu.  My patch pushes a ctx_sw_off/on pair into
those functions.  Anyhow, if you look, they aren't too long.

However, I agree that we may be destroying our purpose here.  A user of
the patch actually put together a patch that will disable the CONFIG to
use the fast MMX memcpy stuff if preemption was enabled.  He benchmarked
against the two and I can send you those results when I sort through
them.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-14  9:15     ` Pavel Machek
  2001-09-17 22:40       ` Manfred Spraul
@ 2001-09-18  0:19       ` Robert Love
  1 sibling, 0 replies; 32+ messages in thread
From: Robert Love @ 2001-09-18  0:19 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Pavel Machek, Roger Larsson, linux-kernel, nigel

On Mon, 2001-09-17 at 18:40, Manfred Spraul wrote:
> > is it legal to kmap_atomic(a,b); kmap_atomic(c,d); kunmap_atomic(a,b);
>
> Yes, that's legal - just think about one kmap_atomic from process
> context, and another one in irq context.
> 
> > If so, your patch may need some ounting....
> > Pavel
> 
> I hope ctx_sw_off does internal counting, correct?

yes, ctx_sw_off atomically increments a counter and ctx_sw_on
atomic_dec_and_test()s it.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
       [not found]   ` <001a01c1390262c7f30/mnt/sendme10411ac@local>
  2001-09-14  9:15     ` Pavel Machek
@ 2001-09-17 22:41     ` Robert Love
  1 sibling, 0 replies; 32+ messages in thread
From: Robert Love @ 2001-09-17 22:41 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Manfred Spraul, Roger Larsson, linux-kernel, nigel

On Fri, 2001-09-14 at 05:15, Pavel Machek wrote:
> is it legal to kmap_atomic(a,b); kmap_atomic(c,d); kunmap_atomic(a,b); ?
> If so, your patch may need some ounting....

ctx_sw_on and ctx_sw_off use a recursive spinlock, so the calls to
kunmap_atomic won't drop the slock until the last call.

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-14  9:15     ` Pavel Machek
@ 2001-09-17 22:40       ` Manfred Spraul
  2001-09-18  0:19       ` Robert Love
  1 sibling, 0 replies; 32+ messages in thread
From: Manfred Spraul @ 2001-09-17 22:40 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Robert Love, Roger Larsson, linux-kernel, nigel

>
> is it legal to kmap_atomic(a,b); kmap_atomic(c,d); kunmap_atomic(a,b);
?
>
Yes, that's legal - just think about one kmap_atomic from process
context, and another one in irq context.

> If so, your patch may need some ounting....
> Pavel

I hope ctx_sw_off does internal counting, correct?

--
    Manfred


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
       [not found]   ` <001a01c1390262c7f30/mnt/sendme10411ac@local>
@ 2001-09-14  9:15     ` Pavel Machek
  2001-09-17 22:40       ` Manfred Spraul
  2001-09-18  0:19       ` Robert Love
  2001-09-17 22:41     ` Robert Love
  1 sibling, 2 replies; 32+ messages in thread
From: Pavel Machek @ 2001-09-14  9:15 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Robert Love, Roger Larsson, linux-kernel, nigel

Hi!

> > #define kmap_atomic(page,idx) ctx_sw_off(); kmap(page);
> > #define kunmap_atomic(page,idx) ctx_sw_on(); kunmap(page);
> >
> No. kmap_atomic is called from interrupt context, and kmap calls
> schedule().
> 
> I thought about the attached patch (completely untested).

is it legal to kmap_atomic(a,b); kmap_atomic(c,d); kunmap_atomic(a,b); ?
If so, your patch may need some ounting....
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-09  3:44 ` Robert Love
@ 2001-09-09  7:38   ` Manfred Spraul
       [not found]   ` <001a01c1390262c7f30/mnt/sendme10411ac@local>
  1 sibling, 0 replies; 32+ messages in thread
From: Manfred Spraul @ 2001-09-09  7:38 UTC (permalink / raw)
  To: Robert Love; +Cc: Roger Larsson, linux-kernel, nigel

[-- Attachment #1: Type: text/plain, Size: 274 bytes --]


> #define kmap_atomic(page,idx) ctx_sw_off(); kmap(page);
> #define kunmap_atomic(page,idx) ctx_sw_on(); kunmap(page);
>
No. kmap_atomic is called from interrupt context, and kmap calls
schedule().

I thought about the attached patch (completely untested).

--
    Manfred

[-- Attachment #2: patch-untested --]
[-- Type: application/octet-stream, Size: 438 bytes --]

--- highmem.h.prev	Sun Sep  9 08:59:04 2001
+++ highmem.h	Sun Sep  9 09:00:07 2001
@@ -88,6 +88,7 @@
 	if (page < highmem_start_page)
 		return page_address(page);
 
+	ctx_sw_off();
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 #if HIGHMEM_DEBUG
@@ -119,6 +120,7 @@
 	pte_clear(kmap_pte-idx);
 	__flush_tlb_one(vaddr);
 #endif
+	ctx_sw_on();
 }
 
 #endif /* __KERNEL__ */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
  2001-09-08 23:11 [SMP lock BUG?] " Manfred Spraul
@ 2001-09-09  3:44 ` Robert Love
  2001-09-09  7:38   ` Manfred Spraul
       [not found]   ` <001a01c1390262c7f30/mnt/sendme10411ac@local>
  0 siblings, 2 replies; 32+ messages in thread
From: Robert Love @ 2001-09-09  3:44 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Roger Larsson, linux-kernel, nigel

On Sat, 2001-09-08 at 19:11, Manfred Spraul wrote:
> No.
> It seems to be a missing ctx_sw_off() in highmem.h:
> kmap_atomic uses a per-cpu variable, thus ctx_sw_off() is needed in
> kmap_atomic, and ctx_sw_on() in kunmap_atomic().

in my tree, kmap_atomic and kunmap_atomic are just defined to
kmap/kunmap.  are you suggesting something like this?

#define kmap_atomic(page,idx)	ctx_sw_off(); kmap(page);
#define kunmap_atomic(page,idx)	ctx_sw_on(); kunmap(page);

-- 
Robert M. Love
rml at ufl.edu
rml at tech9.net


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [SMP lock BUG?] Re: Feedback on preemptible kernel patch
@ 2001-09-08 23:11 Manfred Spraul
  2001-09-09  3:44 ` Robert Love
  0 siblings, 1 reply; 32+ messages in thread
From: Manfred Spraul @ 2001-09-08 23:11 UTC (permalink / raw)
  To: Roger Larsson; +Cc: linux-kernel, Robert Love, nigel

> This is interesting. [Assumes UP Athlon - correct]
> Note that all BUGs out in highmem.h:95 (kmap_atomic)
> and that test is only on if you have enabled HIGHMEM_DEBUG
> [my analyze is done with a 2.4.10-pre2 kernel, but I checked with
> later patches and I do not think they fix it either...]
>
> The preemptive kernel puts more SMP stress on the kernel than
> running with multiple CPUs.
>
> So this might be a potential bug in the kernel proper, running with
> a SMP computer.

No.
It seems to be a missing ctx_sw_off() in highmem.h:
kmap_atomic uses a per-cpu variable, thus ctx_sw_off() is needed in
kmap_atomic, and ctx_sw_on() in kunmap_atomic().

--
    Manfred




^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2001-09-18  0:18 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-08  5:22 Feedback on preemptible kernel patch grue
2001-09-08  5:47 ` Robert Love
2001-09-08 17:33   ` Arjan Filius
2001-09-08 18:22     ` safemode
2001-09-08 20:58     ` [SMP lock BUG?] " Roger Larsson
2001-09-08 22:18       ` Arjan Filius
2001-09-09 14:55       ` george anzinger
2001-09-09 22:25         ` Arjan Filius
2001-09-09  4:40     ` Robert Love
2001-09-09 17:09     ` Robert Love
2001-09-09 21:07       ` Arjan Filius
2001-09-09 21:26         ` Robert Love
2001-09-09 21:23       ` Arjan Filius
2001-09-09 21:37         ` Robert Love
2001-09-10  3:24           ` Daniel Phillips
2001-09-10  3:37             ` Jeremy Zawodny
2001-09-10  5:09           ` Robert Love
2001-09-10 18:25             ` Daniel Phillips
2001-09-10 21:29             ` Arjan Filius
2001-09-13 17:27               ` Robert Love
2001-09-14  7:30                 ` george anzinger
2001-09-14 15:01                 ` Robert Love
2001-09-11 19:47           ` Arjan Filius
2001-09-09 18:57   ` grue
2001-09-09 21:44     ` Robert Love
2001-09-08 23:11 [SMP lock BUG?] " Manfred Spraul
2001-09-09  3:44 ` Robert Love
2001-09-09  7:38   ` Manfred Spraul
     [not found]   ` <001a01c1390262c7f30/mnt/sendme10411ac@local>
2001-09-14  9:15     ` Pavel Machek
2001-09-17 22:40       ` Manfred Spraul
2001-09-18  0:19       ` Robert Love
2001-09-17 22:41     ` Robert Love

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).