linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.20pre11aa1
@ 2002-10-16 16:51 Andrea Arcangeli
  2002-10-17 12:04 ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-16 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Srihari Vijayaraghavan

Srihari, I would like if you could try to reproduce with this new one
with CONFIG_SOUND=n.  Thanks!

URL:

	http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20pre11aa1.gz
	http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20pre11aa1/

Only in 2.4.20pre10aa1: 00_extraversion-10
Only in 2.4.20pre11aa1: 00_extraversion-11
Only in 2.4.20pre10aa1: 00_max_bytes-5
Only in 2.4.20pre11aa1: 00_max_bytes-6
Only in 2.4.20pre10aa1: 60_pagecache-atomic-6
Only in 2.4.20pre11aa1: 60_pagecache-atomic-7
Only in 2.4.20pre10aa1: 70_intermezzo-junk-1
Only in 2.4.20pre11aa1: 70_intermezzo-junk-2

	Rediffed.

Only in 2.4.20pre11aa1: 00_fcntl_getfl-largefile-1

	Clear the implicit O_LARGEPAGE with 64bit archs.

Only in 2.4.20pre11aa1: 00_o_direct-read-overflow-write-locking-xfs-2

	fix xfs compilation (from Christoph).

Only in 2.4.20pre10aa1: 20_sched-o1-fixes-4
Only in 2.4.20pre11aa1: 20_sched-o1-fixes-5

	Take the expired queue into account in sched_yield, still
	sched_yield is a cpu-local operation unlike in 2.4 mainline.

	Fix idle rescheduling so we don't waste an 80% of the cpu power of some
	big irons.

	Fixed a race that could explain some instability (in my my tree only).

Only in 2.4.20pre10aa1: 86_x86_64-tsc-hpet-pit-1

	Dropped temporarily.

Only in 2.4.20pre10aa1: 9900_aio-11.gz
Only in 2.4.20pre11aa1: 9900_aio-12.gz

	Unplug the queue properly in the next_chunk passes too. (from
	Chris Mason)

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-16 16:51 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-17 12:04 ` Srihari Vijayaraghavan
  2002-10-17 12:10   ` 2.4.20pre11aa1 Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-17 12:04 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

> Srihari, I would like if you could try to reproduce with this new one
> with CONFIG_SOUND=n.  Thanks!

No worries!

I tried it without sound and unfortunately it crashed few times. The good news 
is that it is very stable without agpgart and radeon (module or not) support.

These are the three oops with agpgart and radeon as modules:
------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre11aa1/ (default)
     -m /boot/System.map-2.4.20-pre11aa1 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 17 20:27:24 localhost kernel: Unable to handle kernel paging request at 
virtual address c68b8008
Oct 17 20:27:24 localhost kernel: c01180ae
Oct 17 20:27:24 localhost kernel: *pde = 068001e3
Oct 17 20:27:24 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #3 Thu Oct 17 
20:18:58 EST 2002
Oct 17 20:27:24 localhost kernel: CPU:    0
Oct 17 20:27:24 localhost kernel: EIP:    0010:[<c01180ae>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 17 20:27:24 localhost kernel: EFLAGS: 00013206
Oct 17 20:27:24 localhost kernel: eax: bfffec7c   ebx: c68b8000   ecx: 
c020de0c   edx: 00000018
Oct 17 20:27:24 localhost kernel: esi: 00000100   edi: bfffec7c   ebp: 
ffffffff   esp: c58f5f78
Oct 17 20:27:24 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888, 
stackpage=c58f5000)
Oct 17 20:27:24 localhost kernel: Stack: dff82e04 00000000 00001000 00000000 
00000000 ffffffea c020dda0 bfffec7c 
Oct 17 20:27:24 localhost kernel:        080640e8 c01188a4 080640e8 00000100 
bfffec7c 00000004 c58f4000 00000100 
Oct 17 20:27:24 localhost kernel:        bfffec7c bfffeca8 c01074ff 00000000 
00000001 080640e8 00000100 bfffec7c 
Oct 17 20:27:24 localhost kernel: Call Trace:    [<c01188a4>] [<c01074ff>]
Oct 17 20:27:24 localhost kernel: Code: 8b 7b 08 89 e9 31 c0 f2 ae f7 d1 49 8d 
79 01 39 f7 77 7f 8b 


>>EIP; c01180ae <qm_modules+2e/140>   <=====

>>eax; bfffec7c Before first symbol
>>ebx; c68b8000 <[agpgart].bss.end+2c031e5/1c0a3265>
>>ecx; c020de0c <modlist_lock+0/0>
>>edi; bfffec7c Before first symbol
>>ebp; ffffffff <END_OF_CODE+202a3a58/????>
>>esp; c58f5f78 <[agpgart].bss.end+1c4115d/1c0a3265>

Trace; c01188a4 <sys_query_module+d4/1b0>
Trace; c01074ff <system_call+33/38>

Code;  c01180ae <qm_modules+2e/140>
00000000 <_EIP>:
Code;  c01180ae <qm_modules+2e/140>   <=====
   0:   8b 7b 08                  mov    0x8(%ebx),%edi   <=====
Code;  c01180b1 <qm_modules+31/140>
   3:   89 e9                     mov    %ebp,%ecx
Code;  c01180b3 <qm_modules+33/140>
   5:   31 c0                     xor    %eax,%eax
Code;  c01180b5 <qm_modules+35/140>
   7:   f2 ae                     repnz scas %es:(%edi),%al
Code;  c01180b7 <qm_modules+37/140>
   9:   f7 d1                     not    %ecx
Code;  c01180b9 <qm_modules+39/140>
   b:   49                        dec    %ecx
Code;  c01180ba <qm_modules+3a/140>
   c:   8d 79 01                  lea    0x1(%ecx),%edi
Code;  c01180bd <qm_modules+3d/140>
   f:   39 f7                     cmp    %esi,%edi
Code;  c01180bf <qm_modules+3f/140>
  11:   77 7f                     ja     92 <_EIP+0x92>
Code;  c01180c1 <qm_modules+41/140>
  13:   8b 00                     mov    (%eax),%eax

Oct 17 20:27:24 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c56ac098
Oct 17 20:27:24 localhost kernel: c0119dd0
Oct 17 20:27:24 localhost kernel: *pde = 054001e3
Oct 17 20:27:24 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #3 Thu Oct 17 
20:18:58 EST 2002
Oct 17 20:27:24 localhost kernel: CPU:    0
Oct 17 20:27:24 localhost kernel: EIP:    0010:[<c0119dd0>]    Not tainted
Oct 17 20:27:24 localhost kernel: EFLAGS: 00013206
Oct 17 20:27:24 localhost kernel: eax: 00000000   ebx: c56ac000   ecx: 
c4ad9000   edx: 00000000
Oct 17 20:27:24 localhost kernel: esi: c58f4000   edi: 000000b8   ebp: 
0000000b   esp: c58f5e2c
Oct 17 20:27:24 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888, 
stackpage=c58f5000)
Oct 17 20:27:24 localhost kernel: Stack: c1587bb8 c4ad9ac0 c58f4000 00000000 
c58f4000 000000b8 0000000b c011a2c0 
Oct 17 20:27:24 localhost kernel:        c58f4000 c16f1880 c58f5f44 00000000 
000000b8 c58f4000 c0107bef 0000000b 
Oct 17 20:27:24 localhost kernel:        c01f1e2a 00000000 00000000 c01125a4 
c01f1e2a c58f5f44 00000000 dff82e00 
Oct 17 20:27:24 localhost kernel: Call Trace:    [<c011a2c0>] [<c0107bef>] 
[<c01125a4>] [<c0126aaa>] [<c01314e5>]
Oct 17 20:27:24 localhost kernel:   [<c0126dde>] [<c011244a>] [<c01276dc>] 
[<c01122a0>] [<c01075f0>] [<c01180ae>]
Oct 17 20:27:24 localhost kernel:   [<c01188a4>] [<c01074ff>]
Oct 17 20:27:24 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b 
5b 50 81 fb 00 a0 21 


>>EIP; c0119dd0 <exit_notify+20/300>   <=====

>>ebx; c56ac000 <[agpgart].bss.end+19f71e5/1c0a3265>
>>ecx; c4ad9000 <[agpgart].bss.end+e241e5/1c0a3265>
>>esi; c58f4000 <[agpgart].bss.end+1c3f1e5/1c0a3265>
>>esp; c58f5e2c <[agpgart].bss.end+1c41011/1c0a3265>

Trace; c011a2c0 <do_exit+210/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c0126aaa <do_no_page+8a/1c0>
Trace; c01314e5 <lru_cache_add+65/70>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01180ae <qm_modules+2e/140>
Trace; c01188a4 <sys_query_module+d4/1b0>
Trace; c01074ff <system_call+33/38>

Code;  c0119dd0 <exit_notify+20/300>
00000000 <_EIP>:
Code;  c0119dd0 <exit_notify+20/300>   <=====
   0:   39 b3 98 00 00 00         cmp    %esi,0x98(%ebx)   <=====
Code;  c0119dd6 <exit_notify+26/300>
   6:   0f 84 85 02 00 00         je     291 <_EIP+0x291>
Code;  c0119ddc <exit_notify+2c/300>
   c:   8b 5b 50                  mov    0x50(%ebx),%ebx
Code;  c0119ddf <exit_notify+2f/300>
   f:   81 fb 00 a0 21 00         cmp    $0x21a000,%ebx

Oct 17 20:27:24 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c4db8098
Oct 17 20:27:24 localhost kernel: c0119dd0
Oct 17 20:27:24 localhost kernel: *pde = 04c001e3
Oct 17 20:27:24 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #3 Thu Oct 17 
20:18:58 EST 2002
Oct 17 20:27:24 localhost kernel: CPU:    0
Oct 17 20:27:24 localhost kernel: EIP:    0010:[<c0119dd0>]    Not tainted
Oct 17 20:27:24 localhost kernel: EFLAGS: 00013206
Oct 17 20:27:24 localhost kernel: eax: 00000000   ebx: c4db8000   ecx: 
00000000   edx: 00000000
Oct 17 20:27:24 localhost kernel: esi: c58f4000   edi: 000002ac   ebp: 
0000000b   esp: c58f5ce0
Oct 17 20:27:24 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888, 
stackpage=c58f5000)
Oct 17 20:27:24 localhost kernel: Stack: 00000020 00000400 c58f4000 00000000 
c58f4000 000002ac 0000000b c011a2c0 
Oct 17 20:27:24 localhost kernel:        c58f4000 00000000 c58f5df8 00000000 
000002ac c58f4000 c0107bef 0000000b 
Oct 17 20:27:24 localhost kernel:        c01f1e2a 00000000 00000000 c01125a4 
c01f1e2a c58f5df8 00000000 33323130 
Oct 17 20:27:24 localhost kernel: Call Trace:    [<c011a2c0>] [<c0107bef>] 
[<c01125a4>] [<c0131577>] [<c01278e8>]
Oct 17 20:27:24 localhost kernel:   [<c01122a0>] [<c01276dc>] [<c01122a0>] 
[<c01075f0>] [<c0119dd0>] [<c011a2c0>]
Oct 17 20:27:24 localhost kernel:   [<c0107bef>] [<c01125a4>] [<c0126aaa>] 
[<c01314e5>] [<c0126dde>] [<c011244a>]
Oct 17 20:27:24 localhost kernel:   [<c01276dc>] [<c01122a0>] [<c01075f0>] 
[<c01180ae>] [<c01188a4>] [<c01074ff>]
Oct 17 20:27:24 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b 
5b 50 81 fb 00 a0 21 


>>EIP; c0119dd0 <exit_notify+20/300>   <=====

>>ebx; c4db8000 <[agpgart].bss.end+11031e5/1c0a3265>
>>esi; c58f4000 <[agpgart].bss.end+1c3f1e5/1c0a3265>
>>esp; c58f5ce0 <[agpgart].bss.end+1c40ec5/1c0a3265>

Trace; c011a2c0 <do_exit+210/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c0131577 <__lru_cache_del+87/90>
Trace; c01278e8 <zap_pte_range+f8/150>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c0119dd0 <exit_notify+20/300>
Trace; c011a2c0 <do_exit+210/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c0126aaa <do_no_page+8a/1c0>
Trace; c01314e5 <lru_cache_add+65/70>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01180ae <qm_modules+2e/140>
Trace; c01188a4 <sys_query_module+d4/1b0>
Trace; c01074ff <system_call+33/38>

Code;  c0119dd0 <exit_notify+20/300>
00000000 <_EIP>:
Code;  c0119dd0 <exit_notify+20/300>   <=====
   0:   39 b3 98 00 00 00         cmp    %esi,0x98(%ebx)   <=====
Code;  c0119dd6 <exit_notify+26/300>
   6:   0f 84 85 02 00 00         je     291 <_EIP+0x291>
Code;  c0119ddc <exit_notify+2c/300>
   c:   8b 5b 50                  mov    0x50(%ebx),%ebx
Code;  c0119ddf <exit_notify+2f/300>
   f:   81 fb 00 a0 21 00         cmp    $0x21a000,%ebx


1 warning issued.  Results may not be reliable.

These are the two oops with agpgart and radeon built-in the kernel:
------------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1-agpdrm.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre11aa1-agpdrm/ (default)
     -m /boot/System.map-2.4.20-pre11aa1-agpdrm (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 17 21:22:29 localhost kernel: Unable to handle kernel paging request at 
virtual address c72b4034
Oct 17 21:22:29 localhost kernel: c0112b57
Oct 17 21:22:29 localhost kernel: *pde = 070001e3
Oct 17 21:22:29 localhost kernel: Oops: 0000 2.4.20-pre11aa1-agpdrm #6 Thu Oct 
17 21:11:50 EST 2002
Oct 17 21:22:29 localhost kernel: CPU:    0
Oct 17 21:22:29 localhost kernel: EIP:    0010:[<c0112b57>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 17 21:22:29 localhost kernel: EFLAGS: 00013086
Oct 17 21:22:29 localhost kernel: eax: 00000000   ebx: c8aaa000   ecx: 
c72b4000   edx: c8aabe78
Oct 17 21:22:29 localhost kernel: esi: 00000002   edi: c01f5f22   ebp: 
00003246   esp: c8aabd9c
Oct 17 21:22:29 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 21:22:29 localhost kernel: Process modprobe (pid: 1036, 
stackpage=c8aab000)
Oct 17 21:22:29 localhost kernel: Stack: c8aaa000 00000002 c6e3c000 c8aaa000 
c01124c2 c01f5f22 c8aaa000 00000000 
Oct 17 21:22:29 localhost kernel:        c6270f8e c110eb5c c8aaa000 c8aabfc4 
0001ff9d c022326f c6270000 c110eb5c 
Oct 17 21:22:29 localhost kernel:        c2d94000 00000000 c0223360 c8aabfc8 
c0141c50 c8aabdfc c8aabf6c c8aabdfc 
Oct 17 21:22:29 localhost kernel: Call Trace:    [<c01124c2>] [<c01f5f22>] 
[<c0141c50>] [<c01122a0>] [<c01075f0>]
Oct 17 21:22:29 localhost kernel:   [<c01f5f22>] [<c01269b2>] [<c0126dde>] 
[<c011244a>] [<c01286df>] [<c0128a37>]
Oct 17 21:22:29 localhost kernel:   [<c0128ab4>] [<c01122a0>] [<c01075f0>]
Oct 17 21:22:29 localhost kernel: Code: 8b 51 34 85 d2 74 3f f7 41 14 41 00 00 
00 74 36 8b 71 38 89 


>>EIP; c0112b57 <search_exception_table+17/80>   <=====

>>ebx; c8aaa000 <[sr_mod].bss.end+1da61a9/1902c229>
>>ecx; c72b4000 <[sr_mod].bss.end+5b01a9/1902c229>
>>edx; c8aabe78 <[sr_mod].bss.end+1da8021/1902c229>
>>edi; c01f5f22 <fast_clear_page+12/50>
>>ebp; 00003246 Before first symbol
>>esp; c8aabd9c <[sr_mod].bss.end+1da7f45/1902c229>

Trace; c01124c2 <do_page_fault+222/5a0>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c0141c50 <do_execve+180/220>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c01269b2 <do_anonymous_page+a2/110>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01286df <unmap_fixup+12f/140>
Trace; c0128a37 <do_munmap+297/2d0>
Trace; c0128ab4 <sys_munmap+44/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code;  c0112b57 <search_exception_table+17/80>
00000000 <_EIP>:
Code;  c0112b57 <search_exception_table+17/80>   <=====
   0:   8b 51 34                  mov    0x34(%ecx),%edx   <=====
Code;  c0112b5a <search_exception_table+1a/80>
   3:   85 d2                     test   %edx,%edx
Code;  c0112b5c <search_exception_table+1c/80>
   5:   74 3f                     je     46 <_EIP+0x46>
Code;  c0112b5e <search_exception_table+1e/80>
   7:   f7 41 14 41 00 00 00      testl  $0x41,0x14(%ecx)
Code;  c0112b65 <search_exception_table+25/80>
   e:   74 36                     je     46 <_EIP+0x46>
Code;  c0112b67 <search_exception_table+27/80>
  10:   8b 71 38                  mov    0x38(%ecx),%esi
Code;  c0112b6a <search_exception_table+2a/80>
  13:   89 00                     mov    %eax,(%eax)

Oct 17 21:22:29 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c77340c4
Oct 17 21:22:29 localhost kernel: c0139b5e
Oct 17 21:22:29 localhost kernel: *pde = 07769163
Oct 17 21:22:29 localhost kernel: Oops: 0003 2.4.20-pre11aa1-agpdrm #6 Thu Oct 
17 21:11:50 EST 2002
Oct 17 21:22:29 localhost kernel: CPU:    0
Oct 17 21:22:29 localhost kernel: EIP:    0010:[<c0139b5e>]    Not tainted
Oct 17 21:22:29 localhost kernel: EFLAGS: 00013246
Oct 17 21:22:29 localhost kernel: eax: c27e7340   ebx: c779cdc0   ecx: 
00000000   edx: c77340c0
Oct 17 21:22:29 localhost kernel: esi: c158e380   edi: c1689dc0   ebp: 
c1ac8540   esp: c8aabc20
Oct 17 21:22:29 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 17 21:22:29 localhost kernel: Process modprobe (pid: 1036, 
stackpage=c8aab000)
Oct 17 21:22:29 localhost kernel: Stack: c1689dc0 c779cdc0 c1c338c0 00001000 
dfe572c0 08060000 c0128e85 dfe572c0 
Oct 17 21:22:29 localhost kernel:        08060000 00001000 c1c33940 dfe572c0 
c8aaa000 000002b4 0000000b c0115076 
Oct 17 21:22:29 localhost kernel:        dfe572c0 00003202 dfe572c0 c011a137 
dfe572c0 00000000 c8aabd68 00000000 
Oct 17 21:22:29 localhost kernel: Call Trace:    [<c0128e85>] [<c0115076>] 
[<c011a137>] [<c0107bef>] [<c01125a4>]
Oct 17 21:22:29 localhost kernel:   [<c014322b>] [<c01122a0>] [<c01075f0>] 
[<c01f5f22>] [<c0112b57>] [<c01124c2>]
Oct 17 21:22:29 localhost kernel:   [<c01f5f22>] [<c0141c50>] [<c01122a0>] 
[<c01075f0>] [<c01f5f22>] [<c01269b2>]
Oct 17 21:22:29 localhost kernel:   [<c0126dde>] [<c011244a>] [<c01286df>] 
[<c0128a37>] [<c0128ab4>] [<c01122a0>]
Oct 17 21:22:29 localhost kernel:   [<c01075f0>]
Oct 17 21:22:29 localhost kernel: Code: 89 42 04 c7 03 00 00 00 00 a1 b4 3e 22 
c0 89 58 04 89 03 89 


>>EIP; c0139b5e <fput+9e/120>   <=====

>>eax; c27e7340 <[floppy].bss.end+599905/4ab2645>
>>ebx; c779cdc0 <[sr_mod].bss.end+a98f69/1902c229>
>>edx; c77340c0 <[sr_mod].bss.end+a30269/1902c229>
>>esi; c158e380 <_end+12f1d10/15aaa10>
>>edi; c1689dc0 <_end+13ed750/15aaa10>
>>ebp; c1ac8540 <[md].bss.end+25a861/3123a1>
>>esp; c8aabc20 <[sr_mod].bss.end+1da7dc9/1902c229>

Trace; c0128e85 <exit_mmap+125/140>
Trace; c0115076 <mmput+56/d0>
Trace; c011a137 <do_exit+87/260>
Trace; c0107bef <die+7f/80>
Trace; c01125a4 <do_page_fault+304/5a0>
Trace; c014322b <cached_lookup+1b/70>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c0112b57 <search_exception_table+17/80>
Trace; c01124c2 <do_page_fault+222/5a0>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c0141c50 <do_execve+180/220>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>
Trace; c01f5f22 <fast_clear_page+12/50>
Trace; c01269b2 <do_anonymous_page+a2/110>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01286df <unmap_fixup+12f/140>
Trace; c0128a37 <do_munmap+297/2d0>
Trace; c0128ab4 <sys_munmap+44/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code;  c0139b5e <fput+9e/120>
00000000 <_EIP>:
Code;  c0139b5e <fput+9e/120>   <=====
   0:   89 42 04                  mov    %eax,0x4(%edx)   <=====
Code;  c0139b61 <fput+a1/120>
   3:   c7 03 00 00 00 00         movl   $0x0,(%ebx)
Code;  c0139b67 <fput+a7/120>
   9:   a1 b4 3e 22 c0            mov    0xc0223eb4,%eax
Code;  c0139b6c <fput+ac/120>
   e:   89 58 04                  mov    %ebx,0x4(%eax)
Code;  c0139b6f <fput+af/120>
  11:   89 03                     mov    %eax,(%ebx)
Code;  c0139b71 <fput+b1/120>
  13:   89 00                     mov    %eax,(%eax)


1 warning issued.  Results may not be reliable.

The mainline (2.4.20-pre11) is fine with agpgart and radeon as modules. I 
haven't tested it with agpgart and radeon built-in the kernel.

I am trying to find if any of my friends have a different Radeon card (mine is 
Radeon VE QY) or any video card that has DRM support on the official kernel 
tree. If I find one I will try and see if --aa works fine with that.

Thanks for your help. 
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-17 12:04 ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-17 12:10   ` Andrea Arcangeli
  2002-10-17 13:01     ` 2.4.20pre11aa1 Keith Owens
  2002-10-17 13:02     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 2 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-17 12:10 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Thu, Oct 17, 2002 at 10:04:50PM +1000, Srihari Vijayaraghavan wrote:
> Hello Andrea,
> 
> > Srihari, I would like if you could try to reproduce with this new one
> > with CONFIG_SOUND=n.  Thanks!
> 
> No worries!
> 
> I tried it without sound and unfortunately it crashed few times. The good news 
> is that it is very stable without agpgart and radeon (module or not) support.

I've no idea what could be wrong with the graphics drivers, there are no
changes there. 

> ffffffff   esp: c58f5f78
> Oct 17 20:27:24 localhost kernel: ds: 0018   es: 0018   ss: 0018
> Oct 17 20:27:24 localhost kernel: Process modprobe (pid: 888, 


please try to find which is this module, replace modprobe with a script
that does:

#!/bin/sh
echo $@ >>/tmp/log
sync
modprobe.orig $@

then look at log after the crash. You said in your last email that the
gart code wasn't the culprit. If it isn't the sound drivers I've no
clue what it is. What does it mean the without agpgart it is very
stable? That it crashes less frequently? (I recalled it crashed even
without those modules)

It doesn't make any sense that 2.4.20-pre11 works and my tree doesn't,
there are no changes to those sound and graphics driver. Can you make
sure that modversions is enabled, and please send me your .config.

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-17 13:02     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-17 13:00       ` Andrea Arcangeli
  0 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-17 13:00 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Thu, Oct 17, 2002 at 11:02:24PM +1000, Srihari Vijayaraghavan wrote:
> Sorry if it was not clear. The -aa kernel crashes _only_ when I have agpgart 
> and radeon support (either as modules or as built-in the kernel). If there is 
> no agpgart and radeon support enabled, it does not crash.

ok. So the mystery is why it crashes only with my tree. there are no
changes to the graphics/gart drivers as far as I can tell. Now I even
wonder about a collision of dma with the sound driver or something weird
like that ;)

> > It doesn't make any sense that 2.4.20-pre11 works and my tree doesn't,
> > there are no changes to those sound and graphics driver. Can you make
> > sure that modversions is enabled, and please send me your .config.
> 
> Here is my current .config. While this one doesn't have modversions enabled I 
> have seen crashes even when it is enabled.

ok. but you can left modversions enabled, I do it myself too ;)

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-17 12:10   ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-17 13:01     ` Keith Owens
  2002-10-17 15:26       ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  2002-10-17 13:02     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  1 sibling, 1 reply; 25+ messages in thread
From: Keith Owens @ 2002-10-17 13:01 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Srihari Vijayaraghavan, linux-kernel

On Thu, 17 Oct 2002 14:10:05 +0200, 
Andrea Arcangeli <andrea@suse.de> wrote:
>please try to find which is this module, replace modprobe with a script
>that does:
>
>#!/bin/sh
>echo $@ >>/tmp/log
>sync
>modprobe.orig $@

You don't need that, just mkdir /var/log/ksymoops.  modprobe/insmod
will create a daily log file and snapshot a copy of lsmod and
/proc/ksyms for every module loaded or unloaded.  All with sync in the
right places.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-17 12:10   ` 2.4.20pre11aa1 Andrea Arcangeli
  2002-10-17 13:01     ` 2.4.20pre11aa1 Keith Owens
@ 2002-10-17 13:02     ` Srihari Vijayaraghavan
  2002-10-17 13:00       ` 2.4.20pre11aa1 Andrea Arcangeli
  1 sibling, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-17 13:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello,

> please try to find which is this module, replace modprobe with a script
> that does:
>
> #!/bin/sh
> echo $@ >>/tmp/log
> sync
> modprobe.orig $@

I will try that.

> then look at log after the crash. You said in your last email that the
> gart code wasn't the culprit. If it isn't the sound drivers I've no
> clue what it is. What does it mean the without agpgart it is very
> stable? That it crashes less frequently? (I recalled it crashed even
> without those modules)

Sorry if it was not clear. The -aa kernel crashes _only_ when I have agpgart 
and radeon support (either as modules or as built-in the kernel). If there is 
no agpgart and radeon support enabled, it does not crash.

> It doesn't make any sense that 2.4.20-pre11 works and my tree doesn't,
> there are no changes to those sound and graphics driver. Can you make
> sure that modversions is enabled, and please send me your .config.

Here is my current .config. While this one doesn't have modversions enabled I 
have seen crashes even when it is enabled.

CONFIG_X86=y
CONFIG_UID16=y
CONFIG_MODULES=y
CONFIG_KMOD=y
CONFIG_MK7=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG8=y
CONFIG_X86_HAS_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_F00F_WORKS_OK=y
CONFIG_X86_MCE=y
CONFIG_NOHIGHMEM=y
CONFIG_1GB=y
CONFIG_MTRR=y
CONFIG_X86_TSC=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_BLK_DEV_FD=m
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_RAID0=m
CONFIG_PACKET=m
CONFIG_NETFILTER=y
CONFIG_UNIX=m
CONFIG_INET=y
CONFIG_IP_NF_CONNTRACK=m
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_TOS=m
CONFIG_IP_NF_MATCH_STATE=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDESCSI=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_SCSI=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m
CONFIG_SCSI_DEBUG_QUEUES=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_NETDEVICES=y
CONFIG_PPP=m
CONFIG_PPP_ASYNC=m
CONFIG_PPP_DEFLATE=m
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=m
CONFIG_SERIAL_EXTENDED=y
CONFIG_UNIX98_PTYS=y
CONFIG_MOUSE=m
CONFIG_PSMOUSE=y
CONFIG_RTC=m
CONFIG_AGP=y
CONFIG_AGP_AMD=y
CONFIG_DRM=y
CONFIG_DRM_NEW=y
CONFIG_DRM_RADEON=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_VGA_CONSOLE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m

Thanks
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-17 13:01     ` 2.4.20pre11aa1 Keith Owens
@ 2002-10-17 15:26       ` Srihari Vijayaraghavan
  2002-10-17 16:27         ` 2.4.20pre11aa1 Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-17 15:26 UTC (permalink / raw)
  To: Keith Owens, Andrea Arcangeli; +Cc: linux-kernel

Hello Keith,

> You don't need that, just mkdir /var/log/ksymoops.  modprobe/insmod
> will create a daily log file and snapshot a copy of lsmod and
> /proc/ksyms for every module loaded or unloaded.  All with sync in the
> right places.

Thanks, and that works fine.

Hello Andrea,

1. To simplify and to prove that the crashes are associated with agpgart 
and/or radeon I have compiled kernel with _only_ agpgart and radeon as 
modules and nothing else.

$ cat /lib/modules/2.4.20-pre10aa1/modules.dep
/lib/modules/2.4.20-pre11aa1/kernel/drivers/char/agp/agpgart.o:

/lib/modules/2.4.20-pre11aa1/kernel/drivers/char/drm/radeon.o:

These are some decoded output of oops appeared in the system logs:
------------------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre11aa1/ (default)
     -m /boot/System.map-2.4.20-pre11aa1 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 18 00:29:02 localhost kernel: Unable to handle kernel paging request at 
virtual address c73ae000
Oct 18 00:29:02 localhost kernel: c0210ee2
Oct 18 00:29:02 localhost kernel: *pde = 070001e3
Oct 18 00:29:02 localhost kernel: Oops: 0002 2.4.20-pre11aa1 #9 Fri Oct 18 
00:06:42 EST 2002
Oct 18 00:29:02 localhost kernel: CPU:    0
Oct 18 00:29:02 localhost kernel: EIP:    0010:[<c0210ee2>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 18 00:29:02 localhost kernel: EFLAGS: 00013246
Oct 18 00:29:02 localhost kernel: eax: 0000003f   ebx: c73ae000   ecx: 
c7c8c000   edx: 00000000
Oct 18 00:29:02 localhost kernel: esi: c2daffe0   edi: 00000fe0   ebp: 
c113e204   esp: c7c8deac
Oct 18 00:29:02 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 18 00:29:02 localhost kernel: Process modprobe (pid: 944, 
stackpage=c7c8d000)
Oct 18 00:29:03 localhost kernel: Stack: 00104025 c01269b2 c73ae000 c7534bfc 
bfff8e50 c2d9f480 c4e97a40 c0126dde 
Oct 18 00:29:03 localhost kernel:        c2d9f480 c4e97a40 c2daffe0 c7534bfc 
00000001 bfff8e50 c7c8df24 c2d9f480 
Oct 18 00:29:03 localhost kernel:        c4e97a40 bfff8e50 c7c8c000 c011244a 
c2d9f480 c4e97a40 bfff8e50 00000001 
Oct 18 00:29:03 localhost kernel: Call Trace:    [<c01269b2>] [<c0126dde>] 
[<c011244a>] [<c0127bb6>] [<c0128cc7>]
Oct 18 00:29:03 localhost kernel:   [<c0127ab1>] [<c01122a0>] [<c01075f0>]
Oct 18 00:29:03 localhost kernel: Code: 0f e7 03 0f e7 43 08 0f e7 43 10 0f e7 
43 18 0f e7 43 20 0f 


>>EIP; c0210ee2 <fast_clear_page+12/50>   <=====

>>ebx; c73ae000 <END_OF_CODE+35e90a5/????>
>>ecx; c7c8c000 <END_OF_CODE+3ec70a5/????>
>>esi; c2daffe0 <_end+2ad7c48/3a47ce8>
>>edi; 00000fe0 Before first symbol
>>ebp; c113e204 <_end+e65e6c/3a47ce8>
>>esp; c7c8deac <END_OF_CODE+3ec8f51/????>

Trace; c01269b2 <do_anonymous_page+a2/110>
Trace; c0126dde <handle_mm_fault+8e/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c0127bb6 <__vma_link+56/d0>
Trace; c0128cc7 <do_brk+1d7/210>
Trace; c0127ab1 <sys_brk+f1/130>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code;  c0210ee2 <fast_clear_page+12/50>
00000000 <_EIP>:
Code;  c0210ee2 <fast_clear_page+12/50>   <=====
   0:   0f e7 03                  movntq %mm0,(%ebx)   <=====
Code;  c0210ee5 <fast_clear_page+15/50>
   3:   0f e7 43 08               movntq %mm0,0x8(%ebx)
Code;  c0210ee9 <fast_clear_page+19/50>
   7:   0f e7 43 10               movntq %mm0,0x10(%ebx)
Code;  c0210eed <fast_clear_page+1d/50>
   b:   0f e7 43 18               movntq %mm0,0x18(%ebx)
Code;  c0210ef1 <fast_clear_page+21/50>
   f:   0f e7 43 20               movntq %mm0,0x20(%ebx)
Code;  c0210ef5 <fast_clear_page+25/50>
  13:   0f 00 00                  sldtl  (%eax)

Oct 18 00:29:03 localhost kernel:  <1>Unable to handle kernel NULL pointer 
dereference at virtual address 00000044
Oct 18 00:29:03 localhost kernel: c014ca41
Oct 18 00:29:03 localhost kernel: *pde = 0752b067
Oct 18 00:29:03 localhost kernel: Oops: 0000 2.4.20-pre11aa1 #9 Fri Oct 18 
00:06:42 EST 2002
Oct 18 00:29:03 localhost kernel: CPU:    0
Oct 18 00:29:03 localhost kernel: EIP:    0010:[<c014ca41>]    Not tainted
Oct 18 00:29:03 localhost kernel: EFLAGS: 00013217
Oct 18 00:29:03 localhost kernel: eax: dff32cf8   ebx: 00000010   ecx: 
00000010   edx: dff00000
Oct 18 00:29:03 localhost kernel: esi: 00000000   edi: 00000000   ebp: 
0003b0c1   esp: c64d9d74
Oct 18 00:29:03 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 18 00:29:03 localhost kernel: Process X (pid: 945, stackpage=c64d9000)
Oct 18 00:29:03 localhost kernel: Stack: 00000000 00000000 00000000 00000000 
00000000 dff32cf8 dfe66005 00000002 
Oct 18 00:29:03 localhost kernel:        dfe66005 dfe66007 00000000 c64d9e14 
c014322b c16d7540 c64d9dd4 dfe66005 
Oct 18 00:29:03 localhost kernel:        c0143854 c16d7540 c64d9dd4 00000000 
00000009 00000000 c16c29c0 00000000 
Oct 18 00:29:03 localhost kernel: Call Trace:    [<c014322b>] [<c0143854>] 
[<c0143d37>] [<c0141187>] [<c0141af7>]
Oct 18 00:29:03 localhost kernel:   [<c0132ecf>] [<c01314e5>] [<c0126510>] 
[<c0126e69>] [<c011244a>] [<c0142fd7>]
Oct 18 00:29:03 localhost kernel:   [<c0105c90>] [<c01074ff>]
Oct 18 00:29:03 localhost kernel: Code: 39 6e 44 8b 1b 75 e8 8b 7c 24 34 39 7e 
0c 75 df 8b 57 4c 85 


>>EIP; c014ca41 <d_lookup+61/110>   <=====

>>eax; dff32cf8 <END_OF_CODE+1c16dd9d/????>
>>edx; dff00000 <END_OF_CODE+1c13b0a5/????>
>>ebp; 0003b0c1 Before first symbol
>>esp; c64d9d74 <END_OF_CODE+2714e19/????>

Trace; c014322b <cached_lookup+1b/70>
Trace; c0143854 <link_path_walk+3c4/6f0>
Trace; c0143d37 <path_lookup+37/40>
Trace; c0141187 <open_exec+27/e0>
Trace; c0141af7 <do_execve+27/220>
Trace; c0132ecf <__alloc_pages+5f/280>
Trace; c01314e5 <lru_cache_add+65/70>
Trace; c0126510 <do_wp_page+140/1f0>
Trace; c0126e69 <handle_mm_fault+119/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c0142fd7 <getname+97/d0>
Trace; c0105c90 <sys_execve+50/80>
Trace; c01074ff <system_call+33/38>

Code;  c014ca41 <d_lookup+61/110>
00000000 <_EIP>:
Code;  c014ca41 <d_lookup+61/110>   <=====
   0:   39 6e 44                  cmp    %ebp,0x44(%esi)   <=====
Code;  c014ca44 <d_lookup+64/110>
   3:   8b 1b                     mov    (%ebx),%ebx
Code;  c014ca46 <d_lookup+66/110>
   5:   75 e8                     jne    ffffffef <_EIP+0xffffffef>
Code;  c014ca48 <d_lookup+68/110>
   7:   8b 7c 24 34               mov    0x34(%esp,1),%edi
Code;  c014ca4c <d_lookup+6c/110>
   b:   39 7e 0c                  cmp    %edi,0xc(%esi)
Code;  c014ca4f <d_lookup+6f/110>
   e:   75 df                     jne    ffffffef <_EIP+0xffffffef>
Code;  c014ca51 <d_lookup+71/110>
  10:   8b 57 4c                  mov    0x4c(%edi),%edx
Code;  c014ca54 <d_lookup+74/110>
  13:   85 00                     test   %eax,(%eax)

Oct 18 00:29:04 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c6b917c4
Oct 18 00:29:04 localhost kernel: c0139920
Oct 18 00:29:04 localhost kernel: *pde = 0748a163
Oct 18 00:29:04 localhost kernel: Oops: 0003 2.4.20-pre11aa1 #9 Fri Oct 18 
00:06:42 EST 2002
Oct 18 00:29:04 localhost kernel: CPU:    0
Oct 18 00:29:04 localhost kernel: EIP:    0010:[<c0139920>]    Not tainted
Oct 18 00:29:04 localhost kernel: EFLAGS: 00010216
Oct 18 00:29:04 localhost kernel: eax: c6b917c0   ebx: c4a132c0   ecx: 
00000004   edx: c0251474
Oct 18 00:29:04 localhost kernel: esi: 00000000   edi: ffffffe9   ebp: 
c158e380   esp: c8bb7f44
Oct 18 00:29:04 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 18 00:29:04 localhost kernel: Process sh (pid: 950, stackpage=c8bb7000)
Oct 18 00:29:04 localhost kernel: Stack: c167e440 00000004 c57acbe4 00000000 
c0137e29 00000004 c16d77c0 00000000 
Oct 18 00:29:04 localhost kernel:        c1be5000 4001edcd bfffeb68 c0137e07 
c16d77c0 c158e380 00000000 c8bb7f84 
Oct 18 00:29:04 localhost kernel:        c16d77c0 c158e380 c1be5000 c2dbc61c 
00000003 00000001 00000001 4001edcd 
Oct 18 00:29:04 localhost kernel: Call Trace:    [<c0137e29>] [<c0137e07>] 
[<c01381e3>] [<c01074ff>]
Oct 18 00:29:04 localhost kernel: Code: 89 50 04 89 02 c7 43 04 00 00 00 00 c7 
03 00 00 00 00 ff 0d 


>>EIP; c0139920 <get_empty_filp+20/130>   <=====

>>eax; c6b917c0 <END_OF_CODE+2dcc865/????>
>>ebx; c4a132c0 <END_OF_CODE+c4e365/????>
>>edx; c0251474 <free_list+0/8>
>>edi; ffffffe9 <END_OF_CODE+3c23b08e/????>
>>ebp; c158e380 <_end+12b5fe8/3a47ce8>
>>esp; c8bb7f44 <END_OF_CODE+4df2fe9/????>

Trace; c0137e29 <dentry_open+19/210>
Trace; c0137e07 <filp_open+67/70>
Trace; c01381e3 <sys_open+53/a0>
Trace; c01074ff <system_call+33/38>

Code;  c0139920 <get_empty_filp+20/130>
00000000 <_EIP>:
Code;  c0139920 <get_empty_filp+20/130>   <=====
   0:   89 50 04                  mov    %edx,0x4(%eax)   <=====
Code;  c0139923 <get_empty_filp+23/130>
   3:   89 02                     mov    %eax,(%edx)
Code;  c0139925 <get_empty_filp+25/130>
   5:   c7 43 04 00 00 00 00      movl   $0x0,0x4(%ebx)
Code;  c013992c <get_empty_filp+2c/130>
   c:   c7 03 00 00 00 00         movl   $0x0,(%ebx)
Code;  c0139932 <get_empty_filp+32/130>
  12:   ff 0d 00 00 00 00         decl   0x0

Oct 18 00:29:10 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c6895b44
Oct 18 00:29:10 localhost kernel: c0139920
Oct 18 00:29:10 localhost kernel: *pde = 0748a163
Oct 18 00:29:10 localhost kernel: Oops: 0003 2.4.20-pre11aa1 #9 Fri Oct 18 
00:06:42 EST 2002
Oct 18 00:29:10 localhost kernel: CPU:    0
Oct 18 00:29:10 localhost kernel: EIP:    0010:[<c0139920>]    Not tainted
Oct 18 00:29:10 localhost kernel: EFLAGS: 00010216
Warning (Oops_read): Code line not seen, dumping what data is available


>>EIP; c0139920 <get_empty_filp+20/130>   <=====


2 warnings issued.  Results may not be reliable.

2. Then I compiled the kernel with one and only module ie, radeon, and nothing 
else.
$ cat /lib/modules/2.4.20-pre11aa1/modules.dep
/lib/modules/2.4.20-pre11aa1/kernel/drivers/char/drm/radeon.o:

Here is the decoded output of the oops appeared on the system logs:
----------------------------------------------------------------------------------------------------
ksymoops 2.4.5 on i686 2.4.20-pre11aa1.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre11aa1/ (default)
     -m /boot/System.map-2.4.20-pre11aa1 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 18 01:00:26 localhost kernel: Unable to handle kernel paging request at 
virtual address c3d50000
Oct 18 01:00:26 localhost kernel: c021389a
Oct 18 01:00:26 localhost kernel: *pde = 03c001e3
Oct 18 01:00:26 localhost kernel: Oops: 0002 2.4.20-pre11aa1 #10 Fri Oct 18 
00:39:27 EST 2002
Oct 18 01:00:26 localhost kernel: CPU:    0
Oct 18 01:00:26 localhost kernel: EIP:    0010:[<c021389a>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 18 01:00:26 localhost kernel: EFLAGS: 00013246
Oct 18 01:00:26 localhost kernel: eax: 0000003a   ebx: c1730000   ecx: 
c3df4000   edx: 00000000
Oct 18 01:00:26 localhost kernel: esi: c3d50000   edi: 01730025   ebp: 
c10a89dc   esp: c3df5e9c
Oct 18 01:00:26 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 18 01:00:26 localhost kernel: Process modprobe (pid: 712, 
stackpage=c3df5000)
Oct 18 01:00:26 localhost kernel: Stack: c103fc5c c3fad498 c01264ce c3d50000 
c1730000 dfe1ce00 c10a89dc c4c99420 
Oct 18 01:00:26 localhost kernel:        42126000 dfe1ce00 c164ed40 c0126e69 
dfe1ce00 c164ed40 42126000 c3fad498 
Oct 18 01:00:26 localhost kernel:        c4c99420 01730025 c164e5c0 dfe1ce00 
c164ed40 42126000 c3df4000 c011244a 
Oct 18 01:00:26 localhost kernel: Call Trace:    [<c01264ce>] [<c0126e69>] 
[<c011244a>] [<c01276dc>] [<c0139b8c>]
Oct 18 01:00:26 localhost kernel:   [<c01286df>] [<c0128a37>] [<c0128ab4>] 
[<c01122a0>] [<c01075f0>]
Oct 18 01:00:26 localhost kernel: Code: 0f e7 06 0f 6f 4b 08 0f e7 4e 08 0f 6f 
53 10 0f e7 56 10 0f 


>>EIP; c021389a <fast_copy_page+3a/e0>   <=====

>>ebx; c1730000 <_end+1455ba8/3a85c28>
>>ecx; c3df4000 <END_OF_CODE+7ea89/????>
>>esi; c3d50000 <_end+3a75ba8/3a85c28>
>>edi; 01730025 Before first symbol
>>ebp; c10a89dc <_end+dce584/3a85c28>
>>esp; c3df5e9c <END_OF_CODE+80925/????>

Trace; c01264ce <do_wp_page+fe/1f0>
Trace; c0126e69 <handle_mm_fault+119/160>
Trace; c011244a <do_page_fault+1aa/5a0>
Trace; c01276dc <zap_pmd_range+7c/80>
Trace; c0139b8c <fput+cc/120>
Trace; c01286df <unmap_fixup+12f/140>
Trace; c0128a37 <do_munmap+297/2d0>
Trace; c0128ab4 <sys_munmap+44/80>
Trace; c01122a0 <do_page_fault+0/5a0>
Trace; c01075f0 <error_code+34/3c>

Code;  c021389a <fast_copy_page+3a/e0>
00000000 <_EIP>:
Code;  c021389a <fast_copy_page+3a/e0>   <=====
   0:   0f e7 06                  movntq %mm0,(%esi)   <=====
Code;  c021389d <fast_copy_page+3d/e0>
   3:   0f 6f 4b 08               movq   0x8(%ebx),%mm1
Code;  c02138a1 <fast_copy_page+41/e0>
   7:   0f e7 4e 08               movntq %mm1,0x8(%esi)
Code;  c02138a5 <fast_copy_page+45/e0>
   b:   0f 6f 53 10               movq   0x10(%ebx),%mm2
Code;  c02138a9 <fast_copy_page+49/e0>
   f:   0f e7 56 10               movntq %mm2,0x10(%esi)
Code;  c02138ad <fast_copy_page+4d/e0>
  13:   0f 00 00                  sldtl  (%eax)


1 warning issued.  Results may not be reliable.

I can provide .config upon request, but it is basically the same as the 
previous one except I have deselected the whole Netfilter stuff.

Thanks.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-17 15:26       ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-17 16:27         ` Andrea Arcangeli
       [not found]           ` <200210190014.19357.harisri@bigpond.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-17 16:27 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: Keith Owens, linux-kernel

On Fri, Oct 18, 2002 at 01:26:36AM +1000, Srihari Vijayaraghavan wrote:
> Hello Keith,
> 
> > You don't need that, just mkdir /var/log/ksymoops.  modprobe/insmod
> > will create a daily log file and snapshot a copy of lsmod and
> > /proc/ksyms for every module loaded or unloaded.  All with sync in the
> > right places.
> 
> Thanks, and that works fine.

if you enabled it before getting the new oopses what's interesting is
that you send me a tarball of /var/log/ksymoops, so I
will also be able to resolve those module addresses too (please send me
also your agpgart.o and your radeon.o modules, all from the same
kernels: .o, ksymoops and below oopses).

thanks,

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
       [not found]           ` <200210190014.19357.harisri@bigpond.com>
@ 2002-10-18 14:52             ` Andrea Arcangeli
  2002-10-18 15:21               ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  2002-10-18 15:34               ` 2.4.20pre11aa1 Keith Owens
  0 siblings, 2 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-18 14:52 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Sat, Oct 19, 2002 at 12:14:19AM +1000, Srihari Vijayaraghavan wrote:
> Oct 18 23:40:42 localhost kernel: Process modprobe (pid: 957, 

modprobe was running at 234042, now in the log I see:

20021018 234001 start /sbin/modprobe -s -k -- char-major-14 safemode=1
20021018 234001 probe ended
20021018 234004 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
20021018 234004 probe ended
20021018 234014 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
20021018 234014 probe ended
20021018 234021 start /sbin/modprobe -s -k -- char-major-14 safemode=1
20021018 234021 probe ended
20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
20021018 234022 probe ended
20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
20021018 234022 probe ended
20021018 234040 start /sbin/modprobe -s -k -- char-major-14 safemode=1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20021018 234040 probe ended
20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
20021018 234051 probe ended
20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
20021018 234051 probe ended

I don't see any modprobe in the logs at 234042 and the one at 234040 is
writing "probe ended" at 234040. maybe it was another modprobe that
crashed before it could write into the logs? or maybe it was the
underlined one that crashed after writing "probe ended"? But anyways it
looks like modprobe is innocent if it didn't write into the log any new
module loaded. Do you agree Keith?

if you still have the .config used to build the kernel please send it
too, thanks!

I've no idea why radeon or agpgart could generate corruption in my tree
and not in mainline and I can't reproduce. the best would be if you
could do a binary search on all the patches applied (first applying all
the [012]* and see if you can rerproduce, and so on)

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-18 14:52             ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-18 15:21               ` Srihari Vijayaraghavan
  2002-10-18 15:34               ` 2.4.20pre11aa1 Keith Owens
  1 sibling, 0 replies; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-18 15:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello,

On Saturday 19 October 2002 00:52, Andrea Arcangeli wrote:
> if you still have the .config used to build the kernel please send it
> too, thanks!

CONFIG_X86=y
CONFIG_UID16=y
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y
CONFIG_MK7=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG8=y
CONFIG_X86_HAS_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_F00F_WORKS_OK=y
CONFIG_X86_MCE=y
CONFIG_NOHIGHMEM=y
CONFIG_1GB=y
CONFIG_MTRR=y
CONFIG_X86_TSC=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_BLK_DEV_FD=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID0=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_PPP=y
CONFIG_PPP_ASYNC=y
CONFIG_PPP_DEFLATE=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_SERIAL_EXTENDED=y
CONFIG_UNIX98_PTYS=y
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_AGP=m
CONFIG_AGP_AMD=y
CONFIG_DRM=y
CONFIG_DRM_NEW=y
CONFIG_DRM_RADEON=m
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_VGA_CONSOLE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y

> I've no idea why radeon or agpgart could generate corruption in my tree
> and not in mainline and I can't reproduce. the best would be if you
> could do a binary search on all the patches applied (first applying all
> the [012]* and see if you can rerproduce, and so on)

I will try that.

Thanks
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-18 14:52             ` 2.4.20pre11aa1 Andrea Arcangeli
  2002-10-18 15:21               ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-18 15:34               ` Keith Owens
  2002-10-18 16:00                 ` 2.4.20pre11aa1 Andrea Arcangeli
  1 sibling, 1 reply; 25+ messages in thread
From: Keith Owens @ 2002-10-18 15:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Srihari Vijayaraghavan, linux-kernel

On Fri, 18 Oct 2002 16:52:04 +0200, 
Andrea Arcangeli <andrea@suse.de> wrote:
>On Sat, Oct 19, 2002 at 12:14:19AM +1000, Srihari Vijayaraghavan wrote:
>> Oct 18 23:40:42 localhost kernel: Process modprobe (pid: 957, 
>
>modprobe was running at 234042, now in the log I see:
>
>20021018 234001 start /sbin/modprobe -s -k -- char-major-14 safemode=1
>20021018 234001 probe ended
>20021018 234004 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
>20021018 234004 probe ended
>20021018 234014 start /sbin/modprobe -s -k -- char-major-10-134 safemode=1
>20021018 234014 probe ended
>20021018 234021 start /sbin/modprobe -s -k -- char-major-14 safemode=1
>20021018 234021 probe ended
>20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
>20021018 234022 probe ended
>20021018 234022 start /sbin/modprobe -s -k -- ide-cd safemode=1
>20021018 234022 probe ended
>20021018 234040 start /sbin/modprobe -s -k -- char-major-14 safemode=1
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>20021018 234040 probe ended
>20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
>20021018 234051 probe ended
>20021018 234051 start /sbin/modprobe -s -k -- binfmt-ffff safemode=1
>20021018 234051 probe ended
>
>I don't see any modprobe in the logs at 234042 and the one at 234040 is
>writing "probe ended" at 234040. maybe it was another modprobe that
>crashed before it could write into the logs? or maybe it was the
>underlined one that crashed after writing "probe ended"? But anyways it
>looks like modprobe is innocent if it didn't write into the log any new
>module loaded. Do you agree Keith?

modprobe appends to the log for all operations that might change the
module state.  The data is flushed before changing module state, with

snap_shot_log()
	fprintf(log, "\n");
	fflush(log);
	fdatasync(fileno(log));
	fclose(log);

so the log should always be valid, even if modprobe then crashes.
There is no system code after modprobe writes 'probe ended', crashes
after writing 'probe ended' should not be possible.

Three possibilities :-

(a) The modprobe at 234040 completed the load successfully then the
oops occurred before the modprobe task was completely purged.  IOW, the
module loaded, module_init() ran, modprobe returned to user space then
the module died handling some event.

(b) The failing modprobe at 234042 is real, but is performing an
operation that will not change module state.  For example, it is
doing modprobe -n, this will not log but will still invoke some module
syscalls.  The oops is then caused by corrupt module tables.

(c) modprobe is not being run as root so it cannot log.  Although it
cannot actually change module state, it will do part of the work in
extracting existing module symbols.  Again, the oops is caused by
corrupt module tables.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-18 15:34               ` 2.4.20pre11aa1 Keith Owens
@ 2002-10-18 16:00                 ` Andrea Arcangeli
  2002-10-19  1:21                   ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-18 16:00 UTC (permalink / raw)
  To: Keith Owens; +Cc: Srihari Vijayaraghavan, linux-kernel

On Sat, Oct 19, 2002 at 01:34:06AM +1000, Keith Owens wrote:
> Three possibilities :-
> 
> (a) The modprobe at 234040 completed the load successfully then the
> oops occurred before the modprobe task was completely purged.  IOW, the
> module loaded, module_init() ran, modprobe returned to user space then
> the module died handling some event.
> 
> (b) The failing modprobe at 234042 is real, but is performing an
> operation that will not change module state.  For example, it is
> doing modprobe -n, this will not log but will still invoke some module
> syscalls.  The oops is then caused by corrupt module tables.
> 
> (c) modprobe is not being run as root so it cannot log.  Although it
> cannot actually change module state, it will do part of the work in
> extracting existing module symbols.  Again, the oops is caused by
> corrupt module tables.

thanks for the help.

the corrupted module tables rings a bell. I fixed the wrong locking in
the module code that could corrupt these tables (they were relying on
the bkl but the bkl means nothing if you copy_user in the middle of the
loop like the module code does, so I replaced the bkl with a semaphore
and that should fix things), but I wonder if I broken something else
with these fixes.

Here's the patch that I'm talking about, you may want to start the
binary search backing this out and see if the problem goes away. if it
goes away I clearly need to double check it ;)

diff -urNp x-ref/kernel/module.c x/kernel/module.c
--- x-ref/kernel/module.c	Tue Jan 22 18:56:00 2002
+++ x/kernel/module.c	Thu Oct 10 23:47:20 2002
@@ -78,6 +78,8 @@ static int kmalloc_failed;
  
 spinlock_t modlist_lock = SPIN_LOCK_UNLOCKED;
 
+static DECLARE_MUTEX(module_mutex);
+
 /**
  * inter_module_register - register a new set of inter module data.
  * @im_name: an arbitrary string to identify the data, must be unique
@@ -298,7 +300,7 @@ sys_create_module(const char *name_user,
 
 	if (!capable(CAP_SYS_MODULE))
 		return -EPERM;
-	lock_kernel();
+	down(&module_mutex);
 	if ((namelen = get_mod_name(name_user, &name)) < 0) {
 		error = namelen;
 		goto err0;
@@ -334,7 +336,7 @@ sys_create_module(const char *name_user,
 err1:
 	put_mod_name(name);
 err0:
-	unlock_kernel();
+	up(&module_mutex);
 	return error;
 }
 
@@ -353,7 +355,7 @@ sys_init_module(const char *name_user, s
 
 	if (!capable(CAP_SYS_MODULE))
 		return -EPERM;
-	lock_kernel();
+	down(&module_mutex);
 	if ((namelen = get_mod_name(name_user, &name)) < 0) {
 		error = namelen;
 		goto err0;
@@ -549,13 +551,16 @@ sys_init_module(const char *name_user, s
 	/* Initialize the module.  */
 	atomic_set(&mod->uc.usecount,1);
 	mod->flags |= MOD_INITIALIZING;
+	up(&module_mutex);
 	if (mod->init && (error = mod->init()) != 0) {
+		down(&module_mutex);
 		atomic_set(&mod->uc.usecount,0);
 		mod->flags &= ~MOD_INITIALIZING;
 		if (error > 0)	/* Buggy module */
 			error = -EBUSY;
 		goto err0;
 	}
+	down(&module_mutex);
 	atomic_dec(&mod->uc.usecount);
 
 	/* And set it running.  */
@@ -571,7 +576,7 @@ err2:
 err1:
 	put_mod_name(name);
 err0:
-	unlock_kernel();
+	up(&module_mutex);
 	kfree(name_tmp);
 	return error;
 }
@@ -602,7 +607,7 @@ sys_delete_module(const char *name_user)
 	if (!capable(CAP_SYS_MODULE))
 		return -EPERM;
 
-	lock_kernel();
+	down(&module_mutex);
 	if (name_user) {
 		if ((error = get_mod_name(name_user, &name)) < 0)
 			goto out;
@@ -664,7 +669,7 @@ restart:
 	
 	error = 0;
 out:
-	unlock_kernel();
+	up(&module_mutex);
 	return error;
 }
 
@@ -887,7 +892,7 @@ sys_query_module(const char *name_user, 
 	struct module *mod;
 	int err;
 
-	lock_kernel();
+	down(&module_mutex);
 	if (name_user == NULL)
 		mod = &kernel_module;
 	else {
@@ -937,7 +942,7 @@ sys_query_module(const char *name_user, 
 	atomic_dec(&mod->uc.usecount);
 	
 out:
-	unlock_kernel();
+	up(&module_mutex);
 	return err;
 }
 
@@ -956,7 +961,7 @@ sys_get_kernel_syms(struct kernel_sym *t
 	int i;
 	struct kernel_sym ksym;
 
-	lock_kernel();
+	down(&module_mutex);
 	for (mod = module_list, i = 0; mod; mod = mod->next) {
 		/* include the count for the module name! */
 		i += mod->nsyms + 1;
@@ -999,7 +1004,7 @@ sys_get_kernel_syms(struct kernel_sym *t
 		}
 	}
 out:
-	unlock_kernel();
+	up(&module_mutex);
 	return i;
 }
 
@@ -1037,8 +1042,11 @@ free_module(struct module *mod, int tag_
 
 	if (mod->flags & MOD_RUNNING)
 	{
-		if(mod->cleanup)
+		if(mod->cleanup) {
+			up(&module_mutex);
 			mod->cleanup();
+			down(&module_mutex);
+		}
 		mod->flags &= ~MOD_RUNNING;
 	}
 
@@ -1082,6 +1090,7 @@ int get_module_list(char *p)
 	char tmpstr[64];
 	struct module_ref *ref;
 
+	down(&module_mutex);
 	for (mod = module_list; mod != &kernel_module; mod = mod->next) {
 		long len;
 		const char *q;
@@ -1150,6 +1159,7 @@ int get_module_list(char *p)
 	}
 
 fini:
+	up(&module_mutex);
 	return PAGE_SIZE - left;
 }
 
@@ -1172,7 +1182,7 @@ static void *s_start(struct seq_file *m,
 
 	if (!p)
 		return ERR_PTR(-ENOMEM);
-	lock_kernel();
+	down(&module_mutex);
 	for (v = module_list, n = *pos; v; n -= v->nsyms, v = v->next) {
 		if (n < v->nsyms) {
 			p->mod = v;
@@ -1180,7 +1190,7 @@ static void *s_start(struct seq_file *m,
 			return p;
 		}
 	}
-	unlock_kernel();
+	up(&module_mutex);
 	kfree(p);
 	return NULL;
 }
@@ -1193,7 +1203,7 @@ static void *s_next(struct seq_file *m, 
 		do {
 			v->mod = v->mod->next;
 			if (!v->mod) {
-				unlock_kernel();
+				up(&module_mutex);
 				kfree(p);
 				return NULL;
 			}
@@ -1206,7 +1216,7 @@ static void *s_next(struct seq_file *m, 
 static void s_stop(struct seq_file *m, void *p)
 {
 	if (p && !IS_ERR(p)) {
-		unlock_kernel();
+		up(&module_mutex);
 		kfree(p);
 	}
 }


Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-18 16:00                 ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-19  1:21                   ` Srihari Vijayaraghavan
  2002-10-19  1:25                     ` 2.4.20pre11aa1 Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-19  1:21 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Keith Owens

Hello Andrea,

On Saturday 19 October 2002 02:00, Andrea Arcangeli wrote:
> the corrupted module tables rings a bell. I fixed the wrong locking in
> the module code that could corrupt these tables (they were relying on
> the bkl but the bkl means nothing if you copy_user in the middle of the
> loop like the module code does, so I replaced the bkl with a semaphore
> and that should fix things), but I wonder if I broken something else
> with these fixes.
>
> Here's the patch that I'm talking about, you may want to start the
> binary search backing this out and see if the problem goes away. if it
> goes away I clearly need to double check it ;)

Unfortunately removing that change off kernel/module.c did not help.

I may be wrong but considering in my case the kernel is crashing whether 
agpgart/radeon are compiled as modules or built-in, I suspect that this issue 
is larger than just modules sub-system.

Anyway I will start applying the patches from 00* on-wards from your tree to 
see if I can reliably prove where the problem is.

Thanks.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-19  1:21                   ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-19  1:25                     ` Andrea Arcangeli
  2002-10-22 10:48                       ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-19  1:25 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel, Keith Owens

On Sat, Oct 19, 2002 at 11:21:19AM +1000, Srihari Vijayaraghavan wrote:
> I may be wrong but considering in my case the kernel is crashing whether 
> agpgart/radeon are compiled as modules or built-in, I suspect that this issue 
> is larger than just modules sub-system.

agreed. the oops in modprobe sounds more like a coincidence now.

> Anyway I will start applying the patches from 00* on-wards from your tree to 
> see if I can reliably prove where the problem is.

that will help a lot, thanks!

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-19  1:25                     ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-22 10:48                       ` Srihari Vijayaraghavan
  2002-10-22 14:55                         ` 2.4.20pre11aa1 Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-22 10:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

On Saturday 19 October 2002 11:25, Andrea Arcangeli wrote:
> that will help a lot, thanks!

Is there a quick HOWTO on how to apply the individual patches?

Do I apply 00*gz patches after applying 00* patches?

When I tried the above procedure there were a lot of hunks and it did not 
compile bzImage and agpgart.o etc..

Thanks
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-22 10:48                       ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-22 14:55                         ` Andrea Arcangeli
  2002-10-23 12:27                           ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-22 14:55 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Tue, Oct 22, 2002 at 08:48:05PM +1000, Srihari Vijayaraghavan wrote:
> Hello Andrea,
> 
> On Saturday 19 October 2002 11:25, Andrea Arcangeli wrote:
> > that will help a lot, thanks!
> 
> Is there a quick HOWTO on how to apply the individual patches?
> 
> Do I apply 00*gz patches after applying 00* patches?

gz doesn't matter, the `ls` ordering is the only thing that matters. You
can gzip -d * and then apply [0123]* and see if it still breaks.

> When I tried the above procedure there were a lot of hunks and it did not 
> compile bzImage and agpgart.o etc..

something like this will apply cleanly, if every patch is self contained
as it should, it will compile correctly too:

	rm ../2.4.20pre11aa1/*.bz2
	gzip -d ../2.4.20pre11aa1/*.gz
	for i in ../2.4.20pre11aa1/[0123]*; patch -p1 < $i; done

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-22 14:55                         ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-23 12:27                           ` Srihari Vijayaraghavan
  2002-10-23 12:46                             ` 2.4.20pre11aa1 Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-23 12:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

On Wednesday 23 October 2002 00:55, Andrea Arcangeli wrote:
> something like this will apply cleanly, if every patch is self contained
> as it should, it will compile correctly too:
>
> 	rm ../2.4.20pre11aa1/*.bz2
> 	gzip -d ../2.4.20pre11aa1/*.gz
> 	for i in ../2.4.20pre11aa1/[0123]*; patch -p1 < $i; done

Thanks that is neat.

I was able to trigger few oops with [0123]* patches.

ksymoops 2.4.5 on i686 2.4.20-pre11aa1-0123.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-pre11aa1-0123/ (default)
     -m /boot/System.map-2.4.20-pre11aa1-0123 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 23 21:23:22 localhost kernel: Unable to handle kernel paging request at 
virtual address c463b440
Oct 23 21:23:22 localhost kernel: c01485d1
Oct 23 21:23:22 localhost kernel: *pde = 045fe163
Oct 23 21:23:22 localhost kernel: Oops: 0003
Oct 23 21:23:22 localhost kernel: CPU:    0
Oct 23 21:23:22 localhost kernel: EIP:    0010:[<c01485d1>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Oct 23 21:23:22 localhost kernel: EFLAGS: 00010282
Oct 23 21:23:22 localhost kernel: eax: c463b440   ebx: c463b440   ecx: 
c5938080   edx: 00000296
Oct 23 21:23:22 localhost kernel: esi: c6a6f6c8   edi: c6a6f680   ebp: 
00000001   esp: c85d1f18
Oct 23 21:23:22 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 23 21:23:22 localhost kernel: Process bonobo-activati (pid: 795, 
stackpage=c85d1000)
Oct 23 21:23:22 localhost kernel: Stack: c01a492d c158f2dc 00000000 c6a6f6c8 
c01de68b c463b440 00000000 00000217 
Oct 23 21:23:22 localhost kernel:        c158e480 c463b440 c5706b50 c158e200 
c5706a40 c6cf4140 c01de987 c6a6f680 
Oct 23 21:23:22 localhost kernel:        00000000 c01a236c c5706b50 c5706a40 
c01a2949 c5706b50 c641f3c0 00000000 
Oct 23 21:23:22 localhost kernel: Call Trace:    [<c01a492d>] [<c01de68b>] 
[<c01de987>] [<c01a236c>] [<c01a2949>]
Oct 23 21:23:22 localhost kernel:   [<c0136782>] [<c0134e6d>] [<c0134eee>] 
[<c010737f>]
Oct 23 21:23:22 localhost kernel: Code: ff 0b 0f 94 c0 84 c0 0f 84 8f 00 00 00 
8d 73 18 39 73 18 74 


>>EIP; c01485d1 <dput+11/110>   <=====

>>eax; c463b440 <END_OF_CODE+66e505/????>
>>ebx; c463b440 <END_OF_CODE+66e505/????>
>>ecx; c5938080 <END_OF_CODE+196b145/????>
>>esi; c6a6f6c8 <END_OF_CODE+2aa278d/????>
>>edi; c6a6f680 <END_OF_CODE+2aa2745/????>
>>esp; c85d1f18 <END_OF_CODE+4604fdd/????>

Trace; c01a492d <sk_free+2d/60>
Trace; c01de68b <unix_release_sock+11b/1d0>
Trace; c01de987 <unix_release+27/30>
Trace; c01a236c <sock_release+5c/60>
Trace; c01a2949 <sock_close+39/60>
Trace; c0136782 <fput+102/130>
Trace; c0134e6d <filp_close+4d/80>
Trace; c0134eee <sys_close+4e/60>
Trace; c010737f <system_call+33/38>

Code;  c01485d1 <dput+11/110>
00000000 <_EIP>:
Code;  c01485d1 <dput+11/110>   <=====
   0:   ff 0b                     decl   (%ebx)   <=====
Code;  c01485d3 <dput+13/110>
   2:   0f 94 c0                  sete   %al
Code;  c01485d6 <dput+16/110>
   5:   84 c0                     test   %al,%al
Code;  c01485d8 <dput+18/110>
   7:   0f 84 8f 00 00 00         je     9c <_EIP+0x9c>
Code;  c01485de <dput+1e/110>
   d:   8d 73 18                  lea    0x18(%ebx),%esi
Code;  c01485e1 <dput+21/110>
  10:   39 73 18                  cmp    %esi,0x18(%ebx)
Code;  c01485e4 <dput+24/110>
  13:   74 00                     je     15 <_EIP+0x15>

Oct 23 21:23:22 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c4c6a360
Oct 23 21:23:22 localhost kernel: c0137103
Oct 23 21:23:22 localhost kernel: *pde = 04c001e3
Oct 23 21:23:22 localhost kernel: Oops: 0002
Oct 23 21:23:22 localhost kernel: CPU:    0
Oct 23 21:23:22 localhost kernel: EIP:    0010:[<c0137103>]    Not tainted
Oct 23 21:23:22 localhost kernel: EFLAGS: 00013286
Oct 23 21:23:22 localhost kernel: eax: c4c6a340   ebx: 00000000   ecx: 
c916b940   edx: c025ec44
Oct 23 21:23:22 localhost kernel: esi: c916b940   edi: c1ee3930   ebp: 
c1ee3cc0   esp: c1c11e54
Oct 23 21:23:22 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 23 21:23:22 localhost kernel: Process kjournald (pid: 136, 
stackpage=c1c11000)
Oct 23 21:23:22 localhost kernel: Stack: 00000000 c01379e8 c916b940 00000000 
c916b940 c1ee3450 c0169b7e c916b940 
Oct 23 21:23:22 localhost kernel:        0000002d c1c11ea8 000002fa ffffffff 
c1c10000 dffceaf4 00000000 00000000 
Oct 23 21:23:22 localhost kernel:        00000000 00000000 c1ca2c40 c1b72540 
000002fa c90e1640 c90e15c0 c8576a40 
Oct 23 21:23:22 localhost kernel: Call Trace:    [<c01379e8>] [<c0169b7e>] 
[<c011350b>] [<c016bf5c>] [<c016be00>]
Oct 23 21:23:22 localhost kernel:   [<c010576e>] [<c016be20>]
Oct 23 21:23:22 localhost kernel: Code: 89 48 20 8b 02 89 48 24 ff 04 9d 50 ec 
25 c0 0f b7 41 08 01 


>>EIP; c0137103 <__insert_into_lru_list+43/60>   <=====

>>eax; c4c6a340 <END_OF_CODE+c9d405/????>
>>ecx; c916b940 <END_OF_CODE+519ea05/????>
>>edx; c025ec44 <lru_list+0/c>
>>esi; c916b940 <END_OF_CODE+519ea05/????>
>>edi; c1ee3930 <[md].bss.end+216dd1/2273521>
>>ebp; c1ee3cc0 <[md].bss.end+217161/2273521>
>>esp; c1c11e54 <_end+1997e04/1a32030>

Trace; c01379e8 <__refile_buffer+58/70>
Trace; c0169b7e <journal_commit_transaction+105e/11c0>
Trace; c011350b <schedule+15b/240>
Trace; c016bf5c <kjournald+13c/1d0>
Trace; c016be00 <commit_timeout+0/10>
Trace; c010576e <kernel_thread+2e/40>
Trace; c016be20 <kjournald+0/1d0>

Code;  c0137103 <__insert_into_lru_list+43/60>
00000000 <_EIP>:
Code;  c0137103 <__insert_into_lru_list+43/60>   <=====
   0:   89 48 20                  mov    %ecx,0x20(%eax)   <=====
Code;  c0137106 <__insert_into_lru_list+46/60>
   3:   8b 02                     mov    (%edx),%eax
Code;  c0137108 <__insert_into_lru_list+48/60>
   5:   89 48 24                  mov    %ecx,0x24(%eax)
Code;  c013710b <__insert_into_lru_list+4b/60>
   8:   ff 04 9d 50 ec 25 c0      incl   0xc025ec50(,%ebx,4)
Code;  c0137112 <__insert_into_lru_list+52/60>
   f:   0f b7 41 08               movzwl 0x8(%ecx),%eax
Code;  c0137116 <__insert_into_lru_list+56/60>
  13:   01 00                     add    %eax,(%eax)

Oct 23 21:23:22 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c51c0098
Oct 23 21:23:22 localhost kernel: c0119a10
Oct 23 21:23:22 localhost kernel: *pde = 050001e3
Oct 23 21:23:22 localhost kernel: Oops: 0000
Oct 23 21:23:22 localhost kernel: CPU:    0
Oct 23 21:23:22 localhost kernel: EIP:    0010:[<c0119a10>]    Not tainted
Oct 23 21:23:22 localhost kernel: EFLAGS: 00013206
Oct 23 21:23:22 localhost kernel: eax: 00000000   ebx: c51c0000   ecx: 
c193f000   edx: 00000000
Oct 23 21:23:22 localhost kernel: esi: c1c10000   edi: 0000006a   ebp: 
0000000b   esp: c1c11d08
Oct 23 21:23:22 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 23 21:23:22 localhost kernel: Process kjournald (pid: 136, 
stackpage=c1c11000)
Oct 23 21:23:22 localhost kernel: Stack: c1587bb8 c193f040 c1c10000 00000000 
c1c10000 0000006a 0000000b c0119f00 
Oct 23 21:23:22 localhost kernel:        c1c10000 00000002 c1c11e20 00000002 
0000006a c1c10000 c01079f2 0000000b 
Oct 23 21:23:22 localhost kernel:        c01edc4a 00000002 4942412e c01123c4 
c01edc4a c1c11e20 00000002 c0276784 
Oct 23 21:23:22 localhost kernel: Call Trace:    [<c0119f00>] [<c01079f2>] 
[<c01123c4>] [<c019bc12>] [<c0137cab>]
Oct 23 21:23:22 localhost kernel:   [<c018f4ec>] [<c018f8d5>] [<c018fac5>] 
[<c01120b0>] [<c0107470>] [<c0137103>]
Oct 23 21:23:22 localhost kernel:   [<c01379e8>] [<c0169b7e>] [<c011350b>] 
[<c016bf5c>] [<c016be00>] [<c010576e>]
Oct 23 21:23:22 localhost kernel:   [<c016be20>]
Oct 23 21:23:22 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b 
5b 50 81 fb 00 80 21 


>>EIP; c0119a10 <exit_notify+20/300>   <=====

>>ebx; c51c0000 <END_OF_CODE+11f30c5/????>
>>ecx; c193f000 <_end+16c4fb0/1a32030>
>>esi; c1c10000 <_end+1995fb0/1a32030>
>>esp; c1c11d08 <_end+1997cb8/1a32030>

Trace; c0119f00 <do_exit+210/260>
Trace; c01079f2 <die+72/80>
Trace; c01123c4 <do_page_fault+314/5d0>
Trace; c019bc12 <do_rw_disk+4b2/5c0>
Trace; c0137cab <create_buffers+6b/e0>
Trace; c018f4ec <ide_wait_stat+bc/130>
Trace; c018f8d5 <start_request+1b5/250>
Trace; c018fac5 <ide_do_request+c5/1c0>
Trace; c01120b0 <do_page_fault+0/5d0>
Trace; c0107470 <error_code+34/3c>
Trace; c0137103 <__insert_into_lru_list+43/60>
Trace; c01379e8 <__refile_buffer+58/70>
Trace; c0169b7e <journal_commit_transaction+105e/11c0>
Trace; c011350b <schedule+15b/240>
Trace; c016bf5c <kjournald+13c/1d0>
Trace; c016be00 <commit_timeout+0/10>
Trace; c010576e <kernel_thread+2e/40>
Trace; c016be20 <kjournald+0/1d0>

Code;  c0119a10 <exit_notify+20/300>
00000000 <_EIP>:
Code;  c0119a10 <exit_notify+20/300>   <=====
   0:   39 b3 98 00 00 00         cmp    %esi,0x98(%ebx)   <=====
Code;  c0119a16 <exit_notify+26/300>
   6:   0f 84 85 02 00 00         je     291 <_EIP+0x291>
Code;  c0119a1c <exit_notify+2c/300>
   c:   8b 5b 50                  mov    0x50(%ebx),%ebx
Code;  c0119a1f <exit_notify+2f/300>
   f:   81 fb 00 80 21 00         cmp    $0x218000,%ebx

Oct 23 21:23:22 localhost kernel:  <1>Unable to handle kernel paging request 
at virtual address c54bc098
Oct 23 21:23:22 localhost kernel: c0119a10
Oct 23 21:23:22 localhost kernel: *pde = 054001e3
Oct 23 21:23:22 localhost kernel: Oops: 0000
Oct 23 21:23:22 localhost kernel: CPU:    0
Oct 23 21:23:22 localhost kernel: EIP:    0010:[<c0119a10>]    Not tainted
Oct 23 21:23:23 localhost kernel: EFLAGS: 00013206
Oct 23 21:23:23 localhost kernel: eax: 00000000   ebx: c54bc000   ecx: 
00000000   edx: 00000000
Oct 23 21:23:23 localhost kernel: esi: c1c10000   edi: 000001c0   ebp: 
0000000b   esp: c1c11bbc
Oct 23 21:23:23 localhost kernel: ds: 0018   es: 0018   ss: 0018
Oct 23 21:23:23 localhost kernel: Process kjournald (pid: 136, 
stackpage=c1c11000)
Oct 23 21:23:23 localhost kernel: Stack: 00000020 00000400 c1c10000 00000000 
c1c10000 000001c0 0000000b c0119f00 
Oct 23 21:23:23 localhost kernel:        c1c10000 00000000 c1c11cd4 00000000 
000001c0 c1c10000 c01079f2 0000000b 
Oct 23 21:23:23 localhost kernel:        c01edc4a 00000000 24548924 c01123c4 
c01edc4a c1c11cd4 00000000 33323130 
Oct 23 21:23:23 localhost kernel: Call Trace:    [<c0119f00>] [<c01079f2>] 
[<c01123c4>] [<c0185ba9>] [<c0185ba9>]
Oct 23 21:23:23 localhost kernel:   [<c0185ba9>] [<c01167bf>] [<c0185ba9>] 
[<c0185ba9>] [<c01120b0>] [<c0107470>]
Oct 23 21:23:23 localhost kernel:   [<c0119a10>] [<c0119f00>] [<c01079f2>] 
[<c01123c4>] [<c019bc12>] [<c0137cab>]
Oct 23 21:23:23 localhost kernel:   [<c018f4ec>] [<c018f8d5>] [<c018fac5>] 
[<c01120b0>] [<c0107470>] [<c0137103>]
Oct 23 21:23:23 localhost kernel:   [<c01379e8>] [<c0169b7e>] [<c011350b>] 
[<c016bf5c>] [<c016be00>] [<c010576e>]
Oct 23 21:23:23 localhost kernel:   [<c016be20>]
Oct 23 21:23:23 localhost kernel: Code: 39 b3 98 00 00 00 0f 84 85 02 00 00 8b 
5b 50 81 fb 00 80 21 


>>EIP; c0119a10 <exit_notify+20/300>   <=====

>>ebx; c54bc000 <END_OF_CODE+14ef0c5/????>
>>esi; c1c10000 <_end+1995fb0/1a32030>
>>esp; c1c11bbc <_end+1997b6c/1a32030>

Trace; c0119f00 <do_exit+210/260>
Trace; c01079f2 <die+72/80>
Trace; c01123c4 <do_page_fault+314/5d0>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c01167bf <__call_console_drivers+5f/70>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c0185ba9 <vt_console_print+59/310>
Trace; c01120b0 <do_page_fault+0/5d0>
Trace; c0107470 <error_code+34/3c>
Trace; c0119a10 <exit_notify+20/300>
Trace; c0119f00 <do_exit+210/260>
Trace; c01079f2 <die+72/80>
Trace; c01123c4 <do_page_fault+314/5d0>
Trace; c019bc12 <do_rw_disk+4b2/5c0>
Trace; c0137cab <create_buffers+6b/e0>
Trace; c018f4ec <ide_wait_stat+bc/130>
Trace; c018f8d5 <start_request+1b5/250>
Trace; c018fac5 <ide_do_request+c5/1c0>
Trace; c01120b0 <do_page_fault+0/5d0>
Trace; c0107470 <error_code+34/3c>
Trace; c0137103 <__insert_into_lru_list+43/60>
Trace; c01379e8 <__refile_buffer+58/70>
Trace; c0169b7e <journal_commit_transaction+105e/11c0>
Trace; c011350b <schedule+15b/240>
Trace; c016bf5c <kjournald+13c/1d0>
Trace; c016be00 <commit_timeout+0/10>
Trace; c010576e <kernel_thread+2e/40>
Trace; c016be20 <kjournald+0/1d0>

Code;  c0119a10 <exit_notify+20/300>
00000000 <_EIP>:
Code;  c0119a10 <exit_notify+20/300>   <=====
   0:   39 b3 98 00 00 00         cmp    %esi,0x98(%ebx)   <=====
Code;  c0119a16 <exit_notify+26/300>
   6:   0f 84 85 02 00 00         je     291 <_EIP+0x291>
Code;  c0119a1c <exit_notify+2c/300>
   c:   8b 5b 50                  mov    0x50(%ebx),%ebx
Code;  c0119a1f <exit_notify+2f/300>
   f:   81 fb 00 80 21 00         cmp    $0x218000,%ebx


1 warning issued.  Results may not be reliable.


When I tried to see if I can trigger the oops with only 0* patches, I couldn't 
compile the kernel. Here is the standard error stream of 'make dep clean ; 
make bzImage' :

module.c:7:28: linux/rcupdate.h: No such file or directory
module.c: In function `free_module':
module.c:1082: warning: implicit declaration of function `synchronize_kernel'
make[2]: *** [module.o] Error 1
make[1]: *** [first_rule] Error 2
make: *** [_dir_kernel] Error 2

BTW I heard DaveM mentioning about AMD only bugs appearing during 2.4.20-pre 
series, I am not sure about -aa series though. I thought of testing the 
-aa/radeon/agpgart on my friend's computer which is an Intel P-III/VIA 
Chipset mother board.

Thanks for your help.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-23 12:27                           ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-23 12:46                             ` Andrea Arcangeli
  2002-10-23 14:26                               ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-23 12:46 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Wed, Oct 23, 2002 at 10:27:47PM +1000, Srihari Vijayaraghavan wrote:
> module.c:7:28: linux/rcupdate.h: No such file or directory
> module.c: In function `free_module':
> module.c:1082: warning: implicit declaration of function `synchronize_kernel'
> make[2]: *** [module.o] Error 1
> make[1]: *** [first_rule] Error 2
> make: *** [_dir_kernel] Error 2

Ok, please try to backout 2.4.20pre11aa1/00_reduce-module-races-1.
I just moved it into the 20 serie. that should fix this bit.

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-23 12:46                             ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-23 14:26                               ` Srihari Vijayaraghavan
  2002-10-23 14:35                                 ` 2.4.20pre11aa1 Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-23 14:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

On Wednesday 23 October 2002 22:46, Andrea Arcangeli wrote:
> Ok, please try to backout 2.4.20pre11aa1/00_reduce-module-races-1.
> I just moved it into the 20 serie. that should fix this bit.

Yes I did that. I renamed it to _00_reduce-module-races-1, and did the 
patching again.

But that did not help. Here is the current std_err:

exit.c: In function `release_task':
exit.c:44: warning: implicit declaration of function `sched_exit'
shmem.c: In function `shmem_getpage_locked':
shmem.c:560: warning: unused variable `flags'
{standard input}: Assembler messages:
{standard input}:1014: Warning: indirect lcall without `*'
{standard input}:1091: Warning: indirect lcall without `*'
{standard input}:1176: Warning: indirect lcall without `*'
{standard input}:1255: Warning: indirect lcall without `*'
{standard input}:1271: Warning: indirect lcall without `*'
{standard input}:1281: Warning: indirect lcall without `*'
{standard input}:1349: Warning: indirect lcall without `*'
{standard input}:1364: Warning: indirect lcall without `*'
{standard input}:1375: Warning: indirect lcall without `*'
{standard input}:1874: Warning: indirect lcall without `*'
{standard input}:1960: Warning: indirect lcall without `*'
init_task.c:3:34: linux/sched_runqueue.h: No such file or directory
make[1]: *** [init_task.o] Error 1
make: *** [_dir_arch/i386/kernel] Error 2

Thanks.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-23 14:26                               ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-23 14:35                                 ` Andrea Arcangeli
  2002-10-25 14:03                                   ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-10-23 14:35 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Thu, Oct 24, 2002 at 12:26:36AM +1000, Srihari Vijayaraghavan wrote:
> Hello Andrea,
> 
> On Wednesday 23 October 2002 22:46, Andrea Arcangeli wrote:
> > Ok, please try to backout 2.4.20pre11aa1/00_reduce-module-races-1.
> > I just moved it into the 20 serie. that should fix this bit.
> 
> Yes I did that. I renamed it to _00_reduce-module-races-1, and did the 
> patching again.
> 
> But that did not help. Here is the current std_err:
> 
> exit.c: In function `release_task':
> exit.c:44: warning: implicit declaration of function `sched_exit'
> shmem.c: In function `shmem_getpage_locked':
> shmem.c:560: warning: unused variable `flags'
> {standard input}: Assembler messages:
> {standard input}:1014: Warning: indirect lcall without `*'
> {standard input}:1091: Warning: indirect lcall without `*'
> {standard input}:1176: Warning: indirect lcall without `*'
> {standard input}:1255: Warning: indirect lcall without `*'
> {standard input}:1271: Warning: indirect lcall without `*'
> {standard input}:1281: Warning: indirect lcall without `*'
> {standard input}:1349: Warning: indirect lcall without `*'
> {standard input}:1364: Warning: indirect lcall without `*'
> {standard input}:1375: Warning: indirect lcall without `*'
> {standard input}:1874: Warning: indirect lcall without `*'
> {standard input}:1960: Warning: indirect lcall without `*'
> init_task.c:3:34: linux/sched_runqueue.h: No such file or directory
> make[1]: *** [init_task.o] Error 1
> make: *** [_dir_arch/i386/kernel] Error 2

try to apply all the scheduler related patches:

10_sched-o1-hyperthreading-3  20_apm-o1-sched-1  20_sched-o1-fixes-5
21_o1-A4-aa-1 20_rcu-poll-7

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-23 14:35                                 ` 2.4.20pre11aa1 Andrea Arcangeli
@ 2002-10-25 14:03                                   ` Srihari Vijayaraghavan
  2002-10-31 10:47                                     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-25 14:03 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

[I tried to post the reply through groups.google.com, and it looks like it 
didn't get to lkml. :( ]

> try to apply all the scheduler related patches:
>
> 10_sched-o1-hyperthreading-3  20_apm-o1-sched-1  20_sched-o1-fixes-5
> 21_o1-A4-aa-1 20_rcu-poll-7

OK.

I have applied the patches 0* and the following patches in this order:
10_sched-o1-hyperthreading-3
20_apm-o1-sched-1
20_rcu-poll-7
20_sched-o1-fixes-5
21_o1-A4-aa-1

The resulting kernel is very stable and it does not crash.

Then I tried patches [01]* and the extra patches (20_apm-o1-sched-1,
20_rcu-poll-7, 20_sched-o1-fixes-5, 21_o1-A4-aa-1), I couldn't compile
the kernel.

Here is the current std_err:

inode.c:1468: warning: initialization from incompatible pointer type
In file included from ide.c:149:
/usr/src/01/include/linux/ide.h:333:16: warning: ISO C requires
whitespace after the macro name
ide.c: In function `init_hwif_data':
ide.c:270: `ide_disk' undeclared (first use in this function)
ide.c:270: (Each undeclared identifier is reported only once
ide.c:270: for each function it appears in.)
ide.c: In function `ide_geninit':
ide.c:639: `ide_disk' undeclared (first use in this function)
ide.c: In function `do_reset1':
ide.c:791: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_dump_status':
ide.c:973: `ide_disk' undeclared (first use in this function)
ide.c: In function `try_to_flush_leftover_data':
ide.c:1034: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_error':
ide.c:1071: `ide_disk' undeclared (first use in this function)
ide.c: In function `start_request':
ide.c:1373: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_open':
ide.c:2119: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_reinit_drive':
ide.c:2768: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_ioctl':
ide.c:2842: `ide_disk' undeclared (first use in this function)
ide.c: In function `ide_setup':
ide.c:3383: `ide_disk' undeclared (first use in this function)
make[3]: *** [ide.o] Error 1
make[2]: *** [first_rule] Error 2
make[1]: *** [_subdir_ide] Error 2
make: *** [_dir_drivers] Error 2
make: *** Waiting for unfinished jobs....
{standard input}: Assembler messages:
{standard input}:1014: Warning: indirect lcall without `*'
{standard input}:1091: Warning: indirect lcall without `*'
{standard input}:1176: Warning: indirect lcall without `*'
{standard input}:1255: Warning: indirect lcall without `*'
{standard input}:1271: Warning: indirect lcall without `*'
{standard input}:1281: Warning: indirect lcall without `*'
{standard input}:1349: Warning: indirect lcall without `*'
{standard input}:1364: Warning: indirect lcall without `*'
{standard input}:1375: Warning: indirect lcall without `*'
{standard input}:1874: Warning: indirect lcall without `*'
{standard input}:1960: Warning: indirect lcall without `*'

Thanks.
-- 
Hari
harisri@bigpond.com



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 2.4.20pre11aa1
  2002-10-25 14:03                                   ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-10-31 10:47                                     ` Srihari Vijayaraghavan
  2002-11-09  9:34                                       ` Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1] Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-10-31 10:47 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

On Saturday 26 October 2002 00:03, Srihari Vijayaraghavan wrote:
> The resulting kernel is very stable and it does not crash.
>
> Then I tried patches [01]* and the extra patches (20_apm-o1-sched-1,
> 20_rcu-poll-7, 20_sched-o1-fixes-5, 21_o1-A4-aa-1), I couldn't compile
> the kernel.

The current status is:

[0]* - compiles fine - works fine
[01]* - couldn't compile
[012]* - compiles fine - crashes

So I believe either 1* or 2* patches are introducing the issue.

In the mean time I had an opportunity to test -aa on a nice IBM NetVista 
computer, whose configuration is as follows:

00:00.0 Host bridge: Intel Corp. 82815 815 Chipset Host Bridge and Memory 
Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corp. 82815 CGC [Chipset Graphics 
Controller] (rev 02)
00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev 02)
00:1f.0 ISA bridge: Intel Corp. 82801BA ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801BA IDE U100 (rev 02)
00:1f.2 USB Controller: Intel Corp. 82801BA/BAM USB (Hub #1) (rev 02)
00:1f.3 SMBus: Intel Corp. 82801BA/BAM SMBus (rev 02)
00:1f.5 Multimedia audio controller: Intel Corp. 82801BA/BAM AC'97 Audio (rev 
02)
01:08.0 Ethernet controller: Intel Corp. 82801BA/BAM/CA/CAM Ethernet 
Controller (rev 01)

I can easily reproduce the same issue on that computer too (of course I am 
using CONFIG_AGP_I810 for agpgart support and CONFIG_DRM_I810 for i810 
display card support).

I think this eliminates the doubt on DRM support of Radeon (or i810 for that 
matter), and the issue appears very specific to agpgart in general.

Anyway I guess we are very close to the problem, if someone helps me to 
compile -aa with [01]* patches I think we can pinpoint the issue I suspect.

Thanks for your help and support.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1]
  2002-10-31 10:47                                     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
@ 2002-11-09  9:34                                       ` Srihari Vijayaraghavan
  2002-11-10  2:50                                         ` Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-11-09  9:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

> So I believe either 1* or 2* patches are introducing the issue.

Got it. The 10_x86-fast-pte2 patch is introducting the instability.

I have tested it on 2.4.20rc1aa1 though, backing out that patch alone solves 
the instability.

I can give the .config and ksymoops of 2.4.20rc1aa1 if needed.

> In the mean time I had an opportunity to test -aa on a nice IBM NetVista
> computer, whose configuration is as follows:

I will verify this finding even on that computer perhaps on Monday.

Thanks for your help.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1]
  2002-11-09  9:34                                       ` Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1] Srihari Vijayaraghavan
@ 2002-11-10  2:50                                         ` Andrea Arcangeli
  2002-11-10  3:24                                           ` Srihari Vijayaraghavan
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2002-11-10  2:50 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: linux-kernel

On Sat, Nov 09, 2002 at 08:34:39PM +1100, Srihari Vijayaraghavan wrote:
> Hello Andrea,
> 
> > So I believe either 1* or 2* patches are introducing the issue.
> 
> Got it. The 10_x86-fast-pte2 patch is introducting the instability.

Great job! Many thanks! This reduces the bug a whole lot. I will think
on Monday what could be going wrong with that patch, in the meantime
just try to run (slower ;) with it backed out, to be sure it's really
such one (nevertheless if I had to guess right now I would say this most
certainly is triggering a bug somewhere else, unlikely that such patch
is really containing a bug, the patch is kind of obviously correct and
it is a so much stressed codepath that everybody would reproduce it if
that was the case, one of the reason I could never guess such patch
could be the interesting one for your case without your useful binary
search).

Andrea

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1]
  2002-11-10  2:50                                         ` Andrea Arcangeli
@ 2002-11-10  3:24                                           ` Srihari Vijayaraghavan
  0 siblings, 0 replies; 25+ messages in thread
From: Srihari Vijayaraghavan @ 2002-11-10  3:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello Andrea,

On Sunday 10 November 2002 13:50, Andrea Arcangeli wrote:
> Great job! Many thanks! This reduces the bug a whole lot. I will think
> on Monday what could be going wrong with that patch, in the meantime
> just try to run (slower ;) with it backed out, to be sure it's really

I am running complete 2.4.20rc1aa1 minus 10_x86-fast-pte-2 at present. It has 
been very stable as mainline plus as snappy as -aa :).

On a related note, I had to apply 20_rcu-poll-7 for compiling 10* patch(es) 
(even for the10_ext3-o_direct-2 patch), so would it be a good idea to move it 
as the earliest 10* patch?

Thanks.
-- 
Hari
harisri@bigpond.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2002-11-10  3:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-16 16:51 2.4.20pre11aa1 Andrea Arcangeli
2002-10-17 12:04 ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-17 12:10   ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-17 13:01     ` 2.4.20pre11aa1 Keith Owens
2002-10-17 15:26       ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-17 16:27         ` 2.4.20pre11aa1 Andrea Arcangeli
     [not found]           ` <200210190014.19357.harisri@bigpond.com>
2002-10-18 14:52             ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-18 15:21               ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-18 15:34               ` 2.4.20pre11aa1 Keith Owens
2002-10-18 16:00                 ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-19  1:21                   ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-19  1:25                     ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-22 10:48                       ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-22 14:55                         ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-23 12:27                           ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-23 12:46                             ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-23 14:26                               ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-23 14:35                                 ` 2.4.20pre11aa1 Andrea Arcangeli
2002-10-25 14:03                                   ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-31 10:47                                     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-11-09  9:34                                       ` Solved 2.4.20pre11aa1/2.4.20rc1aa1 Agpgart/Radeon crash. [was: Re: 2.4.20pre11aa1] Srihari Vijayaraghavan
2002-11-10  2:50                                         ` Andrea Arcangeli
2002-11-10  3:24                                           ` Srihari Vijayaraghavan
2002-10-17 13:02     ` 2.4.20pre11aa1 Srihari Vijayaraghavan
2002-10-17 13:00       ` 2.4.20pre11aa1 Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).