All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.0-test7 oops in proc_pid_stat
@ 2003-10-09 13:04 Olaf Hering
  2003-10-09 22:04 ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Olaf Hering @ 2003-10-09 13:04 UTC (permalink / raw)
  To: linux-kernel

IBM blade server, 2 cpus (Intel(R) XEON(TM) CPU 2.40GHz), 512mb.


Linux version 2.6.0-test7 (olaf@zert152) (gcc version 3.2.2) #2 SMP Thu Oct 9 08:49:29 CEST 2003

Unable to handle kernel NULL pointer dereference at virtual address 0000003c
 printing eip:
c018a322
*pde = 00000000
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c018a322>]    Not tainted
EFLAGS: 00010246
EIP is at proc_pid_stat+0x92/0x510
eax: 00000000   ebx: df2b0d80   ecx: 00000000   edx: c038afcc
esi: 00000000   edi: df2b0d80   ebp: 00000000   esp: ce85de3c
ds: 007b   es: 007b   ss: 0068
Process pstree (pid: 3518, threadinfo=ce85c000 task=dbb38c80)
Stack: df94b900 c034f440 00000dad df6b5bda 00000053 00000d99 00000419 00000419
       0000040d 00000419 00000100 00000086 000000e0 00000106 00000284 00000000
       cf6419b4 cf641940 ce136006 c0187ce8 df2b0d80 cf641940 ce85df38 dffd3820
Call Trace:
 [<c0187ce8>] pid_revalidate+0x28/0xd0
 [<c0170300>] dput+0x30/0x1b0
 [<c0140ac3>] buffered_rmqueue+0xc3/0x150
 [<c0140c00>] __alloc_pages+0xb0/0x350
 [<c0187174>] proc_info_read+0x74/0x160
 [<c015904e>] vfs_read+0xbe/0x130
 [<c01592f2>] sys_read+0x42/0x70
 [<c010b52f>] syscall_call+0x7/0xb

Code: 8b 48 3c 85 c9 74 40 8b 81 98 00 00 00 89 84 24 d4 00 00 00

config is all static. I was reading a CD in the foreground and 2 rpm
builds in the background.

...
screen  -S cdtest -- sh -c 'for i in `seq 0 420` `seq 0 420` ; do date; umount -v /media/cdrom ; mount -v /media/cdrom ; find /media/cdrom -type f -print0 | xargs -0 --verbose -n1 cat > /dev/null || break ; done &>log'
...

CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_STANDALONE=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_KALLSYMS=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_X86_PC=y
CONFIG_MPENTIUMIII=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_SMP=y
CONFIG_NR_CPUS=4
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_PM=y
CONFIG_ACPI=y
CONFIG_ACPI_BOOT=y
CONFIG_ACPI_INTERPRETER=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_ASUS=y
CONFIG_ACPI_TOSHIBA=y
CONFIG_ACPI_BUS=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_PCI=y
CONFIG_ACPI_SYSTEM=y
CONFIG_ACPI_RELAXED_AML=y
CONFIG_APM=y
CONFIG_APM_DO_ENABLE=y
CONFIG_APM_DISPLAY_BLANK=y
CONFIG_APM_ALLOW_INTS=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_HOTPLUG=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_MISC=y
CONFIG_FW_LOADER=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=64000
CONFIG_BLK_DEV_INITRD=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_IDEDISK_STROKE=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_OFFBOARD=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_IDEDMA_ONLYDISK=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_SVWKS=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_AUTO=y
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y
CONFIG_BLK_DEV_SR=y
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_REPORT_LUNS=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_NET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_INET_ECN=y
CONFIG_SYN_COOKIES=y
CONFIG_IPV6_SCTP__=y
CONFIG_NETDEVICES=y
CONFIG_TIGON3=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_EVDEV=y
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_CORE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_NVRAM=y
CONFIG_RTC=y
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_USB=y
CONFIG_USB_DEVICEFS=y
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_FREECOM=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_HP8200e=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_HID=y
CONFIG_USB_HIDINPUT=y
CONFIG_USB_HIDDEV=y
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
CONFIG_FS_POSIX_ACL=y
CONFIG_MINIX_FS=y
CONFIG_AUTOFS_FS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_DEVPTS_FS=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_SUNRPC=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_X86_EXTRA_IRQS=y
CONFIG_X86_FIND_SMP_CONFIG=y
CONFIG_X86_MPPARSE=y
CONFIG_CRC32=y
CONFIG_ZLIB_INFLATE=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_PC=y

-- 
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 oops in proc_pid_stat
  2003-10-09 13:04 2.6.0-test7 oops in proc_pid_stat Olaf Hering
@ 2003-10-09 22:04 ` Linus Torvalds
  2003-10-10  6:50   ` 2.6.0-test7 DEBUG_PAGEALLOC oops Mike Galbraith
  2003-10-10  7:28   ` 2.6.0-test7 oops in proc_pid_stat Olaf Hering
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2003-10-09 22:04 UTC (permalink / raw)
  To: Olaf Hering, linux-kernel, Taner Halicioglu

Olaf Hering wrote:
> 
> Linux version 2.6.0-test7 (olaf@zert152) (gcc version 3.2.2) #2 SMP Thu
> Oct 9 08:49:29 CEST 2003
> 
> Unable to handle kernel NULL pointer dereference at virtual address
0000003c

Ok, this seems to be due to the move of the job control fields from
the task structure to the signal structure.

That looks like a bad idea, and the best thing to do is likely to just
revert the whole thing.

If you are a BK user, do a "bk changes" to find the ChangeSet that says
"[PATCH] move job control fields from task_struct to", and just do a

        bk cset -xX.XXXX

where X.XXXX is the changeset number in your tree (that will depend on
exactly what else is in your tree).

                Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-09 22:04 ` Linus Torvalds
@ 2003-10-10  6:50   ` Mike Galbraith
  2003-10-10 16:52     ` Zwane Mwaikambo
  2003-10-10  7:28   ` 2.6.0-test7 oops in proc_pid_stat Olaf Hering
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2003-10-10  6:50 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 93 bytes --]

Greetings,

Enabling page allocation debugging produced the attached repeatable oops.

	-Mike

[-- Attachment #2: Type: text/plain, Size: 6798 bytes --]

Linux version 2.6.0-test7 (root@mikeg) (gcc version gcc-2.95.3 20010315 (release)) #118 Fri Oct 10 08:20:35 CEST 2003
Video mode to be used for restore is f00
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000007ff0000 (usable)
 BIOS-e820: 0000000007ff0000 - 0000000007ff3000 (ACPI NVS)
 BIOS-e820: 0000000007ff3000 - 0000000008000000 (ACPI data)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
127MB LOWMEM available.
On node 0 totalpages: 32752
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 28656 pages, LIFO batch:6
  HighMem zone: 0 pages, LIFO batch:1
DMI 2.2 present.
Building zonelist for node : 0
Kernel command line: root=/dev/hda6 ro console=ttyS0,115200n8 console=tty0 apm=power-off elevator=as sb=220,5,0,6 mpu401=0x BOOT_IMAGE=260t7vir
Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!
Initializing CPU#0
PID hash table entries: 512 (order 9: 4096 bytes)
Detected 499.509 MHz processor.
Console: colour VGA+ 80x25
Memory: 125276k/131008k available (1668k kernel code, 5196k reserved, 663k data, 296k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 985.08 BogoMIPS
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel Pentium III (Katmai) stepping 03
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 499.0050 MHz.
..... host bus clock speed is 99.0809 MHz.
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb1a0, last bus=1
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router VIA [1106/0596] at 0000:00:07.0
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
VFS: Disk quotas dquot_6.5.1
Activating ISA DMA hang workarounds.
pty: 256 Unix98 ptys configured
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
agpgart: Detected VIA Apollo Pro 133 chipset
agpgart: Maximum main memory to use for agp memory: 94M
agpgart: AGP aperture is 64M @ 0xe0000000
[drm] Initialized r128 2.5.0 20030725 on minor 0
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Linux Tulip driver version 1.1.13 (May 11, 2002)
PCI: Found IRQ 9 for device 0000:00:0f.0
PCI: Sharing IRQ 9 with 0000:00:0c.0
eth0: ADMtek Comet rev 17 at 0xc8823000, 00:04:5A:64:94:34, IRQ 9.
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c596b (rev 11) IDE UDMA66 controller on pci0000:00:07.1
    ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:DMA, hdd:pio
hda: IBM-DJNA-352030, ATA DISK drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: MATSHITADVD-ROM SR-8583A, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 39876480 sectors (20416 MB) w/1966KiB Cache, CHS=39560/16/63
 hda: hda1 hda2 hda3 < hda5 hda6 hda7 hda8 >
end_request: I/O error, dev hdc, sector 0
hdc: ATAPI 32X DVD-ROM drive, 512kB Cache, DMA
Uniform CD-ROM driver Revision: 3.12
mice: PS/2 mouse device common for all mice
input: PS/2 Generic Mouse on isa0060/serio1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Translated Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
i2c /dev entries driver
Advanced Linux Sound Architecture Driver Version 0.9.7 (Thu Sep 25 19:16:36 2003 UTC).
PCI: Found IRQ 9 for device 0000:00:0c.0
PCI: Sharing IRQ 9 with 0000:00:0f.0
ALSA device list:
  #0: Yamaha DS-XG PCI (YMF740C) at 0xec000000, irq 9
NET: Registered protocol family 2
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
NET: Registered protocol family 1
NET: Registered protocol family 17
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 296k freed
Adding 265064k swap on /dev/hda2.  Priority:2 extents:1
blk: queue c7af4df8, I/O limit 4095Mb (mask 0xffffffff)
EXT3 FS on hda6, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Unable to handle kernel paging request at virtual address c034a000
 printing eip:
c0134d5a
*pde = 00102027
*pte = 0034a000
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c0134d5a>]    Not tainted
EFLAGS: 00010002
EIP is at store_stackinfo+0x4e/0x80
eax: 00000000   ebx: c7802f98   ecx: c0301390   edx: c030138c
esi: c0349ffe   edi: 017e0008   ebp: c0349da6   esp: c0349d96
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0348000 task=c02fcbe0)
Stack: c78371dc c11731d8 c7802000 00000060 c0349dd6 c0136aa8 c11731d8 c7802000 
       0000006b c7802f58 c7af4df8 c7fec428 00001000 c0131b6c c7d2ef78 00000086 
       c0349de6 c0131b6c c11731d8 c7802f58 c0349e02 c0131b3e c7802f58 c11731d8 
Call Trace:
 [<c0136aa8>] kmem_cache_free+0x218/0x294
 [<c0131b6c>] mempool_free_slab+0x10/0x14
 [<c0131b6c>] mempool_free_slab+0x10/0x14
 [<c0131b3e>] mempool_free+0x7a/0x84
 [<c01e3984>] __blk_put_request+0x74/0x88
 [<c01e4716>] end_that_request_last+0x62/0x7c
 [<c01f0be3>] ide_end_request+0xf3/0x124
 [<c01f9f68>] default_end_request+0x14/0x18
 [<c0201df8>] ide_dma_intr+0x60/0x98
 [<c01f2024>] ide_intr+0x108/0x17c
 [<c0201d98>] ide_dma_intr+0x0/0x98
 [<c010a423>] handle_IRQ_event+0x2b/0x58
 [<c010a70e>] do_IRQ+0x92/0x130
 [<c0109014>] common_interrupt+0x18/0x20
 [<c010a423>] 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 oops in proc_pid_stat
  2003-10-09 22:04 ` Linus Torvalds
  2003-10-10  6:50   ` 2.6.0-test7 DEBUG_PAGEALLOC oops Mike Galbraith
@ 2003-10-10  7:28   ` Olaf Hering
  1 sibling, 0 replies; 18+ messages in thread
From: Olaf Hering @ 2003-10-10  7:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Taner Halicioglu

 On Thu, Oct 09, Linus Torvalds wrote:

> Olaf Hering wrote:
> > 
> > Linux version 2.6.0-test7 (olaf@zert152) (gcc version 3.2.2) #2 SMP Thu
> > Oct 9 08:49:29 CEST 2003
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address
> 0000003c
> 
> Ok, this seems to be due to the move of the job control fields from
> the task structure to the signal structure.
> 
> That looks like a bad idea, and the best thing to do is likely to just
> revert the whole thing.

I have reverted these two patches and fixed the reject in sched.h, no
oops since 9 hours.

# 03/10/05      akpm@osdl.org[torvalds] 1.1451.4.16
# [PATCH] move job control fields from task_struct to

# 03/10/05      akpm@osdl.org[torvalds] 1.1451.4.17
# [PATCH] fix "compat ioctl consolidation" for "move job

-- 
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-10  6:50   ` 2.6.0-test7 DEBUG_PAGEALLOC oops Mike Galbraith
@ 2003-10-10 16:52     ` Zwane Mwaikambo
  2003-10-11  7:01       ` Mike Galbraith
  0 siblings, 1 reply; 18+ messages in thread
From: Zwane Mwaikambo @ 2003-10-10 16:52 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

On Fri, 10 Oct 2003, Mike Galbraith wrote:

> Greetings,
> 
> Enabling page allocation debugging produced the attached repeatable oops.

There is an open bugzilla for this, i'd appreciate it if you could follow 
up there.

Thanks

http://bugzilla.kernel.org/show_bug.cgi?id=973

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-10 16:52     ` Zwane Mwaikambo
@ 2003-10-11  7:01       ` Mike Galbraith
  2003-10-11  7:03         ` Zwane Mwaikambo
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2003-10-11  7:01 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: linux-kernel

At 12:52 PM 10/10/2003 -0400, Zwane Mwaikambo wrote:
>On Fri, 10 Oct 2003, Mike Galbraith wrote:
>
> > Greetings,
> >
> > Enabling page allocation debugging produced the attached repeatable oops.
>
>There is an open bugzilla for this, i'd appreciate it if you could follow
>up there.
>
>Thanks
>
>http://bugzilla.kernel.org/show_bug.cgi?id=973

403.

(i'll go poke around the source instead)

         -Mike 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-11  7:01       ` Mike Galbraith
@ 2003-10-11  7:03         ` Zwane Mwaikambo
  0 siblings, 0 replies; 18+ messages in thread
From: Zwane Mwaikambo @ 2003-10-11  7:03 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

On Sat, 11 Oct 2003, Mike Galbraith wrote:

> At 12:52 PM 10/10/2003 -0400, Zwane Mwaikambo wrote:
> >http://bugzilla.kernel.org/show_bug.cgi?id=973
> 
> 403.
> 
> (i'll go poke around the source instead)

Sorry, someone broke the kernel.org bugzilla URL, but you can still access 
it via;

http://bugme.osdl.org/show_bug.cgi?id=973

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-12  6:58           ` Manfred Spraul
  2003-10-12  8:52             ` Mike Galbraith
@ 2003-10-12 22:36             ` Thomas Molina
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Molina @ 2003-10-12 22:36 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Mike Galbraith, Zwane Mwaikambo, linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 361 bytes --]

On Sun, 12 Oct 2003, Manfred Spraul wrote:

> Could you try the attached patch?
> It updates the end of stack detection to handle unaligned stacks.


I've attached a rediff of your patch against test7 bitkeeper.  It has the 
kstack_end function moved up above ASSEMBLY as suggested by Mike.  I've 
tested this version and it works for me (tm).  Thanks a bunch.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 2004 bytes --]

diff -ur linux-2.5-tma/arch/i386/kernel/traps.c linux-2.5-tm/arch/i386/kernel/traps.c
--- linux-2.5-tma/arch/i386/kernel/traps.c	2003-10-12 16:16:56.000000000 -0400
+++ linux-2.5-tm/arch/i386/kernel/traps.c	2003-10-12 15:05:48.000000000 -0400
@@ -104,7 +104,7 @@
 #ifdef CONFIG_KALLSYMS
 	printk("\n");
 #endif
-	while (((long) stack & (THREAD_SIZE-1)) != 0) {
+	while (!kstack_end(stack)) {
 		addr = *stack++;
 		if (kernel_text_address(addr)) {
 			printk(" [<%08lx>] ", addr);
@@ -138,7 +138,7 @@
 
 	stack = esp;
 	for(i = 0; i < kstack_depth_to_print; i++) {
-		if (((long) stack & (THREAD_SIZE-1)) == 0)
+		if (kstack_end(stack))
 			break;
 		if (i && ((i % 8) == 0))
 			printk("\n       ");
diff -ur linux-2.5-tma/include/asm-i386/thread_info.h linux-2.5-tm/include/asm-i386/thread_info.h
--- linux-2.5-tma/include/asm-i386/thread_info.h	2003-10-12 16:16:28.000000000 -0400
+++ linux-2.5-tm/include/asm-i386/thread_info.h	2003-10-12 15:19:09.000000000 -0400
@@ -92,6 +92,16 @@
 #define get_thread_info(ti) get_task_struct((ti)->task)
 #define put_thread_info(ti) put_task_struct((ti)->task)
 
+static inline int kstack_end(void *addr)
+{
+        unsigned long offset = (unsigned long)addr & (THREAD_SIZE-1);
+
+        /* Some APM bios versions misalign the stack */
+        if (offset == 0 || offset > (THREAD_SIZE-sizeof(void*)))
+                        return 1;
+        return 0;
+}
+
 #else /* !__ASSEMBLY__ */
 
 /* how to get the thread information struct from ASM */
diff -ur linux-2.5-tma/mm/slab.c linux-2.5-tm/mm/slab.c
--- linux-2.5-tma/mm/slab.c	2003-10-12 16:24:14.000000000 -0400
+++ linux-2.5-tm/mm/slab.c	2003-10-12 15:05:48.000000000 -0400
@@ -862,7 +862,7 @@
 		unsigned long *sptr = &caller;
 		unsigned long svalue;
 
-		while (((long) sptr & (THREAD_SIZE-1)) != 0) {
+		while (!kstack_end(sptr)) {
 			svalue = *sptr++;
 			if (kernel_text_address(svalue)) {
 				*addr++=svalue;
Only in linux-2.5-tm: stack.patch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-12 12:08               ` Thomas Molina
@ 2003-10-12 14:13                 ` Thomas Molina
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Molina @ 2003-10-12 14:13 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Manfred Spraul, Zwane Mwaikambo, linux-kernel

On Sun, 12 Oct 2003, Thomas Molina wrote:

> On Sun, 12 Oct 2003, Mike Galbraith wrote:
> 
> > At 08:58 AM 10/12/2003 +0200, Manfred Spraul wrote:
> > >Could you try the attached patch?
> > >It updates the end of stack detection to handle unaligned stacks.
> > 
> > Works fine.  (modulo moving kstack_end above ASSEMBLY)
> 
> I'm the one with bugzilla 973.  I'm trying the patch with a source tree 
> synced up from bk this morning and having a few problems.  My in-laws are 
> visiting today, so my work on this will be intermittent.  I am interested, 
> however.

Note to self:  next time read the whole message, including the part in 
parenthesis.  The patch, modulo Mike's modulo (move the function where I 
was told to move the function), does indeed work fine.  

Testing continues, but thanks!


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-12  8:52             ` Mike Galbraith
@ 2003-10-12 12:08               ` Thomas Molina
  2003-10-12 14:13                 ` Thomas Molina
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Molina @ 2003-10-12 12:08 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Manfred Spraul, Zwane Mwaikambo, linux-kernel

On Sun, 12 Oct 2003, Mike Galbraith wrote:

> At 08:58 AM 10/12/2003 +0200, Manfred Spraul wrote:
> >Could you try the attached patch?
> >It updates the end of stack detection to handle unaligned stacks.
> 
> Works fine.  (modulo moving kstack_end above ASSEMBLY)

I'm the one with bugzilla 973.  I'm trying the patch with a source tree 
synced up from bk this morning and having a few problems.  My in-laws are 
visiting today, so my work on this will be intermittent.  I am interested, 
however.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-12  6:58           ` Manfred Spraul
@ 2003-10-12  8:52             ` Mike Galbraith
  2003-10-12 12:08               ` Thomas Molina
  2003-10-12 22:36             ` Thomas Molina
  1 sibling, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2003-10-12  8:52 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Zwane Mwaikambo, linux-kernel

At 08:58 AM 10/12/2003 +0200, Manfred Spraul wrote:
>Could you try the attached patch?
>It updates the end of stack detection to handle unaligned stacks.

Works fine.  (modulo moving kstack_end above ASSEMBLY)

         Thanks,

         -Mike 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-12  5:11         ` Mike Galbraith
@ 2003-10-12  6:58           ` Manfred Spraul
  2003-10-12  8:52             ` Mike Galbraith
  2003-10-12 22:36             ` Thomas Molina
  0 siblings, 2 replies; 18+ messages in thread
From: Manfred Spraul @ 2003-10-12  6:58 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Zwane Mwaikambo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 116 bytes --]

Could you try the attached patch?
It updates the end of stack detection to handle unaligned stacks.

--
    Manfred

[-- Attachment #2: patch-end-of-stack --]
[-- Type: text/plain, Size: 1625 bytes --]

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 6
//  SUBLEVEL = 0
//  EXTRAVERSION = -test7
--- 2.6/include/asm-i386/thread_info.h	2003-10-09 21:20:00.000000000 +0200
+++ build-2.6/include/asm-i386/thread_info.h	2003-10-12 08:50:12.000000000 +0200
@@ -101,6 +101,16 @@
 
 #endif
 
+static inline int kstack_end(void *addr)
+{
+	unsigned long offset = (unsigned long)addr & (THREAD_SIZE-1);
+
+	/* Some APM bios versions misalign the stack */
+	if (offset == 0 || offset > (THREAD_SIZE-sizeof(void*)))
+			return 1;
+	return 0;
+}
+
 /*
  * thread information flags
  * - these are process state flags that various assembly files may need to access
--- 2.6/mm/slab.c	2003-10-09 21:23:19.000000000 +0200
+++ build-2.6/mm/slab.c	2003-10-12 08:51:13.000000000 +0200
@@ -862,7 +862,7 @@
 		unsigned long *sptr = &caller;
 		unsigned long svalue;
 
-		while (((long) sptr & (THREAD_SIZE-1)) != 0) {
+		while (!kstack_end(sptr)) {
 			svalue = *sptr++;
 			if (kernel_text_address(svalue)) {
 				*addr++=svalue;
--- 2.6/arch/i386/kernel/traps.c	2003-10-09 21:23:03.000000000 +0200
+++ build-2.6/arch/i386/kernel/traps.c	2003-10-12 08:50:41.000000000 +0200
@@ -104,7 +104,7 @@
 #ifdef CONFIG_KALLSYMS
 	printk("\n");
 #endif
-	while (((long) stack & (THREAD_SIZE-1)) != 0) {
+	while (!kstack_end(stack)) {
 		addr = *stack++;
 		if (kernel_text_address(addr)) {
 			printk(" [<%08lx>] ", addr);
@@ -138,7 +138,7 @@
 
 	stack = esp;
 	for(i = 0; i < kstack_depth_to_print; i++) {
-		if (((long) stack & (THREAD_SIZE-1)) == 0)
+		if (kstack_end(stack))
 			break;
 		if (i && ((i % 8) == 0))
 			printk("\n       ");

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-11 17:34       ` Manfred Spraul
@ 2003-10-12  5:11         ` Mike Galbraith
  2003-10-12  6:58           ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2003-10-12  5:11 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Zwane Mwaikambo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

At 07:34 PM 10/11/2003 +0200, Manfred Spraul wrote:
>Mike Galbraith wrote:
>
>>
>>Ok, you want do_IRQ assembler, correct?
>
>No - I need the function that was interrupted by common_interrupt.

Aha. (/me sees light [a wee bit late])

>I found only one valid function pointer in the stack dump above 
>common_interrupt:
>
>0xc0112a13, EBP=0xc0349f88
>
>Could you look it up in your System.map?

It's in apm_bios_call_simple()...

>Which power management do you use? apm or acpi?

...and booting with apm=off cures the explosions.  (uhoh... crud bios?)

         -Mike

[-- Attachment #2: Type: text/plain, Size: 5305 bytes --]

00000134 <apm_bios_call_simple>:
     134:	55                   	push   %ebp
     135:	b8 00 e0 ff ff       	mov    $0xffffe000,%eax
     13a:	89 e5                	mov    %esp,%ebp
     13c:	21 e0                	and    %esp,%eax
     13e:	83 ec 1c             	sub    $0x1c,%esp
     141:	57                   	push   %edi
     142:	56                   	push   %esi
     143:	53                   	push   %ebx
}

/**
 *	apm_bios_call_simple	-	make a simple APM BIOS 32bit call
 *	@func: APM function to invoke
 *	@ebx_in: EBX register value for BIOS call
 *	@ecx_in: ECX register value for BIOS call
 *	@eax: EAX register on return from the BIOS call
 *
 *	Make a BIOS call that does only returns one value, or just status.
 *	If there is an error, then the error code is returned in AH
 *	(bits 8-15 of eax) and this function returns non-zero. This is
 *	used for simpler BIOS operations. This call may hold interrupts
 *	off for a long time on some laptops.
 */

static u8 apm_bios_call_simple(u32 func, u32 ebx_in, u32 ecx_in, u32 *eax)
{
     144:	8b 5d 0c             	mov    0xc(%ebp),%ebx
     147:	8b 4d 10             	mov    0x10(%ebp),%ecx
	u8			error;
	APM_DECL_SEGS
	unsigned long		flags;
	cpumask_t		cpus;
	int			cpu;
	struct desc_struct	save_desc_40;


	cpus = apm_save_cpus();
	
	cpu = get_cpu();
     14a:	ff 40 14             	incl   0x14(%eax)
	save_desc_40 = cpu_gdt_table[cpu][0x40 / 8];
     14d:	a1 40 00 00 00       	mov    0x40,%eax
     152:	8b 15 44 00 00 00    	mov    0x44,%edx
     158:	89 45 f0             	mov    %eax,0xfffffff0(%ebp)
     15b:	89 55 f4             	mov    %edx,0xfffffff4(%ebp)
	cpu_gdt_table[cpu][0x40 / 8] = bad_bios_desc;
     15e:	a1 34 00 00 00       	mov    0x34,%eax
     163:	8b 15 38 00 00 00    	mov    0x38,%edx
     169:	a3 40 00 00 00       	mov    %eax,0x40
     16e:	89 15 44 00 00 00    	mov    %edx,0x44

	local_save_flags(flags);
     174:	9c                   	pushf  
     175:	8f 45 ec             	popl   0xffffffec(%ebp)
	APM_DO_CLI;
     178:	83 3d 20 00 00 00 00 	cmpl   $0x0,0x20
     17f:	74 03                	je     184 <apm_bios_call_simple+0x50>
     181:	fb                   	sti    
     182:	eb 01                	jmp    185 <apm_bios_call_simple+0x51>
     184:	fa                   	cli    
	APM_DO_SAVE_SEGS;
     185:	8c 65 fc             	movl   %fs,0xfffffffc(%ebp)
     188:	8c 6d f8             	movl   %gs,0xfffffff8(%ebp)
	/*
	 * N.B. We do NOT need a cld after the BIOS call
	 * because we always save and restore the flags.
	 */
	__asm__ __volatile__(APM_DO_ZERO_SEGS
     18b:	8b 45 08             	mov    0x8(%ebp),%eax
     18e:	1e                   	push   %ds
     18f:	06                   	push   %es
     190:	31 d2                	xor    %edx,%edx
     192:	8e da                	mov    %edx,%ds
     194:	8e c2                	mov    %edx,%es
     196:	8e e2                	mov    %edx,%fs
     198:	8e ea                	mov    %edx,%gs
     19a:	57                   	push   %edi
     19b:	55                   	push   %ebp
     19c:	2e ff 1d 20 00 00 00 	lcall  *%cs:0x20
     1a3:	0f 92 c3             	setb   %bl <== 0xc0112a13 is HERE
     1a6:	5d                   	pop    %ebp
     1a7:	5f                   	pop    %edi
     1a8:	07                   	pop    %es
     1a9:	1f                   	pop    %ds
     1aa:	89 45 e8             	mov    %eax,0xffffffe8(%ebp)
     1ad:	8b 45 14             	mov    0x14(%ebp),%eax
     1b0:	8b 55 e8             	mov    0xffffffe8(%ebp),%edx
     1b3:	89 10                	mov    %edx,(%eax)
	error = apm_bios_call_simple_asm(func, ebx_in, ecx_in, eax);
	APM_DO_RESTORE_SEGS;
     1b5:	8e 65 fc             	movl   0xfffffffc(%ebp),%fs
     1b8:	8e 6d f8             	movl   0xfffffff8(%ebp),%gs
	local_irq_restore(flags);
     1bb:	ff 75 ec             	pushl  0xffffffec(%ebp)
     1be:	9d                   	popf   
	cpu_gdt_table[smp_processor_id()][0x40 / 8] = save_desc_40;
     1bf:	8b 45 f0             	mov    0xfffffff0(%ebp),%eax
     1c2:	8b 55 f4             	mov    0xfffffff4(%ebp),%edx
     1c5:	a3 40 00 00 00       	mov    %eax,0x40
     1ca:	89 15 44 00 00 00    	mov    %edx,0x44
/* how to get the thread information struct from C */
static inline struct thread_info *current_thread_info(void)
{
	struct thread_info *ti;
	__asm__("andl %%esp,%0; ":"=r" (ti) : "0" (~8191UL));
     1d0:	b8 00 e0 ff ff       	mov    $0xffffe000,%eax
     1d5:	21 e0                	and    %esp,%eax
	put_cpu();
     1d7:	ff 48 14             	decl   0x14(%eax)
#endif

static __inline__ int constant_test_bit(int nr, const volatile unsigned long * addr)
{
	return ((1UL << (nr & 31)) & (((const volatile unsigned int *) addr)[nr >> 5])) != 0;
     1da:	8b 40 08             	mov    0x8(%eax),%eax
     1dd:	a8 08                	test   $0x8,%al
     1df:	74 05                	je     1e6 <apm_bios_call_simple+0xb2>
     1e1:	e8 fc ff ff ff       	call   1e2 <apm_bios_call_simple+0xae>
	apm_restore_cpus(cpus);
	return error;
     1e6:	0f b6 c3             	movzbl %bl,%eax
     1e9:	5b                   	pop    %ebx
     1ea:	5e                   	pop    %esi
     1eb:	5f                   	pop    %edi
     1ec:	89 ec                	mov    %ebp,%esp
     1ee:	5d                   	pop    %ebp
     1ef:	c3                   	ret    

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-11 15:52     ` Mike Galbraith
@ 2003-10-11 17:34       ` Manfred Spraul
  2003-10-12  5:11         ` Mike Galbraith
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2003-10-11 17:34 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Zwane Mwaikambo, linux-kernel

Mike Galbraith wrote:

>
> Ok, you want do_IRQ assembler, correct?

No - I need the function that was interrupted by common_interrupt.

I found only one valid function pointer in the stack dump above 
common_interrupt:

0xc0112a13, EBP=0xc0349f88

Could you look it up in your System.map?
Which power management do you use? apm or acpi?

--
    Manfred


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-11 12:06   ` Manfred Spraul
@ 2003-10-11 15:52     ` Mike Galbraith
  2003-10-11 17:34       ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2003-10-11 15:52 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Zwane Mwaikambo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

At 02:06 PM 10/11/2003 +0200, Manfred Spraul wrote:

>I'd increase kstack_depth_to_print to 140. Do not increase it too much, 
>otherwise it will oops due to the misaligned stack.
>Then check the EBP values: They are pushed after the return address. The 
>return addresses are listed in the Call Trace section.
>Example:
>0xc01316aa8 pushes 0xc0349dd6 -> odd.
>0xc0131b6c pushes 0xc0349de6 -> odd.
>
>0xc0131b3e pushes c0349e02 -> odd.
>
>Proper values for EBP are multiples of 4. One you find where the stack got 
>misaligned, disassemble the offending function (or send me the .o file)

Ok, you want do_IRQ assembler, correct?

fwiw, building with gcc-3.3 didn't help, nor did disabling frame pointers.

         -Mike 

[-- Attachment #2: Type: text/plain, Size: 7563 bytes --]

Unable to handle kernel paging request at virtual address c034a000
 printing eip:
c0134d5a
*pde = 00102027
*pte = 0034a000
Oops: 0000 [#1]
CPU:    0
EIP:    0060:[<c0134d5a>]    Not tainted
EFLAGS: 00010002
EIP is at store_stackinfo+0x4e/0x80
eax: 00000000   ebx: c7436f88   ecx: c0301390   edx: c030138c
esi: c0349ffe   edi: 017e0008   ebp: c0349d46   esp: c0349d36
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0348000 task=c02fcbe0)
Stack: c74b59c4 c1173488 c7436000 00000070 c0349d76 c0136aa8 c1173488 c7436000 
       0000006b c72b1bb4 c72b1bb4 c7fecafc 00001000 c0131b6c c7f75f78 00000086 
       c0349d86 c0131b6c c1173488 c7436f38 c0349da2 c0131b3e c7436f38 c1173488 
       c72b1bb4 c72b1bb4 00000001 c0349db6 c014cf76 c7436f38 c7fecafc c7436f2c 
       c0349dc2 c014d158 c72b1bb4 c0349dda c016548a c72b1bb4 c72b1bb4 00000000 
       c04a070c c0349df6 c014d830 c72b1bb4 00010000 00000000 00010000 c72b1bb4 
       c0349e26 c01e45af c72b1bb4 00010000 00000000 c7467f58 00000080 c04a070c 
       00000000 00000000 00000000 00000000 c0349e3a c01e4697 c7467f58 00000001 
       00010000 c0349e62 c01f0b7d c7467f58 00000001 00000080 c04a070c 00000000 
       c04a070c 00000001 00000296 c0349e76 c01f9f68 c04a070c 00000001 00000080 
       c0349e96 c0201df8 c04a070c 00000001 00000080 c0348000 c7af5ef8 50005ef8 
       c0349eba c01f2024 c04a070c c7b2ca48 04000001 00000000 c0201d98 c04a0660 
       00000292 c0349eda c010a423 0000000e c7af5ef8 c0349f0a c0348000 c0348000 
       0000000e c0349f02 c010a70e 0000000e c0349f0a c7b2ca48 00000000 00000010 
       c0348000 c7b2ca48 c0346bc0 000000c0 c0109014 00000000 00000000 00000000 
       00000010 c0348000 000000c0 00005305 9f280000 9f280000 ffffff0e 00008026 
       000000b8 00000216 803600c0 9f8800b8 2a13c034 0060c011 9f880000 8000c034 
       007bc034 007b0000 9c2c0000 0010fffb 
Call Trace:
 [<c0136aa8>] kmem_cache_free+0x218/0x294
 [<c0131b6c>] mempool_free_slab+0x10/0x14
 [<c0131b6c>] mempool_free_slab+0x10/0x14
 [<c0131b3e>] mempool_free+0x7a/0x84
 [<c014cf76>] bio_destructor+0x36/0x4c
 [<c014d158>] bio_put+0x2c/0x30
 [<c016548a>] mpage_end_io_read+0x6a/0x78
 [<c014d830>] bio_endio+0x50/0x5c
 [<c01e45af>] __end_that_request_first+0xef/0x1c0
 [<c01e4697>] end_that_request_first+0x17/0x1c
 [<c01f0b7d>] ide_end_request+0x8d/0x124
 [<c01f9f68>] default_end_request+0x14/0x18
 [<c0201df8>] ide_dma_intr+0x60/0x98
 [<c01f2024>] ide_intr+0x108/0x17c
 [<c0201d98>] ide_dma_intr+0x0/0x98
 [<c010a423>] handle_IRQ_event+0x2b/0x58
 [<c010a70e>] do_IRQ+0x92/0x130
 [<c0109014>] common_interrupt+0x18/0x20


000004ec <do_IRQ>:
 4ec:	55                   	push   %ebp
 4ed:	89 e5                	mov    %esp,%ebp
 4ef:	83 ec 08             	sub    $0x8,%esp
 4f2:	57                   	push   %edi
 4f3:	56                   	push   %esi
 4f4:	53                   	push   %ebx
 4f5:	be 00 e0 ff ff       	mov    $0xffffe000,%esi
 4fa:	0f b6 7d 2c          	movzbl 0x2c(%ebp),%edi
 4fe:	21 e6                	and    %esp,%esi
 500:	89 fb                	mov    %edi,%ebx
 502:	c1 e3 05             	shl    $0x5,%ebx
 505:	8d 83 00 00 00 00    	lea    0x0(%ebx),%eax
 50b:	89 45 fc             	mov    %eax,0xfffffffc(%ebp)
 50e:	81 46 14 00 00 01 00 	addl   $0x10000,0x14(%esi)
 515:	ff 04 bd 1c 00 00 00 	incl   0x1c(,%edi,4)
 51c:	ff 46 14             	incl   0x14(%esi)
 51f:	8b 83 04 00 00 00    	mov    0x4(%ebx),%eax
 525:	57                   	push   %edi
 526:	8b 40 14             	mov    0x14(%eax),%eax
 529:	ff d0                	call   *%eax
 52b:	8b 83 00 00 00 00    	mov    0x0(%ebx),%eax
 531:	83 c4 04             	add    $0x4,%esp
 534:	24 d7                	and    $0xd7,%al
 536:	c7 45 f8 00 00 00 00 	movl   $0x0,0xfffffff8(%ebp)
 53d:	0c 04                	or     $0x4,%al
 53f:	a8 03                	test   $0x3,%al
 541:	75 0d                	jne    550 <do_IRQ+0x64>
 543:	8b 93 08 00 00 00    	mov    0x8(%ebx),%edx
 549:	24 fb                	and    $0xfb,%al
 54b:	89 55 f8             	mov    %edx,0xfffffff8(%ebp)
 54e:	0c 01                	or     $0x1,%al
 550:	89 83 00 00 00 00    	mov    %eax,0x0(%ebx)
 556:	83 7d f8 00          	cmpl   $0x0,0xfffffff8(%ebp)
 55a:	74 60                	je     5bc <do_IRQ+0xd0>
 55c:	89 f3                	mov    %esi,%ebx
 55e:	89 f6                	mov    %esi,%esi
 560:	ff 4b 14             	decl   0x14(%ebx)
 563:	8b 43 08             	mov    0x8(%ebx),%eax
 566:	a8 08                	test   $0x8,%al
 568:	74 06                	je     570 <do_IRQ+0x84>
 56a:	e8 fc ff ff ff       	call   56b <do_IRQ+0x7f>
 56f:	90                   	nop    
 570:	8b 45 f8             	mov    0xfffffff8(%ebp),%eax
 573:	8d 55 08             	lea    0x8(%ebp),%edx
 576:	50                   	push   %eax
 577:	52                   	push   %edx
 578:	57                   	push   %edi
 579:	e8 fc ff ff ff       	call   57a <do_IRQ+0x8e>
 57e:	83 c4 0c             	add    $0xc,%esp
 581:	ff 43 14             	incl   0x14(%ebx)
 584:	83 3d 04 00 00 00 00 	cmpl   $0x0,0x4
 58b:	75 13                	jne    5a0 <do_IRQ+0xb4>
 58d:	50                   	push   %eax
 58e:	8b 45 fc             	mov    0xfffffffc(%ebp),%eax
 591:	50                   	push   %eax
 592:	57                   	push   %edi
 593:	e8 e0 fd ff ff       	call   378 <note_interrupt>
 598:	83 c4 0c             	add    $0xc,%esp
 59b:	90                   	nop    
 59c:	8d 74 26 00          	lea    0x0(%esi,1),%esi
 5a0:	8b 55 fc             	mov    0xfffffffc(%ebp),%edx
 5a3:	8b 02                	mov    (%edx),%eax
 5a5:	89 c2                	mov    %eax,%edx
 5a7:	a8 04                	test   $0x4,%al
 5a9:	74 0a                	je     5b5 <do_IRQ+0xc9>
 5ab:	83 e2 fb             	and    $0xfffffffb,%edx
 5ae:	8b 45 fc             	mov    0xfffffffc(%ebp),%eax
 5b1:	89 10                	mov    %edx,(%eax)
 5b3:	eb ab                	jmp    560 <do_IRQ+0x74>
 5b5:	24 fe                	and    $0xfe,%al
 5b7:	8b 55 fc             	mov    0xfffffffc(%ebp),%edx
 5ba:	89 02                	mov    %eax,(%edx)
 5bc:	8b 55 fc             	mov    0xfffffffc(%ebp),%edx
 5bf:	8b 42 04             	mov    0x4(%edx),%eax
 5c2:	57                   	push   %edi
 5c3:	8b 40 18             	mov    0x18(%eax),%eax
 5c6:	ff d0                	call   *%eax
 5c8:	83 c4 04             	add    $0x4,%esp
 5cb:	bb 00 e0 ff ff       	mov    $0xffffe000,%ebx
 5d0:	21 e3                	and    %esp,%ebx
 5d2:	ff 4b 14             	decl   0x14(%ebx)
 5d5:	8b 43 08             	mov    0x8(%ebx),%eax
 5d8:	a8 08                	test   $0x8,%al
 5da:	74 05                	je     5e1 <do_IRQ+0xf5>
 5dc:	e8 fc ff ff ff       	call   5dd <do_IRQ+0xf1>
 5e1:	8b 43 14             	mov    0x14(%ebx),%eax
 5e4:	05 01 00 ff ff       	add    $0xffff0001,%eax
 5e9:	89 43 14             	mov    %eax,0x14(%ebx)
 5ec:	a9 00 ff ff 00       	test   $0xffff00,%eax
 5f1:	75 0e                	jne    601 <do_IRQ+0x115>
 5f3:	83 3d 00 00 00 00 00 	cmpl   $0x0,0x0
 5fa:	74 05                	je     601 <do_IRQ+0x115>
 5fc:	e8 fc ff ff ff       	call   5fd <do_IRQ+0x111>
 601:	b8 00 e0 ff ff       	mov    $0xffffe000,%eax
 606:	21 e0                	and    %esp,%eax
 608:	ff 48 14             	decl   0x14(%eax)
 60b:	b8 01 00 00 00       	mov    $0x1,%eax
 610:	8d 65 ec             	lea    0xffffffec(%ebp),%esp
 613:	5b                   	pop    %ebx
 614:	5e                   	pop    %esi
 615:	5f                   	pop    %edi
 616:	89 ec                	mov    %ebp,%esp
 618:	5d                   	pop    %ebp
 619:	c3                   	ret    

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-11 11:15 ` Mike Galbraith
@ 2003-10-11 12:06   ` Manfred Spraul
  2003-10-11 15:52     ` Mike Galbraith
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2003-10-11 12:06 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Zwane Mwaikambo, linux-kernel

Mike Galbraith wrote:

>>
>>> eax: 00000000   ebx: c7802f98   ecx: c0301390   edx: c030138c
>>> esi: c0349ffe   edi: 017e0008   ebp: c0349da6   esp: c0349d96
>>> ds: 007b   es: 007b   ss: 0068
>>> Process swapper (pid: 0, threadinfo=c0348000 task=c02fcbe0)
>>
>> The esp value is sane, the stack is at 0xc0348000, and the fault is 
>> at 'a000: just behind the end of the stack. 
>
I'm blind. The esp value is the culprit:
It's not 32-bit aligned. Someone misaligned the stack, and thus
    if(stack_ptr & (THREAD_SIZE-1))
didn't notice the end of the stack.
The generated assembly of store_slabinfo is correct:
     1d2:       f7 c6 ff 1f 00 00       test   $0x1fff,%esi
Check sptr against THREAD_SIZE -1
     1d8:       74 21                   je     1fb <store_stackinfo+0x6f>
     1da:       8b 3e                   mov    (%esi),%edi
And load *sptr.


>> It looks like store stackinfo accesses memory behind the end of the 
>> stack.
>
>
> Yeah, I'm trying to figure out why.  The below (if dang mailer 
> actually inlines it) kludge allows me to boot, so I suppose I need to 
> ponder addr wrt _stext and _etext.

Wrong direction:  Right now it crashes because it runs over the end of 
the stack.
With your patch applied, the allocated object is too small to hold all 
entries on the stack, and thus store_stackinfo aborts before it runs 
into the next page.

I'd increase kstack_depth_to_print to 140. Do not increase it too much, 
otherwise it will oops due to the misaligned stack.
Then check the EBP values: They are pushed after the return address. The 
return addresses are listed in the Call Trace section.
Example:
0xc01316aa8 pushes 0xc0349dd6 -> odd.
0xc0131b6c pushes 0xc0349de6 -> odd.

0xc0131b3e pushes c0349e02 -> odd.

Proper values for EBP are multiples of 4. One you find where the stack got misaligned, disassemble the offending function (or send me the .o file)


--
    Manfred


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
  2003-10-11  9:37 2.6.0-test7 DEBUG_PAGEALLOC oops Manfred Spraul
@ 2003-10-11 11:15 ` Mike Galbraith
  2003-10-11 12:06   ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Galbraith @ 2003-10-11 11:15 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Zwane Mwaikambo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1349 bytes --]

At 11:37 AM 10/11/2003 +0200, Manfred Spraul wrote:
>Mike wrote:
>
>>Unable to handle kernel paging request at virtual address c034a000
>>printing eip:
>>c0134d5a
>>*pde = 00102027
>>*pte = 0034a000
>Fault trying to read from address 0xc034a000: the page is not mapped.
>
>>Oops: 0000 [#1]
>>CPU:    0
>>EIP:    0060:[<c0134d5a>]    Not tainted
>>EFLAGS: 00010002
>>EIP is at store_stackinfo+0x4e/0x80
>In store_stackinfo: the function stores a backtrace of the last 
>kmem_cache_free caller in the object - might be useful, and the memory is 
>not used.
>
>>eax: 00000000   ebx: c7802f98   ecx: c0301390   edx: c030138c
>>esi: c0349ffe   edi: 017e0008   ebp: c0349da6   esp: c0349d96
>>ds: 007b   es: 007b   ss: 0068
>>Process swapper (pid: 0, threadinfo=c0348000 task=c02fcbe0)
>The esp value is sane, the stack is at 0xc0348000, and the fault is at 
>'a000: just behind the end of the stack.
>I assume the fauling line is
>                        svalue = *sptr++;

Exactly.

>It looks like store stackinfo accesses memory behind the end of the stack.

Yeah, I'm trying to figure out why.  The below (if dang mailer actually 
inlines it) kludge allows me to boot, so I suppose I need to ponder addr 
wrt _stext and _etext.

>Which gcc version do you use? Could you send me mm/slab.o?

gcc-2.95.3.  slab.o coming via private mail.

         -Mike 

[-- Attachment #2: Type: text/plain, Size: 467 bytes --]

--- mm/slab.c.org	Sat Oct 11 12:25:24 2003
+++ mm/slab.c	Sat Oct 11 12:26:02 2003
@@ -864,12 +864,11 @@
 
 		while (((long) sptr & (THREAD_SIZE-1)) != 0) {
 			svalue = *sptr++;
-			if (kernel_text_address(svalue)) {
+			if (kernel_text_address(svalue))
 				*addr++=svalue;
-				size -= sizeof(unsigned long);
-				if (size <= sizeof(unsigned long))
-					break;
-			}
+			size -= sizeof(unsigned long);
+			if (size <= sizeof(unsigned long))
+				break;
 		}
 
 	}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: 2.6.0-test7 DEBUG_PAGEALLOC oops
@ 2003-10-11  9:37 Manfred Spraul
  2003-10-11 11:15 ` Mike Galbraith
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2003-10-11  9:37 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Zwane Mwaikambo, linux-kernel

Mike wrote:

>Unable to handle kernel paging request at virtual address c034a000
> printing eip:
>c0134d5a
>*pde = 00102027
>*pte = 0034a000
>
Fault trying to read from address 0xc034a000: the page is not mapped.

>Oops: 0000 [#1]
>CPU:    0
>EIP:    0060:[<c0134d5a>]    Not tainted
>EFLAGS: 00010002
>EIP is at store_stackinfo+0x4e/0x80
>
In store_stackinfo: the function stores a backtrace of the last 
kmem_cache_free caller in the object - might be useful, and the memory 
is not used.

>eax: 00000000   ebx: c7802f98   ecx: c0301390   edx: c030138c
>esi: c0349ffe   edi: 017e0008   ebp: c0349da6   esp: c0349d96
>ds: 007b   es: 007b   ss: 0068
>Process swapper (pid: 0, threadinfo=c0348000 task=c02fcbe0)
>
The esp value is sane, the stack is at 0xc0348000, and the fault is at 
'a000: just behind the end of the stack.
I assume the fauling line is
                        svalue = *sptr++;

It looks like store stackinfo accesses memory behind the end of the stack.
Which gcc version do you use? Could you send me mm/slab.o?

--
    Manfred


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-10-12 22:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-09 13:04 2.6.0-test7 oops in proc_pid_stat Olaf Hering
2003-10-09 22:04 ` Linus Torvalds
2003-10-10  6:50   ` 2.6.0-test7 DEBUG_PAGEALLOC oops Mike Galbraith
2003-10-10 16:52     ` Zwane Mwaikambo
2003-10-11  7:01       ` Mike Galbraith
2003-10-11  7:03         ` Zwane Mwaikambo
2003-10-10  7:28   ` 2.6.0-test7 oops in proc_pid_stat Olaf Hering
2003-10-11  9:37 2.6.0-test7 DEBUG_PAGEALLOC oops Manfred Spraul
2003-10-11 11:15 ` Mike Galbraith
2003-10-11 12:06   ` Manfred Spraul
2003-10-11 15:52     ` Mike Galbraith
2003-10-11 17:34       ` Manfred Spraul
2003-10-12  5:11         ` Mike Galbraith
2003-10-12  6:58           ` Manfred Spraul
2003-10-12  8:52             ` Mike Galbraith
2003-10-12 12:08               ` Thomas Molina
2003-10-12 14:13                 ` Thomas Molina
2003-10-12 22:36             ` Thomas Molina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.