linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux 2.2.18pre25
@ 2000-12-07 20:03 Alan Cox
  2000-12-07 23:23 ` Miquel van Smoorenburg
  2000-12-08  0:20 ` Andrea Arcangeli
  0 siblings, 2 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-07 20:03 UTC (permalink / raw)
  To: linux-kernel


Ok we believe the VM crash looping printing error messages is now fixed.
Marcelo finally figured it out and my 8Mb 486 has been running 2.2.18pre
with that fix and stably[1].

So I figure this is it for 2.2.18, subject to evidence to the contrary

Alan


2.2.18pre25
o	Fix tight loop spinning reporting out of free	(Marcelo Tosatti)
	pages
o	Back out ppa changes causing problems for a	(Tim Waugh)
	few users
o	Set master enable on UHCI USB controllers	(Erik Mouw)
o	RIO DCD fixes				(Patrick van de Lageweg)
o	3c59x.c support for 3c556B			(Andrew Morton)
o	S390 cleanups for loopsperjiffy etc		(Kurt Roeckx)
o	Fix acceleport 4 SMP hangs			(Al Borchers)
o	Fix drivers/char/Makefile buglet		(Chip Salzenberg)
o	PPC syscall table fix				(Chip Salzenberg)
o	Move HID sysctl to avoid clash in 2.4 case	(Tom Rini)
o	Small symbios check condition fix		(Gérard Roudier)
o	Fix Makefile module version check		(Eric Lammerts)
o	Fix DRM build on Sparc 				(Dave Miller)
o	Work around Dallas D4201 PCM8 audio bug		(Thomas Sailer)
o	Fix USB memory leak				(Dan Streetman)
o	Fix ioremap fencepost error			(Chip Salzenberg)


2.2.18pre24
o	Expose put_unused_fd for modules		(Andi Kleen)
o	Fix the ps/2 mouse probe I hope			(me)
o	Fix crash in cosa driver			(Jan Kasprzak)
o	Fix procfs negative seek offset error reporting (HJ Lu)
o	Fix ext2 file limit constraints			(Andrea Arcangeli)
o	Fix lockf corner cases				(Andi Kleen, me)
o	Fix NCPfs date limits				(Igor Zhbanov)
o	Update DRM					(Chip Salzenberg)
o	Fix missing Alpha includes			(Matt Wilson)
o	Fix missing symbols on alpha			(Matt Wilson)
 
2.2.18pre23
o	Fix alpha compile problem			(Herbert Xu)
o	Scan DMI bios data to find broken laptops	(me)
o	Fix megaraid module symbols			(Michael Marxmeier)
o	Fix visor/OHCI problem				(Gerg Kroah-Hartman)
o	Fix sysctl_jiffies compile bug			(Tomasz K³oczko)
o	Init mic input low to avoid feedback		(Pete Zaitcev)
o	Fix typo in acenic headers			(Val Henson)
o	David Woodhouse has moved			(David Woodhouse)
o	Compaq raid driver update			(Charles White)
o	Fix aha1542 scribbles on errors			(Phil Stracchino)
o	Update Advansys driver to v3.3D			(Bob Frey)
o	Fix maestro ioctl locking			(Zach Brown)
o	Formatting cleanup for setup.c			(Dave Jones)
o	Fix FAT32 bugs on Alpha				(Bill Nottingham)

2.2.18pre22
o	Fix HZ assumption in USB hub driver		(Oleg Drokin)
o	Fix ndisc range check on ipv6			(Dave Miller)
o	Clear other fields in qcam VIDIOCGWIN return	(Damion de Soto)
o	Fix sparc64 includes for socket.h		(Solar Designer)
o	ELF platform was misset for Pentium IV		(Mikael Pettersson)
o	ADMTek 985 ident was wrong			(Lee Bradshaw)
o	Fix filemark status test on scsi tape		(Robin Miller)
o	Fix file/block when spacing to tape beginning	(Kai Maiksara)
o	Small ISDN documentation fixes			(Kai Germaschewski)
o	Resync icn driver with core isdn tree		(Kai Germaschewski)
o	Fix isdn loopback driver			(Kai Germaschewski)
o	Fix small leaks in lockd			(Trond Myklebust)
o	Add Pentium IV rep nop, ident etc		(Various folks,
							 notably HPA and
							 Linda Wang)
o	Update sparc default config			(Dave Miller)
o	Hopefully properly fix the megaraid problem	(Willy Tarreau, AMI
							 and others)
o	Resync tcp bits with Dave			(Dave Miller)
o	Make cpqarray provide randomness		(Nigel Metheringham)
o	Fix wavefront symbols bug			(Carlos E. Gorges)
o	Fix acenic jumbo handling when flushing ring	(Val Henson)
o	Fix ace_set_mac_addr for littleendian hosts	(Stephen Hack)
o	Fix assorted typos in the kernel		(Andries Brouwer)
o	EEPro100 fixes					(Dragan Stancevic)
o	Fix hisax _setup crash case			(David Woodhouse)
o	Fix small cdrom driver bugs			(Jens Axboe)
o	Fix remaining vmalloc corner cases		(Ben LaHaise)
o	Update USB maintainers				(Greg Kroah-Hartman)
o	Fix matroxfb doc bug				(Pavel Rabel)
o	Fix setscheduler lock inversion 		(Andrew Morton)
o	Fix scsi unload/sg ioctl oops			(Paul Clements)

2.2.18pre21
o	Environment controller update for sparc		(Eric Brower)
o	No italian translation for config.help		(Andrea Ferraris)
o	Fix type error in buz driver			(Pete Zaitcev)
o	Resnchronize Apple PowerMac codebase		(Paul Mackerras & co)
o	Merge powermac tree fixes into usb
o	Powermac input device handling changes
o	Fix console switch fonts
o	S/390 merge					(IBM S/390 folks)
					(Merge grunt work done by Kurt Roeckx)
o	Make knfsd TCP an option 			(me)
o	Drop cisco info packets (0x2000)		(Ivan Passos)
o	Add belkin USB serial cable			(William Greathouse)

2.2.18pre20
o	Fix ide-probe SMP build error			(Ian Morgan)
o	Fix appletalk physical layer ioctl handling	(Andi Kleen)
o	Sparc update					(Dave Miller)
o	Update Stephen Tweedie's contact info		(Stephen Tweedie)
o	Fix typo in esp and scsi_obsolete code		(Dave Miller)
o	Bonding ioctl check fix				(Willy Tarreau)
o	Fix ipv6 procfs bug				(Al Viro)
o	Report PIV in proc as family 15 and uname as	(me)
	model 6 as discussed
o	Redo Intel cache decodes as code not tables	(me)
	and add new ones  (based on updates by
	Asit Mallick & Andrew Ip)
o	Fix CMOS locking in machine_power_off paths	(me)
o	Create build tree symlinks only if insmod is
	new enough not to be confused by it		(Keith Owens)
o	Fix cmsg handling				(Philippe Troin)
o	Tiny xpds driver changes			(Dan Hollis)
o	Fix vmalloc sign bug				(Ben LaHaise)
o	SMBFS fixes/changes for find_next problems and	(Urban Widmark)
	to avoid truncate bug in netapps
o	Fix ntfs translation bug			(Anton Altaparmakov)
o	Fix sparc problem with some soundcards and the	(Jeff Garzik)
	_IOC magic
o	Update ppa driver to v2.05			(Tim Waugh)


2.2.18pre19
o	Fix transproxy socket lookup			(Val Henson)
o	Add ICS1893 PHY to the SiS900 driver		(Lei-Chun Chang)
o	Fix documentation error in matroxfb		(Vsevolod Sipakov)
o	Update IDE floppy maintainer			(Paul Bristow)
o	Fix remaining cmos locking			(Paul Gortmaker)
o	Fix sparc bitfield/compiler bits on sound	(Dave Miller)
o	Update Pegasus USB driver			(Petko Manolov)
o	Networking updates - move divert header		(Andi Kleen)
o	Add ETH_P_ATM* defines				(Matti Aarnio)
o	Fix one more missing GFP_KERNEL/sk->allocation	(Dave Miller)
o	Fix ISDN multilink handler bug			(Kai Germaschewski)
o	Fix ymfpci unload cases				(Kai Germaschewski)

2.2.18pre18
o	Fix off by one in net/ipv4/proc			(Dave Miller)
o	Move the fpu emu patch that got away		(Dave Miller)
o	K6 update for MTRR ability			(Dave Jones)
o	Fix raid1/vm deadlock				(Marcelo Tosatti)
o	Fix usb mouse userspace memory accesses		(David Woodhouse)
o	Fix xpdsl if compiled in (typo)			(Arjan van de Ven)
o	Rio fixes for modem handling. Fix a small (Patrick van de Lageweg)
	generic serial bug
o	IBMtr driver fixes for cable pulls, pcmcia	(Burt Silverman,
	behaviour etc					 Mike Sullivan)
o	Tidy up /dev/microcode messages			(Daniel Roesen)
o	Add arpfilter					(Andi Kleen)
o	IDE floppy updates for clik support, cleanups	(Paul Bristow)
o	Fix irongate handling on Alpha			(Soohoon Lee)
o	Fix HZ=100 assumption in aha152x.c		(me)
o	Fix power management handling in i810 audio	(me)
	(From an ALSA fix by Godmar Back)
o	Put the NFS block default back to 4K		(Trond Myklebust)
o	Fix misleading comment in printk code		(Riley Williams)
o	Fix fbcon scroll back/paste bug			(Herbert Xu)
o	Fix rtc_lock for ide-probe, and hd.c		(Richard Johnson)
o	Backport of 2.4 PR_GET/SET_KEEPCAPS		(Brian Brunswick)
	(from Chris Evans 2.4 code)
o	LRU list corruption fix				(Andrea Arcangeli)
o	Initial gcc 2.96+ support for kernel building	(H J Lu)
	| Not a recommended compiler for production kernels...
o	ALI silence clearing fix			(Ching-Ling Lee)
o	Fix remaining old-style use of copy_strings	(Solar Designer)
o	Better pci_resource_start macro for 2.2		(Jeff Garzik)
o	Fix nbd deadlock				(Marcelo Tosatti)

2.2.18pre17
o	Move a few escaped m68k headers into the right	(me)
	directory
o	Backport 2.4 AF_UNIX garbage collect speedups	(Dave Miller)
o	TCP fixes for NFS 				(Saadia Khan)
o	Fix USB audio hangs				(David Woodhouse)
o	Sparc64 dcache and exec fixes			(Dave Miller)
o	Fix typing crap in divert.h			(Jeff Garzik)
o	Use pkt_type in diverter, add maintainer info	(Dave Miller)
o	Fix obscure NAT problem in FIB code		(Dave Miller)
o	Fix sk->allocation in TCP sendmsg		(Marcelo Tossati)
o	Elevator fixes					(Andrea Arcangeli)
o	Allow broken_suid on NFS root			(Trond Myklebust)
o	Fix net/ipv6/proc off by one bug		(Dave Miller)
o	Fix AGP oops on Alpha				(Michal Jaegermann)
o	MSR/CPUID init call fixes			(Arjan van de Ven)
o	CS4281 sound hang fixes				(Thomas Woller)
o	AX.25 comment updates, Joerg has moved email	(Joerg Reuter)

2.2.18pre16
o	Finally get the m68k tree merged		(Andrew McPherson
							 and a cast of many)
o	Bring the sparc back in line, make it build 	(Anton Blanchard)
o	USB Bluetooth fixes/docs			(Greg Kroah-Hartman)
o	Fix auth_null credentials bug			(Hai-Pao Fan)
o	Update cpu flag names 				(Dave Jones)
o	Console 'quiet' boot option as in 2.4		(Rusty Russell)
o	Make the sx serial driver work again	(Patrick van de Lageweg)
o	Fix negotation on the SYM53C1010		(Gerard Roudier)
o	Fix alpha loops per jiffy			(Jay Estabrook)
o	Fix pegasus to work with 2.2 kernels		(Greg Kroah-Hartman)
o	Update plusb driver for 2.2.x			(Eric Ayers, 
							 Deti Fliegl)
o	Fix ohci to use __init				(Greg Kroah-Hartman)
o	/sbin/hotplug support for USB as in 2.4		(Greg Kroah-Hartman)
o	Update ksymoops url				(Keith Owens)
o	Update the changes doc about gcc 		(Petri Kaukasoina)
o	Fix AMD flag naming				(Ulrich Windl)
o	Restore old block size on devices after a
	partition scan (needed for powermac for one)	(Michael Schmitz)
o	Fix GPL naming in SubmittingDrivers		(Mike Harris)
o	NFSv3 server patches merge			(Dave Higgen)
o	CS46xx changes					(Nils Faerber)
o	Fix sys_nanosleep for >4GHz CPU changes		(me)
	(Spotted by Ben Herrenschmidt)
o	Fix pas rev D mixer				(??)
o	Fix multiple spelling errors			(André Dahlqvist)
o	ISDN updates					(Kai Germaschewski)
o	XSpeed DSL driver				(Timothy Lee, 
							 Dan Hollis)
o	IDE multi-lun/single-lun handling		(Jens Axboe)
o	Fix alpha generic trident sound support		(Rich Payne)
o	Fix PPC for loops per jiffy			(Cort Dougan)

2.2.18pre15
o	Default msdos behaviour to old (small) letters	(me)
	| An option 'big' goes with 'small'
o	Fix define collision in cpqfc			(Arjan van de Ven)
o	Fix case where scripts/kwhich isnt executable	(me)
o	Alpha FPU divide fix				(Richard Henderson)
o	Add ADMtek985 to the tulip list			(J Katz)
o	Lose excess ymfpci debugging			(Rob Landley)
o	Fix i2c bus id clash				(Russell King)
o	Update the ARM vidc driver			(Russell King)
o	Update the ARM am79c961a driver			(Russell King)
o	Fix parport_pc build with no PCI		(Russell King)
o	Fix ARM memzero					(Russell King)
o	Update ARM for __init and __setup		(Russell King)
o	Update ARM to loops_per_jiffy			(Russell King)
o	Remove arm ecard debug messages			(Russell King)
o	Fix ARM makefiles				(Russell King)
o	Fix iph5526 driver to use mdelay		(Arjan van de Ven)
o	Fix epca, dtlk, aha152x loops_per_sec bits	(Philipp Rumpf)
o	Fix smp tlb invalidate and bogomip printing	(Philipp Rumpf)
o	Fix NLS warnings				(Arjan van de Ven)
o	Fix wavfront conversion to loops_per_jiffies	(me)
o	Fix an audio problem and a sanyo changer 	(Jens Axboe)
	problem
o	Fix include bug with divert			(me)
	| Alternate fix to Willy Tarreau's
o	Fix Alpha for loops_per_jiffy			(Willy Tarreau)

2.2.18pre14
o	Reorder attributes in drm to work with gcc272	(me)
o	GNU cross compilers are foo-bar-gcc 		(Russell King)
o	Add extra strange pcnet32 ident			(Willy Tarreau)
o	Since no vendor can get which right.. use a	(Miquel van Smoorenburg)
	shell script instead
	| Please nobody tell me this fails in some bash version!
o	Should be using bash not bash2 (escaped debug)	(Petri Kaukasoina)
o	spin_unlock_irq wrong debug mode printk		(Willy Tarreau)
o	Fix pcxx for the loops changes			(Arjan van de Ven)
o	Fix ov511/via-rhine name clash			(Arjan van de Ven)
o	Fix bridge compile with loops_per_sec change	(Mitch Adair)
o	8139too driver added				(Jeff Garzik)

2.2.18pre13
o	Change udelay to use loops_per tick		(Philipp Rumpf)
	| Otherwise we bomb out at 2GHz which isnt far enough
	| away with 1.4/1.6GHz stuff due out RSN
o	Fix drivers using big delays to use mdelay	(me)
o	Fix drivers that used loops_per_sec		(Philipp Rumpf, me)
o	Fix yamaha PCI sound SMP bug			(Arjan van de Ven)
o	Change to preferred USB init fix		(David Rees)
o	Fix rio fix					(Arjan van de Ven)
o	Catch the VT but no mouse case in init/main.c	(Arjan van de Ven)
o	Fix the 'which' compiler stuff			(Horst von Brand,
							 Peter Samuelson)
	| Can someone verify for me this works on Slackware and
	| on Caldera ?
o	Add devfs include. Devfs wont be going into 2.2 (Richard Gooch)
	but this again makes it easier to do 2.2/2.4
	drivers.

2.2.18pre12
o	Fix cyrix MTRR handling bug 			(IIZUKA Daisuke)
o	Fix ymfpci poll					(me, Arjan)
o	Update radio-maestro, add Configure.help	(Adam Tla/lka>
o	Fix rio/generic serial build bug		(Marcelo Tossati)
o	USB build bug fix				(Arjan van de Ven)
o	Fix missing ac97_codec.c return value		(Arjan van de Ven)
o	Fix several warnings				(Arjan van de Ven)
o	Made the PS/2 reconnect behaviour optional	(me)
	| Its now 'psaux-reconnect' on the boot line
o	Allow for newer Hauppauge with 4 ports		(Krischan Jodies)
o	Switch sound drivers from library to object	(Arjan van de Ven)
o	Kill the not working ac97 lock on the 810	(me)
o	Automatically select older compilers for kernel
	builds on Debian and RH				(Arjan van de Ven)
o	Start volumes higher on ac97, teach the driver  (Rui Sousa)
	about 5bit and 6bit codec precision and use
	the mute bit.

2.2.18pre11
o	Kill bogus codec_id assignment			(Linus Torvalds)
o	Update codec init code to handle id right	(me)
o	Fix dead/clashing define for NFS		(Trond Myklebust)
o	Remove the find_vga crap from bttv		(me)
o	Fix return on probe failure for cadet		(Arjan van de Ven)
o	Add missing configure.help stuff from 2.4test	(Alan Ford)
o	Fix inia100/megaraid define clash		(Arjan van de Ven)
o	__xchg marked as taking volatiles		(Arjan van de Ven)
o	Fix vwsnd warning in sound core			(Arjan van de Ven)
o	wdt_pci driver should return -EIO on error	(Arjan van de Ven)
o	Fix init_adfs_fs warning			(Arjan van de Ven)
o	Fix the joystick driver option parsing		(Arjan van de Ven)
o	Update mkdep to handle // commenting		(Mike Klar)
o	Thunderlan driver typo fixes			(Torben Mathiasen)
o	Add KX133/KT133 stuff to the AGP/DRM 		(Jeff Nguyen)
o	FIx multiple card bug in eepro driver		(Aristeu Filho)
o	Initial YMF PCI native driver			(Pete Zaitcev)
	| Based on Jaroslav's ALSA driver and I've tweaked it
	| a bit and maybe broken it 8)
o	Fix procfs unlink bugs				(Willy Tarreau)
o	X.25 bugfix backport				(Henner Eisen)
o	Fix incorrect free_dma on DMAless boxes		(Boria)
o	Fix via audio driver merge			(Nick Lamb)
o	Update plusb driver to 2.4 one			(Greg Kroah-Hartman)
o	Put description info in wacom driver		(Greg Kroah-Hartman)
o	Update both UHCI drivers to match 2.4test	(Greg Kroah-Hartman)
o	Masquerade cleanup/warning fixes		(Horst von Brand)

2.2.18pre10
o	Add printk level to partition printk messages	(me)
o	Fix bluesmoke address report/serialize		(Andrea Arcangeli)
o	Add 2.4pre CPUID/MSR docs to 2.2.18pre		(Adrian Bunk)
o	Update to the 2.4pre via audio driver		(Jeff Garzik)
o	Fix small SMP race in set_current_state		(Andrea Arcangeli)
o	Fix __KERNEL__ checks in sparc headers		(Dave Miller)
o	Fix ADFS root directory bug added in pre9	(Russell King)
o	Trap incorrect swap partition sizes		(Andries Brouwer)
o	Fix nfsroot bootp/dhcp on sparc64		(Dave Miller)
o	Tidy up tcp opt parsing				(Dave Miller)
o	Check range on port range sysctl		(Dave Miller)
o	Back out erroneous i2c.h change			(Arjan van de Ven)
o	Fix trident hangs due to over zealous addition	(Eric Brombaugh)
	of midi support
o	Fix big endian/macro bug in ext2fs		(Andi Kleen)
o	Bring dabusb driver into line with 2.4		(Greg Kroah-Hartman)
o	Bring event drivers into line with 2.4		(Franz Sirl,
							 Greg Kroah-Hartman)
o	Fix usb help texts				(Greg Kroah-Hartman)
o	Generic frame diverter				(Benoit Locher)
o	Bring USB serial back into line with 2.4	(Greg Kroah-Hartman)
o	Fix DVD driver rpc state bug			(Jens Axboe)
o	Fix extra sunrpc printk				(Tim Mann)
o	USB init tidy up				(Greg Kroah-Hartman)
o	Allow PlanB video on generic PPC		(Michel Lanners)
o	Doc fixes/trim cvs logs on isdn drivers		(Kai Germaschewski)
o	USB hid, hub, ibmcam, dsbr100 devices updates	(Greg Kroah-Hartman)
o	Return EAFNOSUPPORT for out of range families
o	Fix SMP locking on floppy driver		(Jonathan Corbet)
o	Add module author info to acm.c			(Greg Kroah-Hartman)
o	Update CREDITS to reflect all the USB guys	(Greg Kroah-Hartman)
 
o	ipfw wrong allocation flag fix			(Rusty Russell)
o	Implement Sun style lockf/nfs cache barriers	(Trond Myklebust)
o	Updated ISI serial driver			(Multitech)
	| You may well need their newer firmware set/loader for the
	| later cards too

2.2.18pre9
o	Fix usb module load oops			(Thomas Sailer)
o	Bring USB boot drivers in line with 2.4t8	(Greg Kroah-Hartman)
o	And USB print drivers				(Greg Kroah-Hartman)
o	And USB Rio driver				(Greg Kroah-Hartman)
o	And USB dc2xx driver				(Greg Kroah-Hartman)
o	And USB mdc800 driver				(Greg Kroah-Hartman)
o	NFSv3 support and NFS updates			(Trond Myklebust and co)
o	Compaq 64bit/66Mhz PCI Fibrechannel driver	(Amy Vanzant-Hodge)
o	Disable microtouch driver (doesnt work in 2.2	(Greg Kroah-Hartman)
	currently)
o	Update ADFS support				(Russell King)
o	Update ARM arch specific code and includes	(Russell King)
o	Update ARM specific drivers 			(Russell King)
o	Use both fast and slow A20 gating on boot	(Kira Brown)
	| if your box doesnt boot I want to know about it...
	| Needed for stuff like the AMD Elan

2.2.18pre8
o	Fix mtrr compile bug				(Peter Blomgren)
o	Alpha PCI boot up fix				(Michal Jaegermann)
o	Fix vt/keyboard dependancy in USB config	(Arjan van de Ven)
o	Fix sound hangs on cs4281			(Tom Woller)
o	Fix Alpha vmlinuz.lds				(Andrea Arcangeli)
o	Fix CDROMPLAYTRKIND bug, allow root to open	(Jens Axboe)
	the cd door whenver.
o	Update ov511 to match 2.4			(Greg Kroah-Hartman)
o	Further devio.c fix				(Greg Kroah-Hartman)
o	Update NR_TASKS comment				(Jarkko Kovala)
o	Further sparc64 ioctl translator fixes		(Andi Kleen)

2.2.18pre7
o	Fix the AGP compile in bug			(Arjan van de Ven)
o	Revert old incorrect syncppp state change	(Ivan Passos)
o	Fix i810 rng to actually get built in		(Arjan van de Ven)
o	Megaraid compile fix, joystick, mkiss fixes	(Arjan van de Ven)
o	Kawasaki USB ethernet depends on net		(Arjan van de Ven)
o	Compaq cpqarray update				(Charles White)
o	Fix usb problem with no USB unit found		(Oleg Drokin)
o	Driver for the radio on some maestro cards	(Adam Tlalka)
o	Additional shared map support needed for sparc64(Dave Miller)
o	Fix wdt_pci when compiled in			(me, Arjan van de Ven)
o	Fix usb missing symbol when non modular		(Arjan van de Ven)
o	Identify chip and also handle MTRR for the 	(me)
	Cyrix III
o	Allow binding to all ports multicast		(Andi Kleen)
o	Bring USB docs up to date			(Greg Kroah-Hartman)
o	Bring USB devio up to date			(Greg Kroah-Hartman)
o	pci_resource_len null function for non PCI case	(Arjan van de Ven)
o	Fix synchronous write off end of disk bug	(Jari Ruusu)

2.2.18pre6
o	Fix the IDE PCI not compiling bug		(Dag Wieers)
o	Kill an escaped reference to vger.rutgers	(Dave Miller)
o	Small rtl8139 fixups				(Jeff Garzik)
o	Add USB bluetooth driver			(Greg Kroah-Hartman)
o	Fix oops in visor driver			(Greg Kroah-Hartman)
o	Remove some unneeded ext2 includes,fix a bug	(Andreas Dilger)
	in the UFS code
o	Fix rtc race between timer and rtc irq		(Andrea Arcangeli)
o	Fix slow gettimeofday SMP race			(Andrea Arcangeli)
o	Check lost_ticks in settimeofday to be more	(Andrea Arcangeli)
	precise

2.2.18pre5
o	Added older VIA ide chipsets to the not to be	(me)
	autotuned list
o	Fix crash on boot problem with __setup stuff	(me)
o	Small acenic fix				(Matt Domsch)
o	Fix hfc_pci isdn driver				(Jens David)
o	Fix smbfs configuration problem			(Urban Widmark)
o	Emu10K wrapper/build fixes			(Rui Sousa)
o	Small cleanups					(Arjan van de Ven)
o	Fix sparc32 build bug				(Horst von Brand)
o	Fix quota oops					(Martin Diehl)
o	Add i810 random number driver			(Jeff Garzik)
o	Clear suid bits on ext2 truncate as per SuS	(Andi Kleen)
o	Fix illegal use of section attributes		(Arjan van de Ven)
o	Documentation for nmi watchdog			(Marcelo Tosatti)
o	Fix uninitialised variable warnings		(Arjan van de Ven)
o	Save DR6 condition into the TSS			(Ryan Wallach)
o	Add additional __init's to the kernel	(Andrzej M. Krzysztofowicz)
o	Backport 2.4 wdt_pci driver			(JP Nollman, me)
o	AGP i810 fixes					(Chip Salzenberg)
o	UDMA support for ALI1543 & 1543C IDE devices	(ALI)
o	2.4 MSR/CPUID driver backport			(Dave Jones, 
								H Peter Anvin)
o	Fix incorrect use of kernel v user ptr in NCPfs	(Petr Vandrovec)
o	Updated scsi tape driver			(Kai Makisara)

2.2.18pre4
o	Remove the aacraid driver again, having looked	(me)
	at what is needed to make it acceptable and 
	debug it - Im dumping it back on Adaptec
o	DAC960 update					(Leonard Zubkoff)
o	Add setup vmlinuz.lds changes for Sparc		(Arjan van den Ven)
o	Sparc updates for drm, ioctl and other		(Dave Miller)
o	Megaraid driver update				(Peter Jarrett)
o	Add cd volume 0 to the amp power off on the
	crystal cs46xx					(Bill Nottingham)
o	Fix IPV6 fragment and kfree bugs		(Alexey Kuznetsov)
o	Fix emu10k build bug				(me)
o	Emu10K driver upgrade. Adds emu-aps support	(Rui Sousa)
o	Updated IBM serveraid driver to 4.20		(IBM)
o	Ext2 block handling cleanup from 2.4		(Al Viro)
o	Make the ATI128 driver modular			(Marcelo Tosatti)
o	Fix megaraid build bug with gcc 2.7.2		(Arjan van de Ven)
o	Fix some of the dquot races			(Jan Kara)
o	x86 setup code cleanup				(Dave Jones)
o	Implement 2.4 compatible __setup and __initcall	(Arjan van de Ven)
o	Tidy up smp_call_function stuff			(Keitaro Yosimura)
o	Remove 2.4 compat glue from cs4281 driver	(Marcelo Tosatti)
o	Fix minor bugs in bluesmoke now someone actually
	has a faulty CPU and logs			(me)
o	Fix definition of IPV6_TLV_ROUTERALERT		(Dave Miller)
o	Fix in6_addr, ip_decrease_ttl, other		(Dave Miller)
	minor bits
o	cp932 fixes					(Kazuki Yasumatsu)
o	Updated gdth driver				(Andreas Koepf)
o	Acenic update					(Jes Sorensen)
o	Update USB serial drivers			(Greg Kroah-Hartman)
o	Move pci_resource_len into pci compat		(Marcelo Tosatti)

2.2.18pre3  (versus 2.2.17pre20)
o	Clean up most of the compatibility macros	(me)
	that various people use. I've systematically
	moved the 100% correct ones to the headers
	used in 2.4
o	Fix newly introduced bug in kmem_cache_shrink	(Daniel Roesen)
o	Further updates to symbios drivers		(Gerhard Roudier)
o	Remove emu10K warning and mtrr warning		(Daniel Roesen)
o	Fix symbol clash between cs4281 and esssolo1	(Arjan van de Ven)
o	Fix acenic non modular/module build issues	(Arjan van de Ven)
o	Fix bug in alpha csum_partial_copy that could	(Herbert Xu)
	cause spurious EFAULTs
o	Yet another eepro100 variant sighted		(Torben Mathiasen)
o	Minor microcode.c final tweak			(Daniel Roesen)
o	Document that ATIFB is now modular		(Marcelo Tosatti)
o	Parport update					(Tim Waugh)
o	First set of ext2 updates/fixes			(Al Viro)
o	Bring smbfs back into line with 2.2		(Urban Widmark)
	| This should make OS/2 work again
o	Fix S/390 _stext (still doesnt build dasd)	(Kurt Roeckx)
o	Remove unused vars in arch/i386/kernel/bios32.c	(Daniel Roesen)
o	Update the DHCP initrd support			(Chip Salzenberg)
o	Allow opening empty scsi removables like IDE
	with O_NONBLOCK (needed for some ioctls)	(Chip Salzenberg)
o	Back out vibra mixer change
o	Fix error returns in sbni driver		(Dawson Engler)
o	Initial merge of the aacraid driver		(Adaptec)
	| Much deuglification left to be done here
o	Report megaraid: on obscure megaraid error	(Daniel Deimert)
	strings
o	Add another CS4299 id string			(Mulder Tjeerd)

2.2.18pre2  (versus 2.2.17pre20)

o	Fix the compile problems with microcode.c	(Dave Jones, 
							 Daniel Roesen)
o	GDTH driver update 				(Achim Leubner)
o	Fix mathsemu miuse of casting with asm		(??)
o	Make msnd_pinnacle driver build on Alpha
o	Acenic 0.45 fixes				(Chip Salzenberg)
o	Compaq CISS driver (SA 5300)			(Charles White, 
	+ cleanups					 me)
	+ gcc 2.95 fixup
o	Modularise pm2fb and atyfb
o	Upgrade AMI Megaraid driver to 1.09		(AMI)
o	Add DEC HSG80 and COMPAQ 'logical volume' to
	scsi multilun list
o	SK PCI FDDI driver support			(Schneider & Koch)
o	Linux 2.2 USB backport				(Vojtech Pavlik)
	backport 3 + further fixes from the USB list
	+ mm/slab.c fix for cache destroy
o	AGP driver backport				(XFree86, Precision
	DRM driver backport				 Insight, XiG, HJ Lu, 
							 VA Linux, 
							 and others)

2.2.18pre1  (versus 2.2.17pre20)

o	Update symbios/ncr driver to 1.7.0/3.4.0	(Gerhard Roudier)
o	Updated ATP870U driver				(ACard)
o	Avoid running tq_scheduler stuff sometimes with	(Andrea Arcangeli)
	interrupts off
o	Futher cpu setup updates			(me)
o	IBM MCA scsi driver updates			(Michael Lang)
o	Fix incorrect out of memory handling in bttv	(Dawson Engler)
o	Fix incorrect out of memory handling in buz	(Dawson Engler)
o	Fix incorrect out of memory handling in qpmouse	(Dawson Engler)
o	Fix error handling memory leak in ipddp		(Dawson Engler)
o	Fix error handling memory leak in sdla		(Dawson Engler)
o	Fix error handling memory leak in softoss	(Dawson Engler)
o	Fix error handling memory leak in ixj 		(Dawson Engler)
o	Fix error handling memory leak in ax25		(Dawson Engler)
o	Merge the microcode driver from 2.4 into 2.2	(Tigran Aivazian)
o	Fix skbuff handling bug in the smc9194 driver	(Arnaldo Melo)
o	Make vfat use the same generation rules as	(H. Kawaguchi,
	in windows 9x					 Chip Salzenberg)
o	Fix oops in the CPQ array driver		(Arnaldo Melo)
o	Fix ac97 codec not setting the id field		(Bill Nottingham)
o	Further work on the cs46xx/CD power bits	(me)
o	Synclink updates 				(Paul Fulgham)
o	Synclink init bug fix				(Arnaldo Melo)
o	Handle odd interrupts from toshiba floppies	(Alain Knaff)
o	Fix trident driver build on nautilus Alpha	(Peter Petrakis)
o	Add later sb16 imix support tot he sb driver	(Massimo Dal Zotto)
o	Ignore luns that report can be connected, but	(Matt Domsch)
	not currently
o	Fix dereference after kfree in uart401.c	(Dawson Engler)
o	Return correct SuS error code for an unknown	(Herbert Xu)
	socket family
o	Add sub window clipping to the bttv driver	(Thomas Jacob)
o	Fix nfs cache locked messages			(Trond Myklebust)
o	Fix the modutils misdocumentation		(Martin Douda)
o	Remove bogus biosparm code from seagate.c	(Andries Brouwer)
o	Return correct error code on failed fasync set	(Chip Salzenberg)
o	Handle dcc resume with newer irc clients when	(Scottie Shore)
	doing an irq masq

--
Alan Cox <alan@lxorguk.ukuu.org.uk>
Red Hat Kernel Hacker
& Linux 2.2 Maintainer                        Brainbench MVP for TCP/IP
http://www.linux.org.uk/diary                 http://www.brainbench.com


[1] It does have the page aging patch too, but I want to merge that in 
2.2.19pre so we can study any suprises it causes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-07 20:03 Linux 2.2.18pre25 Alan Cox
@ 2000-12-07 23:23 ` Miquel van Smoorenburg
  2000-12-07 23:41   ` Alan Cox
  2000-12-08  0:20 ` Andrea Arcangeli
  1 sibling, 1 reply; 72+ messages in thread
From: Miquel van Smoorenburg @ 2000-12-07 23:23 UTC (permalink / raw)
  To: linux-kernel

In article <E1447Fx-0002vA-00@the-village.bc.nu>,
Alan Cox  <alan@lxorguk.ukuu.org.uk> wrote:
>So I figure this is it for 2.2.18, subject to evidence to the contrary

Megaraid still needs fixing. I sent you the patch twice, so have
other people, but it still isn't fixed. The

megaBase &= PCI_BASE_ADDRESS_MEM_MASK;

...

megaBase &= PCI_BASE_ADDRESS_IO_MASK;

is removed by the 2.2.18 version (read the patch) and that breaks
older megaraid cards.

Existing megaraid system with 2.2.x kernels WILL break with 2.2.18

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-07 23:23 ` Miquel van Smoorenburg
@ 2000-12-07 23:41   ` Alan Cox
  2000-12-08  9:47     ` Willy Tarreau
  0 siblings, 1 reply; 72+ messages in thread
From: Alan Cox @ 2000-12-07 23:41 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

> Megaraid still needs fixing. I sent you the patch twice, so have
> other people, but it still isn't fixed. The

I asked people to explain why it was needed. I am still waiting. It is a 
patch that does nothing. I will not put random deep magic into the kernel.

I have no reason to believe the current driver in 2.2.18pre24 does not work,
have you tried that specific kernel ? 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-07 20:03 Linux 2.2.18pre25 Alan Cox
  2000-12-07 23:23 ` Miquel van Smoorenburg
@ 2000-12-08  0:20 ` Andrea Arcangeli
  2000-12-08  0:27   ` Alan Cox
  2000-12-08 17:02   ` Linux 2.2.18pre25 Martin Kacer
  1 sibling, 2 replies; 72+ messages in thread
From: Andrea Arcangeli @ 2000-12-08  0:20 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Thu, Dec 07, 2000 at 08:03:00PM +0000, Alan Cox wrote:
> 
> Ok we believe the VM crash looping printing error messages is now fixed.

Such bug can't generate crashes. Did you ever reproduced crashes on your 8Mb
486 with 2.2.18pre24?

> Marcelo finally figured it out and my 8Mb 486 has been running 2.2.18pre
> with that fix and stably[1].

diff -urN 2.2.18pre24/mm/filemap.c 2.2.18pre25/mm/filemap.c
--- 2.2.18pre24/mm/filemap.c	Wed Nov 29 19:28:29 2000
+++ 2.2.18pre25/mm/filemap.c	Fri Dec  8 00:41:45 2000
@@ -220,8 +220,10 @@
 			 * throttling.
 			 */
 
-			if (!try_to_free_buffers(page, wait))
+			if (!try_to_free_buffers(page, wait)) { 
+				if(--count < 0) break;
 				goto refresh_clock;
+			}
 			return 1;
 		}
 
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.18pre24aa1/00_account-failed-buffer-tries-1

--- 2.2.17pre19/mm/filemap.c	Tue Aug 22 14:54:13 2000
+++ /tmp/filemap.c	Thu Aug 24 01:05:50 2000
@@ -179,6 +179,8 @@
 		if ((gfp_mask & __GFP_DMA) && !PageDMA(page))
 			continue;
 
+		count--;
+
 		/*
 		 * Is it a page swap page? If so, we want to
 		 * drop it if it is no longer used, even if it
@@ -224,7 +226,7 @@
 			return 1;
 		}
 
-	} while (--count > 0);
+	} while (count > 0);
 	return 0;
 }
 
lftp> pwd
ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.17pre19
								 ^^^^^^^^^^^
lftp> ls -l account-failed-buffer-tries-1 
-rw-r--r--   1 korg     korg          407 Sep  5 22:43 account-failed-buffer-tries-1
					  ^^^^^^
lftp> 

Only difference is that pre25 keeps decreasing `count' for locked, mapped and
out-of-zone pages and that means it will still fail to shrink the cache when it
looks at the unlucky part of the physical memory while the
account-failed-buffer-tries-1 intentionally doesn't decrease `count' in that
cases to avoid failing in such unlucky cases.

account-failed-buffer-tries-1 is included in VM-global-7 and it was
described in the 2.2.18pre21aa2 email to l-k (CC'ed you) in date Fri, 17 Nov
2000 18:54:43 +0100:

[..]
00_account-failed-buffer-tries-1

        Account also the failed buffer tries during shrink_mmap. (me)  
        (this is included in the VM-global that I maintain against vanilla
        2.2.x btw)
[..]

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  0:20 ` Andrea Arcangeli
@ 2000-12-08  0:27   ` Alan Cox
  2000-12-08  0:41     ` Andrea Arcangeli
  2000-12-08  0:44     ` Signal 11 Rainer Mager
  2000-12-08 17:02   ` Linux 2.2.18pre25 Martin Kacer
  1 sibling, 2 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-08  0:27 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Alan Cox, linux-kernel

> Such bug can't generate crashes. Did you ever reproduced crashes on your 8Mb
> 486 with 2.2.18pre24?

Yes. Every 20 minutes or so quite reliably. With that change it has yet to 
crash (its actually running that + page aging + another minor tweak so it
doesnt return success on page aging until we have a clump of free pages.

With just the page aging patch it performed way better but still hung.

> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.2/2.2.18pre24aa1/00_account-failed-buffer-tries-1
>

Oh well ;) 
 
> account-failed-buffer-tries-1 is included in VM-global-7 and it was
> described in the 2.2.18pre21aa2 email to l-k (CC'ed you) in date Fri, 17 Nov
> 2000 18:54:43 +0100:

The problem is its hard to know which of your patches depend on what, and
the complete set is large to say the least.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  0:27   ` Alan Cox
@ 2000-12-08  0:41     ` Andrea Arcangeli
  2000-12-08  0:47       ` Alan Cox
  2000-12-08  0:44     ` Signal 11 Rainer Mager
  1 sibling, 1 reply; 72+ messages in thread
From: Andrea Arcangeli @ 2000-12-08  0:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Fri, Dec 08, 2000 at 12:27:58AM +0000, Alan Cox wrote:
> The problem is its hard to know which of your patches depend on what, and
> the complete set is large to say the least.

That's why I use a `proposed' directory that only contains patches that can be
applied to your tree, in this case it was:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/proposed/v2.2/2.2.18pre2/VM-global-2.2.18pre2-6.bz2

(note: the above is outdated so it's not anymore suggested for inclusion of
course)

I sumbitted most of the not-feature-oriented stuff at pre2 time and I plan to
re-submit after 2.2.18 is released.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Signal 11
  2000-12-08  0:27   ` Alan Cox
  2000-12-08  0:41     ` Andrea Arcangeli
@ 2000-12-08  0:44     ` Rainer Mager
  2000-12-08  1:05       ` Jeff V. Merkey
                         ` (4 more replies)
  1 sibling, 5 replies; 72+ messages in thread
From: Rainer Mager @ 2000-12-08  0:44 UTC (permalink / raw)
  To: linux-kernel

Hi all,

	I've searched around for a answer to this with no real luck yet. If anyone
has some ideas I'd be very grateful.

	I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with
a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1.
Anyway, about once every 2-3 days X will spontaneously die and the only info
I get back is that it was because of signal 11.
	I've heard that signal 11 can be related to bad hardware, most often
memory, but I've done a good bit of testing on this and the system seems ok.
What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no
errors. Actually this only worked when running from the console. When
running from X the machine locked up (although no signal 11).
	The only info I've gotten back from the XFree86 mailing lists so far is
that there are known and wide spread problems with SMP and these types of
problems. Can anyone comment on this? Are there known SMP problems? What is
the current resolution plan?


Thanks,

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  0:41     ` Andrea Arcangeli
@ 2000-12-08  0:47       ` Alan Cox
  2000-12-08  1:27         ` Linus Torvalds
  0 siblings, 1 reply; 72+ messages in thread
From: Alan Cox @ 2000-12-08  0:47 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Alan Cox, linux-kernel

> (note: the above is outdated so it's not anymore suggested for inclusion of
> course)
> 
> I sumbitted most of the not-feature-oriented stuff at pre2 time and I plan to
> re-submit after 2.2.18 is released.

Excellent. I've been trying to avoid VM fixes for 2.2.18 to stop stuff getting
muddled together and hard to debug. Running with page aging convinces me that
2.2.19 we need to sort some of the vm issues out badly, and make it faster than
2.4test 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  0:44     ` Signal 11 Rainer Mager
@ 2000-12-08  1:05       ` Jeff V. Merkey
  2000-12-08  1:09       ` Michel LESPINASSE
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-08  1:05 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel


I have previously reported this error (about three months ago) on 2.4
with XFree 3.3.6.  If you are running RedHat 6.2, then you are running
this X Server.  It also shows up on Calders'a 2.4 eDesktop.  It appears
to be something with glib 2.1 < versions on 2.4.  I also see it with
secure shell 1.2.27 on 2.4.  I've also seen it on RH 7.0 on 2.4 kernels
as well, but only with SSH.

Jeff

Rainer Mager wrote:
> 
> Hi all,
> 
>         I've searched around for a answer to this with no real luck yet. If anyone
> has some ideas I'd be very grateful.
> 
>         I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with
> a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1.
> Anyway, about once every 2-3 days X will spontaneously die and the only info
> I get back is that it was because of signal 11.
>         I've heard that signal 11 can be related to bad hardware, most often
> memory, but I've done a good bit of testing on this and the system seems ok.
> What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no
> errors. Actually this only worked when running from the console. When
> running from X the machine locked up (although no signal 11).
>         The only info I've gotten back from the XFree86 mailing lists so far is
> that there are known and wide spread problems with SMP and these types of
> problems. Can anyone comment on this? Are there known SMP problems? What is
> the current resolution plan?
> 
> Thanks,
> 
> --Rainer
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  0:44     ` Signal 11 Rainer Mager
  2000-12-08  1:05       ` Jeff V. Merkey
@ 2000-12-08  1:09       ` Michel LESPINASSE
  2000-12-08  2:14         ` Rainer Mager
  2000-12-08  1:20       ` Andi Kleen
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 72+ messages in thread
From: Michel LESPINASSE @ 2000-12-08  1:09 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel

On Fri, Dec 08, 2000 at 09:44:29AM +0900, Rainer Mager wrote:

> 	I've heard that signal 11 can be related to bad hardware, most
> often memory, but I've done a good bit of testing on this and the
> system seems ok.  What I did was to run the VA Linux Cerberos(sp?)
> test for 15 hours+ with no errors. Actually this only worked when
> running from the console. When running from X the machine locked up
> (although no signal 11).

Don't be so quick to dismiss the "bad hardware" possibility. It is
really quite common these days. And, some cases of bad hardware are
not detected using simple tests like memtest86. (I'm not sure exactly
what cerberos does, do you have a link for it ?).

My recommandation would be to take a big source tree (say, a bit
bigger than the amount of RAM you have), and run repetitive
tar+detar+diff -ru runs on it for 48 hours or so. If your hardware
runs OK, diff should not report any inconsistencies. I found this test
to be quite reliable to detect hardware problems. If you have several
disk controllers, run one instance of the test on each of
them. Additionally you could run a background task to keep the CPU at
100% - a simple while 1 loop would do.

-- 
Michel "Walken" LESPINASSE
Of course I think I'm right. If I thought I was wrong, I'd change my mind.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  0:44     ` Signal 11 Rainer Mager
  2000-12-08  1:05       ` Jeff V. Merkey
  2000-12-08  1:09       ` Michel LESPINASSE
@ 2000-12-08  1:20       ` Andi Kleen
  2000-12-08  1:24         ` Jeff V. Merkey
  2000-12-08  1:58       ` Richard B. Johnson
  2000-12-08  9:46       ` David Woodhouse
  4 siblings, 1 reply; 72+ messages in thread
From: Andi Kleen @ 2000-12-08  1:20 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel

On Fri, Dec 08, 2000 at 09:44:29AM +0900, Rainer Mager wrote:
> 	I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with
> a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1.
> Anyway, about once every 2-3 days X will spontaneously die and the only info
> I get back is that it was because of signal 11.
> 	I've heard that signal 11 can be related to bad hardware, most often
> memory, but I've done a good bit of testing on this and the system seems ok.
> What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no
> errors. Actually this only worked when running from the console. When
> running from X the machine locked up (although no signal 11).
> 	The only info I've gotten back from the XFree86 mailing lists so far is
> that there are known and wide spread problems with SMP and these types of
> problems. Can anyone comment on this? Are there known SMP problems? What is
> the current resolution plan?

signal 11 just means that the program crashed with a segmentation fault. 

Sounds like a X Server bug. You should probably contact XFree86, not
linux-kernel


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:20       ` Andi Kleen
@ 2000-12-08  1:24         ` Jeff V. Merkey
  2000-12-08  1:40           ` Andi Kleen
  2000-12-08  2:28           ` davej
  0 siblings, 2 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-08  1:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Rainer Mager, linux-kernel


Andi,

It's related to some change in 2.4 vs. 2.2.  There are other programs
affected other than X, SSH also get's spurious signal 11's now and again
with 2.4 and glibc <= 2.1 and it does not occur on 2.2.

Jeff

Andi Kleen wrote:
> 
> On Fri, Dec 08, 2000 at 09:44:29AM +0900, Rainer Mager wrote:
> >       I recently upgraded to a new machine. It is running RedHat 6.2 Linux (with
> > a SMP 2.4.0test[8-11] kernel) and has a Matrox G400 in it. X is 4.0.1.
> > Anyway, about once every 2-3 days X will spontaneously die and the only info
> > I get back is that it was because of signal 11.
> >       I've heard that signal 11 can be related to bad hardware, most often
> > memory, but I've done a good bit of testing on this and the system seems ok.
> > What I did was to run the VA Linux Cerberos(sp?) test for 15 hours+ with no
> > errors. Actually this only worked when running from the console. When
> > running from X the machine locked up (although no signal 11).
> >       The only info I've gotten back from the XFree86 mailing lists so far is
> > that there are known and wide spread problems with SMP and these types of
> > problems. Can anyone comment on this? Are there known SMP problems? What is
> > the current resolution plan?
> 
> signal 11 just means that the program crashed with a segmentation fault.
> 
> Sounds like a X Server bug. You should probably contact XFree86, not
> linux-kernel
> 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  0:47       ` Alan Cox
@ 2000-12-08  1:27         ` Linus Torvalds
  0 siblings, 0 replies; 72+ messages in thread
From: Linus Torvalds @ 2000-12-08  1:27 UTC (permalink / raw)
  To: linux-kernel

In article <E144Bhf-0003GN-00@the-village.bc.nu>,
Alan Cox  <alan@lxorguk.ukuu.org.uk> wrote:
>
>Excellent. I've been trying to avoid VM fixes for 2.2.18 to stop stuff getting
>muddled together and hard to debug. Running with page aging convinces me that
>2.2.19 we need to sort some of the vm issues out badly, and make it faster than
>2.4test 8)

Ahh.. The challenge is out!

You and me. Mano a mano. 

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:24         ` Jeff V. Merkey
@ 2000-12-08  1:40           ` Andi Kleen
  2000-12-08  1:43             ` Jeff V. Merkey
  2000-12-08  2:28           ` davej
  1 sibling, 1 reply; 72+ messages in thread
From: Andi Kleen @ 2000-12-08  1:40 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Andi Kleen, Rainer Mager, linux-kernel

On Thu, Dec 07, 2000 at 06:24:34PM -0700, Jeff V. Merkey wrote:
> 
> Andi,
> 
> It's related to some change in 2.4 vs. 2.2.  There are other programs
> affected other than X, SSH also get's spurious signal 11's now and again
> with 2.4 and glibc <= 2.1 and it does not occur on 2.2.

So have you enabled core dumps and actually looked at the core dumps 
of the programs using gdb to see where they crashed ? 



-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:40           ` Andi Kleen
@ 2000-12-08  1:43             ` Jeff V. Merkey
  2000-12-08  1:55               ` Jeff V. Merkey
  2000-12-08 19:20               ` Dr. Kelsey Hudson
  0 siblings, 2 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-08  1:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Rainer Mager, linux-kernel



Andi Kleen wrote:
> 
> On Thu, Dec 07, 2000 at 06:24:34PM -0700, Jeff V. Merkey wrote:
> >
> > Andi,
> >
> > It's related to some change in 2.4 vs. 2.2.  There are other programs
> > affected other than X, SSH also get's spurious signal 11's now and again
> > with 2.4 and glibc <= 2.1 and it does not occur on 2.2.
> 
> So have you enabled core dumps and actually looked at the core dumps
> of the programs using gdb to see where they crashed ?

Yes.  I can only get the SSH crash when I am running remotely from the
house over the internet, and it only shows then when running a build in
jobserver mode (parallel build).  The X problem seems related as well,
since it's related to (usually) NetScape spawing off a forked process. 
I will attempt to recreate tonight, and post the core dump file.  

Jeff 





> 
> -Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:43             ` Jeff V. Merkey
@ 2000-12-08  1:55               ` Jeff V. Merkey
  2000-12-08 19:20               ` Dr. Kelsey Hudson
  1 sibling, 0 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-08  1:55 UTC (permalink / raw)
  To: Andi Kleen, Rainer Mager, linux-kernel



"Jeff V. Merkey" wrote:
> 
> > So have you enabled core dumps and actually looked at the core dumps
> > of the programs using gdb to see where they crashed ?
> 
> Yes.  I can only get the SSH crash when I am running remotely from the
> house over the internet, and it only shows then when running a build in
> jobserver mode (parallel build).  The X problem seems related as well,
> since it's related to (usually) NetScape spawing off a forked process.
> I will attempt to recreate tonight, and post the core dump file.

BTW.  Were I to wager a guess, I would guess it's related to the paging
problems in 2.4 when a process gets cloned, since everytime I have seen
it, it happens when a child process gets forked then accesses the cloned
data from the parent.  In the previous core dumps, it always puked right
after a call to fork() when the child process attempted to WRITE (not
read) data in the program.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  0:44     ` Signal 11 Rainer Mager
                         ` (2 preceding siblings ...)
  2000-12-08  1:20       ` Andi Kleen
@ 2000-12-08  1:58       ` Richard B. Johnson
  2000-12-08  2:04         ` Peter Samuelson
  2000-12-08  9:46       ` David Woodhouse
  4 siblings, 1 reply; 72+ messages in thread
From: Richard B. Johnson @ 2000-12-08  1:58 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel

On Fri, 8 Dec 2000, Rainer Mager wrote:

> Hi all,
> 
> 	I've searched around for a answer to this with no real luck yet. If anyone
> has some ideas I'd be very grateful.

Signal 11 just means that you "seg-faulted". This is usually caused
by a coding error. However, if you have tools (like the C compiler)
that has been running fine, but starts to seg-fault, this points to
the very real possibility of a hardware error.

Modern RAM (with no error correction), running outside of its
timing specifications, is often the culpret. Even power supplies can
cause this problem. All you need is a single-bit error in a pointer's
value and -- signal 11.

Also, a bad opcode fetched from RAM with an error, also traps to
the same handler.

Do:

char main[]={0xff,0xff,0xff,0xff};


Compile and run this (it will compile!). You will see what
bad opcodes will do.



Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:58       ` Richard B. Johnson
@ 2000-12-08  2:04         ` Peter Samuelson
  2000-12-08 16:36           ` Matthew Vanecek
  2000-12-08 19:36           ` Dr. Kelsey Hudson
  0 siblings, 2 replies; 72+ messages in thread
From: Peter Samuelson @ 2000-12-08  2:04 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Rainer Mager, linux-kernel


[Dick Johnson]
> Do:
> 
> char main[]={0xff,0xff,0xff,0xff};

Oh come on, at least pick an *interesting* invalid opcode:

  char main[]={0xf0,0x0f,0xc0,0xc8};	/* try also on NT (: */

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11
  2000-12-08  1:09       ` Michel LESPINASSE
@ 2000-12-08  2:14         ` Rainer Mager
  0 siblings, 0 replies; 72+ messages in thread
From: Rainer Mager @ 2000-12-08  2:14 UTC (permalink / raw)
  To: linux-kernel

Hi all,

	Thanks for all the input so far. Regarding this...

> (I'm not sure exactly what cerberos does, do you have a link for it ?).

The official name is "Cerberus Test Control System" aka CTCS. I don't know
the official site but a search for this should reveal something. Anyway it
is a pretty comprehensive test that includes multiple kernel compiles,
memory tests, disk test, etc, etc. Like I said, I ran this for more than 15
hours with no problems.

Well, actually, I did notice that if I run CTCS from within X then it
freezes up after a few minutes. This appears to happen when/because of
extreme swapping.


Aside from the above I've also run repeated kernel compiles (more than 50
times) with 'make -j bzImage' and had no problems; all outputs were
identical.

So given these tests, I'm reasonably confident the core hardware is ok. I
suppose it is possible there's some iffy bits in the G400's VRAM (but
wouldn't that just result in screen artifacts?). I will admit that I have't
yet tried swapping RAM or any other system components.


Any other ideas?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:24         ` Jeff V. Merkey
  2000-12-08  1:40           ` Andi Kleen
@ 2000-12-08  2:28           ` davej
  2000-12-08  3:13             ` Jeff V. Merkey
                               ` (2 more replies)
  1 sibling, 3 replies; 72+ messages in thread
From: davej @ 2000-12-08  2:28 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Rainer Mager, Linux Kernel Mailing List

On Thu, 7 Dec 2000, Jeff V. Merkey wrote:

> It's related to some change in 2.4 vs. 2.2.  There are other programs
> affected other than X, SSH also get's spurious signal 11's now and again
> with 2.4 and glibc <= 2.1 and it does not occur on 2.2.

<AOL>

I've begun to get a bit paranoid about my K6-2 500 box.

Various processes have been getting random signals after heavy CPU usage.
Playing an MPEG movie, kernel compile, or even just some small apps
compiling sometimes. Just for the record, this isn't an OOM situation,
I've watched this box with half its memory free or in buffers left
unattended, and suddenly a compile will just die.

I replaced the CPU with a brand new K6-2. Problem remained.
Next suspect was faulty RAM. Despite having passed a memtest, I
swapped out the DIMMs for some known good ones.
Suspecting cooling problems, I added some case fans.
Next came a bigger power supply. Still the problems.
The latest last ditch attempt to make this box stable has been
to attach the biggest fan I could find that would fit a socket 7 CPU.

And still the problems are there.
The only remaining suspect would be a flaky motherboard.
But then comes the real killer : This box is rock solid under 2.2

*boggle*

I'm not sure exactly when this started, but I think I first noticed
it around test5 or so, but didn't suspect the kernel at the time.

I've tried kernels compiled with everything from 2.91.66 when this
was a Redhat box, to gcc 2.95.2 (from Debian woody) when I installed
debian on it.  If this is a compiler bug, it's one that no compiler
I've tried seems to be immune from.

regards,

Davej.

-- 
| Dave Jones <davej@suse.de>  http://www.suse.de/~davej
| SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  2:28           ` davej
@ 2000-12-08  3:13             ` Jeff V. Merkey
  2000-12-08  3:25               ` davej
  2000-12-08 13:52             ` Alan Cox
  2000-12-15  0:11             ` lamont
  2 siblings, 1 reply; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-08  3:13 UTC (permalink / raw)
  To: davej; +Cc: Rainer Mager, Linux Kernel Mailing List


Dave,

I think there may be a case when a process forks, that the MMU or some
other subsystem is either not setting the page bits correctly, or
mapping in a bad page.  It's a LEVEL I bug in 2.4 is this is the case,
BTW.  In core dumps (I've looked at 2 of them from SSH) it barfs right
after executing fork() or one of the exec functions and at some places
in the code where there's not any obvious coding bugs.  Looks like some
type of mapping problem.  I reported it three months ago, but it was
pretty much ignored.

Linus needs to add this one to the pre-12 list -- looks like some type
of mapping bug.

Jeff

davej@suse.de wrote:
> 
> On Thu, 7 Dec 2000, Jeff V. Merkey wrote:
> 
> > It's related to some change in 2.4 vs. 2.2.  There are other programs
> > affected other than X, SSH also get's spurious signal 11's now and again
> > with 2.4 and glibc <= 2.1 and it does not occur on 2.2.
> 
> <AOL>
> 
> I've begun to get a bit paranoid about my K6-2 500 box.
> 
> Various processes have been getting random signals after heavy CPU usage.
> Playing an MPEG movie, kernel compile, or even just some small apps
> compiling sometimes. Just for the record, this isn't an OOM situation,
> I've watched this box with half its memory free or in buffers left
> unattended, and suddenly a compile will just die.
> 
> I replaced the CPU with a brand new K6-2. Problem remained.
> Next suspect was faulty RAM. Despite having passed a memtest, I
> swapped out the DIMMs for some known good ones.
> Suspecting cooling problems, I added some case fans.
> Next came a bigger power supply. Still the problems.
> The latest last ditch attempt to make this box stable has been
> to attach the biggest fan I could find that would fit a socket 7 CPU.
> 
> And still the problems are there.
> The only remaining suspect would be a flaky motherboard.
> But then comes the real killer : This box is rock solid under 2.2
> 
> *boggle*
> 
> I'm not sure exactly when this started, but I think I first noticed
> it around test5 or so, but didn't suspect the kernel at the time.
> 
> I've tried kernels compiled with everything from 2.91.66 when this
> was a Redhat box, to gcc 2.95.2 (from Debian woody) when I installed
> debian on it.  If this is a compiler bug, it's one that no compiler
> I've tried seems to be immune from.
> 
> regards,
> 
> Davej.
> 
> --
> | Dave Jones <davej@suse.de>  http://www.suse.de/~davej
> | SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  3:13             ` Jeff V. Merkey
@ 2000-12-08  3:25               ` davej
  2000-12-08 16:44                 ` Matthew Vanecek
  2000-12-08 19:43                 ` Dr. Kelsey Hudson
  0 siblings, 2 replies; 72+ messages in thread
From: davej @ 2000-12-08  3:25 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Rainer Mager, Linux Kernel Mailing List

On Thu, 7 Dec 2000, Jeff V. Merkey wrote:

> I think there may be a case when a process forks, that the MMU or some
> other subsystem is either not setting the page bits correctly, or
> mapping in a bad page.  It's a LEVEL I bug in 2.4 is this is the case,
> BTW.  In core dumps (I've looked at 2 of them from SSH) it barfs right
> after executing fork() or one of the exec functions and at some places
> in the code where there's not any obvious coding bugs.  Looks like some
> type of mapping problem.  I reported it three months ago, but it was
> pretty much ignored.
> 
> Linus needs to add this one to the pre-12 list -- looks like some type
> of mapping bug.

Now that you mention it, every app that has bombed has been the type
that forks a lot. MpegTV, gtv, and make spring to mind. All apps drive
the CPU load up quite a lot, which was why I initially suspected
overheating. I don't see it on my other 2.4 boxes though which is
suspicious. But they don't get as much of a beating as this, which was
up until last week my main workstation.

regards,

Dave.

-- 
| Dave Jones <davej@suse.de>  http://www.suse.de/~davej
| SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  0:44     ` Signal 11 Rainer Mager
                         ` (3 preceding siblings ...)
  2000-12-08  1:58       ` Richard B. Johnson
@ 2000-12-08  9:46       ` David Woodhouse
  2000-12-08 14:06         ` Alan Cox
                           ` (2 more replies)
  4 siblings, 3 replies; 72+ messages in thread
From: David Woodhouse @ 2000-12-08  9:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Rainer Mager, linux-kernel, Mark Vojkovich


ak@suse.de said:
>  Sounds like a X Server bug. You should probably contact XFree86, not
> linux-kernel

I quote from the X devel list, which perhaps I shouldn't do but this is hardly 
NDA'd stuff:

On Mon 20 Nov 2000, mvojkovich@valinux.com said:
>   I have seen random crashes on dual P3 BX boards (Tyan) and dual Xeon
> GX boards (Intel).  XFree86 core dumps indicate that it happens in
> random places, in old as dirt software rendering code that has nothing
> wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> would say that this is definitely a kernel problem. 

XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
kernels - even on my BP6¹. The random crashes started to happen when I
upgraded my distribution² - and are only seen by people using 2.4. So I
suspect that it's the combination of glibc and kernel which is triggering
it.

--
dwmw2

¹ And the BP6 still falls over less frequently than the dual P3 I use at 
work.
² RH7. Don't start.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-07 23:41   ` Alan Cox
@ 2000-12-08  9:47     ` Willy Tarreau
  2000-12-08 14:08       ` Alan Cox
  2000-12-08 18:12       ` Philipp Rumpf
  0 siblings, 2 replies; 72+ messages in thread
From: Willy Tarreau @ 2000-12-08  9:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: Miquel van Smoorenburg, linux-kernel

> I asked people to explain why it was needed. I am still waiting. It is a
> patch that does nothing. I will not put random deep magic into the
> kernel.

Alan, I replied to you a few weeks ago (pre20 times) when you asked me why
I was sending you this patch. (perhaps you didn't receive my email). What I 
observed was that my netraid card had a 0xXXXX8 base address and the patch
aligned that address to 16 bytes :

|Bus  0, device   2, function  1:
|  Unknown class: Intel OEM MegaRAID Controller (rev 5).
|    Medium devsel.  Fast back-to-back capable.  BIST capable.  IRQ 10.  Master
Capable.  Latency=64.  
|    Prefetchable 32 bit memory at 0xf0000000 [0xf0000008].

as you see, the board is found at 0xf0000008, but used aligned to 0xf0000000.

my server currently works with that patch, but I'm sure it won't boot anymore
if I apply this 2.2.18pre25 alone. 

just in case, here it is again.

Cheers,
Willy

--- 18pre/drivers/scsi/megaraid.c       Wed Nov  8 16:02:45 2000
+++ 18pre+megaraid/drivers/scsi/megaraid.c      Fri Nov 10 12:03:05 2000
@@ -1920,10 +1920,14 @@
 
     pciIdx++;
 
-    if (flag & BOARD_QUARTZ)
+    if (flag & BOARD_QUARTZ) {
+       megaBase &= PCI_BASE_ADDRESS_IO_MASK;
        megaBase = (long) ioremap (megaBase, 128);
-    else
+    }
+    else {
+       megaBase &= PCI_BASE_ADDRESS_MEM_MASK;
        megaBase += 0x10;
+    }
 
     /* Initialize SCSI Host structure */
     host = scsi_register (pHostTmpl, sizeof (mega_host_config));

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  2:28           ` davej
  2000-12-08  3:13             ` Jeff V. Merkey
@ 2000-12-08 13:52             ` Alan Cox
  2000-12-15  0:11             ` lamont
  2 siblings, 0 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-08 13:52 UTC (permalink / raw)
  To: davej; +Cc: Jeff V. Merkey, Rainer Mager, Linux Kernel Mailing List

> Various processes have been getting random signals after heavy CPU usage.
> Playing an MPEG movie, kernel compile, or even just some small apps
> compiling sometimes. Just for the record, this isn't an OOM situation,
> I've watched this box with half its memory free or in buffers left
> unattended, and suddenly a compile will just die.

This is consistent with page cache corruption in memory. We definitely had
that in older 2.4test kernels. I saw this building stuff on Linux parisc
and it was because some page of gcc had randomly decided to become something
different. Since that was test6 I didnt figure it important 8)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  9:46       ` David Woodhouse
@ 2000-12-08 14:06         ` Alan Cox
  2000-12-09 19:01           ` Matthew Vanecek
  2000-12-11  0:58           ` Signal 11 Rainer Mager
  2000-12-08 16:21         ` Horst von Brand
  2000-12-08 19:34         ` Mark Vojkovich
  2 siblings, 2 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-08 14:06 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andi Kleen, Rainer Mager, linux-kernel, Mark Vojkovich

> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem.=20
> 
> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6=B9. The random crashes started to happen when =
> I
> upgraded my distribution=B2 - and are only seen by people using 2.4. So=
>  I
> suspect that it's the combination of glibc and kernel which is triggeri=
> ng
> it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  9:47     ` Willy Tarreau
@ 2000-12-08 14:08       ` Alan Cox
  2000-12-08 16:07         ` Miquel van Smoorenburg
  2000-12-08 18:12       ` Philipp Rumpf
  1 sibling, 1 reply; 72+ messages in thread
From: Alan Cox @ 2000-12-08 14:08 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Alan Cox, Miquel van Smoorenburg, linux-kernel

> my server currently works with that patch, but I'm sure it won't boot anymore
> if I apply this 2.2.18pre25 alone. 

Some days I don't know why I bother

> just in case, here it is again.

It doesnt even apply

> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 14:08       ` Alan Cox
@ 2000-12-08 16:07         ` Miquel van Smoorenburg
  2000-12-08 17:08           ` Alan Cox
  0 siblings, 1 reply; 72+ messages in thread
From: Miquel van Smoorenburg @ 2000-12-08 16:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Willy Tarreau, linux-kernel

According to Alan Cox:
> > my server currently works with that patch, but I'm sure it won't boot anymore
> > if I apply this 2.2.18pre25 alone. 
> 
> Some days I don't know why I bother

Bad day, Alan? ;)

> > just in case, here it is again.
> It doesnt even apply

Hmm, it did apply for me. Do newer versions of patch have the -l option
on by default?

Anyway. I just threw together a testmachine with a megaraid card.
With 2.2.18pre18, it doesn't boot. With 2.2.18pre18 + Willy's patch,
it does boot.

And with 2.2.18pre25 without any extra patches, it magically works.

So I took the plunge and compiled 2.2.18pre25 on the production
machine with the megaraid. And well, it's coming up as I write this.

I see that another patch _has_ been applied between pre18 and pre25
that tooks out some forward/backwards-compat logic with LINUX_VERSION_CODE
magic (beneath /* Read the base port and IRQ from PCI */). And
reading the patch, it makes sense. It probably does about the same
as Willy's patch, but the "right" way by using pci_resource_start()
which the one in pre18 only did for kernels > 2.3.0

So, it looks like pre25 has a working megaraid driver. Thanks Alan.

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  9:46       ` David Woodhouse
  2000-12-08 14:06         ` Alan Cox
@ 2000-12-08 16:21         ` Horst von Brand
  2000-12-08 19:34         ` Mark Vojkovich
  2 siblings, 0 replies; 72+ messages in thread
From: Horst von Brand @ 2000-12-08 16:21 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andi Kleen, Rainer Mager, linux-kernel, Mark Vojkovich

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

David Woodhouse <dwmw2@infradead.org> said:

[...]

> I quote from the X devel list, which perhaps I shouldn't do but this is
> hardly NDA'd stuff:

> On Mon 20 Nov 2000, mvojkovich@valinux.com said:
> >   I have seen random crashes on dual P3 BX boards (Tyan) and dual Xeon
> > GX boards (Intel).  XFree86 core dumps indicate that it happens in
> > random places, in old as dirt software rendering code that has nothing
> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem. 

> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6¹. The random crashes started to happen when I
> upgraded my distribution² - and are only seen by people using 2.4. So I
> suspect that it's the combination of glibc and kernel which is triggering
> it.

I get regular segfaults and random lockups trying to build CVS GCCs and
kernels since I updated RH 7 to glibc-2.2-5. P3, sr440bx mobo (UP),
2.2.18preX kernels; previously rock solid. Might be that the mains voltage
here tends to be out of whack, but I doubt it.
-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Vin~a del Mar, Chile                               +56 32 672616

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  2:04         ` Peter Samuelson
@ 2000-12-08 16:36           ` Matthew Vanecek
  2000-12-08 16:49             ` Richard B. Johnson
  2000-12-08 19:36           ` Dr. Kelsey Hudson
  1 sibling, 1 reply; 72+ messages in thread
From: Matthew Vanecek @ 2000-12-08 16:36 UTC (permalink / raw)
  To: Peter Samuelson; +Cc: Richard B. Johnson, Rainer Mager, linux-kernel

Peter Samuelson wrote:
> 
> [Dick Johnson]
> > Do:
> >
> > char main[]={0xff,0xff,0xff,0xff};
> 
> Oh come on, at least pick an *interesting* invalid opcode:
> 
>   char main[]={0xf0,0x0f,0xc0,0xc8};    /* try also on NT (: */
> 

me2v@reliant DRFDecoder $ ./op
Illegal instruction (core dumped)

Is that the expected behavior?

-- 
Matthew Vanecek
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
********************************************************************************
For 93 million miles, there is nothing between the sun and my shadow
except me.
I'm always getting in the way of something...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  3:25               ` davej
@ 2000-12-08 16:44                 ` Matthew Vanecek
  2000-12-08 19:43                 ` Dr. Kelsey Hudson
  1 sibling, 0 replies; 72+ messages in thread
From: Matthew Vanecek @ 2000-12-08 16:44 UTC (permalink / raw)
  To: Linux Kernel Mailing List

davej@suse.de wrote:
> 
> On Thu, 7 Dec 2000, Jeff V. Merkey wrote:
> 
> > I think there may be a case when a process forks, that the MMU or some
> > other subsystem is either not setting the page bits correctly, or
> > mapping in a bad page.  It's a LEVEL I bug in 2.4 is this is the case,
> > BTW.  In core dumps (I've looked at 2 of them from SSH) it barfs right
> > after executing fork() or one of the exec functions and at some places
> > in the code where there's not any obvious coding bugs.  Looks like some
> > type of mapping problem.  I reported it three months ago, but it was
> > pretty much ignored.
> >
> > Linus needs to add this one to the pre-12 list -- looks like some type
> > of mapping bug.
> 
> Now that you mention it, every app that has bombed has been the type
> that forks a lot. MpegTV, gtv, and make spring to mind. All apps drive
> the CPU load up quite a lot, which was why I initially suspected
> overheating. I don't see it on my other 2.4 boxes though which is
> suspicious. But they don't get as much of a beating as this, which was
> up until last week my main workstation.
> 
> regards,
> 
> Dave.
> 

I've noticed the same problem, and it occasionally happens with XFree86
4.0.1, as well.  Hopefully we've established that this is not the
hardware issue which gcc people of so fond of pushing sig 11s on (even
in the face of overwhelming evidence to the contrary).  It would be good
to have this put on a current to-do list and looked into.

-- 
Matthew Vanecek
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
********************************************************************************
For 93 million miles, there is nothing between the sun and my shadow
except me.
I'm always getting in the way of something...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08 16:36           ` Matthew Vanecek
@ 2000-12-08 16:49             ` Richard B. Johnson
  2000-12-08 17:40               ` Peter Samuelson
  0 siblings, 1 reply; 72+ messages in thread
From: Richard B. Johnson @ 2000-12-08 16:49 UTC (permalink / raw)
  To: Matthew Vanecek; +Cc: Peter Samuelson, Rainer Mager, linux-kernel

On Fri, 8 Dec 2000, Matthew Vanecek wrote:

> Peter Samuelson wrote:
> > 
> > [Dick Johnson]
> > > Do:
> > >
> > > char main[]={0xff,0xff,0xff,0xff};
> > 
> > Oh come on, at least pick an *interesting* invalid opcode:
> > 
> >   char main[]={0xf0,0x0f,0xc0,0xc8};    /* try also on NT (: */
> > 
> 
> me2v@reliant DRFDecoder $ ./op
> Illegal instruction (core dumped)
> 
> Is that the expected behavior?

Yep. And on early Pentinums, the ones with the "f00f" bug, it
would lock the machine tighter than a witches crotch. Ooops,
not politically correct.... It would allow user-mode code
to halt the machine.

Here is code that just quietly returns to the runtime code
that called it:

char main[]={0x90, 0x90, 0xc3};

FYI, if the .data section was not executable, you couldn't do
this. You would have to use some __asm__ stuff to put it in
the .text section. But, this is an interesting example of
how you can create code that the compiler refuses to generate.

It's easier to use assembly, though.....

Cheers,
Dick Johnson

Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  0:20 ` Andrea Arcangeli
  2000-12-08  0:27   ` Alan Cox
@ 2000-12-08 17:02   ` Martin Kacer
  2000-12-08 17:20     ` Alan Cox
  2000-12-08 18:08     ` Andrea Arcangeli
  1 sibling, 2 replies; 72+ messages in thread
From: Martin Kacer @ 2000-12-08 17:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox, Andrea Arcangeli

On Thu, 7 Dec 2000, Alan Cox wrote:

# Ok we believe the VM crash looping printing error messages is now fixed.
# Marcelo finally figured it out and my 8Mb 486 has been running 2.2.18pre
# with that fix and stably[1].

   Unfortunately, I don't think it is fixed. We maintain a heavy loaded
FTP/Samba server here (120+ active connections with very long data
transfers in rush hours) and it had the "VM: do_try_to_free_pages failed"
problem since 2.2.17 was first installed (there was FreeBSD before that).

   We aplied 2.2.18pre25 patch yesterday hoping it could solve it. The
only difference is that the server reached several hours uptime instead of
40 minutes (with pre24). After two hours of load between 6.00 and 15.00
the console was flooded with those unpopular messages ("VM: ..."). The
system was taken down by generation of these messages so quickly, that
even none of the messages appeared in syslog! No response to Ctrl-Alt-Del,
of course... :-( Just trashing...


On Fri, 8 Dec 2000, Andrea Arcangeli wrote:

# > Ok we believe the VM crash looping printing error messages is now fixed.
# Such bug can't generate crashes. Did you ever reproduced crashes on your 8Mb
# 486 with 2.2.18pre24?

   Our bug can generate them. :-( Maybe it's a different one? ;-)


   Is there any chance to get rid of these VMM failures?

   Sorry if I've missed something important recently mentioned here. I had
not enough time to follow the lk list carefully. Is there any reliable
solution?

   It seems we need to return back to 2.2.13 for some time. :-(
   Martin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 16:07         ` Miquel van Smoorenburg
@ 2000-12-08 17:08           ` Alan Cox
  0 siblings, 0 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-08 17:08 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: Alan Cox, Willy Tarreau, linux-kernel

> > Some days I don't know why I bother
> Bad day, Alan? ;)

Umm no but having people _keep_ sending you do nothing patches gets
annoying after a while ;)

> reading the patch, it makes sense. It probably does about the same
> as Willy's patch, but the "right" way by using pci_resource_start()
> which the one in pre18 only did for kernels > 2.3.0

I suspect what actually happened is that someone fixed pci_resource_start()
looking over the change set, and that fixed the megaraid driver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 17:02   ` Linux 2.2.18pre25 Martin Kacer
@ 2000-12-08 17:20     ` Alan Cox
  2000-12-08 17:36       ` Martin Kacer
  2000-12-08 18:08     ` Andrea Arcangeli
  1 sibling, 1 reply; 72+ messages in thread
From: Alan Cox @ 2000-12-08 17:20 UTC (permalink / raw)
  To: Martin Kacer; +Cc: linux-kernel, Alan Cox, Andrea Arcangeli

>    We aplied 2.2.18pre25 patch yesterday hoping it could solve it. The
> only difference is that the server reached several hours uptime instead of
> 40 minutes (with pre24). After two hours of load between 6.00 and 15.00
> the console was flooded with those unpopular messages ("VM: ..."). The
> system was taken down by generation of these messages so quickly, that
> even none of the messages appeared in syslog! No response to Ctrl-Alt-Del,
> of course... :-( Just trashing...
> 
>    Our bug can generate them. :-( Maybe it's a different one? ;-)

Quite possibly.

>    Is there any chance to get rid of these VMM failures?

By finding them. Are you confident you are not running out of memory. 
Presumably since 2.2.13 works you are 8)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 17:20     ` Alan Cox
@ 2000-12-08 17:36       ` Martin Kacer
  0 siblings, 0 replies; 72+ messages in thread
From: Martin Kacer @ 2000-12-08 17:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Andrea Arcangeli

On Fri, 8 Dec 2000, Alan Cox wrote:

# >    Is there any chance to get rid of these VMM failures?
# By finding them.

   :-) I am not so familiar with MM in Linux. :^(
   And do not have enough time for intensive study...
   Although I would probably like that work...

# Are you confident you are not running out of memory.

   Well, almost sure. This is the log with load records:

                                                   (according to /proc/meminfo)
                              FTPusers SMBusr load      free mem    free swap
Fri Dec  8 14:35:05 CET 2000      61      35 6.17        3068 kB     128932 kB
Fri Dec  8 14:40:04 CET 2000      59      36 5.05        2280 kB     130320 kB
Fri Dec  8 14:45:03 CET 2000      59      36 5.97        2896 kB     131448 kB
Fri Dec  8 14:50:03 CET 2000      59      35 6.59        2908 kB     133140 kB
Fri Dec  8 14:55:04 CET 2000      53      36 8.82        2380 kB     133952 kB
Fri Dec  8 15:00:03 CET 2000      53      40 6.42        2728 kB     135064 kB
Fri Dec  8 15:05:03 CET 2000      48      39 5.47        2264 kB     135684 kB
Fri Dec  8 15:10:03 CET 2000      48      41 3.90        3204 kB     135928 kB
Fri Dec  8 15:15:03 CET 2000      51      41 5.93        2628 kB     135700 kB
Fri Dec  8 15:20:03 CET 2000      50      45 6.50        2124 kB     135828 kB
Fri Dec  8 15:25:03 CET 2000      56      44 7.92        2192 kB     136080 kB
Fri Dec  8 15:30:03 CET 2000      49      45 10.89        2072 kB     136176 kB
Fri Dec  8 15:35:03 CET 2000      51      42 6.32        2960 kB     136156 kB
Fri Dec  8 15:40:04 CET 2000      54      44 6.92        2364 kB     136220 kB
Fri Dec  8 15:45:03 CET 2000      54      44 6.63        2852 kB     136348 kB
Fri Dec  8 15:50:04 CET 2000      53      46 3.63        2248 kB     136420 kB
Fri Dec  8 15:55:03 CET 2000      59      48 6.51        3060 kB     136312 kB
(crashed during the next 5 minutes)

   Doesn't seem to have consumed all of swap space.
   I will try to determine more info the next time - I promise...

# Presumably since 2.2.13 works you are 8)

   I didn't tell it worked. It had worked a long time ago.
   It is still not tested now. Unfortunately, due to the absence of raid0
module the bootup process destroyed our 140GB partition. It will take some
time to make the system running again. :-(

   Thank for your answer anyway...
   Martin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08 16:49             ` Richard B. Johnson
@ 2000-12-08 17:40               ` Peter Samuelson
  0 siblings, 0 replies; 72+ messages in thread
From: Peter Samuelson @ 2000-12-08 17:40 UTC (permalink / raw)
  To: root; +Cc: Matthew Vanecek, Rainer Mager, linux-kernel


[Dick Johnson]
> > >   char main[]={0xf0,0x0f,0xc0,0xc8};    /* try also on NT (: */
> > me2v@reliant DRFDecoder $ ./op
> > Illegal instruction (core dumped)
> 
> Yep. And on early Pentinums, the ones with the "f00f" bug, it would
> lock the machine tighter than a witches crotch. Ooops, not
> politically correct.... It would allow user-mode code to halt the
> machine.

...Until Linux 2.0.34 or so (can't remember the exact version number)
which had the workaround for this bug, about a week after the bug was
discovered.

And I was reminded in private mail that the correct lockup sequence is
actually

  char main[]={0xf0,0x0f,0xc7,0xc8};

where the 0xc8 can be anything from 0xc8 to 0xcf.

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 17:02   ` Linux 2.2.18pre25 Martin Kacer
  2000-12-08 17:20     ` Alan Cox
@ 2000-12-08 18:08     ` Andrea Arcangeli
  2000-12-08 18:30       ` Martin Kacer
  1 sibling, 1 reply; 72+ messages in thread
From: Andrea Arcangeli @ 2000-12-08 18:08 UTC (permalink / raw)
  To: Martin Kacer; +Cc: linux-kernel, Alan Cox

On Fri, Dec 08, 2000 at 06:02:57PM +0100, Martin Kacer wrote:
>    Is there any chance to get rid of these VMM failures?

You should apply this patch on top of 2.2.18pre25:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre25/VM-global-2.2.18pre25-7.bz2

>    It seems we need to return back to 2.2.13 for some time. :-(

Definitely no, you only need to apply the above collection of bugfixes.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08  9:47     ` Willy Tarreau
  2000-12-08 14:08       ` Alan Cox
@ 2000-12-08 18:12       ` Philipp Rumpf
  1 sibling, 0 replies; 72+ messages in thread
From: Philipp Rumpf @ 2000-12-08 18:12 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Alan Cox, Miquel van Smoorenburg, linux-kernel

On Fri, Dec 08, 2000 at 10:47:46AM +0100, Willy Tarreau wrote:
> |Bus  0, device   2, function  1:
> |  Unknown class: Intel OEM MegaRAID Controller (rev 5).
> |    Medium devsel.  Fast back-to-back capable.  BIST capable.  IRQ 10.  Master
> Capable.  Latency=64.  
> |    Prefetchable 32 bit memory at 0xf0000000 [0xf0000008].
> 
> as you see, the board is found at 0xf0000008, but used aligned to 0xf0000000.

No.  It's found at 0xf0000000, and has 8 bytes of MMIO space.

> my server currently works with that patch, but I'm sure it won't boot anymore
> if I apply this 2.2.18pre25 alone. 

"I'm sure" meaning "I didn't test it" ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 18:08     ` Andrea Arcangeli
@ 2000-12-08 18:30       ` Martin Kacer
  2000-12-08 23:55         ` Alan Cox
  0 siblings, 1 reply; 72+ messages in thread
From: Martin Kacer @ 2000-12-08 18:30 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Alan Cox

On Fri, 8 Dec 2000, Andrea Arcangeli wrote:

# On Fri, Dec 08, 2000 at 06:02:57PM +0100, Martin Kacer wrote:
# >    Is there any chance to get rid of these VMM failures?
# You should apply this patch on top of 2.2.18pre25:
# ftp://.../VM-global-2.2.18pre25-7.bz2

   Well, I've found that VM-global patch before, of course. Until now, the
last version was against pre18. Since I do not know the exact rules for
including new things into Alan's tree, I thought that VM-global patch was
already included in pre24. Sorry for my lack of experience. ;-)) I should
have checked it.
   As I wrote before, I had no time recently to follow the mailing list
carefully and I didn't know exactly what VM-global patch is.

# >    It seems we need to return back to 2.2.13 for some time. :-(
# Definitely no, you only need to apply the above collection of bugfixes.

   Ok, I can try it, at least.
   I will let you know about results.

   Martin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  1:43             ` Jeff V. Merkey
  2000-12-08  1:55               ` Jeff V. Merkey
@ 2000-12-08 19:20               ` Dr. Kelsey Hudson
  1 sibling, 0 replies; 72+ messages in thread
From: Dr. Kelsey Hudson @ 2000-12-08 19:20 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Andi Kleen, Rainer Mager, linux-kernel

Don't post the core file... It's system-dependant and really wont do
anyone but yourself a shred of good.

On Thu, 7 Dec 2000, Jeff V. Merkey wrote:

> 
> 
> Andi Kleen wrote:
> > 
> > On Thu, Dec 07, 2000 at 06:24:34PM -0700, Jeff V. Merkey wrote:
> > >
> > > Andi,
> > >
> > > It's related to some change in 2.4 vs. 2.2.  There are other programs
> > > affected other than X, SSH also get's spurious signal 11's now and again
> > > with 2.4 and glibc <= 2.1 and it does not occur on 2.2.
> > 
> > So have you enabled core dumps and actually looked at the core dumps
> > of the programs using gdb to see where they crashed ?
> 
> Yes.  I can only get the SSH crash when I am running remotely from the
> house over the internet, and it only shows then when running a build in
> jobserver mode (parallel build).  The X problem seems related as well,
> since it's related to (usually) NetScape spawing off a forked process. 
> I will attempt to recreate tonight, and post the core dump file.  
> 
> Jeff 
> 
> 
> 
> 
> 
> > 
> > -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
> 

-- 
 Kelsey Hudson                                           khudson@ctica.com 
 Software Engineer
 Compendium Technologies, Inc                               (619) 725-0771
---------------------------------------------------------------------------     

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  9:46       ` David Woodhouse
  2000-12-08 14:06         ` Alan Cox
  2000-12-08 16:21         ` Horst von Brand
@ 2000-12-08 19:34         ` Mark Vojkovich
  2000-12-08 23:16           ` Jeff V. Merkey
  2 siblings, 1 reply; 72+ messages in thread
From: Mark Vojkovich @ 2000-12-08 19:34 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andi Kleen, Rainer Mager, linux-kernel



On Fri, 8 Dec 2000, David Woodhouse wrote:

>
> ak@suse.de said:
> >  Sounds like a X Server bug. You should probably contact XFree86, not
> > linux-kernel
>
> I quote from the X devel list, which perhaps I shouldn't do but this is hardly
> NDA'd stuff:
>
> On Mon 20 Nov 2000, mvojkovich@valinux.com said:
> >   I have seen random crashes on dual P3 BX boards (Tyan) and dual Xeon
> > GX boards (Intel).  XFree86 core dumps indicate that it happens in
> > random places, in old as dirt software rendering code that has nothing
> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem.
>
> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6¹. The random crashes started to happen when I
> upgraded my distribution² - and are only seen by people using 2.4. So I
> suspect that it's the combination of glibc and kernel which is triggering
> it.

   Some additional data points.  It goes away on UP 2.4 kernels.
Also, I can't recall seeing this problem on IA64.  Maybe it's still
there on IA64 and I just haven't been trying hard enough to crash
it, but my current impression is that the problem doesn't exist on IA64.

  Hmmm...  IA64 is a static server.  I don't hear of people having
problems on 3.3.6 servers either.  I'm wondering if a non-loader
4.0 server would have problems on IA32 with a 2.4 kernel.  That's
something for people to try.


				Mark.

>
> --
> dwmw2
>
> ¹ And the BP6 still falls over less frequently than the dual P3 I use at
> work.
> ² RH7. Don't start.
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  2:04         ` Peter Samuelson
  2000-12-08 16:36           ` Matthew Vanecek
@ 2000-12-08 19:36           ` Dr. Kelsey Hudson
  1 sibling, 0 replies; 72+ messages in thread
From: Dr. Kelsey Hudson @ 2000-12-08 19:36 UTC (permalink / raw)
  To: Peter Samuelson; +Cc: Richard B. Johnson, Rainer Mager, linux-kernel

On Thu, 7 Dec 2000, Peter Samuelson wrote:

> 
> [Dick Johnson]
> > Do:
> > 
> > char main[]={0xff,0xff,0xff,0xff};
> 
> Oh come on, at least pick an *interesting* invalid opcode:
> 
>   char main[]={0xf0,0x0f,0xc0,0xc8};	/* try also on NT (: */

What's funny, is that this actually executes on SPARC hardware, but
immediately segfaults. On Intel hardware though, you get a message similar
to:

zsh: illegal hardware instruction (core dumped)  a.out

I wrote relatively the same program in college. It exploits the F0 0F bug
found in early Pentium hardware.

 Kelsey Hudson                                           khudson@ctica.com 
 Software Engineer
 Compendium Technologies, Inc                               (619) 725-0771
---------------------------------------------------------------------------     

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  3:25               ` davej
  2000-12-08 16:44                 ` Matthew Vanecek
@ 2000-12-08 19:43                 ` Dr. Kelsey Hudson
  1 sibling, 0 replies; 72+ messages in thread
From: Dr. Kelsey Hudson @ 2000-12-08 19:43 UTC (permalink / raw)
  To: davej; +Cc: Jeff V. Merkey, Rainer Mager, Linux Kernel Mailing List

On Fri, 8 Dec 2000 davej@suse.de wrote:

> On Thu, 7 Dec 2000, Jeff V. Merkey wrote:
> 
> > I think there may be a case when a process forks, that the MMU or some
> > other subsystem is either not setting the page bits correctly, or
> > mapping in a bad page.  It's a LEVEL I bug in 2.4 is this is the case,
> > BTW.  In core dumps (I've looked at 2 of them from SSH) it barfs right
> > after executing fork() or one of the exec functions and at some places
> > in the code where there's not any obvious coding bugs.  Looks like some
> > type of mapping problem.  I reported it three months ago, but it was
> > pretty much ignored.
> > 
> > Linus needs to add this one to the pre-12 list -- looks like some type
> > of mapping bug.
> 
> Now that you mention it, every app that has bombed has been the type
> that forks a lot. MpegTV, gtv, and make spring to mind. All apps drive
> the CPU load up quite a lot, which was why I initially suspected
> overheating. I don't see it on my other 2.4 boxes though which is
> suspicious. But they don't get as much of a beating as this, which was
> up until last week my main workstation.

Just to add some input and insight on here, I loaded the system down with
some FFT algorithms, and then ran an 8-way kernel compile. The machine in
question is a dual P3/600 with 512MB RAM, 2.4.0-test11. The load
skyrocketed to a mere 13.6. xmms was still running, didn't skip even
once. The FFT algorithms didn't bitch at all. Neither did the kernel
compile. In fact, it compiled without a hitch...

I dunno what to say about these boxes that segfault all the
time... Probably just bad hardware somewhere along the lines.

 Kelsey Hudson                                           khudson@ctica.com 
 Software Engineer
 Compendium Technologies, Inc                               (619) 725-0771
---------------------------------------------------------------------------     

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08 23:16           ` Jeff V. Merkey
@ 2000-12-08 22:24             ` David Woodhouse
  2000-12-09  0:56               ` Jeff V. Merkey
  0 siblings, 1 reply; 72+ messages in thread
From: David Woodhouse @ 2000-12-08 22:24 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Mark Vojkovich, Andi Kleen, Rainer Mager, linux-kernel

On Fri, 8 Dec 2000, Jeff V. Merkey wrote:

> I have not seen it on UP systems either.  I only see it on SMP systems.
> After trying very hard last night, I was able to get my 4 x PPro system to
> do it with 2.4.0-12.  It seems related to loading in some way.  If you
> have more than two processors, the loading is less since there's more
> processors, and for whatever reason, it makes it harder to produce
> whatever race condition is causing it.  I can get it to happen
> pretty easily on a 2 x PII system.

Can you reproduce it with bcrl's patch below:

Index: mm/memory.c
===================================================================
RCS file: /net/passion/inst/cvs/linux/mm/memory.c,v
retrieving revision 1.2.2.40
diff -u -r1.2.2.40 memory.c
--- mm/memory.c	2000/12/05 13:33:39	1.2.2.40
+++ mm/memory.c	2000/12/08 22:24:09
@@ -860,6 +860,7 @@
 	/*
 	 * Ok, we need to copy. Oh, well..
 	 */
+	set_pte(page_table, pte);
 	spin_unlock(&mm->page_table_lock);
 	new_page = page_cache_alloc();
 	if (!new_page)
@@ -870,6 +871,12 @@
 	 * Re-check the pte - we dropped the lock
 	 */
 	if (pte_same(*page_table, pte)) {
+		/* We are changing the pte, so get rid of the old
+		 * one to avoid races with the hardware, this really
+		 * only affects the accessed bit here.
+		 */
+		pte = ptep_get_and_clear(page_table);
+
 		if (PageReserved(old_page))
 			++mm->rss;
 		break_cow(vma, old_page, new_page, address, page_table);
@@ -1216,12 +1223,14 @@
 		return do_swap_page(mm, vma, address, pte,
pte_to_swp_entry(entry), write_access);
 	}

+	entry = ptep_get_and_clear(pte);
 	if (write_access) {
 		if (!pte_write(entry))
 			return do_wp_page(mm, vma, address, pte, entry);

 		entry = pte_mkdirty(entry);
 	}
+
 	entry = pte_mkyoung(entry);
 	establish_pte(vma, address, pte, entry);
 	spin_unlock(&mm->page_table_lock);


-- 
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08 19:34         ` Mark Vojkovich
@ 2000-12-08 23:16           ` Jeff V. Merkey
  2000-12-08 22:24             ` David Woodhouse
  0 siblings, 1 reply; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-08 23:16 UTC (permalink / raw)
  To: Mark Vojkovich; +Cc: David Woodhouse, Andi Kleen, Rainer Mager, linux-kernel

On Fri, Dec 08, 2000 at 11:34:51AM -0800, Mark Vojkovich wrote:
> 
> 
> On Fri, 8 Dec 2000, David Woodhouse wrote:
> 
>    Some additional data points.  It goes away on UP 2.4 kernels.
> Also, I can't recall seeing this problem on IA64.  Maybe it's still
> there on IA64 and I just haven't been trying hard enough to crash
> it, but my current impression is that the problem doesn't exist on IA64.
> 
>   Hmmm...  IA64 is a static server.  I don't hear of people having
> problems on 3.3.6 servers either.  I'm wondering if a non-loader
> 4.0 server would have problems on IA32 with a 2.4 kernel.  That's
> something for people to try.
> 
> 
> 				Mark.


I have not seen it on UP systems either.  I only see it on SMP systems.  
After trying very hard last night, I was able to get my 4 x PPro system to 
do it with 2.4.0-12.  It seems related to loading in some way.  If you 
have more than two processors, the loading is less since there's more 
processors, and for whatever reason, it makes it harder to produce
whatever race condition is causing it.  I can get it to happen 
pretty easily on a 2 x PII system.

:-)

Jeff



> 
> >
> > --
> > dwmw2
> >
> > ¹ And the BP6 still falls over less frequently than the dual P3 I use at
> > work.
> > ² RH7. Don't start.
> >
> >
> >
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 18:30       ` Martin Kacer
@ 2000-12-08 23:55         ` Alan Cox
  0 siblings, 0 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-08 23:55 UTC (permalink / raw)
  To: Martin Kacer; +Cc: Andrea Arcangeli, linux-kernel, Alan Cox

>    Well, I've found that VM-global patch before, of course. Until now, the
> last version was against pre18. Since I do not know the exact rules for
> including new things into Alan's tree, I thought that VM-global patch was
> already included in pre24. Sorry for my lack of experience. ;-)) I should
> have checked it.
>    As I wrote before, I had no time recently to follow the mailing list
> carefully and I didn't know exactly what VM-global patch is.
> 
> # >    It seems we need to return back to 2.2.13 for some time. :-(
> # Definitely no, you only need to apply the above collection of bugfixes.
> 
>    Ok, I can try it, at least.
>    I will let you know about results.

VM-global is currently on my 2.2.19pre pile of stuff. Im monitoring a few
cases with interest before I commit to that decision however

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08 22:24             ` David Woodhouse
@ 2000-12-09  0:56               ` Jeff V. Merkey
  0 siblings, 0 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-09  0:56 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Mark Vojkovich, Andi Kleen, Rainer Mager, linux-kernel



I'll try.

Jeff


On Fri, Dec 08, 2000 at 10:24:55PM +0000, David Woodhouse wrote:
> On Fri, 8 Dec 2000, Jeff V. Merkey wrote:
> 
> > I have not seen it on UP systems either.  I only see it on SMP systems.
> > After trying very hard last night, I was able to get my 4 x PPro system to
> > do it with 2.4.0-12.  It seems related to loading in some way.  If you
> > have more than two processors, the loading is less since there's more
> > processors, and for whatever reason, it makes it harder to produce
> > whatever race condition is causing it.  I can get it to happen
> > pretty easily on a 2 x PII system.
> 
> Can you reproduce it with bcrl's patch below:
> 
> Index: mm/memory.c
> ===================================================================
> RCS file: /net/passion/inst/cvs/linux/mm/memory.c,v
> retrieving revision 1.2.2.40
> diff -u -r1.2.2.40 memory.c
> --- mm/memory.c	2000/12/05 13:33:39	1.2.2.40
> +++ mm/memory.c	2000/12/08 22:24:09
> @@ -860,6 +860,7 @@
>  	/*
>  	 * Ok, we need to copy. Oh, well..
>  	 */
> +	set_pte(page_table, pte);
>  	spin_unlock(&mm->page_table_lock);
>  	new_page = page_cache_alloc();
>  	if (!new_page)
> @@ -870,6 +871,12 @@
>  	 * Re-check the pte - we dropped the lock
>  	 */
>  	if (pte_same(*page_table, pte)) {
> +		/* We are changing the pte, so get rid of the old
> +		 * one to avoid races with the hardware, this really
> +		 * only affects the accessed bit here.
> +		 */
> +		pte = ptep_get_and_clear(page_table);
> +
>  		if (PageReserved(old_page))
>  			++mm->rss;
>  		break_cow(vma, old_page, new_page, address, page_table);
> @@ -1216,12 +1223,14 @@
>  		return do_swap_page(mm, vma, address, pte,
> pte_to_swp_entry(entry), write_access);
>  	}
> 
> +	entry = ptep_get_and_clear(pte);
>  	if (write_access) {
>  		if (!pte_write(entry))
>  			return do_wp_page(mm, vma, address, pte, entry);
> 
>  		entry = pte_mkdirty(entry);
>  	}
> +
>  	entry = pte_mkyoung(entry);
>  	establish_pte(vma, address, pte, entry);
>  	spin_unlock(&mm->page_table_lock);
> 
> 
> -- 
> dwmw2
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08 14:06         ` Alan Cox
@ 2000-12-09 19:01           ` Matthew Vanecek
  2000-12-09 19:20             ` davej
  2000-12-11  0:58           ` Signal 11 Rainer Mager
  1 sibling, 1 reply; 72+ messages in thread
From: Matthew Vanecek @ 2000-12-09 19:01 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:
> 
> > > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > > would say that this is definitely a kernel problem.=20
> >
> > XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> > kernels - even on my BP6=B9. The random crashes started to happen when =
> > I
> > upgraded my distribution=B2 - and are only seen by people using 2.4. So=
> >  I
> > suspect that it's the combination of glibc and kernel which is triggeri=
> > ng
> > it.
> 
> Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
> table updating race help ?
> 
> Alan

Where are his fixes at?  I don't seem to see any of his posts in the
archives.
-- 
Matthew Vanecek
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
********************************************************************************
For 93 million miles, there is nothing between the sun and my shadow
except me.
I'm always getting in the way of something...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-09 19:01           ` Matthew Vanecek
@ 2000-12-09 19:20             ` davej
  2000-12-09 23:31               ` Matthew Vanecek
  0 siblings, 1 reply; 72+ messages in thread
From: davej @ 2000-12-09 19:20 UTC (permalink / raw)
  To: Matthew Vanecek; +Cc: linux-kernel

On Sat, 9 Dec 2000, Matthew Vanecek wrote:

> > Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
> > table updating race help ?
> > Alan
> 
> Where are his fixes at?  I don't seem to see any of his posts in the
> archives.

dwmw2 posted one such patch earlier this week :-

http://www.lib.uaa.alaska.edu/linux-kernel/archive/2000-Week-49/0856.html

regards,

Davej.

-- 
| Dave Jones <davej@suse.de>  http://www.suse.de/~davej
| SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-09 19:20             ` davej
@ 2000-12-09 23:31               ` Matthew Vanecek
  2000-12-11  1:31                 ` OOPS when using 4GB memory setting Rainer Mager
  0 siblings, 1 reply; 72+ messages in thread
From: Matthew Vanecek @ 2000-12-09 23:31 UTC (permalink / raw)
  To: linux-kernel

davej@suse.de wrote:
> 
> On Sat, 9 Dec 2000, Matthew Vanecek wrote:
> 
> > > Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
> > > table updating race help ?
> > > Alan
> >
> > Where are his fixes at?  I don't seem to see any of his posts in the
> > archives.
> 
> dwmw2 posted one such patch earlier this week :-
> 
> http://www.lib.uaa.alaska.edu/linux-kernel/archive/2000-Week-49/0856.html
> 
> regards,
> 

I saw that.  I thought it was a patch to try to "reproduce it", as
opposed to fixing it.  Is it truly a fix, and is it applicable for UP
kernels?
-- 
Matthew Vanecek
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
********************************************************************************
For 93 million miles, there is nothing between the sun and my shadow
except me.
I'm always getting in the way of something...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11
  2000-12-08 14:06         ` Alan Cox
  2000-12-09 19:01           ` Matthew Vanecek
@ 2000-12-11  0:58           ` Rainer Mager
  2000-12-11  9:05             ` Rainer Mager
  1 sibling, 1 reply; 72+ messages in thread
From: Rainer Mager @ 2000-12-11  0:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

I just applied the said patch and will report my results. Note that I have
never been able to reliably, on-demand reproduce this so give me a few days
to see what happens.

--Rainer


-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: Friday, December 08, 2000 11:07 PM
To: David Woodhouse
Cc: Andi Kleen; Rainer Mager; linux-kernel@vger.kernel.org; Mark Vojkovich
Subject: Re: Signal 11


> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem.=20
>
> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6=B9. The random crashes started to happen when =
> I
> upgraded my distribution=B2 - and are only seen by people using 2.4. So=
>  I
> suspect that it's the combination of glibc and kernel which is triggeri=
> ng
> it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* OOPS when using 4GB memory setting
  2000-12-09 23:31               ` Matthew Vanecek
@ 2000-12-11  1:31                 ` Rainer Mager
  0 siblings, 0 replies; 72+ messages in thread
From: Rainer Mager @ 2000-12-11  1:31 UTC (permalink / raw)
  To: linux-kernel

Hi all,

	About 1 month back I reported a problem with getting OOPs when running with
a kernel compiled with the 4GB memory setting. Since then I've finally
managed to get the ksymoops results. Where should I post them?

	To review:

	My machine has 1GB RAM. If I build a 2.4.0test11 (or 8, 9, or 10 I haven't
tried earlier) kernel and chose the 1GB memory setting then only 900504 K is
detected (but everything runs stably). If I chose the 4GB memory setting
then the full 1 GB is detected but I get oops. I can reliably force an oops
by mounting a samba drive and then accessing it (via ls for example).


--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11
  2000-12-11  0:58           ` Signal 11 Rainer Mager
@ 2000-12-11  9:05             ` Rainer Mager
  2000-12-11 13:33               ` Mike Galbraith
  2000-12-11 14:14               ` Signal 11 davej
  0 siblings, 2 replies; 72+ messages in thread
From: Rainer Mager @ 2000-12-11  9:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Well, I just had a Signal 11 even with the patch. What can I do to help
figure this out?


Thanks,

--Rainer

-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: Friday, December 08, 2000 11:07 PM
To: David Woodhouse
Cc: Andi Kleen; Rainer Mager; linux-kernel@vger.kernel.org; Mark Vojkovich
Subject: Re: Signal 11


> > wrong with it.  I've only seen this under 2.3.x/2.4 SMP kernels.  I
> > would say that this is definitely a kernel problem.=20
>
> XFree86 3.9 and XFree86 4 were rock solid for a _long_ time on 2.[34]
> kernels - even on my BP6=B9. The random crashes started to happen when =
> I
> upgraded my distribution=B2 - and are only seen by people using 2.4. So=
>  I
> suspect that it's the combination of glibc and kernel which is triggeri=
> ng
> it.

Have any of the folks seeing it checked if Ben LaHaise's fixes for the page
table updating race help ?

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11
  2000-12-11  9:05             ` Rainer Mager
@ 2000-12-11 13:33               ` Mike Galbraith
  2000-12-11 23:24                 ` Rainer Mager
  2000-12-11 14:14               ` Signal 11 davej
  1 sibling, 1 reply; 72+ messages in thread
From: Mike Galbraith @ 2000-12-11 13:33 UTC (permalink / raw)
  To: Rainer Mager; +Cc: Alan Cox, linux-kernel

On Mon, 11 Dec 2000, Rainer Mager wrote:

> Well, I just had a Signal 11 even with the patch. What can I do to help
> figure this out?

Is init permanently running after you see a couple of these?

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11
  2000-12-11  9:05             ` Rainer Mager
  2000-12-11 13:33               ` Mike Galbraith
@ 2000-12-11 14:14               ` davej
  1 sibling, 0 replies; 72+ messages in thread
From: davej @ 2000-12-11 14:14 UTC (permalink / raw)
  To: Rainer Mager; +Cc: Alan Cox, Linux Kernel Mailing List, Linus Torvalds

On Mon, 11 Dec 2000, Rainer Mager wrote:

> Well, I just had a Signal 11 even with the patch. What can I do to help
> figure this out?

My troublesome box finally seems to be stable. It's been up for the
last two days whilst under quite heavy loads without problems.
Previously, it would be lucky to last an hour.
The change? I disabled DRM & AGPGart.
With them both disabled, I get no problems at all. No Sig11's,
No Sig4's, No lockups.

This box has a Voodoo3 3000 AGP..

01:00.0 VGA compatible controller: 3Dfx Interactive, Inc. Voodoo 3 (rev 01)

And is running on an MVP3 chipset....

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]

This box does display the same problem with IRQ routing that I've
got on my Athlon box...

PCI: Using IRQ router VIA [1106/0586] at 00:07.0
PCI: Assigned IRQ 11 for device 00:08.0
PCI: The same IRQ used for device 01:00.0
IRQ routing conflict in pirq table! Try 'pci=autoirq'

(00:08:0 is an SBLive)

A related problem ?
As I mentioned in an earlier mail `autoirq' is an unknown option.

The Athlon box has similar messages, but it happens with even
more devices..

They both do the same with the various PCI options 'nobios' etc,
and changing PnP OS in the BIOS makes no difference either.

regards,

Davej.

-- 
| Dave Jones <davej@suse.de>  http://www.suse.de/~davej
| SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11
  2000-12-11 13:33               ` Mike Galbraith
@ 2000-12-11 23:24                 ` Rainer Mager
  2000-12-13  0:22                   ` Signal 11 - the continuing saga Rainer Mager
  0 siblings, 1 reply; 72+ messages in thread
From: Rainer Mager @ 2000-12-11 23:24 UTC (permalink / raw)
  To: linux-kernel

(This message contains a number of related replies.)

> From: Mike Galbraith [mailto:mikeg@wen-online.de]
> Is init permanently running after you see a couple of these?

No, that is, after 23 hours up time it has used only 6 seconds CPU time
(according to top).

That reminds me that I should repeat that my signal 11 problem has (so far)
only caused X to die. The OS remains up and stable.


> From: davej@suse.de [mailto:davej@suse.de]
> My troublesome box finally seems to be stable.[...]I disabled DRM
> & AGPGart. With them both disabled, I get no problems at all.
> No Sig11's, No Sig4's, No lockups.
>
> This box has a Voodoo3 3000 AGP..

I suppose I can try this too. My box has a Matrox G400. BTW, what is DRM?
Direct Rendering something?


> From: CMA [mailto:cma@mclink.it]
> Did you already try to selectively disable L1 and L2 caches (if
> your box has both) and see what happens?

I'll look into this as well. Anyone have any pointers on how to do this? I
have a Tyan Tiger 133 with Award BIOS if this helps/matters.

Even if this setting does make a difference, what does this tell me/us? I
don't consider running the box with disabled cache(s) a viable solution.



Thanks all and keep those suggestions coming.

--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-11 23:24                 ` Rainer Mager
@ 2000-12-13  0:22                   ` Rainer Mager
  2000-12-13  2:17                     ` Jeff V. Merkey
  2000-12-13 12:10                     ` R: " CMA
  0 siblings, 2 replies; 72+ messages in thread
From: Rainer Mager @ 2000-12-13  0:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox

Hi again,

	Ok, I just upgraded to 2.4.0test12 (although I don't think there was any
work in 12 that directly addresses this signal 11 problem). When compiling
the new kernel I chose to disable AGPGart and RDM as suggested by
davej@suse.de. I will report later if this makes any difference.

	On another, possibly related note, I'm getting some really weird behavior
with a Java program. The only reason I mention it here is because it dies
with our old friend Signal 11. Anyway, please bear with the description
below.
	I have a tiny bash script that launches a Java swing app. If I run my
script from an xterm (or gnome-terminal or whatever) then it starts up fine.
If, however, I try to launch it from my gnome taskbar's menu then it dies
with signal 11 (the Java log is available upon request). This seems to be
100% consistent, since I noticed it yesterday, even across reboots.
Interestingly, the same behavior occurs if I try to run the program from
withis JBuilder 4.
	So, is this related to the larger signal 11 problems?


	What else can I do regarding these issues to help fix it? Would a core dump
help anyone? I'd really like to contribute somehow but I need some
direction.


--Rainer

> From: CMA [mailto:cma@mclink.it]
> Did you already try to selectively disable L1 and L2 caches (if
> your box has both) and see what happens?

Anyone know how to do this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  2:17                     ` Jeff V. Merkey
@ 2000-12-13  1:45                       ` Rainer Mager
  2000-12-13  4:29                         ` Mike Galbraith
  2000-12-13  3:17                       ` Linus Torvalds
  1 sibling, 1 reply; 72+ messages in thread
From: Rainer Mager @ 2000-12-13  1:45 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: linux-kernel, Alan Cox

Thanks for the info...

> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Jeff V. Merkey
> > 	So, is this related to the larger signal 11 problems?
>
> There's a corruption bug in the page cache somewhere, and it's 100%
> reproducable.  Finding it will be tough....

Ok, granted this will be tough but is anyone even actively working on it?
What can I do to help?



> > Anyone know how to do [disable L1 and L2 caches]?
>
> Usually this is performed in the BIOS setup.  You can also disable L1
> with a sequence of instructions that write to the CR0 register on intel
> and flip a bit, but in doing this you have to execute a WBINV (write
> back invalidate) instruction to flush out the cache.  BIOS setup is
> probably simpler.  Disabling Level I will make the machine slower
> than mollasses, BTW, and if this bug is race related (they always
> are) it won't help much in running it down.

Aha, just as I suspected. My BIOS doesn't appear to support this. You seem
to be saying that doing so won't really contribute anything anyway so I will
hold off for now.



--Rainer

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13  0:22                   ` Signal 11 - the continuing saga Rainer Mager
@ 2000-12-13  2:17                     ` Jeff V. Merkey
  2000-12-13  1:45                       ` Rainer Mager
  2000-12-13  3:17                       ` Linus Torvalds
  2000-12-13 12:10                     ` R: " CMA
  1 sibling, 2 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-13  2:17 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel, Alan Cox

On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote:
> Hi again,
> 
> 	Ok, I just upgraded to 2.4.0test12 (although I don't think there was any
> work in 12 that directly addresses this signal 11 problem). When compiling
> the new kernel I chose to disable AGPGart and RDM as suggested by
> davej@suse.de. I will report later if this makes any difference.
> 
> 	On another, possibly related note, I'm getting some really weird behavior
> with a Java program. The only reason I mention it here is because it dies
> with our old friend Signal 11. Anyway, please bear with the description
> below.
> 	I have a tiny bash script that launches a Java swing app. If I run my
> script from an xterm (or gnome-terminal or whatever) then it starts up fine.
> If, however, I try to launch it from my gnome taskbar's menu then it dies
> with signal 11 (the Java log is available upon request). This seems to be
> 100% consistent, since I noticed it yesterday, even across reboots.
> Interestingly, the same behavior occurs if I try to run the program from
> withis JBuilder 4.
> 	So, is this related to the larger signal 11 problems?

There's a corruption bug in the page cache somewhere, and it's 100%
reproducable.  Finding it will be tough....

> 
> 
> 	What else can I do regarding these issues to help fix it? Would a core dump
> help anyone? I'd really like to contribute somehow but I need some
> direction.
> 
> 
> --Rainer
> 
> > From: CMA [mailto:cma@mclink.it]
> > Did you already try to selectively disable L1 and L2 caches (if
> > your box has both) and see what happens?
> 
> Anyone know how to do this?

Usually this is performed in the BIOS setup.  You can also disable L1 
with a sequence of instructions that write to the CR0 register on intel
and flip a bit, but in doing this you have to execute a WBINV (write
back invalidate) instruction to flush out the cache.  BIOS setup is
probably simpler.  Disabling Level I will make the machine slower 
than mollasses, BTW, and if this bug is race related (they always 
are) it won't help much in running it down.

Jeff

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13  2:17                     ` Jeff V. Merkey
  2000-12-13  1:45                       ` Rainer Mager
@ 2000-12-13  3:17                       ` Linus Torvalds
  2000-12-13  9:34                         ` Rainer Mager
  2000-12-13 17:43                         ` Jeff V. Merkey
  1 sibling, 2 replies; 72+ messages in thread
From: Linus Torvalds @ 2000-12-13  3:17 UTC (permalink / raw)
  To: linux-kernel

In article <20001212191719.A12420@vger.timpanogas.org>,
Jeff V. Merkey <jmerkey@vger.timpanogas.org> wrote:
>On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote:
>> 	I have a tiny bash script that launches a Java swing app. If I run my
>> script from an xterm (or gnome-terminal or whatever) then it starts up fine.
>> If, however, I try to launch it from my gnome taskbar's menu then it dies
>> with signal 11 (the Java log is available upon request). This seems to be
>> 100% consistent, since I noticed it yesterday, even across reboots.
>> Interestingly, the same behavior occurs if I try to run the program from
>> withis JBuilder 4.
>> 	So, is this related to the larger signal 11 problems?
>
>There's a corruption bug in the page cache somewhere, and it's 100%
>reproducable.  Finding it will be tough....

Unlikely. If the actual program data was corrupted, it would SIGSEGV
regardless of how it's executed.

I'd guess that the program has a bug, and depending on the arguments and
environment (especially the latter will be different), it shows up or
not. Things like not having a LOCALE set in either case or similar.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  1:45                       ` Rainer Mager
@ 2000-12-13  4:29                         ` Mike Galbraith
  2000-12-13  9:34                           ` Rainer Mager
  0 siblings, 1 reply; 72+ messages in thread
From: Mike Galbraith @ 2000-12-13  4:29 UTC (permalink / raw)
  To: Rainer Mager; +Cc: Jeff V. Merkey, linux-kernel, Alan Cox

On Wed, 13 Dec 2000, Rainer Mager wrote:

> Thanks for the info...
> 
> > [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Jeff V. Merkey
> > > 	So, is this related to the larger signal 11 problems?
> >
> > There's a corruption bug in the page cache somewhere, and it's 100%
> > reproducable.  Finding it will be tough....
> 
> Ok, granted this will be tough but is anyone even actively working on it?
> What can I do to help?

If you want, I can extract IKD.. which happens to have a trap in place
for this (because I have a 100% reproducable swap related SIGSEGV that
I'm trying to figure out). 

If you're interested, let me know and I'll extract it (quite large) and
send it along instructions on how to do the trap.

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  4:29                         ` Mike Galbraith
@ 2000-12-13  9:34                           ` Rainer Mager
  2000-12-13 15:40                             ` Mike Galbraith
  0 siblings, 1 reply; 72+ messages in thread
From: Rainer Mager @ 2000-12-13  9:34 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, Alan Cox

Mike et al,

	I have no idea what IKD is and I don't know what to do with any results I
might find BUT I'd be happy to do this if it will help. Please pass on the
info with the instructions. Who should I report the results to?



--Rainer

> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Mike Galbraith
> If you want, I can extract IKD.. which happens to have a trap in place
> for this (because I have a 100% reproducable swap related SIGSEGV that
> I'm trying to figure out).
>
> If you're interested, let me know and I'll extract it (quite large) and
> send it along instructions on how to do the trap.
>
> 	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  3:17                       ` Linus Torvalds
@ 2000-12-13  9:34                         ` Rainer Mager
  2000-12-13 17:43                         ` Jeff V. Merkey
  1 sibling, 0 replies; 72+ messages in thread
From: Rainer Mager @ 2000-12-13  9:34 UTC (permalink / raw)
  To: Linus Torvalds, linux-kernel

Give that man a cigar....it was an env var (not LOCALE but LANG). I'd
actually checked this but I didn't think that made a difference in my case.

Thanks Linus, now can you fix the larger signal 11 problem?

--Rainer


> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Linus Torvalds
> I'd guess that the program has a bug, and depending on the arguments and
> environment (especially the latter will be different), it shows up or
> not. Things like not having a LOCALE set in either case or similar.
>
> 		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* R: Signal 11 - the continuing saga
  2000-12-13  0:22                   ` Signal 11 - the continuing saga Rainer Mager
  2000-12-13  2:17                     ` Jeff V. Merkey
@ 2000-12-13 12:10                     ` CMA
  1 sibling, 0 replies; 72+ messages in thread
From: CMA @ 2000-12-13 12:10 UTC (permalink / raw)
  To: 'Rainer Mager'; +Cc: linux-kernel

>> From: CMA [mailto:cma@mclink.it]
>> Did you already try to selectively disable L1 and L2 caches (if
>> your box has both) and see what happens?
>
>Anyone know how to do this?

If you own a p6 class machine (sorry but I didn't find your hw specs in
previous messages)
you should be able to enter setup and disable L1 and/or L2 usually in
"advanced setup".
If you disable L1, the machine will be *much* slower.
If you disable L2, you will notice it under heavy load.
Most of the times sig 11 is due L1 cache overheating (on chip). Just
controlling whether cpu cooling fan is properly seated and spinning solves
the problem.
Regards.
Dr. Eng. Mauro Tassinari
www.c-m-a.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* RE: Signal 11 - the continuing saga
  2000-12-13  9:34                           ` Rainer Mager
@ 2000-12-13 15:40                             ` Mike Galbraith
  0 siblings, 0 replies; 72+ messages in thread
From: Mike Galbraith @ 2000-12-13 15:40 UTC (permalink / raw)
  To: Rainer Mager; +Cc: linux-kernel, Alan Cox

On Wed, 13 Dec 2000, Rainer Mager wrote:

> Mike et al,
> 
> 	I have no idea what IKD is and I don't know what to do with any results I
> might find BUT I'd be happy to do this if it will help. Please pass on the
> info with the instructions. Who should I report the results to?

IKD is a debugging toolkit.  The trap I have set up freezes the kernel
trace buffer at SIGSEGV time.  From there you have to read it backward
looking for problems. (which isn't particularly easy).  I was thinking
you wanted to roll your shirt sleeves up and maybe this would help ;-)  

If you want it, and do a trace, I'b be very interested in the last
couple of schedules to compare to my traces.  It's not something you
can just run and report though.

	-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11 - the continuing saga
  2000-12-13  3:17                       ` Linus Torvalds
  2000-12-13  9:34                         ` Rainer Mager
@ 2000-12-13 17:43                         ` Jeff V. Merkey
  1 sibling, 0 replies; 72+ messages in thread
From: Jeff V. Merkey @ 2000-12-13 17:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Tue, Dec 12, 2000 at 07:17:41PM -0800, Linus Torvalds wrote:
> In article <20001212191719.A12420@vger.timpanogas.org>,
> Jeff V. Merkey <jmerkey@vger.timpanogas.org> wrote:
> >On Wed, Dec 13, 2000 at 09:22:55AM +0900, Rainer Mager wrote:
> >> 	I have a tiny bash script that launches a Java swing app. If I run my
> >> script from an xterm (or gnome-terminal or whatever) then it starts up fine.
> >> If, however, I try to launch it from my gnome taskbar's menu then it dies
> >> with signal 11 (the Java log is available upon request). This seems to be
> >> 100% consistent, since I noticed it yesterday, even across reboots.
> >> Interestingly, the same behavior occurs if I try to run the program from
> >> withis JBuilder 4.
> >> 	So, is this related to the larger signal 11 problems?
> >
> >There's a corruption bug in the page cache somewhere, and it's 100%
> >reproducable.  Finding it will be tough....
> 
> Unlikely. If the actual program data was corrupted, it would SIGSEGV
> regardless of how it's executed.
> 
> I'd guess that the program has a bug, and depending on the arguments and
> environment (especially the latter will be different), it shows up or
> not. Things like not having a LOCALE set in either case or similar.
> 
> 		Linus

Linus,

I agree that there may be some problem in the code above -- the question is
what has changed to make this behavior emerge?  I see it with a host of 
programs(ssh, make, netscape) -- true all are userspace.  Time permitting, 
I may attempt to track this down in ssh and make in jobserver mode.  It
may be related to some interaction that changed underneath.

Jeff


> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Signal 11
  2000-12-08  2:28           ` davej
  2000-12-08  3:13             ` Jeff V. Merkey
  2000-12-08 13:52             ` Alan Cox
@ 2000-12-15  0:11             ` lamont
  2 siblings, 0 replies; 72+ messages in thread
From: lamont @ 2000-12-15  0:11 UTC (permalink / raw)
  To: davej; +Cc: Linux Kernel Mailing List


I had tons of problems with K6III/450s in ASUS P5A motherboards with
various kinds of 128MB SIMMs.  There were multiple different symptoms,
including just sig11s on compiles, corrupted input (leading to syntax
error) in compiles, and corrupted input in the buffer cache (same crash
over and over, but dd if=/dev/hda of=/dev/null bs=1024k count=128 fixed
it).  Swapping the memory would sometimes get rid of the problem, but then
it would come back weeks-months later.

I saw a bizzare problem once in an Tyan dual proc PIII/500 box with
2x256MB ECC RAM that one of the ECC RAM sticks was bad and that repeated
kernel compiles would hang after about 24 hours.  Strange problem, but
found that in troubleshooting it, the problem followed this stick of RAM
around to different machines.  Blamed the RAM but don't understand what
the underlying problem was...

On Fri, 8 Dec 2000 davej@suse.de wrote:
> On Thu, 7 Dec 2000, Jeff V. Merkey wrote:
> 
> > It's related to some change in 2.4 vs. 2.2.  There are other programs
> > affected other than X, SSH also get's spurious signal 11's now and again
> > with 2.4 and glibc <= 2.1 and it does not occur on 2.2.
> 
> <AOL>
> 
> I've begun to get a bit paranoid about my K6-2 500 box.
> 
> Various processes have been getting random signals after heavy CPU usage.
> Playing an MPEG movie, kernel compile, or even just some small apps
> compiling sometimes. Just for the record, this isn't an OOM situation,
> I've watched this box with half its memory free or in buffers left
> unattended, and suddenly a compile will just die.
> 
> I replaced the CPU with a brand new K6-2. Problem remained.
> Next suspect was faulty RAM. Despite having passed a memtest, I
> swapped out the DIMMs for some known good ones.
> Suspecting cooling problems, I added some case fans.
> Next came a bigger power supply. Still the problems.
> The latest last ditch attempt to make this box stable has been
> to attach the biggest fan I could find that would fit a socket 7 CPU.
> 
> And still the problems are there.
> The only remaining suspect would be a flaky motherboard.
> But then comes the real killer : This box is rock solid under 2.2
> 
> *boggle*
> 
> I'm not sure exactly when this started, but I think I first noticed
> it around test5 or so, but didn't suspect the kernel at the time.
> 
> I've tried kernels compiled with everything from 2.91.66 when this
> was a Redhat box, to gcc 2.95.2 (from Debian woody) when I installed
> debian on it.  If this is a compiler bug, it's one that no compiler
> I've tried seems to be immune from.
> 
> regards,
> 
> Davej.
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
@ 2000-12-08 20:12 willy tarreau
  0 siblings, 0 replies; 72+ messages in thread
From: willy tarreau @ 2000-12-08 20:12 UTC (permalink / raw)
  To: Philipp Rumpf, Willy Tarreau
  Cc: Alan Cox, Miquel van Smoorenburg, linux-kernel

> "I'm sure" meaning "I didn't test it" ?

absolutely, I believed that the driver was *exactly*
the same as the previous release which didn't boot and
needed the fix, but another fix has been applied and
corrected it. Now I think it will work with a clean
2.2.18pre25. Anyway, I left a kernel compile behind me
this evening, so I'll confirm this on monday as soon
as
I can reboot the server on a pre25.

Cheers,
Willy


___________________________________________________________
Do You Yahoo!? -- Pour dialoguer en direct avec vos amis, 
Yahoo! Messenger : http://fr.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
@ 2000-12-08 20:03 willy tarreau
  0 siblings, 0 replies; 72+ messages in thread
From: willy tarreau @ 2000-12-08 20:03 UTC (permalink / raw)
  To: Alan Cox, Miquel van Smoorenburg; +Cc: Alan Cox, Willy Tarreau, linux-kernel

> > Bad day, Alan? ;)
> Umm no but having people _keep_ sending you do
> nothing patches gets annoying after a while ;)

Please accept all my apologies, Alan. When I quickly
sent you the last patch, I didn't notice that some
other broken code had been removed, what I discovered
later back home and after comparing 2.2.18pre2[15]
(what Miquel noticed too).

Next time, I'll spend a little more of my time on
carefully reading the patch before resending an old
useless one.

Cheers,
Willy


___________________________________________________________
Do You Yahoo!? -- Pour dialoguer en direct avec vos amis, 
Yahoo! Messenger : http://fr.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
  2000-12-08 15:16 willy tarreau
@ 2000-12-08 17:06 ` Alan Cox
  0 siblings, 0 replies; 72+ messages in thread
From: Alan Cox @ 2000-12-08 17:06 UTC (permalink / raw)
  To: willy tarreau
  Cc: Alan Cox, Willy Tarreau, Miquel van Smoorenburg, linux-kernel

> as soon as I can reboot it, I promise I will test the
> kernel with and without the patch to be really sure.
> but before that, if people who have problems with
> megaraid/netraid could give it a try, that would be
> cool. Also, it would be nice if people for which the
> normal megaraid driver works would accept to check
> this
> doesn't break anything.

Your patch changes the mask on both IO and memory ports to be MEM mask, which
is obviously incorrect. It wont actually bite you because all the masking
has already been done by pci_resource_start() so you are masking already
zero bits.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: Linux 2.2.18pre25
@ 2000-12-08 15:16 willy tarreau
  2000-12-08 17:06 ` Alan Cox
  0 siblings, 1 reply; 72+ messages in thread
From: willy tarreau @ 2000-12-08 15:16 UTC (permalink / raw)
  To: Alan Cox, Willy Tarreau; +Cc: Alan Cox, Miquel van Smoorenburg, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

> It doesnt even apply

sorry Alan, I think it's because I had to copy/paste
it
with my mouse under X into my browser (I don't have
smtp access here at work), and it applies here with a
-12 lines offset...

Here it is attached for 2.2.18pre25, but since the
raid
server is running now (under 2.2.18pre20+patch), I
won't be able to test it till next week, but
I'm a bit confident since it will do the same as the
one which currently allows this server to boot.

as soon as I can reboot it, I promise I will test the
kernel with and without the patch to be really sure.
but before that, if people who have problems with
megaraid/netraid could give it a try, that would be
cool. Also, it would be nice if people for which the
normal megaraid driver works would accept to check
this
doesn't break anything.

Regards,
Willy


___________________________________________________________
Do You Yahoo!? -- Pour dialoguer en direct avec vos amis, 
Yahoo! Messenger : http://fr.messenger.yahoo.com

[-- Attachment #2: patch-megaraid-fix --]
[-- Type: application/x-unknown, Size: 674 bytes --]

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2000-12-15  0:42 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-07 20:03 Linux 2.2.18pre25 Alan Cox
2000-12-07 23:23 ` Miquel van Smoorenburg
2000-12-07 23:41   ` Alan Cox
2000-12-08  9:47     ` Willy Tarreau
2000-12-08 14:08       ` Alan Cox
2000-12-08 16:07         ` Miquel van Smoorenburg
2000-12-08 17:08           ` Alan Cox
2000-12-08 18:12       ` Philipp Rumpf
2000-12-08  0:20 ` Andrea Arcangeli
2000-12-08  0:27   ` Alan Cox
2000-12-08  0:41     ` Andrea Arcangeli
2000-12-08  0:47       ` Alan Cox
2000-12-08  1:27         ` Linus Torvalds
2000-12-08  0:44     ` Signal 11 Rainer Mager
2000-12-08  1:05       ` Jeff V. Merkey
2000-12-08  1:09       ` Michel LESPINASSE
2000-12-08  2:14         ` Rainer Mager
2000-12-08  1:20       ` Andi Kleen
2000-12-08  1:24         ` Jeff V. Merkey
2000-12-08  1:40           ` Andi Kleen
2000-12-08  1:43             ` Jeff V. Merkey
2000-12-08  1:55               ` Jeff V. Merkey
2000-12-08 19:20               ` Dr. Kelsey Hudson
2000-12-08  2:28           ` davej
2000-12-08  3:13             ` Jeff V. Merkey
2000-12-08  3:25               ` davej
2000-12-08 16:44                 ` Matthew Vanecek
2000-12-08 19:43                 ` Dr. Kelsey Hudson
2000-12-08 13:52             ` Alan Cox
2000-12-15  0:11             ` lamont
2000-12-08  1:58       ` Richard B. Johnson
2000-12-08  2:04         ` Peter Samuelson
2000-12-08 16:36           ` Matthew Vanecek
2000-12-08 16:49             ` Richard B. Johnson
2000-12-08 17:40               ` Peter Samuelson
2000-12-08 19:36           ` Dr. Kelsey Hudson
2000-12-08  9:46       ` David Woodhouse
2000-12-08 14:06         ` Alan Cox
2000-12-09 19:01           ` Matthew Vanecek
2000-12-09 19:20             ` davej
2000-12-09 23:31               ` Matthew Vanecek
2000-12-11  1:31                 ` OOPS when using 4GB memory setting Rainer Mager
2000-12-11  0:58           ` Signal 11 Rainer Mager
2000-12-11  9:05             ` Rainer Mager
2000-12-11 13:33               ` Mike Galbraith
2000-12-11 23:24                 ` Rainer Mager
2000-12-13  0:22                   ` Signal 11 - the continuing saga Rainer Mager
2000-12-13  2:17                     ` Jeff V. Merkey
2000-12-13  1:45                       ` Rainer Mager
2000-12-13  4:29                         ` Mike Galbraith
2000-12-13  9:34                           ` Rainer Mager
2000-12-13 15:40                             ` Mike Galbraith
2000-12-13  3:17                       ` Linus Torvalds
2000-12-13  9:34                         ` Rainer Mager
2000-12-13 17:43                         ` Jeff V. Merkey
2000-12-13 12:10                     ` R: " CMA
2000-12-11 14:14               ` Signal 11 davej
2000-12-08 16:21         ` Horst von Brand
2000-12-08 19:34         ` Mark Vojkovich
2000-12-08 23:16           ` Jeff V. Merkey
2000-12-08 22:24             ` David Woodhouse
2000-12-09  0:56               ` Jeff V. Merkey
2000-12-08 17:02   ` Linux 2.2.18pre25 Martin Kacer
2000-12-08 17:20     ` Alan Cox
2000-12-08 17:36       ` Martin Kacer
2000-12-08 18:08     ` Andrea Arcangeli
2000-12-08 18:30       ` Martin Kacer
2000-12-08 23:55         ` Alan Cox
2000-12-08 15:16 willy tarreau
2000-12-08 17:06 ` Alan Cox
2000-12-08 20:03 willy tarreau
2000-12-08 20:12 willy tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).