linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: memory corrupting bug, bisected to 6dda9d55
@ 2010-10-09  9:57 pacman
  2010-10-11 12:52 ` Christoph Lameter
  2010-10-11 14:30 ` Mel Gorman
  0 siblings, 2 replies; 40+ messages in thread
From: pacman @ 2010-10-09  9:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Mel Gorman, Christoph Lameter, KOSAKI Motohiro,
	Yinghai Lu, linux-kernel, linux-mm

(What a big Cc: list... scripts/get_maintainer.pl made me do it.)

This will be a long story with a weak conclusion, sorry about that, but it's
been a long bug-hunt.

With recent kernels I've seen a bug that appears to corrupt random 4-byte
chunks of memory. It's not easy to reproduce. It seems to happen only once
per boot, pretty quickly after userspace has gotten started, and sometimes it
doesn't happen at all.

Symptoms that I have seen multiple times include:
#1. Oops during modprobe usbcore (in apply_relocate_add)
#2. (more frequent than #1) e2fsck dies of SIGSEGV or SIGILL

I gdb'ed one of the e2fsck crashes and found that the SIGILL was indeed an
illegal instruction. A single instruction had been replaced by 4 seemingly
random bytes which did not form a valid instruction. So I began doing an md5
check of e2fsck and its dependent libs on every boot.

This made detection easier, as I found that about 50% of the time, booting a
bad kernel would cause an md5 mismatch in /lib/libe2p.so.2.3. None of this
corruption was actually present on disk. I was always able to boot my old
known-good kernel and md5 all the suspect files, and they were always fine.

Using that test procedure, all the bad kernels showed the symptom on the
second boot, and all the good kernels had 6 consecutive boots without any
trouble. The git bisect ended here:

  commit 6dda9d55bf545013597724bf0cd79d01bd2bd944
  Author: Corrado Zoccolo <czoccolo@gmail.com>

      page allocator: reduce fragmentation in buddy allocator by adding buddies that are merging to the tail of the free lists

   mm/page_alloc.c |   30 +++++++++++++++++++++++++-----
   1 files changed, 25 insertions(+), 5 deletions(-)

which is way back before 2.6.35-rc1.

Since this is code that has obviously been tested by a lot of people and
hasn't hurt most of them, I figure it must be very sensitive to hardware
and/or kernel config options. I also considered the possibility of a compiler
bug. Most of my testing was done with gcc 4.3.2, but I also tried 4.4.2 and
that didn't make a difference.

This is all happening on Pegasos2 (32-bit PPC).

The latest kernel I've confirmed the bug on was 2.6.35.7. The bad commit
reverts cleanly on top of 2.6.35.7, and that results in a good kernel as
expected. (I can't test the latest Linus git tree until I solve the unrelated
bug that has apparently killed the keyboard driver.)

Can someone familiar with the code take a fresh look at 6dda9d55 and spot a
bug? If not, what should I try next?

Here's .config

:r!grep '^[^\#]' .config
CONFIG_PPC_BOOK3S_32=y
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_PPC32=y
CONFIG_WORD_SIZE=32
CONFIG_MMU=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_IRQ_PER_CPU=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_PPC=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_OF=y
CONFIG_OF=y
CONFIG_PPC_UDBG_16550=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_DTC=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_TREE_RCU=y
CONFIG_RCU_FANOUT=32
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_EMBEDDED=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_PERF_COUNTERS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
CONFIG_HAVE_OPROFILE=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_SLOW_WORK=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_BLOCK=y
CONFIG_LBDAF=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_INLINE_SPIN_UNLOCK=y
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
CONFIG_INLINE_READ_UNLOCK=y
CONFIG_INLINE_READ_UNLOCK_IRQ=y
CONFIG_INLINE_WRITE_UNLOCK=y
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
CONFIG_PPC_CHRP=y
CONFIG_PPC_NATIVE=y
CONFIG_PPC_OF_BOOT_TRAMPOLINE=y
CONFIG_MPIC=y
CONFIG_PPC_I8259=y
CONFIG_PPC_RTAS=y
CONFIG_RTAS_ERROR_LOGGING=y
CONFIG_PPC_RTAS_DAEMON=y
CONFIG_RTAS_PROC=y
CONFIG_PPC_MPC106=y
CONFIG_HIGHMEM=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_HZ_250=y
CONFIG_HZ=250
CONFIG_SCHED_HRTICK=y
CONFIG_PREEMPT_NONE=y
CONFIG_BINFMT_ELF=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_HAS_WALK_MEMORY=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_SPARSE_IRQ=y
CONFIG_MAX_ACTIVE_REGIONS=32
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_PPC_4K_PAGES=y
CONFIG_FORCE_MAX_ZONEORDER=11
CONFIG_PROC_DEVICETREE=y
CONFIG_EXTRA_TARGETS=""
CONFIG_SECCOMP=y
CONFIG_ISA_DMA_API=y
CONFIG_ZONE_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_PPC_INDIRECT_PCI=y
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_SYSCALL=y
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_LOWMEM_SIZE=0x30000000
CONFIG_PAGE_OFFSET=0xc0000000
CONFIG_KERNEL_START=0xc0000000
CONFIG_PHYSICAL_START=0x00000000
CONFIG_TASK_SIZE=0xc0000000
CONFIG_NET=y
CONFIG_PACKET=m
CONFIG_UNIX=m
CONFIG_XFRM=y
CONFIG_XFRM_USER=m
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_FIB_HASH=y
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CT_ACCT=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CT_PROTO_GRE=m
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HL=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_RATEEST=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_HL=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
CONFIG_NF_DEFRAG_IPV4=m
CONFIG_NF_CONNTRACK_IPV4=m
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_NF_NAT=m
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_NF_NAT_SNMP_BASIC=m
CONFIG_NF_NAT_PROTO_GRE=m
CONFIG_NF_NAT_FTP=m
CONFIG_NF_NAT_IRC=m
CONFIG_NF_NAT_TFTP=m
CONFIG_NF_NAT_AMANDA=m
CONFIG_NF_NAT_PPTP=m
CONFIG_NF_NAT_H323=m
CONFIG_NF_NAT_SIP=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_SCTP=m
CONFIG_SCTP_HMAC_MD5=y
CONFIG_STP=m
CONFIG_GARP=m
CONFIG_BRIDGE=m
CONFIG_BRIDGE_IGMP_SNOOPING=y
CONFIG_VLAN_8021Q=m
CONFIG_VLAN_8021Q_GVRP=y
CONFIG_LLC=m
CONFIG_LLC2=m
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
CONFIG_NET_SCH_DRR=m
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_SCH_FIFO=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
CONFIG_EXTRA_FIRMWARE=""
CONFIG_OF_FLATTREE=y
CONFIG_OF_DYNAMIC=y
CONFIG_OF_DEVICE=y
CONFIG_OF_I2C=y
CONFIG_OF_MDIO=m
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_PC_SUPERIO=y
CONFIG_PARPORT_1284=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=m
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
CONFIG_MISC_DEVICES=y
CONFIG_HAVE_IDE=y
CONFIG_IDE=y
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
CONFIG_IDE_PROC_FS=y
CONFIG_BLK_DEV_IDEDMA_SFF=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_SCSI_MOD=m
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_SG=m
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_WAIT_SCAN=m
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_RAID6_PQ=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BLK_DEV_DM=m
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_IEEE1394=m
CONFIG_IEEE1394_OHCI1394=m
CONFIG_IEEE1394_PCILYNX=m
CONFIG_IEEE1394_SBP2=m
CONFIG_IEEE1394_ETH1394_ROM_ENTRY=y
CONFIG_IEEE1394_ETH1394=m
CONFIG_IEEE1394_RAWIO=m
CONFIG_IEEE1394_VIDEO1394=m
CONFIG_IEEE1394_DV1394=m
CONFIG_I2O=m
CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_CONFIG=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
CONFIG_MACINTOSH_DRIVERS=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
CONFIG_TUN=m
CONFIG_VETH=m
CONFIG_PHYLIB=m
CONFIG_MARVELL_PHY=m
CONFIG_DAVICOM_PHY=m
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=m
CONFIG_CICADA_PHY=m
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
CONFIG_NET_PCI=y
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
CONFIG_NETDEV_1000=y
CONFIG_VIA_VELOCITY=m
CONFIG_MV643XX_ETH=m
CONFIG_USB_CATC=m
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=m
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
CONFIG_USB_NET_DM9601=m
CONFIG_USB_NET_ZAURUS=m
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPPOE=m
CONFIG_SLHC=m
CONFIG_NET_FC=y
CONFIG_INPUT=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_EVDEV=m
CONFIG_INPUT_EVBUG=m
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=m
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
CONFIG_INPUT_UINPUT=m
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=m
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_EMU10K1=m
CONFIG_GAMEPORT_FM801=m
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_8250=m
CONFIG_SERIAL_8250_PCI=m
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_CORE=m
CONFIG_SERIAL_OF_PLATFORM=m
CONFIG_UNIX98_PTYS=y
CONFIG_PRINTER=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_NVRAM=y
CONFIG_GEN_RTC=y
CONFIG_GEN_RTC_X=y
CONFIG_DEVPORT=y
CONFIG_RAMOOPS=m
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_VIAPRO=m
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_SENSORS_VT8231=m
CONFIG_WATCHDOG=y
CONFIG_SOFT_WATCHDOG=m
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m
CONFIG_USBPCWATCHDOG=m
CONFIG_SSB_POSSIBLE=y
CONFIG_MEDIA_SUPPORT=y
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_V4L1_COMPAT=y
CONFIG_VIDEO_MEDIA=m
CONFIG_IR_CORE=y
CONFIG_VIDEO_IR=y
CONFIG_MEDIA_ATTACH=y
CONFIG_MEDIA_TUNER=m
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_CAPTURE_DRIVERS=y
CONFIG_VIDEO_FIXED_MINOR_RANGES=y
CONFIG_V4L_USB_DRIVERS=y
CONFIG_USB_VIDEO_CLASS=m
CONFIG_USB_VIDEO_CLASS_INPUT_EVDEV=y
CONFIG_USB_GSPCA=m
CONFIG_AGP=m
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
CONFIG_VIDEO_OUTPUT_CONTROL=m
CONFIG_FB=y
CONFIG_FB_DDC=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_MACMODES=y
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
CONFIG_FB_OF=y
CONFIG_FB_RADEON=y
CONFIG_FB_RADEON_I2C=y
CONFIG_FB_RADEON_BACKLIGHT=y
CONFIG_FB_ATY128=y
CONFIG_FB_ATY128_BACKLIGHT=y
CONFIG_FB_ATY=y
CONFIG_FB_ATY_CT=y
CONFIG_FB_ATY_GENERIC_LCD=y
CONFIG_FB_ATY_GX=y
CONFIG_FB_ATY_BACKLIGHT=y
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_DISPLAY_SUPPORT=y
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
CONFIG_LOGO_LINUX_VGA16=y
CONFIG_SOUND=m
CONFIG_SOUND_OSS_CORE=y
CONFIG_SOUND_OSS_CORE_PRECLAIM=y
CONFIG_SOUND_PRIME=m
CONFIG_SOUND_OSS=m
CONFIG_SOUND_YM3812=m
CONFIG_HID_SUPPORT=y
CONFIG_HID=m
CONFIG_HIDRAW=y
CONFIG_USB_HID=m
CONFIG_USB_HIDDEV=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=m
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_HCD_PPC_OF=y
CONFIG_USB_OHCI_HCD=m
CONFIG_USB_OHCI_HCD_PPC_OF_BE=y
CONFIG_USB_OHCI_HCD_PPC_OF=y
CONFIG_USB_OHCI_HCD_PCI=y
CONFIG_USB_OHCI_BIG_ENDIAN_DESC=y
CONFIG_USB_OHCI_BIG_ENDIAN_MMIO=y
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
CONFIG_USB_TMC=m
CONFIG_USB_STORAGE=m
CONFIG_USB_STORAGE_DATAFAB=m
CONFIG_USB_STORAGE_FREECOM=m
CONFIG_USB_STORAGE_ISD200=m
CONFIG_USB_MDC800=m
CONFIG_USB_MICROTEK=m
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
CONFIG_USB_SERIAL_BELKIN=m
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
CONFIG_USB_SERIAL_CP210X=m
CONFIG_USB_SERIAL_CYPRESS_M8=m
CONFIG_USB_SERIAL_EMPEG=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_VISOR=m
CONFIG_USB_SERIAL_IPAQ=m
CONFIG_USB_SERIAL_IR=m
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
CONFIG_USB_SERIAL_PL2303=m
CONFIG_USB_SERIAL_HP4X=m
CONFIG_USB_SERIAL_SAFE=m
CONFIG_USB_SERIAL_SAFE_PADDED=y
CONFIG_USB_SERIAL_CYBERJACK=m
CONFIG_USB_SERIAL_OMNINET=m
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
CONFIG_USB_LCD=m
CONFIG_USB_LED=m
CONFIG_USB_CYTHERM=m
CONFIG_USB_IDMOUSE=m
CONFIG_USB_TEST=m
CONFIG_MMC=m
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_BOUNCE=y
CONFIG_MMC_SDHCI=m
CONFIG_AUXDISPLAY=y
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT3_FS=y
CONFIG_EXT3_DEFAULTS_TO_ORDERED=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
CONFIG_EXT4_FS=m
CONFIG_EXT4_FS_XATTR=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_JBD=y
CONFIG_JBD2=m
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=m
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
CONFIG_REISERFS_FS_SECURITY=y
CONFIG_FS_POSIX_ACL=y
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
CONFIG_PRINT_QUOTA_WARNING=y
CONFIG_QUOTA_TREE=m
CONFIG_QFMT_V2=m
CONFIG_QUOTACTL=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_GENERIC_ACL=y
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
CONFIG_FSCACHE_HISTOGRAM=y
CONFIG_CACHEFILES=m
CONFIG_CACHEFILES_HISTOGRAM=y
CONFIG_ISO9660_FS=m
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_CRAMFS=y
CONFIG_SQUASHFS=m
CONFIG_SQUASHFS_XATTRS=y
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
CONFIG_MINIX_FS=m
CONFIG_ROMFS_FS=m
CONFIG_ROMFS_BACKED_BY_BLOCK=y
CONFIG_ROMFS_ON_BLOCK=y
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFS_FSCACHE=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_CODA_FS=m
CONFIG_PARTITION_ADVANCED=y
CONFIG_AMIGA_PARTITION=y
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=m
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
CONFIG_LIBCRC32C=m
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_LZO_COMPRESS=m
CONFIG_LZO_DECOMPRESS=m
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y
CONFIG_GENERIC_ATOMIC64=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
CONFIG_MAGIC_SYSRQ=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_DEBUG_LIST=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_BRANCH_PROFILE_NONE=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_PPC_WERROR=y
CONFIG_PRINT_STACK_DEPTH=64
CONFIG_BOOTX_TEXT=y
CONFIG_KEYS=y
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_ASYNC_PQ=m
CONFIG_ASYNC_RAID6_RECOV=m
CONFIG_CRYPTO=y
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG=m
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_MANAGER_TESTS=y
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=m
CONFIG_CRYPTO_SEQIV=m
CONFIG_CRYPTO_CBC=m
CONFIG_CRYPTO_CTR=m
CONFIG_CRYPTO_CTS=m
CONFIG_CRYPTO_ECB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_GHASH=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD128=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
CONFIG_CRYPTO_RMD320=m
CONFIG_CRYPTO_SHA1=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_SALSA20=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=m
CONFIG_CRYPTO_LZO=m
CONFIG_CRYPTO_ANSI_CPRNG=m

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-09  9:57 PROBLEM: memory corrupting bug, bisected to 6dda9d55 pacman
@ 2010-10-11 12:52 ` Christoph Lameter
  2010-10-11 14:30 ` Mel Gorman
  1 sibling, 0 replies; 40+ messages in thread
From: Christoph Lameter @ 2010-10-11 12:52 UTC (permalink / raw)
  To: pacman
  Cc: linux-mm, Andrew Morton, Mel Gorman, KOSAKI Motohiro, Yinghai Lu,
	linux-kernel

The contents of those scribbles may reveal something. Are these 4 bytes a
pointer? If so at what memory area are they pointing? A page struct?




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-09  9:57 PROBLEM: memory corrupting bug, bisected to 6dda9d55 pacman
  2010-10-11 12:52 ` Christoph Lameter
@ 2010-10-11 14:30 ` Mel Gorman
  2010-10-11 20:35   ` pacman
  2010-10-11 21:00   ` Andrew Morton
  1 sibling, 2 replies; 40+ messages in thread
From: Mel Gorman @ 2010-10-11 14:30 UTC (permalink / raw)
  To: pacman
  Cc: linux-mm, Andrew Morton, Christoph Lameter, KOSAKI Motohiro,
	Yinghai Lu, linux-kernel

On Sat, Oct 09, 2010 at 04:57:18AM -0500, pacman@kosh.dhis.org wrote:
> (What a big Cc: list... scripts/get_maintainer.pl made me do it.)
> 
> This will be a long story with a weak conclusion, sorry about that, but it's
> been a long bug-hunt.
> 
> With recent kernels I've seen a bug that appears to corrupt random 4-byte
> chunks of memory. It's not easy to reproduce. It seems to happen only once
> per boot, pretty quickly after userspace has gotten started, and sometimes it
> doesn't happen at all.
> 

A corruption of 4 bytes could be consistent with a pointer value being
written to an incorrect location.

> Symptoms that I have seen multiple times include:
> #1. Oops during modprobe usbcore (in apply_relocate_add)
> #2. (more frequent than #1) e2fsck dies of SIGSEGV or SIGILL
> 
> I gdb'ed one of the e2fsck crashes and found that the SIGILL was indeed an
> illegal instruction. A single instruction had been replaced by 4 seemingly
> random bytes which did not form a valid instruction. So I began doing an md5
> check of e2fsck and its dependent libs on every boot.
> 
> This made detection easier, as I found that about 50% of the time, booting a
> bad kernel would cause an md5 mismatch in /lib/libe2p.so.2.3. None of this
> corruption was actually present on disk. I was always able to boot my old
> known-good kernel and md5 all the suspect files, and they were always fine.
> 
> Using that test procedure, all the bad kernels showed the symptom on the
> second boot, and all the good kernels had 6 consecutive boots without any
> trouble. The git bisect ended here:
> 
>   commit 6dda9d55bf545013597724bf0cd79d01bd2bd944
>   Author: Corrado Zoccolo <czoccolo@gmail.com>
> 
>       page allocator: reduce fragmentation in buddy allocator by adding buddies that are merging to the tail of the free lists
> 
>    mm/page_alloc.c |   30 +++++++++++++++++++++++++-----
>    1 files changed, 25 insertions(+), 5 deletions(-)
> 
> which is way back before 2.6.35-rc1.
> 
> Since this is code that has obviously been tested by a lot of people and
> hasn't hurt most of them, I figure it must be very sensitive to hardware
> and/or kernel config options. I also considered the possibility of a compiler
> bug. Most of my testing was done with gcc 4.3.2, but I also tried 4.4.2 and
> that didn't make a difference.
> 
> This is all happening on Pegasos2 (32-bit PPC).
> 
> The latest kernel I've confirmed the bug on was 2.6.35.7. The bad commit
> reverts cleanly on top of 2.6.35.7, and that results in a good kernel as
> expected. (I can't test the latest Linus git tree until I solve the unrelated
> bug that has apparently killed the keyboard driver.)
> 
> Can someone familiar with the code take a fresh look at 6dda9d55 and spot a
> bug? If not, what should I try next?
> 

I think there is a slight bug but but not one that would cause corruption.

	if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {

That looks like it can result in checking the buddy for an order-(MAX_ORDER-1)
page which is a bit bogus. Thing is, it should be harmless because there
isn't an unusual write made. In case it's some weird compiler optimisation
though, could you try this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 502a882..5b0eb8c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -530,7 +530,7 @@ static inline void __free_one_page(struct page *page,
 	 * so it's less likely to be used soon and more likely to be merged
 	 * as a higher order page
 	 */
-	if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {
+	if ((order < MAX_ORDER-2) && pfn_valid_within(page_to_pfn(buddy))) {
 		struct page *higher_page, *higher_buddy;
 		combined_idx = __find_combined_index(page_idx, order);
 		higher_page = page + combined_idx - page_idx;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-11 14:30 ` Mel Gorman
@ 2010-10-11 20:35   ` pacman
  2010-10-11 21:00   ` Andrew Morton
  1 sibling, 0 replies; 40+ messages in thread
From: pacman @ 2010-10-11 20:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel

Mel Gorman writes:
> 
> A corruption of 4 bytes could be consistent with a pointer value being
> written to an incorrect location.

The memory scribbles that I've looked at in detail and written down have
been 0x86520000, 0xea5b0000, and 0x1d5f0000. They don't look very pointerish.
The 2 low bytes being 0 in all 3 cases is an intriguing pattern though. That
may not matter though because...

> 
> I think there is a slight bug but but not one that would cause corruption.
> 
> 	if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {

I think you found it. Think harder about how it might cause corruption.
Applying your suggested patch really seems to have fixed it. Starting from
v2.6.36-rc7-69-g6b0cd00 I applied your patch, booted 6 times, all clean.
Reverted your patch, booted once, and /sbin/e2fsck failed its md5sum check.
Sent a copy of the "bad" /sbin/e2fsck to another machine, rebooted with an
old good kernel, reapplied your patch to the new kernel, and got 6 more good
boots.

The bad copy of e2fsck differs from the good one in 2 separate locations,
each 4 bytes wide. The bogus values are the 0xea5b0000 and 0x1d5f0000 which I
mentioned already.

> That looks like it can result in checking the buddy for an order-(MAX_ORDER-1)
> page which is a bit bogus. Thing is, it should be harmless because there
> isn't an unusual write made. In case it's some weird compiler optimisation
> though, could you try this?
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 502a882..5b0eb8c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -530,7 +530,7 @@ static inline void __free_one_page(struct page *page,
>  	 * so it's less likely to be used soon and more likely to be merged
>  	 * as a higher order page
>  	 */
> -	if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {
> +	if ((order < MAX_ORDER-2) && pfn_valid_within(page_to_pfn(buddy))) {
>  		struct page *higher_page, *higher_buddy;
>  		combined_idx = __find_combined_index(page_idx, order);
>  		higher_page = page + combined_idx - page_idx;
> 

It doesn't look like there are any optimization tricks involved. I did a
"make mm/page_alloc.s" before and after your patch, and the difference is
simply this:

--- mm/page_alloc.s.6b0cd00	2010-10-11 14:03:03.000000000 -0500
+++ mm/page_alloc.s.6b0cd00+mel	2010-10-11 14:03:49.000000000 -0500
@@ -3885,7 +3885,7 @@
 .L523:
 	mr 11,28	 # page_idx, page_idx.2227
 .L526:
-	cmplwi 7,29,9	 #, tmp222, order
+	cmplwi 7,29,8	 #, tmp222, order
 	lwz 0,0(30)	 #* page, tmp220
 	stw 29,12(30)	 # <variable>.D.6650.D.6646.private, order
 	oris 0,0,0x8	 #, tmp221, tmp220,
@@ -4337,7 +4337,7 @@
 	add 30,31,11	 # buddy, page, tmp197
 	ble+ 7,.L578	 #
 .L575:
-	cmplwi 7,27,9	 #, tmp226, order
+	cmplwi 7,27,8	 #, tmp226, order
 	lwz 0,0(31)	 #* page, tmp224
 	stw 27,12(31)	 # <variable>.D.6650.D.6646.private, order
 	oris 0,0,0x8	 #, tmp225, tmp224,

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-11 14:30 ` Mel Gorman
  2010-10-11 20:35   ` pacman
@ 2010-10-11 21:00   ` Andrew Morton
  2010-10-13 14:40     ` Mel Gorman
  1 sibling, 1 reply; 40+ messages in thread
From: Andrew Morton @ 2010-10-11 21:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: pacman, linux-mm, Christoph Lameter, KOSAKI Motohiro, Yinghai Lu,
	linux-kernel, linuxppc-dev

(cc linuxppc-dev@lists.ozlabs.org)

On Mon, 11 Oct 2010 15:30:22 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> On Sat, Oct 09, 2010 at 04:57:18AM -0500, pacman@kosh.dhis.org wrote:
> > (What a big Cc: list... scripts/get_maintainer.pl made me do it.)
> > 
> > This will be a long story with a weak conclusion, sorry about that, but it's
> > been a long bug-hunt.
> > 
> > With recent kernels I've seen a bug that appears to corrupt random 4-byte
> > chunks of memory. It's not easy to reproduce. It seems to happen only once
> > per boot, pretty quickly after userspace has gotten started, and sometimes it
> > doesn't happen at all.
> > 
> 
> A corruption of 4 bytes could be consistent with a pointer value being
> written to an incorrect location.

It's corruption of user memory, which is unusual.  I'd be wondering if
there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
previously the corruption was hitting something harmless.  Something
like a missed CPU cache writeback or invalidate operation.

How sensitive/vulnerable is PPC32 to such things?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-11 21:00   ` Andrew Morton
@ 2010-10-13 14:40     ` Mel Gorman
  2010-10-13 17:52       ` pacman
  2010-10-18 20:59       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 40+ messages in thread
From: Mel Gorman @ 2010-10-13 14:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: pacman, linux-mm, Christoph Lameter, KOSAKI Motohiro, Yinghai Lu,
	linux-kernel, linuxppc-dev

On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> (cc linuxppc-dev@lists.ozlabs.org)
> 
> On Mon, 11 Oct 2010 15:30:22 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Sat, Oct 09, 2010 at 04:57:18AM -0500, pacman@kosh.dhis.org wrote:
> > > (What a big Cc: list... scripts/get_maintainer.pl made me do it.)
> > > 
> > > This will be a long story with a weak conclusion, sorry about that, but it's
> > > been a long bug-hunt.
> > > 
> > > With recent kernels I've seen a bug that appears to corrupt random 4-byte
> > > chunks of memory. It's not easy to reproduce. It seems to happen only once
> > > per boot, pretty quickly after userspace has gotten started, and sometimes it
> > > doesn't happen at all.
> > > 
> > 
> > A corruption of 4 bytes could be consistent with a pointer value being
> > written to an incorrect location.
> 
> It's corruption of user memory, which is unusual.  I'd be wondering if
> there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
> previously the corruption was hitting something harmless.  Something
> like a missed CPU cache writeback or invalidate operation.
> 

This seems somewhat plausible although it's hard to tell for sure. But
lets say we had the following situation in memory

[<----MAX_ORDER_NR_PAGES---->][<----MAX_ORDER_NR_PAGES---->]
INITRD                        memmap array

initrd gets freed and someone else very early in boot gets allocated in
there. Lets further guess that the struct pages in the memmap area are
managing the page frame where the INITRD was because it makes the situation
slightly easier to trigger. As pages get freed in the memmap array, we could
reference memory where initrd used to be but the physical memory is mapped
at two virtual addresses.

CPU A							CPU B
							Reads kernelspace virtual (gets cache line)
Writes userspace virtual (gets different cache line)
							IO, writes buffer destined for userspace (via cache line)
Cache line eviction, writeback to memory

This is somewhat contrived but I can see how it might happen even on one
CPU particularly if the L1 cache is virtual and is loose about checking
physical tags.

> How sensitive/vulnerable is PPC32 to such things?
> 

I can not tell you specifically but if the above scenario is in any way
plausible, I believe it would depend on what sort of L1 cache the CPU
has. Maybe this particular version has a virtual cache with no physical
tagging and is depending on the OS not to make virtual aliasing mistakes.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-13 14:40     ` Mel Gorman
@ 2010-10-13 17:52       ` pacman
  2010-10-18 11:33         ` Mel Gorman
  2010-10-18 20:59       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-13 17:52 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, linuxppc-dev

Mel Gorman writes:
> 
> On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> > 
> > It's corruption of user memory, which is unusual.  I'd be wondering if
> > there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
> > previously the corruption was hitting something harmless.  Something
> > like a missed CPU cache writeback or invalidate operation.
> > 
> 
> This seems somewhat plausible although it's hard to tell for sure. But
> lets say we had the following situation in memory
> 
> [<----MAX_ORDER_NR_PAGES---->][<----MAX_ORDER_NR_PAGES---->]
> INITRD                        memmap array

I don't use initrd, so this isn't exactly what happened here. But it could be
close. Let me throw out some more information and see if it triggers any
ideas.

First, I tried a new test after seeing the corruption happen:
# md5sum /sbin/e2fsck ; echo 1 > /proc/sys/vm/drop_caches ; md5sum /sbin/e2fsck
And got 2 different answers. The second answer was the correct one.

Since applying the suggested patch which changed MAX_ORDER-1 to MAX_ORDER-2,
I've been trying to isolate exactly when the corruption happens. Since I
don't know much about kernel code, my main method is stuffing the area full
of printk's.

First I duplicated the affected function __free_one_page, since it's inlined
at 2 different places, so I could apply the patch to just one of them. This
proved that the problem is happening when called from free_one_page.

The patch which fixes (or at least covers up) the bug will only matter when
order==MAX_ORDER-2, otherwise everything is the same. So I added a lot of
printk's to show what's happening when order==MAX_ORDER-2. I found that, very
repeatably, 126 such instances occur during boot, and 61 of them pass the
page_is_buddy(higher_page, higher_buddy, order + 1) test, causing them to
call list_add_tail.

Next, since the bug appears when this code decides to call list_add_tail,
I made my own wrapper for list_add_tail, which allowed me to force some of
the calls to do list_add instead. Eventually I found that of the 61 calls,
the last one makes the difference. Allowing the first 60 calls to go through
to list_add_tail, and switching the last one to list_add, the symptom goes
away.

dump_stack() for that last call gave me a backtrace like this:
[c0303e80] [c0008124] show_stack+0x4c/0x144 (unreliable)
[c0303ec0] [c0068a84] free_one_page+0x28c/0x5b0
[c0303f20] [c0069588] __free_pages_ok+0xf8/0x120
[c0303f40] [c02d28c8] free_all_bootmem_core+0xf0/0x1f8
[c0303f70] [c02d29fc] free_all_bootmem+0x2c/0x6c
[c0303f90] [c02cc7dc] mem_init+0x70/0x2ac
[c0303fc0] [c02c66a4] start_kernel+0x150/0x27c
[c0303ff0] [00003438] 0x3438

And this might be interesting: the PFN of the page being added in that
critical 61st call is 130048, which exactly matches the number of available
pages:

  free_area_init_node: node 0, pgdat c02fee6c, node_mem_map c0330000
    DMA zone: 1024 pages used for memmap
    DMA zone: 0 pages reserved
    DMA zone: 130048 pages, LIFO batch:31
  Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048

Suspicious?

If 130048 is added to the head of the order==MAX_ORDER-2 free list, there's
no symptom. Add it to the tail, and the corruption appears.

That's all I know so far.

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-13 17:52       ` pacman
@ 2010-10-18 11:33         ` Mel Gorman
  2010-10-18 19:10           ` pacman
  2010-10-18 19:37           ` Andrew Morton
  0 siblings, 2 replies; 40+ messages in thread
From: Mel Gorman @ 2010-10-18 11:33 UTC (permalink / raw)
  To: pacman; +Cc: Andrew Morton, linux-mm, linux-kernel, linuxppc-dev

On Wed, Oct 13, 2010 at 12:52:05PM -0500, pacman@kosh.dhis.org wrote:
> Mel Gorman writes:
> > 
> > On Mon, Oct 11, 2010 at 02:00:39PM -0700, Andrew Morton wrote:
> > > 
> > > It's corruption of user memory, which is unusual.  I'd be wondering if
> > > there was a pre-existing bug which 6dda9d55bf545013597 has exposed -
> > > previously the corruption was hitting something harmless.  Something
> > > like a missed CPU cache writeback or invalidate operation.
> > > 
> > 
> > This seems somewhat plausible although it's hard to tell for sure. But
> > lets say we had the following situation in memory
> > 
> > [<----MAX_ORDER_NR_PAGES---->][<----MAX_ORDER_NR_PAGES---->]
> > INITRD                        memmap array
> 
> I don't use initrd, so this isn't exactly what happened here. But it could be
> close. Let me throw out some more information and see if it triggers any
> ideas.
> 

Ok.

> First, I tried a new test after seeing the corruption happen:
> # md5sum /sbin/e2fsck ; echo 1 > /proc/sys/vm/drop_caches ; md5sum /sbin/e2fsck
> And got 2 different answers. The second answer was the correct one.
> 
> Since applying the suggested patch which changed MAX_ORDER-1 to MAX_ORDER-2,
> I've been trying to isolate exactly when the corruption happens. Since I
> don't know much about kernel code, my main method is stuffing the area full
> of printk's.
> 
> First I duplicated the affected function __free_one_page, since it's inlined
> at 2 different places, so I could apply the patch to just one of them. This
> proved that the problem is happening when called from free_one_page.
> 
> The patch which fixes (or at least covers up) the bug will only matter when
> order==MAX_ORDER-2, otherwise everything is the same. So I added a lot of
> printk's to show what's happening when order==MAX_ORDER-2. I found that, very
> repeatably, 126 such instances occur during boot, and 61 of them pass the
> page_is_buddy(higher_page, higher_buddy, order + 1) test, causing them to
> call list_add_tail.
> 
> Next, since the bug appears when this code decides to call list_add_tail,
> I made my own wrapper for list_add_tail, which allowed me to force some of
> the calls to do list_add instead. Eventually I found that of the 61 calls,
> the last one makes the difference. Allowing the first 60 calls to go through
> to list_add_tail, and switching the last one to list_add, the symptom goes
> away.
> 
> dump_stack() for that last call gave me a backtrace like this:
> [c0303e80] [c0008124] show_stack+0x4c/0x144 (unreliable)
> [c0303ec0] [c0068a84] free_one_page+0x28c/0x5b0
> [c0303f20] [c0069588] __free_pages_ok+0xf8/0x120
> [c0303f40] [c02d28c8] free_all_bootmem_core+0xf0/0x1f8
> [c0303f70] [c02d29fc] free_all_bootmem+0x2c/0x6c
> [c0303f90] [c02cc7dc] mem_init+0x70/0x2ac
> [c0303fc0] [c02c66a4] start_kernel+0x150/0x27c
> [c0303ff0] [00003438] 0x3438
> 
> And this might be interesting: the PFN of the page being added in that
> critical 61st call is 130048, which exactly matches the number of available
> pages:
> 
>   free_area_init_node: node 0, pgdat c02fee6c, node_mem_map c0330000
>     DMA zone: 1024 pages used for memmap
>     DMA zone: 0 pages reserved
>     DMA zone: 130048 pages, LIFO batch:31
>   Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
> 
> Suspicious?
> 

A bit but I still don't know why it would cause corruption. Maybe this is still
a caching issue but the difference in timing between list_add and list_add_tail
is enough to hide the bug. It's also possible there are some registers
ioremapped after the memmap array and reading them is causing some
problem.

Andrew, what is the right thing to do here? We could flail around looking
for explanations as to why the bug causes a user buffer corruption but never
get an answer or do we go with this patch, preferably before 2.6.36 releases?

==== CUT HERE ====
mm, page-allocator: Do not check the state of a non-existant buddy during free

There is a bug in commit [6dda9d55: page allocator: reduce fragmentation
in buddy allocator by adding buddies that are merging to the tail of the
free lists] that means a buddy at order MAX_ORDER is checked for
merging. A page of this order never exists so at times, an effectively
random piece of memory is being checked.

Alan Curry has reported that this is causing memory corruption in userspace
data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32). It is not clear
why this is happening. It could be a cache coherency problem where pages
mapped in both user and kernel space are getting different cache lines due
to the bad read from kernel space (http://lkml.org/lkml/2010/10/13/179). It
could also be that there are some special registers being io-remapped at
the end of the memmap array and that a read has special meaning on them.
Compiler bugs have been ruled out because the assembly before and after
the patch looks relatively harmless.

This patch fixes the problem by ensuring we are not reading a possibly
invalid location of memory. It's not clear why the read causes
corruption but one way or the other it is a buggy read.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8cfa9c..93cef41 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -530,7 +530,7 @@ static inline void __free_one_page(struct page *page,
 	 * so it's less likely to be used soon and more likely to be merged
 	 * as a higher order page
 	 */
-	if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {
+	if ((order < MAX_ORDER-2) && pfn_valid_within(page_to_pfn(buddy))) {
 		struct page *higher_page, *higher_buddy;
 		combined_idx = __find_combined_index(page_idx, order);
 		higher_page = page + combined_idx - page_idx;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 11:33         ` Mel Gorman
@ 2010-10-18 19:10           ` pacman
  2010-10-18 21:10             ` Benjamin Herrenschmidt
  2010-10-18 19:37           ` Andrew Morton
  1 sibling, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-18 19:10 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, linuxppc-dev

Mel Gorman writes:
> 
> A bit but I still don't know why it would cause corruption. Maybe this is still
> a caching issue but the difference in timing between list_add and list_add_tail
> is enough to hide the bug. It's also possible there are some registers
> ioremapped after the memmap array and reading them is causing some
> problem.

I've been doing a lot more tests and I'm sure that 6dda9d55 is not really
responsible. It just happens to provoke the bug in my particular setup.
Whatever it is, it's very sensitive to small changes.

At the end of free_all_bootmem, the free list for order 9 has 4 entries.
Which one is at the head of the list depends on whether 6dda9d55 is applied
or not. If page number 130048 is at the head of the list, it gets used fairly
soon, and everything's fine. The alternative is that page number 64512 is at
the head of the list, so it gets used fairly soon, and corruption occurs.

> 
> Andrew, what is the right thing to do here? We could flail around looking
> for explanations as to why the bug causes a user buffer corruption but never
> get an answer or do we go with this patch, preferably before 2.6.36 releases?

I've been flailing around quite a bit. Here's my latest result:

Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
loaded into memory during boot. So now after the corruption happens, I can
  cmp -l /sbin/e2fsck good-e2fsck
for a quick look at the changed bytes. Much easier than provoking a segfault
under gdb.

Then I got really creative and wrote a cmp replacement which mmaps the files
and reports the physical addresses from /proc/self/pagemap of the pages that
don't match. And the consistent result is that physical pages 64604 and 64609
(both in the range of the order=9 64512) have wrong contents. And the
corruption is always a single word 128 bytes after the start of the page.
Physical addresses 0x0fc5c080 and 0x0fc61080 are hit every time.

The values of the corrupted words, observed in 5 consecutive boots, were:
  at 0fc5c080   at 0fc61080
  -----------   -----------
  c3540000      92510000
  565c0000      23590000
  c85b0000      97580000
  d15f0000      9e5c0000
  d95b0000      a8580000

The low 16 bits are all 0 and the upper 16 bits seem randomly distributed.
But look at the differences:

  c3540000 - 92510000 = 31030000
  565c0000 - 23590000 = 33030000
  c85b0000 - 97580000 = 31030000
  d15f0000 - 9e5c0000 = 33030000
  d95b0000 - a8580000 = 31030000

This means something... but I don't know what.

In a completely different method of investigation, I went back a few stable
kernels, got 2.6.33.7 and applied 6dda9d55 to it, thinking that if 6dda9d55
only reveals a pre-existing bug, I could bisect it using 6dda9d55 as a
bug-revealing assistant. The bug appeared when running 2.6.33.7 with 6dda9d55
applied. That was discouraging.

>This patch fixes the problem by ensuring we are not reading a possibly
>invalid location of memory. It's not clear why the read causes
>corruption but one way or the other it is a buggy read.

At least that part of the explanation is wrong. Where's the buggy read?
The action taken by the 6dda9d55 version of __free_one_page looks perfectly
legitimate to me. Page numbers:

[129024       ] [130048       ]   order=10
[129024 129536] [130048 130560]   order=9

130048 is being freed. 130560 is not free. 129024 (the higher_buddy) is
already free at order=10. So 130048 is being pushed to the tail of the free
list, on the speculation that 130560 might soon be free and then the whole
thing will form an order=11 free page, the only problem being that order=11
is too high so that later merge will never happen. It's not useful, and maybe
not conceptually valid to say that 129024 is the buddy of 130048, but it is
an existing page, and the only way it wouldn't be is if the total memory size
was not a multiple of 1<<(MAX_ORDER-1) pages

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 11:33         ` Mel Gorman
  2010-10-18 19:10           ` pacman
@ 2010-10-18 19:37           ` Andrew Morton
  2010-10-18 21:02             ` Benjamin Herrenschmidt
  2010-10-18 21:55             ` Thomas Gleixner
  1 sibling, 2 replies; 40+ messages in thread
From: Andrew Morton @ 2010-10-18 19:37 UTC (permalink / raw)
  To: Mel Gorman; +Cc: pacman, linux-mm, linux-kernel, linuxppc-dev

On Mon, 18 Oct 2010 12:33:31 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> A bit but I still don't know why it would cause corruption. Maybe this is still
> a caching issue but the difference in timing between list_add and list_add_tail
> is enough to hide the bug. It's also possible there are some registers
> ioremapped after the memmap array and reading them is causing some
> problem.
> 
> Andrew, what is the right thing to do here? We could flail around looking
> for explanations as to why the bug causes a user buffer corruption but never
> get an answer or do we go with this patch, preferably before 2.6.36 releases?

Well, you've spotted a bug so I'd say we fix it asap.

It's a bit of a shame that we lose the only known way of reproducing a
different bug, but presumably that will come back and bite someone else
one day, and we'll fix it then :(



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-13 14:40     ` Mel Gorman
  2010-10-13 17:52       ` pacman
@ 2010-10-18 20:59       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-18 20:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, linuxppc-dev, linux-kernel, linux-mm, pacman,
	KOSAKI Motohiro, Christoph Lameter, Yinghai Lu

On Wed, 2010-10-13 at 15:40 +0100, Mel Gorman wrote:
> 
> This is somewhat contrived but I can see how it might happen even on one
> CPU particularly if the L1 cache is virtual and is loose about checking
> physical tags.
> 
> > How sensitive/vulnerable is PPC32 to such things?
> > 
> 
> I can not tell you specifically but if the above scenario is in any way
> plausible, I believe it would depend on what sort of L1 cache the CPU
> has. Maybe this particular version has a virtual cache with no physical
> tagging and is depending on the OS not to make virtual aliasing mistakes.

Nah, ppc doesn't have problems with cache aliases, it all looks
physically tagged to the programmer (tho there's subtleties but none
that explains the reported behaviour).

Looks like real memory corruption to me.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 19:37           ` Andrew Morton
@ 2010-10-18 21:02             ` Benjamin Herrenschmidt
  2010-10-18 21:55             ` Thomas Gleixner
  1 sibling, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-18 21:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, linux-mm, pacman, linuxppc-dev, linux-kernel

On Mon, 2010-10-18 at 12:37 -0700, Andrew Morton wrote:
> Well, you've spotted a bug so I'd say we fix it asap.
> 
> It's a bit of a shame that we lose the only known way of reproducing a
> different bug, but presumably that will come back and bite someone
> else
> one day, and we'll fix it then :(

Well, I can always revert that and run some experiments here, provided I
can reproduce the problem at all ...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 19:10           ` pacman
@ 2010-10-18 21:10             ` Benjamin Herrenschmidt
  2010-10-18 21:33               ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-18 21:10 UTC (permalink / raw)
  To: pacman; +Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel

On Mon, 2010-10-18 at 14:10 -0500, pacman@kosh.dhis.org wrote:

> I've been flailing around quite a bit. Here's my latest result:
> 
> Since I can view the corruption with md5sum /sbin/e2fsck, I know it's in a
> clean cached page. So I made an extra copy of /sbin/e2fsck, which won't be
> loaded into memory during boot. So now after the corruption happens, I can
>   cmp -l /sbin/e2fsck good-e2fsck
> for a quick look at the changed bytes. Much easier than provoking a segfault
> under gdb.
> 
> Then I got really creative and wrote a cmp replacement which mmaps the files
> and reports the physical addresses from /proc/self/pagemap of the pages that
> don't match. And the consistent result is that physical pages 64604 and 64609
> (both in the range of the order=9 64512) have wrong contents. And the
> corruption is always a single word 128 bytes after the start of the page.
> Physical addresses 0x0fc5c080 and 0x0fc61080 are hit every time.

 .../...

You can do something fun... like a timer interrupt that peeks at those
physical addresses from the linear mapping for example, and try to find
out "when" they get set to the wrong value (you should observe the load
from disk, then the corruption, unless they end up being loaded
incorrectly (ie. dma coherency problem ?) ...

>From there, you might be able to close onto the culprit a bit more, for
example, try using the DABR register to set data access breakpoints
shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
can set whether you want it to break on a real or a virtual address.

You can also sprinkle tests for the page content through the code if
that doesn't work to try to "close in" on the culprit (for example if
it's a case of stray DMA, like a network driver bug or such).

Cheers,
Ben.


> The values of the corrupted words, observed in 5 consecutive boots, were:
>   at 0fc5c080   at 0fc61080
>   -----------   -----------
>   c3540000      92510000
>   565c0000      23590000
>   c85b0000      97580000
>   d15f0000      9e5c0000
>   d95b0000      a8580000
> 
> The low 16 bits are all 0 and the upper 16 bits seem randomly distributed.
> But look at the differences:
> 
>   c3540000 - 92510000 = 31030000
>   565c0000 - 23590000 = 33030000
>   c85b0000 - 97580000 = 31030000
>   d15f0000 - 9e5c0000 = 33030000
>   d95b0000 - a8580000 = 31030000
> 
> This means something... but I don't know what.
> 
> In a completely different method of investigation, I went back a few stable
> kernels, got 2.6.33.7 and applied 6dda9d55 to it, thinking that if 6dda9d55
> only reveals a pre-existing bug, I could bisect it using 6dda9d55 as a
> bug-revealing assistant. The bug appeared when running 2.6.33.7 with 6dda9d55
> applied. That was discouraging.
> 
> >This patch fixes the problem by ensuring we are not reading a possibly
> >invalid location of memory. It's not clear why the read causes
> >corruption but one way or the other it is a buggy read.
> 
> At least that part of the explanation is wrong. Where's the buggy read?
> The action taken by the 6dda9d55 version of __free_one_page looks perfectly
> legitimate to me. Page numbers:
> 
> [129024       ] [130048       ]   order=10
> [129024 129536] [130048 130560]   order=9
> 
> 130048 is being freed. 130560 is not free. 129024 (the higher_buddy) is
> already free at order=10. So 130048 is being pushed to the tail of the free
> list, on the speculation that 130560 might soon be free and then the whole
> thing will form an order=11 free page, the only problem being that order=11
> is too high so that later merge will never happen. It's not useful, and maybe
> not conceptually valid to say that 129024 is the buddy of 130048, but it is
> an existing page, and the only way it wouldn't be is if the total memory size
> was not a multiple of 1<<(MAX_ORDER-1) pages
> 
> -- 
> Alan Curry
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 21:10             ` Benjamin Herrenschmidt
@ 2010-10-18 21:33               ` pacman
  2010-10-19 10:16                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-18 21:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel

Benjamin Herrenschmidt writes:
> 
> You can do something fun... like a timer interrupt that peeks at those
> physical addresses from the linear mapping for example, and try to find
> out "when" they get set to the wrong value (you should observe the load
> from disk, then the corruption, unless they end up being loaded
> incorrectly (ie. dma coherency problem ?) ...

I'm headed toward something like that. Maybe not a timer, maybe a "check it
every time the kernel is entered". But first I have to work out exactly when
the disk load completes so I know when to start checking.

> 
> >From there, you might be able to close onto the culprit a bit more, for
> example, try using the DABR register to set data access breakpoints
> shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> can set whether you want it to break on a real or a virtual address.

I thought of that, but as far as I can tell, this CPU doesn't have DABR.
/proc/cpuinfo
processor	: 0
cpu		: 7447/7457
clock		: 999.999990MHz
revision	: 1.1 (pvr 8002 0101)
bogomips	: 66.66
timebase	: 33333333
platform	: CHRP
model		: Pegasos2
machine		: CHRP Pegasos2
Memory		: 512 MB

My next thought was: right after the correct value appears in memory, unmap
the page from the kernel and let it Oops when it tries to write there. Then I
found out that the kernel is using BATs instead of page tables for its own
view of memory. Booting with "nobats" completely changes the memory usage
pattern (probably because it's allocating a lot of pages to hold PTEs that it
didn't need before)

> 
> You can also sprinkle tests for the page content through the code if
> that doesn't work to try to "close in" on the culprit (for example if
> it's a case of stray DMA, like a network driver bug or such).

No network drivers are loaded when this happens.

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 19:37           ` Andrew Morton
  2010-10-18 21:02             ` Benjamin Herrenschmidt
@ 2010-10-18 21:55             ` Thomas Gleixner
  2010-10-19 16:24               ` Helmut Grohne
  1 sibling, 1 reply; 40+ messages in thread
From: Thomas Gleixner @ 2010-10-18 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, pacman, linux-mm, LKML, linuxppc-dev, Helmut Grohne

On Mon, 18 Oct 2010, Andrew Morton wrote:

> On Mon, 18 Oct 2010 12:33:31 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > A bit but I still don't know why it would cause corruption. Maybe this is still
> > a caching issue but the difference in timing between list_add and list_add_tail
> > is enough to hide the bug. It's also possible there are some registers
> > ioremapped after the memmap array and reading them is causing some
> > problem.
> > 
> > Andrew, what is the right thing to do here? We could flail around looking
> > for explanations as to why the bug causes a user buffer corruption but never
> > get an answer or do we go with this patch, preferably before 2.6.36 releases?
> 
> Well, you've spotted a bug so I'd say we fix it asap.
> 
> It's a bit of a shame that we lose the only known way of reproducing a
> different bug, but presumably that will come back and bite someone else
> one day, and we'll fix it then :(

I might be completely one off as usual, but this thing reminds me of a
bug I stared at yesterday night:

    http://permalink.gmane.org/gmane.linux.kernel/1049605

Reporter Cc'ed

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 21:33               ` pacman
@ 2010-10-19 10:16                 ` Benjamin Herrenschmidt
  2010-10-19 18:10                   ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-19 10:16 UTC (permalink / raw)
  To: pacman; +Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel


> > >From there, you might be able to close onto the culprit a bit more, for
> > example, try using the DABR register to set data access breakpoints
> > shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> > can set whether you want it to break on a real or a virtual address.
> 
> I thought of that, but as far as I can tell, this CPU doesn't have DABR.
> /proc/cpuinfo
> processor	: 0
> cpu		: 7447/7457
> clock		: 999.999990MHz
> revision	: 1.1 (pvr 8002 0101)
> bogomips	: 66.66
> timebase	: 33333333
> platform	: CHRP
> model		: Pegasos2
> machine		: CHRP Pegasos2
> Memory		: 512 MB

AFAIK, the 7447 is just a derivative of the 7450 design which -does-
have a DABR ... Unless it's broken :-)

> My next thought was: right after the correct value appears in memory, unmap
> the page from the kernel and let it Oops when it tries to write there. Then I
> found out that the kernel is using BATs instead of page tables for its own
> view of memory. Booting with "nobats" completely changes the memory usage
> pattern (probably because it's allocating a lot of pages to hold PTEs that it
> didn't need before)

Right. And that hides the problem I suppose ?

> > You can also sprinkle tests for the page content through the code if
> > that doesn't work to try to "close in" on the culprit (for example if
> > it's a case of stray DMA, like a network driver bug or such).
> 
> No network drivers are loaded when this happens.

Ok.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-18 21:55             ` Thomas Gleixner
@ 2010-10-19 16:24               ` Helmut Grohne
  2010-10-19 16:42                 ` Thomas Gleixner
  0 siblings, 1 reply; 40+ messages in thread
From: Helmut Grohne @ 2010-10-19 16:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Mel Gorman, pacman, linux-mm, LKML, linuxppc-dev

On Mon, Oct 18, 2010 at 11:55:44PM +0200, Thomas Gleixner wrote:
> I might be completely one off as usual, but this thing reminds me of a
> bug I stared at yesterday night:

This problem is completely unrelated. My problem was caused by using
binutils-gold.

Helmut

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-19 16:24               ` Helmut Grohne
@ 2010-10-19 16:42                 ` Thomas Gleixner
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Gleixner @ 2010-10-19 16:42 UTC (permalink / raw)
  To: Helmut Grohne
  Cc: Andrew Morton, Mel Gorman, pacman, linux-mm, LKML, linuxppc-dev

On Tue, 19 Oct 2010, Helmut Grohne wrote:

> On Mon, Oct 18, 2010 at 11:55:44PM +0200, Thomas Gleixner wrote:
> > I might be completely one off as usual, but this thing reminds me of a
> > bug I stared at yesterday night:
> 
> This problem is completely unrelated. My problem was caused by using
> binutils-gold.

Ok, thanks for the update. One thing less to worry about :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-19 10:16                 ` Benjamin Herrenschmidt
@ 2010-10-19 18:10                   ` pacman
  2010-10-19 20:47                     ` Segher Boessenkool
  2010-10-19 20:58                     ` PROBLEM: memory corrupting bug, bisected to 6dda9d55 Benjamin Herrenschmidt
  0 siblings, 2 replies; 40+ messages in thread
From: pacman @ 2010-10-19 18:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel

Benjamin Herrenschmidt writes:
> > 
> > I thought of that, but as far as I can tell, this CPU doesn't have DABR.
> 
> AFAIK, the 7447 is just a derivative of the 7450 design which -does-
> have a DABR ... Unless it's broken :-)

Hmm. gdb resorts to single-stepping when I set a watchpoint while debugging
some userspace program, which I assumed was caused by lack of hardware
watchpoint support. But that's not important right now.

I made a new discovery. During a test boot while looking at the usual symptom
of a corrupted page cache, I run md5sum /sbin/e2fsck twice and got 2
different results, neither one of them correct. The third time, yet another
different result. A few dozen more times, a few dozen more unique results. I
had somehow managed to get a usable interactive shell while corruption was
ongoing.

So then I ran
  dd if=/dev/mem bs=4 count=1 skip=$((0xfc5c080/4)) | od -t x4
a few times very fast, plucking the first affected word directly out of
memory by its physical address. The result:

The low 16 bits are always zero as before. The high 16 bits are a counter,
being incremented at about 1000Hz (as close as I could measure with a crude
shell script. 1024Hz would also be within the margin of error). And it's
little-endian.

While I was watching this happen, there were only 5 or 6 userspace processes
running, and 3 of them were shells. So I doubt that anything in userspace was
doing it. It went on for a few minutes before I exited the interactive shell
and allowed the boot to continue, while keeping an extra shell running on
tty2 to continue making observations. It stopped incrementing almost
immediately.

So what type of driver, firmware, or hardware bug puts a 16-bit 1000Hz timer
in memory, and does it in little-endian instead of the CPU's native byte
order? And why does it stop doing it some time during the early init scripts,
shortly after the root filesystem fsck?

I have not yet attempted to repeat the experiment. If it is repeatable, I'll
probe more deeply into those init scripts later. I'm looking hard at
/etc/rcS.d/S11hwclock.sh

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-19 18:10                   ` pacman
@ 2010-10-19 20:47                     ` Segher Boessenkool
  2010-10-19 21:02                       ` Benjamin Herrenschmidt
  2010-10-19 20:58                     ` PROBLEM: memory corrupting bug, bisected to 6dda9d55 Benjamin Herrenschmidt
  1 sibling, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2010-10-19 20:47 UTC (permalink / raw)
  To: pacman
  Cc: Benjamin Herrenschmidt, Mel Gorman, linux-mm, Andrew Morton,
	linuxppc-dev, linux-kernel

> I made a new discovery.

And this nails it :-)

> So then I ran
>   dd if=/dev/mem bs=4 count=1 skip=$((0xfc5c080/4)) | od -t x4
> a few times very fast, plucking the first affected word directly out of
> memory by its physical address. The result:
>
> The low 16 bits are always zero as before. The high 16 bits are a counter,
> being incremented at about 1000Hz (as close as I could measure with a
> crude
> shell script. 1024Hz would also be within the margin of error). And it's
> little-endian.

> So what type of driver, firmware, or hardware bug puts a 16-bit 1000Hz
> timer
> in memory, and does it in little-endian instead of the CPU's native byte
> order? And why does it stop doing it some time during the early init
> scripts,
> shortly after the root filesystem fsck?

It looks like it is the frame counter in an USB OHCI HCCA.
16-bit, 1kHz update, offset x'80 in a page.

So either the kernel forgot to call quiesce on it, or the firmware
doesn't implement that, or the firmware messed up some other way.


Segher


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-19 18:10                   ` pacman
  2010-10-19 20:47                     ` Segher Boessenkool
@ 2010-10-19 20:58                     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-19 20:58 UTC (permalink / raw)
  To: pacman; +Cc: Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel

On Tue, 2010-10-19 at 13:10 -0500, pacman@kosh.dhis.org wrote:
> 
> So what type of driver, firmware, or hardware bug puts a 16-bit 1000Hz
> timer
> in memory, and does it in little-endian instead of the CPU's native
> byte
> order? And why does it stop doing it some time during the early init
> scripts,
> shortly after the root filesystem fsck?
> 
> I have not yet attempted to repeat the experiment. If it is
> repeatable, I'll
> probe more deeply into those init scripts later. I'm looking hard at
> /etc/rcS.d/S11hwclock.sh 

Stinks of USB...

Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-19 20:47                     ` Segher Boessenkool
@ 2010-10-19 21:02                       ` Benjamin Herrenschmidt
  2010-10-20  3:23                         ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-19 21:02 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: pacman, Mel Gorman, linux-mm, Andrew Morton, linuxppc-dev, linux-kernel

On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote:
> 
> It looks like it is the frame counter in an USB OHCI HCCA.
> 16-bit, 1kHz update, offset x'80 in a page.
> 
> So either the kernel forgot to call quiesce on it, or the firmware
> doesn't implement that, or the firmware messed up some other way.

I vote for the FW being on crack. Wouldn't be the first time with
Pegasos.

It's an OHCI or an UHCI in there ?

Can you try in prom_init.c changing the prom_close_stdin() function to
also close "stdout" ? 

         if (prom_getprop(_prom->chosen, "stdin", &val, sizeof(val)) > 0)
                 call_prom("close", 1, 0, val);
+        if (prom_getprop(_prom->chosen, "stdout", &val, sizeof(val)) > 0)
+               call_prom("close", 1, 0, val);

See if that makes a difference ?

Last option would be to manually turn the thing off with MMIO in yet-another
pegasos workaround in prom_init.c.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-19 21:02                       ` Benjamin Herrenschmidt
@ 2010-10-20  3:23                         ` pacman
  2010-10-20 10:32                           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-20  3:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Segher Boessenkool, Mel Gorman, linux-mm, Andrew Morton,
	linuxppc-dev, linux-kernel

Benjamin Herrenschmidt writes:
> 
> On Tue, 2010-10-19 at 22:47 +0200, Segher Boessenkool wrote:
> > 
> > It looks like it is the frame counter in an USB OHCI HCCA.
> > 16-bit, 1kHz update, offset x'80 in a page.
> > 
> > So either the kernel forgot to call quiesce on it, or the firmware
> > doesn't implement that, or the firmware messed up some other way.
> 
> I vote for the FW being on crack. Wouldn't be the first time with
> Pegasos.
> 
> It's an OHCI or an UHCI in there ?

There's one of each... UHCI on the motherboard, OHCI on a card in a PCI
expansion slot. They shipped the ODW with the extra controller on an
expansion card since the on-board UHCI doesn't do USB2.0.

And that OHCI controller does appear to be the culprit. The 2 affected
addresses tick at 1000Hz until ohci-hcd is modprobe'd, then they stop.

I think the mm people can consider this closed. 6dda9d55 didn't do anything
but expose a problem which has been here all along. Will drop them from Cc
list in any further messages.

> 
> Can you try in prom_init.c changing the prom_close_stdin() function to
> also close "stdout" ? 
> 
>          if (prom_getprop(_prom->chosen, "stdin", &val, sizeof(val)) > 0)
>                  call_prom("close", 1, 0, val);
> +        if (prom_getprop(_prom->chosen, "stdout", &val, sizeof(val)) > 0)
> +               call_prom("close", 1, 0, val);
> 
> See if that makes a difference ?

Huge difference. With no stdout to print to, the kernel seems to freeze up.
Or at least it loses the console. The last message it prints is "Device tree
struct 0x00933000 -> 0x00957000" then there's just nothing. I waited a while
for the console to come on but it didn't.

The diff fragment above applied inside prom_close_stdin, but there are some
prom_printf calls after prom_close_stdin. Calling prom_printf after closing
stdout sounds like it could be bad. If I moved it down below all the
prom_printf's, it would be after the "quiesce" call. Would that be acceptable
(or even interesting as an experiment)? Does a close need a quiesce after it?

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-20  3:23                         ` pacman
@ 2010-10-20 10:32                           ` Benjamin Herrenschmidt
  2010-10-20 18:33                             ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-20 10:32 UTC (permalink / raw)
  To: pacman
  Cc: Segher Boessenkool, Mel Gorman, linux-mm, Andrew Morton,
	linuxppc-dev, linux-kernel

On Tue, 2010-10-19 at 22:23 -0500, pacman@kosh.dhis.org wrote:
> The diff fragment above applied inside prom_close_stdin, but there are
> some
> prom_printf calls after prom_close_stdin. Calling prom_printf after
> closing
> stdout sounds like it could be bad. If I moved it down below all the
> prom_printf's, it would be after the "quiesce" call. Would that be
> acceptable
> (or even interesting as an experiment)? Does a close need a quiesce
> after it?

Just try :-) "quiesce" is something that afaik only apple ever
implemented anyways. It uses hooks inside their OF to shut down all
drivers that do bus master (among other HW sanitization tasks).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-20 10:32                           ` Benjamin Herrenschmidt
@ 2010-10-20 18:33                             ` pacman
  2010-10-20 20:56                               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-20 18:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Segher Boessenkool, linuxppc-dev, linux-kernel

Benjamin Herrenschmidt writes:
> 
> On Tue, 2010-10-19 at 22:23 -0500, pacman@kosh.dhis.org wrote:
> > The diff fragment above applied inside prom_close_stdin, but there are
> > some
> > prom_printf calls after prom_close_stdin. Calling prom_printf after
> > closing
> > stdout sounds like it could be bad. If I moved it down below all the
> > prom_printf's, it would be after the "quiesce" call. Would that be
> > acceptable
> > (or even interesting as an experiment)? Does a close need a quiesce
> > after it?
> 
> Just try :-) "quiesce" is something that afaik only apple ever
> implemented anyways. It uses hooks inside their OF to shut down all
> drivers that do bus master (among other HW sanitization tasks).

I booted a version with a prom_close_stdout after the last prom_debug. It
didn't have any effect. That 1000Hz clock was still ticking.

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-20 18:33                             ` pacman
@ 2010-10-20 20:56                               ` Benjamin Herrenschmidt
  2010-10-22  9:15                                 ` pacman
  2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
  0 siblings, 2 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-20 20:56 UTC (permalink / raw)
  To: pacman; +Cc: Segher Boessenkool, linuxppc-dev, linux-kernel

On Wed, 2010-10-20 at 13:33 -0500, pacman@kosh.dhis.org wrote:
> > Just try :-) "quiesce" is something that afaik only apple ever
> > implemented anyways. It uses hooks inside their OF to shut down all
> > drivers that do bus master (among other HW sanitization tasks).
> 
> I booted a version with a prom_close_stdout after the last prom_debug. It
> didn't have any effect. That 1000Hz clock was still ticking. 

Ok so you'll have to make up a "workaround" in prom_init that looks for
OHCI's in the device-tree and disable them.

Check if the OHCI node has some existing f-code words you can use for
that with "dev /path-to-ohci words" in OF for example. If not, you may
need to use the low level register accessors. Use OF client interface
"interpret" to run forth code from C.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55
  2010-10-20 20:56                               ` Benjamin Herrenschmidt
@ 2010-10-22  9:15                                 ` pacman
  2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
  1 sibling, 0 replies; 40+ messages in thread
From: pacman @ 2010-10-22  9:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Segher Boessenkool, linuxppc-dev, linux-kernel

Benjamin Herrenschmidt writes:
> 
> On Wed, 2010-10-20 at 13:33 -0500, pacman@kosh.dhis.org wrote:
> > > Just try :-) "quiesce" is something that afaik only apple ever
> > > implemented anyways. It uses hooks inside their OF to shut down all
> > > drivers that do bus master (among other HW sanitization tasks).
> > 
> > I booted a version with a prom_close_stdout after the last prom_debug. It
> > didn't have any effect. That 1000Hz clock was still ticking. 
> 
> Ok so you'll have to make up a "workaround" in prom_init that looks for
> OHCI's in the device-tree and disable them.

I'm a long way from understanding how to do that.

> 
> Check if the OHCI node has some existing f-code words you can use for
> that with "dev /path-to-ohci words" in OF for example. If not, you may

Nothing there but open close decode-unit encode-unit

> need to use the low level register accessors. Use OF client interface
> "interpret" to run forth code from C.

Here are the major problems:

1. How do I locate all usb nodes in the device tree?

2. How do I know if a particular usb node is OHCI?

3. Knowing that a node is OHCI, how do I know where its control registers
are? I'm sure this is calculated from the "reg" property but I don't see how.

4. Knowing where the control registers are, how do I access them? Do I need
to request a virt-to-phys mapping or can I assume that it's already mapped,
or that the "rl!" command will do the right thing with a physical address?

5. Which control register should I use to tell the OHCI to be quiet? Just do
a general reset, or is there something that specifically turns off the
counter that's been causing the trouble?

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)
  2010-10-20 20:56                               ` Benjamin Herrenschmidt
  2010-10-22  9:15                                 ` pacman
@ 2010-10-27  8:57                                 ` pacman
  2010-10-27 10:13                                   ` Olaf Hering
  2010-10-27 13:27                                   ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) Benjamin Herrenschmidt
  1 sibling, 2 replies; 40+ messages in thread
From: pacman @ 2010-10-27  8:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel, matt

Benjamin Herrenschmidt writes:
> 
> Ok so you'll have to make up a "workaround" in prom_init that looks for
> OHCI's in the device-tree and disable them.
> 
> Check if the OHCI node has some existing f-code words you can use for
> that with "dev /path-to-ohci words" in OF for example. If not, you may
> need to use the low level register accessors. Use OF client interface
> "interpret" to run forth code from C.

I responded with a long list of reasons that I'm not qualified to do that
work myself:
|Here are the major problems:
|
|1. How do I locate all usb nodes in the device tree?
|
|2. How do I know if a particular usb node is OHCI?
|
|3. Knowing that a node is OHCI, how do I know where its control registers
|are? I'm sure this is calculated from the "reg" property but I don't see how.
|
|4. Knowing where the control registers are, how do I access them? Do I need
|to request a virt-to-phys mapping or can I assume that it's already mapped,
|or that the "rl!" command will do the right thing with a physical address?
|
|5. Which control register should I use to tell the OHCI to be quiet? Just do
|a general reset, or is there something that specifically turns off the
|counter that's been causing the trouble?

Since then, the silence has been deafening.

My assumption now is that this is not ever getting fixed. I'm certainly not
able to fix it. I'm not a even kernel programmer! I got far enough to
diagnose the cause just with the "add more printk's and boot it again"
technique. Hundreds of reboots trying to figure it out. I was a conscientious
bug-reporter, I thought.

I could pull the PCI card and be done with it. I never used those USB ports
anyway. But after all the suffering I went through to find this bug... the
crashing e2fsck's and consequent filesystem corruption... I hate the idea of
surrendering to it. There are possibly other affected users who I'd be
abandoning to suffer similarly in the future.

For the last week I've studied OpenFirmware as hard as I can. I read the spec
cover to cover. And the USB annex, and the PCI annex. But I'm still lost in
all the different address formats.

I took my best guess on how to handle this problem, and ran with it, ending
up with a 97-line Forth script, and that was just to get a virtual address,
not to actually do anything with it, and it used a hardcoded device path. But
it didn't work, all I got was an "invalid pointer" error. I made another
guess at something that wasn't documented anywhere (the fact that this stuff
is insufficiently documented is the one thing I can state with complete
confidence!) and out came a successful translation to a virtual address: 0.

If I'm the only one fighting this bug, the bug wins.

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)
  2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
@ 2010-10-27 10:13                                   ` Olaf Hering
  2010-10-27 21:04                                     ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, pacman
  2010-10-27 13:27                                   ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) Benjamin Herrenschmidt
  1 sibling, 1 reply; 40+ messages in thread
From: Olaf Hering @ 2010-10-27 10:13 UTC (permalink / raw)
  To: pacman; +Cc: Benjamin Herrenschmidt, linuxppc-dev, linux-kernel

On Wed, Oct 27, pacman@kosh.dhis.org wrote:

> |1. How do I locate all usb nodes in the device tree?
> |
> |2. How do I know if a particular usb node is OHCI?

In the installed system, run 'lspci | grep -i usb', this gives the pci
bus numbers.  Then run 'find /sys -name devspec', and look or the bus
numbers from the lspci output.  Each devspec file contains the firmware
path.  The ohci node may have subdirectories. Run 'words' in each of
them at the firmware prompt. Perhaps there is one to shutdown the
controller?

I just noticed older firmware did not have a node for ohci, newer ones
my have a /pci@80000000/usb@5 node.

Good luck.

Olaf

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)
  2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
  2010-10-27 10:13                                   ` Olaf Hering
@ 2010-10-27 13:27                                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-10-27 13:27 UTC (permalink / raw)
  To: pacman; +Cc: linuxppc-dev, linux-kernel, matt


> Since then, the silence has been deafening.
> 
> My assumption now is that this is not ever getting fixed. I'm certainly not
> able to fix it. I'm not a even kernel programmer! I got far enough to
> diagnose the cause just with the "add more printk's and boot it again"
> technique. Hundreds of reboots trying to figure it out. I was a conscientious
> bug-reporter, I thought.

I'm happy to help you fix it but I'm travelling at the moment and won't
have much time for a couple of weeks.

Cheers,
Ben.

> I could pull the PCI card and be done with it. I never used those USB ports
> anyway. But after all the suffering I went through to find this bug... the
> crashing e2fsck's and consequent filesystem corruption... I hate the idea of
> surrendering to it. There are possibly other affected users who I'd be
> abandoning to suffer similarly in the future.
> 
> For the last week I've studied OpenFirmware as hard as I can. I read the spec
> cover to cover. And the USB annex, and the PCI annex. But I'm still lost in
> all the different address formats.
> 
> I took my best guess on how to handle this problem, and ran with it, ending
> up with a 97-line Forth script, and that was just to get a virtual address,
> not to actually do anything with it, and it used a hardcoded device path. But
> it didn't work, all I got was an "invalid pointer" error. I made another
> guess at something that wasn't documented anywhere (the fact that this stuff
> is insufficiently documented is the one thing I can state with complete
> confidence!) and out came a successful translation to a virtual address: 0.
> 
> If I'm the only one fighting this bug, the bug wins.
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-27 10:13                                   ` Olaf Hering
@ 2010-10-27 21:04                                     ` pacman
  2010-10-27 22:05                                       ` Segher Boessenkool
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-27 21:04 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev, linux-kernel

Olaf Hering writes:
> 
> On Wed, Oct 27, pacman@kosh.dhis.org wrote:
> 
> > |1. How do I locate all usb nodes in the device tree?
> > |
> > |2. How do I know if a particular usb node is OHCI?
> 
> In the installed system, run 'lspci | grep -i usb', this gives the pci
> bus numbers.  Then run 'find /sys -name devspec', and look or the bus

Once the system is running, I have no problem figuring it out. What I meant
was how do I write some code to identify OHCI devices correctly, from within
the limited environment of the Forth interpreter, which will work in the
general case.

I already know that /pci@80000000/usb@5 and /pci@80000000/usb@5,1 are the
problem nodes on my machine. And I've learned enough about OF to do a full
recursive device tree search to find the USB nodes, so the first question is
answered.

But the UHCI and OHCI nodes look very much alike in the OF properties. "name"
is just "usb" and there's no "compatible".

The big question that I'm still stumbling over is how to access the device
registers. The "reg" property looks like this:
             phys                 size
 -------------------------- -----------------
 00002800 00000000 00000000 00000000 00000000
 02002810 00000000 00000000 00000000 00001000
so I take the second group of 5 words, which should be the device registers,
and try to map it to a virtual address. The members are unpacked on the stack
like this:
  00000000 00000000 02002810 00000000 00001000
which looks like this stack diagram from OF spec:
  map-in ( phys.lo ... phys.hi size -- virt )
and the method call goes like this:
  " map-in" $call-parent
The result: "invalid pointer". But I notice it only popped 4 items. I think
maybe the "size" for map-in is not the same as the "size" found in the reg
property. Maybe #size-cells applies in one place but not the other. Thanks
for not documenting that! Try again:
  00000000 00000000 02002810 00001000 " map-in" $call-parent
This one doesn't complain, but leaves me a 0 on the stack as its answer. The
OHCI registers have been mapped to virtual address 0? Doesn't seem likely.

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-27 21:04                                     ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, pacman
@ 2010-10-27 22:05                                       ` Segher Boessenkool
  2010-10-27 22:58                                         ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2010-10-27 22:05 UTC (permalink / raw)
  To: pacman; +Cc: Olaf Hering, linuxppc-dev, linux-kernel

>> > |1. How do I locate all usb nodes in the device tree?
>> > |
>> > |2. How do I know if a particular usb node is OHCI?

You look for compatible "usb-ohci".

But this doesn't help you.  You do not know yet if the
problem happens for all usb-ohci; for example, it could be
that you have the console output device on usb; or as another
example, it could be that this firmware leaves all pci devices
in some active state.

So as I see it you have only two options:

1) Figure out what exactly is going on;
or 2) make the kernel shut down all pci devices early (either
in actual kernel code, or in an OF boot script).

> The big question that I'm still stumbling over is how to access the device
> registers. The "reg" property looks like this:

You should look at "assigned-addresses", not "reg".  Well,
you first need to look at "reg" to figure out what entry
in "assigned-addresses" to use.


Segher


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-27 22:05                                       ` Segher Boessenkool
@ 2010-10-27 22:58                                         ` pacman
  2010-10-27 23:33                                           ` Segher Boessenkool
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-27 22:58 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Olaf Hering, linuxppc-dev, linux-kernel

Segher Boessenkool writes:
> 
> >> > |1. How do I locate all usb nodes in the device tree?
> >> > |
> >> > |2. How do I know if a particular usb node is OHCI?
> 
> You look for compatible "usb-ohci".

There is no "compatible" there. I can probably use class-code since the
parent is a PCI bus.

> 
> But this doesn't help you.  You do not know yet if the
> problem happens for all usb-ohci; for example, it could be
> that you have the console output device on usb; or as another
> example, it could be that this firmware leaves all pci devices
> in some active state.
> 
> So as I see it you have only two options:
> 
> 1) Figure out what exactly is going on;

I thought we were past that. The startup sequence leaves the device in a bad
state (writing 1000 times per second to memory that the kernel believes is
not in use), so it needs to be given a reset command before the kernel tries
to use that memory.

> > The big question that I'm still stumbling over is how to access the device
> > registers. The "reg" property looks like this:
> 
> You should look at "assigned-addresses", not "reg".  Well,
> you first need to look at "reg" to figure out what entry
> in "assigned-addresses" to use.

The properties look like this:

/pci@80000000/usb@5/assigned-addresses
 02002810 00000000 80000000 00000000 00001000
/pci@80000000/usb@5/reg
 00002800 00000000 00000000 00000000 00000000
 02002810 00000000 00000000 00000000 00001000

I'm not sure how I'm supposed to know which entry from "reg" is the right
one. I've been guessing that it's the second one, since that one matches the
only entry in "assigned-addresses". It's supposed to go the other direction?

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-27 22:58                                         ` pacman
@ 2010-10-27 23:33                                           ` Segher Boessenkool
  2010-10-28  1:11                                             ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2010-10-27 23:33 UTC (permalink / raw)
  To: pacman; +Cc: Segher Boessenkool, Olaf Hering, linuxppc-dev, linux-kernel

>> 1) Figure out what exactly is going on;
>
> I thought we were past that.

We are not.

> The startup sequence leaves the device in a
> bad
> state (writing 1000 times per second to memory that the kernel believes is
> not in use), so it needs to be given a reset command before the kernel
> tries
> to use that memory.

The question now is what causes the firmware to do that, and then
what is the best way to stop it from doing that.

>> > The big question that I'm still stumbling over is how to access the
>> device
>> > registers. The "reg" property looks like this:
>>
>> You should look at "assigned-addresses", not "reg".  Well,
>> you first need to look at "reg" to figure out what entry
>> in "assigned-addresses" to use.

Ignore this part, I was confused.

> The properties look like this:
>
> /pci@80000000/usb@5/assigned-addresses
>  02002810 00000000 80000000 00000000 00001000

Lovely, incorrect data (it should start with 82002810, i.e.,
not relocatable -- it is already an assigned address!).

This means: 32-bit MMIO address space for bus 0 dev 5 fn 0,
first BAR; assigned to address 80000000; size is 1000.

You could try a boot script like this:


dev /pci
0 ffff04 DO 0 i config-w! -100 +LOOP
device-end


which should disable all PCI devices on all busses, on that
PCI host bus (it disables every device behind pci-pci bridges
separately, as long as every such bridge has a higher secondary
bus number than primary bus number; if you only want to disable
everything on the root bus (which should be sufficient), use
ff04 instead of ffff04).


Segher


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-27 23:33                                           ` Segher Boessenkool
@ 2010-10-28  1:11                                             ` pacman
  2010-10-28 19:50                                               ` Segher Boessenkool
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-28  1:11 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, linux-kernel

Segher Boessenkool writes:
> 
> >> 1) Figure out what exactly is going on;
> >
> > I thought we were past that.
> 
> We are not.
> 
> > The startup sequence leaves the device in a
> > bad
> > state (writing 1000 times per second to memory that the kernel believes is
> > not in use), so it needs to be given a reset command before the kernel
> > tries
> > to use that memory.
> 
> The question now is what causes the firmware to do that, and then
> what is the best way to stop it from doing that.

As far as I can tell, it turns on the host controller during the global
probe, which is not wrong because USB devices could theoretically be used for
booting, or for console display. Then it never turns off the host controller
because someone forgot to put in the code to turn it off.

It's not easy to figure out exactly where that should have been done. Turning
off the host controller too soon would rule out booting from USB, but leaving
it running while the OS is starting up has caused a major problem.

So is it wrong to leave the host controller enabled when the OS is booted? If
not, then the error must be in the communication of which memory addresses
are in use by OF. I've got a node /memory@0 whose "available" property looks
like this:
 00000000 00400000
 00584000 0007c000
 0092a1d8 00004e28
 00a2f000 005d1000
 01800000 0e3fd000
 0fbffab4 0000054c
>From that list, it looks to me like OF is telling the kernel that it should
not attempt to use any address above 0xfbffab4+0x54c == 0xfc00000. The
addresses being written to by the OHCI controller are 0xfc5c080 and
0xfc61080. If the kernel is staying within the "available" list, there won't
be a problem.

Later, when the kernel decides it's done using OF, what's supposed to happen?
It closes stdin, but that doesn't help here since the offending device is a
bus node, not an input node. It looks to me like the kernel makes the
assumption that all devices other than stdin and stdout will have been
deactivated already when the kernel starts, and that this assumption has
been violated. Who is wrong, from the perspective of the OF standard, the
assumer or the violator?

Then there's the "quiesce" call, which I don't understand at all since it's
not mentioned in any of the specification documents I've been able to find.
It's been mentioned as an Apple-only thing. Seems like it would be a good
name for a "make all the devices stop puking on the RAM" function. Since the
OF spec doesn't include this function, they must not have thought it was
necessary.

> > /pci@80000000/usb@5/assigned-addresses
> >  02002810 00000000 80000000 00000000 00001000
> 
> Lovely, incorrect data (it should start with 82002810, i.e.,
> not relocatable -- it is already an assigned address!).

Now you see how I have trouble relating the docs to the reality...

> 
> This means: 32-bit MMIO address space for bus 0 dev 5 fn 0,
> first BAR; assigned to address 80000000; size is 1000.

But "address 80000000" is a physical address (I think), so do I need to do a
map-in on it before using it?

> 
> You could try a boot script like this:
> 
> 
> dev /pci
> 0 ffff04 DO 0 i config-w! -100 +LOOP
> device-end
> 
> 
> which should disable all PCI devices on all busses, on that

Almost all of my devices are under that PCI node. What will I prove by
disabling them?

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-28  1:11                                             ` pacman
@ 2010-10-28 19:50                                               ` Segher Boessenkool
  2010-10-28 21:07                                                 ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2010-10-28 19:50 UTC (permalink / raw)
  To: pacman; +Cc: Segher Boessenkool, linuxppc-dev, linux-kernel

> So is it wrong to leave the host controller enabled when the OS is booted?

Yes.  Or, rather, there should be some way for the client to turn off
all dma and interrupt activity; if the client closes the ihandles in
"/chosen", and perhaps calls "quiesce", that should be enough.



> If
> not, then the error must be in the communication of which memory addresses
> are in use by OF. I've got a node /memory@0 whose "available" property
> looks
> like this:
>  00000000 00400000
>  00584000 0007c000
>  0092a1d8 00004e28
>  00a2f000 005d1000
>  01800000 0e3fd000
>  0fbffab4 0000054c
>>From that list, it looks to me like OF is telling the kernel that it
>> should
> not attempt to use any address above 0xfbffab4+0x54c == 0xfc00000.

The client is allowed to "take over" all memory, if it doesn't call OF
after doing so.  This won't work if some device scribbles on it, as
you have seen.

> Later, when the kernel decides it's done using OF, what's supposed to
> happen?
> It closes stdin, but that doesn't help here since the offending device is
> a
> bus node, not an input node. It looks to me like the kernel makes the
> assumption that all devices other than stdin and stdout will have been
> deactivated already when the kernel starts, and that this assumption has
> been violated. Who is wrong, from the perspective of the OF standard, the
> assumer or the violator?

The violator.

>> Lovely, incorrect data (it should start with 82002810, i.e.,
>> not relocatable -- it is already an assigned address!).
>
> Now you see how I have trouble relating the docs to the reality...

Yeah :-(

>> This means: 32-bit MMIO address space for bus 0 dev 5 fn 0,
>> first BAR; assigned to address 80000000; size is 1000.
>
> But "address 80000000" is a physical address (I think), so do I need to do
> a
> map-in on it before using it?

Yes.

>> You could try a boot script like this:
>>
>>
>> dev /pci
>> 0 ffff04 DO 0 i config-w! -100 +LOOP
>> device-end
>>
>>
>> which should disable all PCI devices on all busses, on that
>
> Almost all of my devices are under that PCI node. What will I prove by
> disabling them?

You should put it after "load", and before "go".

It should give you a working system; it's a sledgehammer workaround.


Segher


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-28 19:50                                               ` Segher Boessenkool
@ 2010-10-28 21:07                                                 ` pacman
  2010-10-29  0:16                                                   ` Segher Boessenkool
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-10-28 21:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, linux-kernel

Segher Boessenkool writes:
> 
> > So is it wrong to leave the host controller enabled when the OS is booted?
> 
> Yes.  Or, rather, there should be some way for the client to turn off
> all dma and interrupt activity; if the client closes the ihandles in
> "/chosen", and perhaps calls "quiesce", that should be enough.

Sounds good to me, I only wish someone had written down what "quiesce" means.

> >
> > Almost all of my devices are under that PCI node. What will I prove by
> > disabling them?
> 
> You should put it after "load", and before "go".
> 
> It should give you a working system; it's a sledgehammer workaround.

I can do it a little more gracefully than that. This works to deactivate the
problem devices manually:

  1 lbflip 80000000 8 + rl!
  1 lbflip 80001000 8 + rl!

where 80000000 and 80001000 have been obtained from
/pci@80000000/usb@5/assigned-addresses and
/pci@80000000/usb@5,1/assigned-addresses; 8 is the offset of the
HcCommandStatus register; and the 1 bit is HostControllerReset (HCR).

Now I'm just trying to find the more correct way of doing it, without
hardcoded addresses. That'll be something like this:

  search the device tree for OHCI nodes
  for each OHCI node
    get assigned-addresses
    map-in
    set HCR
    wait for acknowledgement
    map-out

which can be done any time before the quiesce call, since that marks the
point where the kernel assumes that there are no devices writing to memory.
Sound good?

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-28 21:07                                                 ` pacman
@ 2010-10-29  0:16                                                   ` Segher Boessenkool
  2010-11-05  6:43                                                     ` pacman
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2010-10-29  0:16 UTC (permalink / raw)
  To: pacman; +Cc: Segher Boessenkool, linuxppc-dev, linux-kernel

>> > Almost all of my devices are under that PCI node. What will I prove by
>> > disabling them?
>>
>> You should put it after "load", and before "go".
>>
>> It should give you a working system; it's a sledgehammer workaround.
>
> I can do it a little more gracefully than that. This works to deactivate
> the
> problem devices manually:
>
>   1 lbflip 80000000 8 + rl!
>   1 lbflip 80001000 8 + rl!
>
> where 80000000 and 80001000 have been obtained from
> /pci@80000000/usb@5/assigned-addresses and
> /pci@80000000/usb@5,1/assigned-addresses; 8 is the offset of the
> HcCommandStatus register; and the 1 bit is HostControllerReset (HCR).
>
> Now I'm just trying to find the more correct way of doing it, without
> hardcoded addresses. That'll be something like this:
>
>   search the device tree for OHCI nodes
>   for each OHCI node
>     get assigned-addresses
>     map-in
>     set HCR
>     wait for acknowledgement
>     map-out

As you noted, your firmware does not show which usb host controllers
are OHCI and which are not.  It has a lot of other problems as well.
Also, it's a lot of code to do things this way.  Which is why I suggested
the "heavy handed" workaround: it is simple and should work on even the
most broken OF implementations.

To figure out which host controllers are OHCI, you'll need to look
at the PCI class code (0c0310 for OHCI), since your OF doesn't want
to tell you.

> Sound good?

Sounds like it should work, yes.


Segher


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-10-29  0:16                                                   ` Segher Boessenkool
@ 2010-11-05  6:43                                                     ` pacman
  2010-11-29  5:44                                                       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 40+ messages in thread
From: pacman @ 2010-11-05  6:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Segher Boessenkool, linux-kernel

Segher Boessenkool writes:
> 
> > Now I'm just trying to find the more correct way of doing it, without
> > hardcoded addresses. That'll be something like this:
> >
> >   search the device tree for OHCI nodes
> >   for each OHCI node
> >     get assigned-addresses
> >     map-in
> >     set HCR
> >     wait for acknowledgement
> >     map-out
 
> Sounds like it should work, yes.
> 

I have a mostly-finished patch to do the above. I'll include it below, but
first a few words about why it's only mostly finished.

The other Pegasos workarounds are in fixup_device_tree_chrp, and I don't see
anything like an "if(machine_is_pegasos)" around them. What keeps them from
being erroneously run on other CHRP-type machines? I made this patch mainly
by copying pieces of other functions from prom_init.c, but couldn't find the
"test for Pegasos before running a Pegasos workaround" piece.

Another issue is, since the firmware doesn't give me a "compatible" property
with the details of the controller, I just have to assume that it's
little-endian. I'm not sure if that's clean, since the real ohci driver
supports both endiannesses, with at least 3 different Kconfig options(!) to
choose between them.

Then there's the volatile which I guess is supposed to be replaced by
something else, but I don't know what the something else is. I believe this
usage is extremely close to what volatile was meant for.

Finally, when I updated to a more recent upstream kernel to test the patch, I
found that an intervening commit (3df7169e73fc1d71a39cffeacc969f6840cdf52b,
OHCI: work around for nVidia shutdown problem) has had a major effect,
on the appearance of my bug.

Before that change, the window in which the bug could strike was from the end
of prom_init (when the kernel believes that devices are quiescent) to the
initialization of the ohci-hcd driver (which actually quietens the device, or
at least directs its scribbling to a properly allocated page). After the
change, the window ends at some point early in the PCI bus setup. That's a
window so small that with a new kernel, I can't provoke a symptom even if I
try.

Mostly-finished patch:

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 941ff4d..a14f21b 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2237,6 +2237,81 @@ static void __init fixup_device_tree_chrp(void)
 		}
 	}
 }
+
+/*
+ * Pegasos firmware doesn't quiesce OHCI controllers, so do it manually
+ */
+static void __init pegasos_quiesce(void)
+{
+	phandle node, parent_node;
+	ihandle parent_ih;
+	int rc;
+	char type[16], *path;
+	u32 prop[5], map_size;
+	prom_arg_t ohci_virt;
+
+	for (node = 0; prom_next_node(&node); ) {
+		memset(type, 0, sizeof(type));
+		prom_getprop(node, "device_type", type, sizeof(type));
+		if (strcmp(type, RELOC("usb")) != 0)
+			continue;
+
+		/* Parent should be a PCI bus (so class-code makes sense).
+		   class-code should be 0x0C0310 */
+		parent_node = call_prom("parent", 1, 1, node);
+		if (!parent_node)
+			continue;
+		rc = prom_getprop(node, "class-code", prop, sizeof(u32));
+		if (rc != sizeof(u32) || prop[0] != 0x0c0310)
+			continue;
+
+		rc = prom_getprop(node, "assigned-addresses",
+				  prop, 5*sizeof(u32));
+		if (rc != 5*sizeof(u32))
+			continue;
+
+		/* Open the parent and call map-in */
+
+		/* It seems OF doesn't null-terminate the path :-( */
+		path = RELOC(prom_scratch);
+		memset(path, 0, PROM_SCRATCH_SIZE);
+
+		if (call_prom("package-to-path", 3, 1, parent_node,
+			      path, PROM_SCRATCH_SIZE-1) == PROM_ERROR)
+			continue;
+		parent_ih = call_prom("open", 1, 1, path);
+
+		/* Get the OHCI node's pathname, for printing later */
+		memset(path, 0, PROM_SCRATCH_SIZE);
+		call_prom("package-to-path", 3, 1, node,
+			  path, PROM_SCRATCH_SIZE-1);
+
+		map_size = prop[4];
+		if (call_prom_ret("call-method", 6, 2, &ohci_virt,
+				  ADDR("map-in"), parent_ih,
+				  map_size, prop[0], prop[1], prop[2]) == 0) {
+			prom_printf("resetting OHCI device %s...", path);
+
+			/* Set HostControllerReset (==1) in HcCommandStatus,
+			 * located at offset 8 in the register area. The <<24
+			 * is because the CPU is big-endian and the device is
+			 * little-endian. */
+			*(volatile u32 *)(ohci_virt + 8) |= (1<<24);
+
+			/* controller should acknowledge by zeroing the bit
+			 * within 10us. waiting 1ms should be plenty. */
+			call_prom("interpret", 1, 1, "1 ms");
+			if (*(volatile u32 *)(ohci_virt + 8) & (1<<24))
+				prom_printf("failed\n");
+			else
+				prom_printf("done\n");
+
+			call_prom("call-method", 4, 1, ADDR("map-out"),
+				  parent_ih, map_size, ohci_virt);
+		}
+		call_prom("close", 1, 0, parent_ih);
+	}
+}
 #else
 #define fixup_device_tree_chrp()
 #endif
@@ -2642,6 +2717,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
 	 * devices etc...
 	 */
 	prom_printf("Calling quiesce...\n");
+	pegasos_quiesce();
 	call_prom("quiesce", 0, 0);
 
 	/*

-- 
Alan Curry

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,
  2010-11-05  6:43                                                     ` pacman
@ 2010-11-29  5:44                                                       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 40+ messages in thread
From: Benjamin Herrenschmidt @ 2010-11-29  5:44 UTC (permalink / raw)
  To: pacman; +Cc: linuxppc-dev, linux-kernel


> I have a mostly-finished patch to do the above. I'll include it below, but
> first a few words about why it's only mostly finished.
> 
> The other Pegasos workarounds are in fixup_device_tree_chrp, and I don't see
> anything like an "if(machine_is_pegasos)" around them. What keeps them from
> being erroneously run on other CHRP-type machines? I made this patch mainly
> by copying pieces of other functions from prom_init.c, but couldn't find the
> "test for Pegasos before running a Pegasos workaround" piece.

Probably bcs the condition they test for really only happens on
pegasos ? :-)

I agree it's a bit gross tho.

The "ranges" property fixup is pretty harmless in any case. The other
fixup might be worth moving to a separate pegasos-only function in which
you would test for a pegasos properly and add your own stuff.

> Another issue is, since the firmware doesn't give me a "compatible" property
> with the details of the controller, I just have to assume that it's
> little-endian. I'm not sure if that's clean, since the real ohci driver
> supports both endiannesses, with at least 3 different Kconfig options(!) to
> choose between them.

If it's PCI it's LE or somebody needs to be shot :-)

> Then there's the volatile which I guess is supposed to be replaced by
> something else, but I don't know what the something else is. I believe this
> usage is extremely close to what volatile was meant for.

Yeah, it's fine, just add something like that on the next line:

 asm volatile("eieio" : : : "memory");

> Finally, when I updated to a more recent upstream kernel to test the patch, I
> found that an intervening commit (3df7169e73fc1d71a39cffeacc969f6840cdf52b,
> OHCI: work around for nVidia shutdown problem) has had a major effect,
> on the appearance of my bug.
> 
> Before that change, the window in which the bug could strike was from the end
> of prom_init (when the kernel believes that devices are quiescent) to the
> initialization of the ohci-hcd driver (which actually quietens the device, or
> at least directs its scribbling to a properly allocated page). After the
> change, the window ends at some point early in the PCI bus setup. That's a
> window so small that with a new kernel, I can't provoke a symptom even if I
> try.

Right but it's very fishy, ie, it may still be DMA'ing and god knows
where ... you may or may not get lucky. I'd rather you do a proper
fixup :-)

Cheers,
Ben.

> Mostly-finished patch:
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 941ff4d..a14f21b 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -2237,6 +2237,81 @@ static void __init fixup_device_tree_chrp(void)
>  		}
>  	}
>  }
> +
> +/*
> + * Pegasos firmware doesn't quiesce OHCI controllers, so do it manually
> + */
> +static void __init pegasos_quiesce(void)
> +{
> +	phandle node, parent_node;
> +	ihandle parent_ih;
> +	int rc;
> +	char type[16], *path;
> +	u32 prop[5], map_size;
> +	prom_arg_t ohci_virt;
> +
> +	for (node = 0; prom_next_node(&node); ) {
> +		memset(type, 0, sizeof(type));
> +		prom_getprop(node, "device_type", type, sizeof(type));
> +		if (strcmp(type, RELOC("usb")) != 0)
> +			continue;
> +
> +		/* Parent should be a PCI bus (so class-code makes sense).
> +		   class-code should be 0x0C0310 */
> +		parent_node = call_prom("parent", 1, 1, node);
> +		if (!parent_node)
> +			continue;
> +		rc = prom_getprop(node, "class-code", prop, sizeof(u32));
> +		if (rc != sizeof(u32) || prop[0] != 0x0c0310)
> +			continue;
> +
> +		rc = prom_getprop(node, "assigned-addresses",
> +				  prop, 5*sizeof(u32));
> +		if (rc != 5*sizeof(u32))
> +			continue;
> +
> +		/* Open the parent and call map-in */
> +
> +		/* It seems OF doesn't null-terminate the path :-( */
> +		path = RELOC(prom_scratch);
> +		memset(path, 0, PROM_SCRATCH_SIZE);
> +
> +		if (call_prom("package-to-path", 3, 1, parent_node,
> +			      path, PROM_SCRATCH_SIZE-1) == PROM_ERROR)
> +			continue;
> +		parent_ih = call_prom("open", 1, 1, path);
> +
> +		/* Get the OHCI node's pathname, for printing later */
> +		memset(path, 0, PROM_SCRATCH_SIZE);
> +		call_prom("package-to-path", 3, 1, node,
> +			  path, PROM_SCRATCH_SIZE-1);
> +
> +		map_size = prop[4];
> +		if (call_prom_ret("call-method", 6, 2, &ohci_virt,
> +				  ADDR("map-in"), parent_ih,
> +				  map_size, prop[0], prop[1], prop[2]) == 0) {
> +			prom_printf("resetting OHCI device %s...", path);
> +
> +			/* Set HostControllerReset (==1) in HcCommandStatus,
> +			 * located at offset 8 in the register area. The <<24
> +			 * is because the CPU is big-endian and the device is
> +			 * little-endian. */
> +			*(volatile u32 *)(ohci_virt + 8) |= (1<<24);
> +
> +			/* controller should acknowledge by zeroing the bit
> +			 * within 10us. waiting 1ms should be plenty. */
> +			call_prom("interpret", 1, 1, "1 ms");
> +			if (*(volatile u32 *)(ohci_virt + 8) & (1<<24))
> +				prom_printf("failed\n");
> +			else
> +				prom_printf("done\n");
> +
> +			call_prom("call-method", 4, 1, ADDR("map-out"),
> +				  parent_ih, map_size, ohci_virt);
> +		}
> +		call_prom("close", 1, 0, parent_ih);
> +	}
> +}
>  #else
>  #define fixup_device_tree_chrp()
>  #endif
> @@ -2642,6 +2717,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
>  	 * devices etc...
>  	 */
>  	prom_printf("Calling quiesce...\n");
> +	pegasos_quiesce();
>  	call_prom("quiesce", 0, 0);
>  
>  	/*
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-11-29  5:45 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-09  9:57 PROBLEM: memory corrupting bug, bisected to 6dda9d55 pacman
2010-10-11 12:52 ` Christoph Lameter
2010-10-11 14:30 ` Mel Gorman
2010-10-11 20:35   ` pacman
2010-10-11 21:00   ` Andrew Morton
2010-10-13 14:40     ` Mel Gorman
2010-10-13 17:52       ` pacman
2010-10-18 11:33         ` Mel Gorman
2010-10-18 19:10           ` pacman
2010-10-18 21:10             ` Benjamin Herrenschmidt
2010-10-18 21:33               ` pacman
2010-10-19 10:16                 ` Benjamin Herrenschmidt
2010-10-19 18:10                   ` pacman
2010-10-19 20:47                     ` Segher Boessenkool
2010-10-19 21:02                       ` Benjamin Herrenschmidt
2010-10-20  3:23                         ` pacman
2010-10-20 10:32                           ` Benjamin Herrenschmidt
2010-10-20 18:33                             ` pacman
2010-10-20 20:56                               ` Benjamin Herrenschmidt
2010-10-22  9:15                                 ` pacman
2010-10-27  8:57                                 ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) pacman
2010-10-27 10:13                                   ` Olaf Hering
2010-10-27 21:04                                     ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, pacman
2010-10-27 22:05                                       ` Segher Boessenkool
2010-10-27 22:58                                         ` pacman
2010-10-27 23:33                                           ` Segher Boessenkool
2010-10-28  1:11                                             ` pacman
2010-10-28 19:50                                               ` Segher Boessenkool
2010-10-28 21:07                                                 ` pacman
2010-10-29  0:16                                                   ` Segher Boessenkool
2010-11-05  6:43                                                     ` pacman
2010-11-29  5:44                                                       ` Benjamin Herrenschmidt
2010-10-27 13:27                                   ` Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55) Benjamin Herrenschmidt
2010-10-19 20:58                     ` PROBLEM: memory corrupting bug, bisected to 6dda9d55 Benjamin Herrenschmidt
2010-10-18 19:37           ` Andrew Morton
2010-10-18 21:02             ` Benjamin Herrenschmidt
2010-10-18 21:55             ` Thomas Gleixner
2010-10-19 16:24               ` Helmut Grohne
2010-10-19 16:42                 ` Thomas Gleixner
2010-10-18 20:59       ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).