* 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-23 20:44 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-23 20:44 UTC (permalink / raw) To: xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 6290 bytes --] Hi, Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 or -rc2) I've hit a "dying machine" where processes writing to disk end up in D state. From occurrence with -rc3+ I don't have logs as those never hit the disk, for -rc4+ I have the following (sysrq+t was too big, what I have of it misses a dozen of kernel tasks - if needed, please ask): The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 (+ 1 patch of mine to stop disk on reboot), full dmesg available if needed; kernel config attached (only selected options). In case there is something I should do at next occurrence please tell. Unfortunately I have no trigger for it and it does not happen very often. Thanks, Bruno [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 [32040.120153] Call Trace: [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] [32040.120625] [<f88c560a>] xfs_iomap_write_allocate+0x27a/0x3a0 [xfs] [32040.120680] [<f88df192>] xfs_map_blocks+0x1d2/0x210 [xfs] [32040.120733] [<f88dfbde>] xfs_vm_writepage+0x2be/0x4e0 [xfs] [32040.120749] [<c107ab4b>] __writepage+0xb/0x30 [32040.120760] [<c107bb06>] write_cache_pages+0x156/0x350 [32040.120771] [<c107ab40>] ? set_page_dirty+0x60/0x60 [32040.120784] [<c107bd3b>] generic_writepages+0x3b/0x60 [32040.120836] [<f88def91>] xfs_vm_writepages+0x21/0x30 [xfs] [32040.120847] [<c107bd77>] do_writepages+0x17/0x30 [32040.120859] [<c10c1667>] writeback_single_inode+0x77/0x1d0 [32040.120869] [<c10c1c36>] writeback_sb_inodes+0x96/0x130 [32040.120880] [<c10c21ae>] writeback_inodes_wb+0x6e/0xf0 [32040.120890] [<c10c2442>] wb_writeback+0x212/0x260 [32040.120901] [<c10c2655>] wb_do_writeback+0x1c5/0x1d0 [32040.120912] [<c10c26d1>] bdi_writeback_thread+0x71/0x130 [32040.120923] [<c10c2660>] ? wb_do_writeback+0x1d0/0x1d0 [32040.120936] [<c10458c4>] kthread+0x74/0x80 [32040.120946] [<c1045850>] ? kthreadd+0xc0/0xc0 [32040.120959] [<c13332f6>] kernel_thread_helper+0x6/0xd [32040.120989] INFO: task kworker/0:2:6126 blocked for more than 120 seconds. [32040.120996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32040.121003] kworker/0:2 D e05a3dfc 6400 6126 2 0x00000000 [32040.121024] e05a3e6c 00000046 00002925 e05a3dfc c104b7f1 00000296 1b84e819 e05a3e04 [32040.121042] e05a3e1c e05a3e6c 00000004 f4502ec0 8c876f48 00001cfb f4502ec0 2d2b9b93 [32040.121059] 0000189c dd283f80 00000000 0016e360 e05a3e50 c10086c7 e05a3ee0 0009201a [32040.121077] Call Trace: [32040.121089] [<c104b7f1>] ? sched_clock_tick+0x51/0x80 [32040.121105] [<c10086c7>] ? time_cpufreq_notifier+0x57/0x120 [32040.121117] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 [32040.121169] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] [32040.121220] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] [32040.121231] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 [32040.121283] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] [32040.121335] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] [32040.121402] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] [32040.121455] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] [32040.121506] [<f88bafe1>] xfs_fs_log_dummy+0x41/0x90 [xfs] [32040.121559] [<f88e9f12>] xfs_sync_worker+0x62/0x70 [xfs] [32040.121570] [<c104075c>] process_one_work+0xfc/0x320 [32040.121621] [<f88e9eb0>] ? xfs_syncd_init+0xb0/0xb0 [xfs] [32040.121633] [<c1042206>] worker_thread+0x106/0x330 [32040.121643] [<c1042100>] ? manage_workers+0x340/0x340 [32040.121654] [<c10458c4>] kthread+0x74/0x80 [32040.121664] [<c1045850>] ? kthreadd+0xc0/0xc0 [32040.121675] [<c13332f6>] kernel_thread_helper+0x6/0xd [32040.121685] INFO: task tar:9048 blocked for more than 120 seconds. [32040.121692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32040.121699] tar D e4907dac 5540 9048 9039 0x00000000 [32040.121720] e4907db4 00200082 00000000 e4907dac f88c88ca f4a58d24 00000002 00000000 [32040.121737] e4907d64 e4907db4 00000004 e0ee2310 e349aa9d 00001cf9 e0ee2310 e4907d94 [32040.121755] 008d71a4 00000000 00001fbd 00947400 f606d880 0000402e 00000000 e4907db4 [32040.121772] Call Trace: [32040.121823] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] [32040.121876] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] [32040.121929] [<f88c94f2>] xlog_grant_log_space+0xf2/0x240 [xfs] [32040.121941] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 [32040.121992] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] [32040.122047] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] [32040.122100] [<f88dae3d>] xfs_create+0x14d/0x520 [xfs] [32040.122152] [<f88e69da>] xfs_vn_mknod+0x9a/0x180 [xfs] [32040.122204] [<f88e6ad8>] xfs_vn_mkdir+0x18/0x20 [xfs] [32040.122216] [<c10adf5e>] vfs_mkdir+0x6e/0xa0 [32040.122266] [<f88e6ac0>] ? xfs_vn_mknod+0x180/0x180 [xfs] [32040.122278] [<c10b0928>] sys_mkdirat+0xc8/0xe0 [32040.122292] [<c108e46f>] ? remove_vma+0x3f/0x60 [32040.122303] [<c108f23c>] ? do_munmap+0x20c/0x2f0 [32040.122314] [<c10b0960>] sys_mkdir+0x20/0x30 [32040.122325] [<c1332dd0>] sysenter_do_call+0x12/0x26 [-- Attachment #2: config --] [-- Type: application/octet-stream, Size: 18269 bytes --] CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_HAVE_IRQ_WORK=y CONFIG_IRQ_WORK=y CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_KERNEL_GZIP=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_HAVE_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS=y CONFIG_HAVE_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_TINY_RCU=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_NAMESPACES=y CONFIG_SCHED_AUTOGROUP=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_PCI_QUIRKS=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y CONFIG_HAVE_OPROFILE=y CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_OPTPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y CONFIG_HAVE_DMA_API_DEBUG=y CONFIG_HAVE_HW_BREAKPOINT=y CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y CONFIG_HAVE_USER_RETURN_NOTIFIER=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_HAVE_GENERIC_DMA_COHERENT=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_BLOCK=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_CFQ=y CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_INLINE_SPIN_UNLOCK=y CONFIG_INLINE_SPIN_UNLOCK_IRQ=y CONFIG_INLINE_READ_UNLOCK=y CONFIG_INLINE_READ_UNLOCK_IRQ=y CONFIG_INLINE_WRITE_UNLOCK=y CONFIG_INLINE_WRITE_UNLOCK_IRQ=y CONFIG_FREEZER=y CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_NO_BOOTMEM=y CONFIG_MPENTIUMM=y CONFIG_X86_INTERNODE_CACHE_SHIFT=6 CONFIG_X86_CMPXCHG=y CONFIG_CMPXCHG_LOCAL=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=5 CONFIG_X86_DEBUGCTLMSR=y CONFIG_CPU_SUP_INTEL=y CONFIG_CPU_SUP_CYRIX_32=y CONFIG_CPU_SUP_AMD=y CONFIG_CPU_SUP_CENTAUR=y CONFIG_CPU_SUP_TRANSMETA_32=y CONFIG_CPU_SUP_UMC_32=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y CONFIG_NR_CPUS=1 CONFIG_PREEMPT_VOLUNTARY=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_THRESHOLD=y CONFIG_X86_THERMAL_VECTOR=y CONFIG_VM86=y CONFIG_MICROCODE=m CONFIG_MICROCODE_INTEL=y CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m CONFIG_HIGHMEM4G=y CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_HIGHMEM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ILLEGAL_POINTER_VALUE=0 CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_HAVE_MEMBLOCK=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_COMPACTION=y CONFIG_MIGRATION=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_DEFAULT_MMAP_MIN_ADDR=4096 CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y CONFIG_NEED_PER_CPU_KM=y CONFIG_HIGHPTE=y CONFIG_X86_CHECK_BIOS_CORRUPTION=y CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y CONFIG_X86_RESERVE_LOW=64 CONFIG_MTRR=y CONFIG_X86_PAT=y CONFIG_ARCH_USES_PG_UNCACHED=y CONFIG_HZ_100=y CONFIG_HZ=100 CONFIG_SCHED_HRTICK=y CONFIG_KEXEC=y CONFIG_PHYSICAL_START=0x1000000 CONFIG_PHYSICAL_ALIGN=0x100000 CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="" CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_SUSPEND=y CONFIG_SUSPEND_FREEZER=y CONFIG_PM_SLEEP=y CONFIG_PM_RUNTIME=y CONFIG_PM=y CONFIG_PM_DEBUG=y CONFIG_PM_ADVANCED_DEBUG=y CONFIG_CAN_PM_TRACE=y CONFIG_PM_TRACE=y CONFIG_PM_TRACE_RTC=y CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_AC=y CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_X86_PM_TIMER=y CONFIG_ACPI_HED=y CONFIG_ACPI_APEI=y CONFIG_ACPI_APEI_GHES=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_TABLE=y CONFIG_CPU_FREQ_STAT=m CONFIG_CPU_FREQ_STAT_DETAILS=y CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_X86_ACPI_CPUFREQ=y CONFIG_X86_SPEEDSTEP_CENTRINO=m CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE=y CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y CONFIG_INTEL_IDLE=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y CONFIG_PCI_IOAPIC=y CONFIG_PCI_LABEL=y CONFIG_ISA_DMA_API=y CONFIG_ISA=y CONFIG_AMD_NB=y CONFIG_PCCARD=m CONFIG_PCMCIA=m CONFIG_PCMCIA_LOAD_CIS=y CONFIG_CARDBUS=y CONFIG_YENTA=m CONFIG_YENTA_O2=y CONFIG_YENTA_RICOH=y CONFIG_YENTA_TI=y CONFIG_YENTA_ENE_TUNE=y CONFIG_YENTA_TOSHIBA=y CONFIG_PCMCIA_PROBE=y CONFIG_PCCARD_NONSTATIC=y CONFIG_BINFMT_ELF=y CONFIG_HAVE_AOUT=y CONFIG_HAVE_ATOMIC_IOMAP=y CONFIG_HAVE_TEXT_POKE_SMP=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_SYN_COOKIES=y CONFIG_INET_TUNNEL=m CONFIG_INET_LRO=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_IPV6=y CONFIG_IPV6_PRIVACY=y CONFIG_IPV6_SIT=m CONFIG_IPV6_NDISC_NODETYPE=y CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_IPV6_SUBTREES=y CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y CONFIG_NETFILTER_NETLINK=y CONFIG_NETFILTER_NETLINK_LOG=y CONFIG_NETFILTER_XTABLES=y CONFIG_NETFILTER_XT_TARGET_NFLOG=y CONFIG_NETFILTER_XT_TARGET_TCPMSS=y CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y CONFIG_NETFILTER_XT_MATCH_DEVGROUP=y CONFIG_NETFILTER_XT_MATCH_IPRANGE=y CONFIG_NETFILTER_XT_MATCH_LIMIT=y CONFIG_NETFILTER_XT_MATCH_MAC=y CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y CONFIG_NETFILTER_XT_MATCH_OWNER=y CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_TARGET_LOG=y CONFIG_IP_NF_MANGLE=y CONFIG_IP6_NF_IPTABLES=y CONFIG_IP6_NF_MATCH_EUI64=y CONFIG_IP6_NF_TARGET_LOG=y CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_REJECT=y CONFIG_IP6_NF_MANGLE=y CONFIG_VLAN_8021Q=m CONFIG_IRDA=m CONFIG_IRLAN=m CONFIG_IRCOMM=m CONFIG_IRDA_ULTRA=y CONFIG_IRTTY_SIR=m CONFIG_NSC_FIR=m CONFIG_FIB_RULES=y CONFIG_WIRELESS=y CONFIG_WIRELESS_EXT=y CONFIG_WEXT_CORE=y CONFIG_WEXT_PROC=y CONFIG_WEXT_SPY=y CONFIG_WEXT_PRIV=y CONFIG_CFG80211=m CONFIG_CFG80211_DEFAULT_PS=y CONFIG_WIRELESS_EXT_SYSFS=y CONFIG_LIB80211=m CONFIG_LIB80211_CRYPT_WEP=m CONFIG_LIB80211_CRYPT_CCMP=m CONFIG_LIB80211_CRYPT_TKIP=m CONFIG_RFKILL=m CONFIG_RFKILL_INPUT=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="" CONFIG_ARCH_NO_SYSDEV_OPS=y CONFIG_PNP=y CONFIG_ISAPNP=y CONFIG_PNPACPI=y CONFIG_BLK_DEV=y CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_NBD=m CONFIG_CDROM_PKTCDVD=m CONFIG_CDROM_PKTCDVD_BUFFERS=8 CONFIG_ATA_OVER_ETH=y CONFIG_MISC_DEVICES=y CONFIG_HAVE_IDE=y CONFIG_SCSI_MOD=y CONFIG_SCSI=y CONFIG_SCSI_DMA=y CONFIG_SCSI_PROC_FS=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_SCSI_WAIT_SCAN=m CONFIG_ATA=y CONFIG_ATA_VERBOSE_ERROR=y CONFIG_ATA_ACPI=y CONFIG_ATA_SFF=y CONFIG_ATA_BMDMA=y CONFIG_ATA_PIIX=y CONFIG_FIREWIRE=m CONFIG_FIREWIRE_OHCI=m CONFIG_FIREWIRE_OHCI_DEBUG=y CONFIG_FIREWIRE_SBP2=m CONFIG_NETDEVICES=y CONFIG_TUN=m CONFIG_VETH=m CONFIG_MII=y CONFIG_PHYLIB=m CONFIG_BROADCOM_PHY=m CONFIG_NET_ETHERNET=y CONFIG_NET_PCI=y CONFIG_B44=y CONFIG_B44_PCI_AUTOSELECT=y CONFIG_B44_PCICORE_AUTOSELECT=y CONFIG_B44_PCI=y CONFIG_WLAN=y CONFIG_IPW2100=m CONFIG_IPW2100_MONITOR=y CONFIG_LIBIPW=m CONFIG_NETCONSOLE=y CONFIG_NETCONSOLE_DYNAMIC=y CONFIG_NETPOLL=y CONFIG_NET_POLL_CONTROLLER=y CONFIG_INPUT=y CONFIG_INPUT_POLLDEV=m CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1400 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1050 CONFIG_INPUT_EVDEV=y CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y CONFIG_MOUSE_PS2_ALPS=y CONFIG_MOUSE_PS2_LOGIPS2PP=y CONFIG_MOUSE_PS2_SYNAPTICS=y CONFIG_MOUSE_PS2_LIFEBOOK=y CONFIG_MOUSE_PS2_TRACKPOINT=y CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=m CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=m CONFIG_SERIO_LIBPS2=y CONFIG_VT=y CONFIG_CONSOLE_TRANSLATIONS=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_UNIX98_PTYS=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_NVRAM=m CONFIG_HPET=y CONFIG_HPET_MMAP=y CONFIG_DEVPORT=y CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y CONFIG_I2C_COMPAT=y CONFIG_I2C_CHARDEV=m CONFIG_I2C_HELPER_AUTO=y CONFIG_I2C_ALGOBIT=y CONFIG_I2C_I801=m CONFIG_I2C_SCMI=m CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y CONFIG_POWER_SUPPLY=y CONFIG_HWMON=y CONFIG_SENSORS_CORETEMP=m CONFIG_THERMAL=y CONFIG_THERMAL_HWMON=y CONFIG_SSB_POSSIBLE=y CONFIG_SSB=y CONFIG_SSB_SPROM=y CONFIG_SSB_PCIHOST_POSSIBLE=y CONFIG_SSB_PCIHOST=y CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y CONFIG_SSB_DRIVER_PCICORE=y CONFIG_MFD_SUPPORT=y CONFIG_MEDIA_SUPPORT=y CONFIG_VIDEO_DEV=m CONFIG_VIDEO_V4L2_COMMON=m CONFIG_VIDEO_MEDIA=m CONFIG_MEDIA_ATTACH=y CONFIG_MEDIA_TUNER=m CONFIG_MEDIA_TUNER_SIMPLE=m CONFIG_MEDIA_TUNER_TDA8290=m CONFIG_MEDIA_TUNER_TDA827X=m CONFIG_MEDIA_TUNER_TDA18271=m CONFIG_MEDIA_TUNER_TDA9887=m CONFIG_MEDIA_TUNER_TEA5761=m CONFIG_MEDIA_TUNER_TEA5767=m CONFIG_MEDIA_TUNER_MT20XX=m CONFIG_MEDIA_TUNER_XC2028=m CONFIG_MEDIA_TUNER_XC5000=m CONFIG_MEDIA_TUNER_MC44S803=m CONFIG_VIDEO_V4L2=m CONFIG_VIDEO_CAPTURE_DRIVERS=y CONFIG_VIDEO_HELPER_CHIPS_AUTO=y CONFIG_V4L_USB_DRIVERS=y CONFIG_USB_VIDEO_CLASS=m CONFIG_USB_VIDEO_CLASS_INPUT_EVDEV=y CONFIG_USB_PWC=m CONFIG_USB_PWC_INPUT_EVDEV=y CONFIG_AGP=y CONFIG_AGP_INTEL=y CONFIG_VGA_ARB=y CONFIG_VGA_ARB_MAX_GPUS=2 CONFIG_DRM=y CONFIG_DRM_KMS_HELPER=y CONFIG_DRM_I915=y CONFIG_VIDEO_OUTPUT_CONTROL=y CONFIG_FB=y CONFIG_FIRMWARE_EDID=y CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y CONFIG_FB_SYS_FILLRECT=m CONFIG_FB_SYS_COPYAREA=m CONFIG_FB_SYS_IMAGEBLIT=m CONFIG_FB_SYS_FOPS=m CONFIG_FB_DEFERRED_IO=y CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y CONFIG_FB_VESA=y CONFIG_BACKLIGHT_LCD_SUPPORT=y CONFIG_LCD_CLASS_DEVICE=y CONFIG_BACKLIGHT_CLASS_DEVICE=y CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y CONFIG_LOGO=y CONFIG_LOGO_LINUX_CLUT224=y CONFIG_SOUND=y CONFIG_SOUND_OSS_CORE=y CONFIG_SND=m CONFIG_SND_TIMER=m CONFIG_SND_PCM=m CONFIG_SND_HWDEP=m CONFIG_SND_RAWMIDI=m CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=m CONFIG_SND_PCM_OSS=m CONFIG_SND_PCM_OSS_PLUGINS=y CONFIG_SND_HRTIMER=m CONFIG_SND_VMASTER=y CONFIG_SND_DMA_SGBUF=y CONFIG_SND_AC97_CODEC=m CONFIG_SND_DRIVERS=y CONFIG_SND_AC97_POWER_SAVE=y CONFIG_SND_AC97_POWER_SAVE_DEFAULT=60 CONFIG_SND_PCI=y CONFIG_SND_INTEL8X0=m CONFIG_SND_INTEL8X0M=m CONFIG_SND_USB=y CONFIG_SND_USB_AUDIO=m CONFIG_AC97_BUS=m CONFIG_HID_SUPPORT=y CONFIG_HID=m CONFIG_HIDRAW=y CONFIG_USB_HID=m CONFIG_USB_HIDDEV=y CONFIG_HID_A4TECH=m CONFIG_HID_APPLE=m CONFIG_HID_BELKIN=m CONFIG_HID_CHERRY=m CONFIG_HID_CHICONY=m CONFIG_HID_CYPRESS=m CONFIG_HID_EZKEY=m CONFIG_HID_KYE=m CONFIG_HID_TWINHAN=m CONFIG_HID_KENSINGTON=m CONFIG_HID_LOGITECH=m CONFIG_HID_MICROSOFT=m CONFIG_HID_MONTEREY=m CONFIG_HID_PICOLCD=m CONFIG_HID_PICOLCD_FB=y CONFIG_HID_PICOLCD_BACKLIGHT=y CONFIG_HID_PICOLCD_LCD=y CONFIG_HID_PICOLCD_LEDS=y CONFIG_USB_SUPPORT=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_USB=m CONFIG_USB_ANNOUNCE_NEW_DEVICES=y CONFIG_USB_SUSPEND=y CONFIG_USB_EHCI_HCD=m CONFIG_USB_UHCI_HCD=m CONFIG_USB_STORAGE=m CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_PL2303=m CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y CONFIG_RTC_LIB=y CONFIG_RTC_CLASS=y CONFIG_RTC_HCTOSYS=y CONFIG_RTC_HCTOSYS_DEVICE="rtc0" CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y CONFIG_RTC_INTF_DEV=y CONFIG_RTC_DRV_CMOS=y CONFIG_X86_PLATFORM_DEVICES=y CONFIG_ACPI_WMI=y CONFIG_FIRMWARE_MEMMAP=y CONFIG_DMIID=y CONFIG_DMI_SYSFS=y CONFIG_EXT2_FS=m CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_FS_MBCACHE=m CONFIG_REISERFS_FS=y CONFIG_REISERFS_PROC_INFO=y CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y CONFIG_REISERFS_FS_SECURITY=y CONFIG_XFS_FS=m CONFIG_XFS_QUOTA=y CONFIG_XFS_POSIX_ACL=y CONFIG_FS_POSIX_ACL=y CONFIG_EXPORTFS=y CONFIG_FILE_LOCKING=y CONFIG_FSNOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_QUOTACTL=y CONFIG_GENERIC_ACL=y CONFIG_ISO9660_FS=m CONFIG_JOLIET=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=850 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" CONFIG_NTFS_FS=m CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_SYSCTL=y CONFIG_PROC_PAGE_MONITOR=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_CONFIGFS_FS=y CONFIG_MISC_FILESYSTEMS=y CONFIG_HFS_FS=m CONFIG_HFSPLUS_FS=m CONFIG_SQUASHFS=m CONFIG_SQUASHFS_XATTR=y CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3 CONFIG_PSTORE=y CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=m CONFIG_NFS_V3=y CONFIG_NFS_V3_ACL=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_NFS_ACL_SUPPORT=m CONFIG_NFS_COMMON=y CONFIG_SUNRPC=m CONFIG_CIFS=m CONFIG_CIFS_STATS=y CONFIG_CIFS_STATS2=y CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y CONFIG_PARTITION_ADVANCED=y CONFIG_MSDOS_PARTITION=y CONFIG_EFI_PARTITION=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_CODEPAGE_850=m CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=m CONFIG_NLS_UTF8=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4 CONFIG_ENABLE_WARN_DEPRECATED=y CONFIG_FRAME_WARN=1024 CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0 CONFIG_DETECT_HUNG_TASK=y CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0 CONFIG_SCHED_DEBUG=y CONFIG_SCHEDSTATS=y CONFIG_TIMER_STATS=y CONFIG_STACKTRACE=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_REDUCED=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_FRAME_POINTER=y CONFIG_LATENCYTOP=y CONFIG_SYSCTL_SYSCALL_CHECK=y CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_SYSCALL_TRACEPOINTS=y CONFIG_HAVE_C_RECORDMCOUNT=y CONFIG_TRACING_SUPPORT=y CONFIG_HAVE_ARCH_KGDB=y CONFIG_HAVE_ARCH_KMEMCHECK=y CONFIG_STRICT_DEVMEM=y CONFIG_X86_VERBOSE_BOOTUP=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_RODATA=y CONFIG_DEBUG_SET_MODULE_RONX=y CONFIG_DOUBLEFAULT=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y CONFIG_IO_DELAY_TYPE_0X80=0 CONFIG_IO_DELAY_TYPE_0XED=1 CONFIG_IO_DELAY_TYPE_UDELAY=2 CONFIG_IO_DELAY_TYPE_NONE=3 CONFIG_IO_DELAY_0X80=y CONFIG_DEFAULT_IO_DELAY_TYPE=0 CONFIG_OPTIMIZE_INLINING=y CONFIG_DEFAULT_SECURITY_DAC=y CONFIG_DEFAULT_SECURITY="" CONFIG_CRYPTO=y CONFIG_CRYPTO_ALGAPI=m CONFIG_CRYPTO_ALGAPI2=m CONFIG_CRYPTO_AEAD2=m CONFIG_CRYPTO_BLKCIPHER=m CONFIG_CRYPTO_BLKCIPHER2=m CONFIG_CRYPTO_HASH=m CONFIG_CRYPTO_HASH2=m CONFIG_CRYPTO_RNG2=m CONFIG_CRYPTO_PCOMP2=m CONFIG_CRYPTO_MANAGER=m CONFIG_CRYPTO_MANAGER2=m CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y CONFIG_CRYPTO_WORKQUEUE=m CONFIG_CRYPTO_ECB=m CONFIG_CRYPTO_HMAC=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_ARC4=m CONFIG_HAVE_KVM=y CONFIG_BITREVERSE=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_FIND_LAST_BIT=y CONFIG_CRC_CCITT=m CONFIG_CRC_ITU_T=m CONFIG_CRC32=y CONFIG_ZLIB_INFLATE=m CONFIG_HAS_IOMEM=y CONFIG_HAS_IOPORT=y CONFIG_HAS_DMA=y CONFIG_NLATTR=y ^ permalink raw reply [flat|nested] 44+ messages in thread
* 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-23 20:44 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-23 20:44 UTC (permalink / raw) To: xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 6290 bytes --] Hi, Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 or -rc2) I've hit a "dying machine" where processes writing to disk end up in D state. From occurrence with -rc3+ I don't have logs as those never hit the disk, for -rc4+ I have the following (sysrq+t was too big, what I have of it misses a dozen of kernel tasks - if needed, please ask): The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 (+ 1 patch of mine to stop disk on reboot), full dmesg available if needed; kernel config attached (only selected options). In case there is something I should do at next occurrence please tell. Unfortunately I have no trigger for it and it does not happen very often. Thanks, Bruno [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 [32040.120153] Call Trace: [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] [32040.120625] [<f88c560a>] xfs_iomap_write_allocate+0x27a/0x3a0 [xfs] [32040.120680] [<f88df192>] xfs_map_blocks+0x1d2/0x210 [xfs] [32040.120733] [<f88dfbde>] xfs_vm_writepage+0x2be/0x4e0 [xfs] [32040.120749] [<c107ab4b>] __writepage+0xb/0x30 [32040.120760] [<c107bb06>] write_cache_pages+0x156/0x350 [32040.120771] [<c107ab40>] ? set_page_dirty+0x60/0x60 [32040.120784] [<c107bd3b>] generic_writepages+0x3b/0x60 [32040.120836] [<f88def91>] xfs_vm_writepages+0x21/0x30 [xfs] [32040.120847] [<c107bd77>] do_writepages+0x17/0x30 [32040.120859] [<c10c1667>] writeback_single_inode+0x77/0x1d0 [32040.120869] [<c10c1c36>] writeback_sb_inodes+0x96/0x130 [32040.120880] [<c10c21ae>] writeback_inodes_wb+0x6e/0xf0 [32040.120890] [<c10c2442>] wb_writeback+0x212/0x260 [32040.120901] [<c10c2655>] wb_do_writeback+0x1c5/0x1d0 [32040.120912] [<c10c26d1>] bdi_writeback_thread+0x71/0x130 [32040.120923] [<c10c2660>] ? wb_do_writeback+0x1d0/0x1d0 [32040.120936] [<c10458c4>] kthread+0x74/0x80 [32040.120946] [<c1045850>] ? kthreadd+0xc0/0xc0 [32040.120959] [<c13332f6>] kernel_thread_helper+0x6/0xd [32040.120989] INFO: task kworker/0:2:6126 blocked for more than 120 seconds. [32040.120996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32040.121003] kworker/0:2 D e05a3dfc 6400 6126 2 0x00000000 [32040.121024] e05a3e6c 00000046 00002925 e05a3dfc c104b7f1 00000296 1b84e819 e05a3e04 [32040.121042] e05a3e1c e05a3e6c 00000004 f4502ec0 8c876f48 00001cfb f4502ec0 2d2b9b93 [32040.121059] 0000189c dd283f80 00000000 0016e360 e05a3e50 c10086c7 e05a3ee0 0009201a [32040.121077] Call Trace: [32040.121089] [<c104b7f1>] ? sched_clock_tick+0x51/0x80 [32040.121105] [<c10086c7>] ? time_cpufreq_notifier+0x57/0x120 [32040.121117] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 [32040.121169] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] [32040.121220] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] [32040.121231] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 [32040.121283] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] [32040.121335] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] [32040.121402] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] [32040.121455] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] [32040.121506] [<f88bafe1>] xfs_fs_log_dummy+0x41/0x90 [xfs] [32040.121559] [<f88e9f12>] xfs_sync_worker+0x62/0x70 [xfs] [32040.121570] [<c104075c>] process_one_work+0xfc/0x320 [32040.121621] [<f88e9eb0>] ? xfs_syncd_init+0xb0/0xb0 [xfs] [32040.121633] [<c1042206>] worker_thread+0x106/0x330 [32040.121643] [<c1042100>] ? manage_workers+0x340/0x340 [32040.121654] [<c10458c4>] kthread+0x74/0x80 [32040.121664] [<c1045850>] ? kthreadd+0xc0/0xc0 [32040.121675] [<c13332f6>] kernel_thread_helper+0x6/0xd [32040.121685] INFO: task tar:9048 blocked for more than 120 seconds. [32040.121692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32040.121699] tar D e4907dac 5540 9048 9039 0x00000000 [32040.121720] e4907db4 00200082 00000000 e4907dac f88c88ca f4a58d24 00000002 00000000 [32040.121737] e4907d64 e4907db4 00000004 e0ee2310 e349aa9d 00001cf9 e0ee2310 e4907d94 [32040.121755] 008d71a4 00000000 00001fbd 00947400 f606d880 0000402e 00000000 e4907db4 [32040.121772] Call Trace: [32040.121823] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] [32040.121876] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] [32040.121929] [<f88c94f2>] xlog_grant_log_space+0xf2/0x240 [xfs] [32040.121941] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 [32040.121992] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] [32040.122047] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] [32040.122100] [<f88dae3d>] xfs_create+0x14d/0x520 [xfs] [32040.122152] [<f88e69da>] xfs_vn_mknod+0x9a/0x180 [xfs] [32040.122204] [<f88e6ad8>] xfs_vn_mkdir+0x18/0x20 [xfs] [32040.122216] [<c10adf5e>] vfs_mkdir+0x6e/0xa0 [32040.122266] [<f88e6ac0>] ? xfs_vn_mknod+0x180/0x180 [xfs] [32040.122278] [<c10b0928>] sys_mkdirat+0xc8/0xe0 [32040.122292] [<c108e46f>] ? remove_vma+0x3f/0x60 [32040.122303] [<c108f23c>] ? do_munmap+0x20c/0x2f0 [32040.122314] [<c10b0960>] sys_mkdir+0x20/0x30 [32040.122325] [<c1332dd0>] sysenter_do_call+0x12/0x26 [-- Attachment #2: config --] [-- Type: application/octet-stream, Size: 18269 bytes --] CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_DEFAULT_IDLE=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_KTIME_SCALAR=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_CONSTRUCTORS=y CONFIG_HAVE_IRQ_WORK=y CONFIG_IRQ_WORK=y CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_KERNEL_GZIP=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_HAVE_GENERIC_HARDIRQS=y CONFIG_GENERIC_HARDIRQS=y CONFIG_HAVE_SPARSE_IRQ=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_TINY_RCU=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_NAMESPACES=y CONFIG_SCHED_AUTOGROUP=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_PCI_QUIRKS=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y CONFIG_HAVE_OPROFILE=y CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_OPTPROBES=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y CONFIG_HAVE_DMA_API_DEBUG=y CONFIG_HAVE_HW_BREAKPOINT=y CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y CONFIG_HAVE_USER_RETURN_NOTIFIER=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_HAVE_GENERIC_DMA_COHERENT=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_BLOCK=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_CFQ=y CONFIG_DEFAULT_IOSCHED="cfq" CONFIG_INLINE_SPIN_UNLOCK=y CONFIG_INLINE_SPIN_UNLOCK_IRQ=y CONFIG_INLINE_READ_UNLOCK=y CONFIG_INLINE_READ_UNLOCK_IRQ=y CONFIG_INLINE_WRITE_UNLOCK=y CONFIG_INLINE_WRITE_UNLOCK_IRQ=y CONFIG_FREEZER=y CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_NO_BOOTMEM=y CONFIG_MPENTIUMM=y CONFIG_X86_INTERNODE_CACHE_SHIFT=6 CONFIG_X86_CMPXCHG=y CONFIG_CMPXCHG_LOCAL=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=5 CONFIG_X86_DEBUGCTLMSR=y CONFIG_CPU_SUP_INTEL=y CONFIG_CPU_SUP_CYRIX_32=y CONFIG_CPU_SUP_AMD=y CONFIG_CPU_SUP_CENTAUR=y CONFIG_CPU_SUP_TRANSMETA_32=y CONFIG_CPU_SUP_UMC_32=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y CONFIG_NR_CPUS=1 CONFIG_PREEMPT_VOLUNTARY=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_THRESHOLD=y CONFIG_X86_THERMAL_VECTOR=y CONFIG_VM86=y CONFIG_MICROCODE=m CONFIG_MICROCODE_INTEL=y CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=m CONFIG_X86_CPUID=m CONFIG_HIGHMEM4G=y CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_HIGHMEM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ILLEGAL_POINTER_VALUE=0 CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_HAVE_MEMBLOCK=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_COMPACTION=y CONFIG_MIGRATION=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_DEFAULT_MMAP_MIN_ADDR=4096 CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y CONFIG_NEED_PER_CPU_KM=y CONFIG_HIGHPTE=y CONFIG_X86_CHECK_BIOS_CORRUPTION=y CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y CONFIG_X86_RESERVE_LOW=64 CONFIG_MTRR=y CONFIG_X86_PAT=y CONFIG_ARCH_USES_PG_UNCACHED=y CONFIG_HZ_100=y CONFIG_HZ=100 CONFIG_SCHED_HRTICK=y CONFIG_KEXEC=y CONFIG_PHYSICAL_START=0x1000000 CONFIG_PHYSICAL_ALIGN=0x100000 CONFIG_CMDLINE_BOOL=y CONFIG_CMDLINE="" CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_SUSPEND=y CONFIG_SUSPEND_FREEZER=y CONFIG_PM_SLEEP=y CONFIG_PM_RUNTIME=y CONFIG_PM=y CONFIG_PM_DEBUG=y CONFIG_PM_ADVANCED_DEBUG=y CONFIG_CAN_PM_TRACE=y CONFIG_PM_TRACE=y CONFIG_PM_TRACE_RTC=y CONFIG_ACPI=y CONFIG_ACPI_SLEEP=y CONFIG_ACPI_AC=y CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=y CONFIG_ACPI_DOCK=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_BLACKLIST_YEAR=0 CONFIG_ACPI_DEBUG=y CONFIG_X86_PM_TIMER=y CONFIG_ACPI_HED=y CONFIG_ACPI_APEI=y CONFIG_ACPI_APEI_GHES=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_TABLE=y CONFIG_CPU_FREQ_STAT=m CONFIG_CPU_FREQ_STAT_DETAILS=y CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_X86_ACPI_CPUFREQ=y CONFIG_X86_SPEEDSTEP_CENTRINO=m CONFIG_X86_SPEEDSTEP_CENTRINO_TABLE=y CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y CONFIG_INTEL_IDLE=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y CONFIG_ARCH_SUPPORTS_MSI=y CONFIG_PCI_MSI=y CONFIG_PCI_IOAPIC=y CONFIG_PCI_LABEL=y CONFIG_ISA_DMA_API=y CONFIG_ISA=y CONFIG_AMD_NB=y CONFIG_PCCARD=m CONFIG_PCMCIA=m CONFIG_PCMCIA_LOAD_CIS=y CONFIG_CARDBUS=y CONFIG_YENTA=m CONFIG_YENTA_O2=y CONFIG_YENTA_RICOH=y CONFIG_YENTA_TI=y CONFIG_YENTA_ENE_TUNE=y CONFIG_YENTA_TOSHIBA=y CONFIG_PCMCIA_PROBE=y CONFIG_PCCARD_NONSTATIC=y CONFIG_BINFMT_ELF=y CONFIG_HAVE_AOUT=y CONFIG_HAVE_ATOMIC_IOMAP=y CONFIG_HAVE_TEXT_POKE_SMP=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_SYN_COOKIES=y CONFIG_INET_TUNNEL=m CONFIG_INET_LRO=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_IPV6=y CONFIG_IPV6_PRIVACY=y CONFIG_IPV6_SIT=m CONFIG_IPV6_NDISC_NODETYPE=y CONFIG_IPV6_MULTIPLE_TABLES=y CONFIG_IPV6_SUBTREES=y CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y CONFIG_NETFILTER_NETLINK=y CONFIG_NETFILTER_NETLINK_LOG=y CONFIG_NETFILTER_XTABLES=y CONFIG_NETFILTER_XT_TARGET_NFLOG=y CONFIG_NETFILTER_XT_TARGET_TCPMSS=y CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y CONFIG_NETFILTER_XT_MATCH_DEVGROUP=y CONFIG_NETFILTER_XT_MATCH_IPRANGE=y CONFIG_NETFILTER_XT_MATCH_LIMIT=y CONFIG_NETFILTER_XT_MATCH_MAC=y CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y CONFIG_NETFILTER_XT_MATCH_OWNER=y CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_TARGET_LOG=y CONFIG_IP_NF_MANGLE=y CONFIG_IP6_NF_IPTABLES=y CONFIG_IP6_NF_MATCH_EUI64=y CONFIG_IP6_NF_TARGET_LOG=y CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_REJECT=y CONFIG_IP6_NF_MANGLE=y CONFIG_VLAN_8021Q=m CONFIG_IRDA=m CONFIG_IRLAN=m CONFIG_IRCOMM=m CONFIG_IRDA_ULTRA=y CONFIG_IRTTY_SIR=m CONFIG_NSC_FIR=m CONFIG_FIB_RULES=y CONFIG_WIRELESS=y CONFIG_WIRELESS_EXT=y CONFIG_WEXT_CORE=y CONFIG_WEXT_PROC=y CONFIG_WEXT_SPY=y CONFIG_WEXT_PRIV=y CONFIG_CFG80211=m CONFIG_CFG80211_DEFAULT_PS=y CONFIG_WIRELESS_EXT_SYSFS=y CONFIG_LIB80211=m CONFIG_LIB80211_CRYPT_WEP=m CONFIG_LIB80211_CRYPT_CCMP=m CONFIG_LIB80211_CRYPT_TKIP=m CONFIG_RFKILL=m CONFIG_RFKILL_INPUT=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="" CONFIG_ARCH_NO_SYSDEV_OPS=y CONFIG_PNP=y CONFIG_ISAPNP=y CONFIG_PNPACPI=y CONFIG_BLK_DEV=y CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_NBD=m CONFIG_CDROM_PKTCDVD=m CONFIG_CDROM_PKTCDVD_BUFFERS=8 CONFIG_ATA_OVER_ETH=y CONFIG_MISC_DEVICES=y CONFIG_HAVE_IDE=y CONFIG_SCSI_MOD=y CONFIG_SCSI=y CONFIG_SCSI_DMA=y CONFIG_SCSI_PROC_FS=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_SCSI_WAIT_SCAN=m CONFIG_ATA=y CONFIG_ATA_VERBOSE_ERROR=y CONFIG_ATA_ACPI=y CONFIG_ATA_SFF=y CONFIG_ATA_BMDMA=y CONFIG_ATA_PIIX=y CONFIG_FIREWIRE=m CONFIG_FIREWIRE_OHCI=m CONFIG_FIREWIRE_OHCI_DEBUG=y CONFIG_FIREWIRE_SBP2=m CONFIG_NETDEVICES=y CONFIG_TUN=m CONFIG_VETH=m CONFIG_MII=y CONFIG_PHYLIB=m CONFIG_BROADCOM_PHY=m CONFIG_NET_ETHERNET=y CONFIG_NET_PCI=y CONFIG_B44=y CONFIG_B44_PCI_AUTOSELECT=y CONFIG_B44_PCICORE_AUTOSELECT=y CONFIG_B44_PCI=y CONFIG_WLAN=y CONFIG_IPW2100=m CONFIG_IPW2100_MONITOR=y CONFIG_LIBIPW=m CONFIG_NETCONSOLE=y CONFIG_NETCONSOLE_DYNAMIC=y CONFIG_NETPOLL=y CONFIG_NET_POLL_CONTROLLER=y CONFIG_INPUT=y CONFIG_INPUT_POLLDEV=m CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1400 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1050 CONFIG_INPUT_EVDEV=y CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y CONFIG_MOUSE_PS2_ALPS=y CONFIG_MOUSE_PS2_LOGIPS2PP=y CONFIG_MOUSE_PS2_SYNAPTICS=y CONFIG_MOUSE_PS2_LIFEBOOK=y CONFIG_MOUSE_PS2_TRACKPOINT=y CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=m CONFIG_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=m CONFIG_SERIO_LIBPS2=y CONFIG_VT=y CONFIG_CONSOLE_TRANSLATIONS=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_UNIX98_PTYS=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_NVRAM=m CONFIG_HPET=y CONFIG_HPET_MMAP=y CONFIG_DEVPORT=y CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y CONFIG_I2C_COMPAT=y CONFIG_I2C_CHARDEV=m CONFIG_I2C_HELPER_AUTO=y CONFIG_I2C_ALGOBIT=y CONFIG_I2C_I801=m CONFIG_I2C_SCMI=m CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y CONFIG_POWER_SUPPLY=y CONFIG_HWMON=y CONFIG_SENSORS_CORETEMP=m CONFIG_THERMAL=y CONFIG_THERMAL_HWMON=y CONFIG_SSB_POSSIBLE=y CONFIG_SSB=y CONFIG_SSB_SPROM=y CONFIG_SSB_PCIHOST_POSSIBLE=y CONFIG_SSB_PCIHOST=y CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y CONFIG_SSB_DRIVER_PCICORE=y CONFIG_MFD_SUPPORT=y CONFIG_MEDIA_SUPPORT=y CONFIG_VIDEO_DEV=m CONFIG_VIDEO_V4L2_COMMON=m CONFIG_VIDEO_MEDIA=m CONFIG_MEDIA_ATTACH=y CONFIG_MEDIA_TUNER=m CONFIG_MEDIA_TUNER_SIMPLE=m CONFIG_MEDIA_TUNER_TDA8290=m CONFIG_MEDIA_TUNER_TDA827X=m CONFIG_MEDIA_TUNER_TDA18271=m CONFIG_MEDIA_TUNER_TDA9887=m CONFIG_MEDIA_TUNER_TEA5761=m CONFIG_MEDIA_TUNER_TEA5767=m CONFIG_MEDIA_TUNER_MT20XX=m CONFIG_MEDIA_TUNER_XC2028=m CONFIG_MEDIA_TUNER_XC5000=m CONFIG_MEDIA_TUNER_MC44S803=m CONFIG_VIDEO_V4L2=m CONFIG_VIDEO_CAPTURE_DRIVERS=y CONFIG_VIDEO_HELPER_CHIPS_AUTO=y CONFIG_V4L_USB_DRIVERS=y CONFIG_USB_VIDEO_CLASS=m CONFIG_USB_VIDEO_CLASS_INPUT_EVDEV=y CONFIG_USB_PWC=m CONFIG_USB_PWC_INPUT_EVDEV=y CONFIG_AGP=y CONFIG_AGP_INTEL=y CONFIG_VGA_ARB=y CONFIG_VGA_ARB_MAX_GPUS=2 CONFIG_DRM=y CONFIG_DRM_KMS_HELPER=y CONFIG_DRM_I915=y CONFIG_VIDEO_OUTPUT_CONTROL=y CONFIG_FB=y CONFIG_FIRMWARE_EDID=y CONFIG_FB_BOOT_VESA_SUPPORT=y CONFIG_FB_CFB_FILLRECT=y CONFIG_FB_CFB_COPYAREA=y CONFIG_FB_CFB_IMAGEBLIT=y CONFIG_FB_SYS_FILLRECT=m CONFIG_FB_SYS_COPYAREA=m CONFIG_FB_SYS_IMAGEBLIT=m CONFIG_FB_SYS_FOPS=m CONFIG_FB_DEFERRED_IO=y CONFIG_FB_MODE_HELPERS=y CONFIG_FB_TILEBLITTING=y CONFIG_FB_VESA=y CONFIG_BACKLIGHT_LCD_SUPPORT=y CONFIG_LCD_CLASS_DEVICE=y CONFIG_BACKLIGHT_CLASS_DEVICE=y CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y CONFIG_FONT_8x8=y CONFIG_FONT_8x16=y CONFIG_LOGO=y CONFIG_LOGO_LINUX_CLUT224=y CONFIG_SOUND=y CONFIG_SOUND_OSS_CORE=y CONFIG_SND=m CONFIG_SND_TIMER=m CONFIG_SND_PCM=m CONFIG_SND_HWDEP=m CONFIG_SND_RAWMIDI=m CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=m CONFIG_SND_PCM_OSS=m CONFIG_SND_PCM_OSS_PLUGINS=y CONFIG_SND_HRTIMER=m CONFIG_SND_VMASTER=y CONFIG_SND_DMA_SGBUF=y CONFIG_SND_AC97_CODEC=m CONFIG_SND_DRIVERS=y CONFIG_SND_AC97_POWER_SAVE=y CONFIG_SND_AC97_POWER_SAVE_DEFAULT=60 CONFIG_SND_PCI=y CONFIG_SND_INTEL8X0=m CONFIG_SND_INTEL8X0M=m CONFIG_SND_USB=y CONFIG_SND_USB_AUDIO=m CONFIG_AC97_BUS=m CONFIG_HID_SUPPORT=y CONFIG_HID=m CONFIG_HIDRAW=y CONFIG_USB_HID=m CONFIG_USB_HIDDEV=y CONFIG_HID_A4TECH=m CONFIG_HID_APPLE=m CONFIG_HID_BELKIN=m CONFIG_HID_CHERRY=m CONFIG_HID_CHICONY=m CONFIG_HID_CYPRESS=m CONFIG_HID_EZKEY=m CONFIG_HID_KYE=m CONFIG_HID_TWINHAN=m CONFIG_HID_KENSINGTON=m CONFIG_HID_LOGITECH=m CONFIG_HID_MICROSOFT=m CONFIG_HID_MONTEREY=m CONFIG_HID_PICOLCD=m CONFIG_HID_PICOLCD_FB=y CONFIG_HID_PICOLCD_BACKLIGHT=y CONFIG_HID_PICOLCD_LCD=y CONFIG_HID_PICOLCD_LEDS=y CONFIG_USB_SUPPORT=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_USB=m CONFIG_USB_ANNOUNCE_NEW_DEVICES=y CONFIG_USB_SUSPEND=y CONFIG_USB_EHCI_HCD=m CONFIG_USB_UHCI_HCD=m CONFIG_USB_STORAGE=m CONFIG_USB_SERIAL=m CONFIG_USB_SERIAL_FTDI_SIO=m CONFIG_USB_SERIAL_PL2303=m CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y CONFIG_RTC_LIB=y CONFIG_RTC_CLASS=y CONFIG_RTC_HCTOSYS=y CONFIG_RTC_HCTOSYS_DEVICE="rtc0" CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y CONFIG_RTC_INTF_DEV=y CONFIG_RTC_DRV_CMOS=y CONFIG_X86_PLATFORM_DEVICES=y CONFIG_ACPI_WMI=y CONFIG_FIRMWARE_MEMMAP=y CONFIG_DMIID=y CONFIG_DMI_SYSFS=y CONFIG_EXT2_FS=m CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_FS_MBCACHE=m CONFIG_REISERFS_FS=y CONFIG_REISERFS_PROC_INFO=y CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y CONFIG_REISERFS_FS_SECURITY=y CONFIG_XFS_FS=m CONFIG_XFS_QUOTA=y CONFIG_XFS_POSIX_ACL=y CONFIG_FS_POSIX_ACL=y CONFIG_EXPORTFS=y CONFIG_FILE_LOCKING=y CONFIG_FSNOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_QUOTACTL=y CONFIG_GENERIC_ACL=y CONFIG_ISO9660_FS=m CONFIG_JOLIET=y CONFIG_UDF_FS=m CONFIG_UDF_NLS=y CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_FAT_DEFAULT_CODEPAGE=850 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" CONFIG_NTFS_FS=m CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_SYSCTL=y CONFIG_PROC_PAGE_MONITOR=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_CONFIGFS_FS=y CONFIG_MISC_FILESYSTEMS=y CONFIG_HFS_FS=m CONFIG_HFSPLUS_FS=m CONFIG_SQUASHFS=m CONFIG_SQUASHFS_XATTR=y CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3 CONFIG_PSTORE=y CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=m CONFIG_NFS_V3=y CONFIG_NFS_V3_ACL=y CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_NFS_ACL_SUPPORT=m CONFIG_NFS_COMMON=y CONFIG_SUNRPC=m CONFIG_CIFS=m CONFIG_CIFS_STATS=y CONFIG_CIFS_STATS2=y CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y CONFIG_PARTITION_ADVANCED=y CONFIG_MSDOS_PARTITION=y CONFIG_EFI_PARTITION=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_CODEPAGE_850=m CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_15=m CONFIG_NLS_UTF8=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4 CONFIG_ENABLE_WARN_DEPRECATED=y CONFIG_FRAME_WARN=1024 CONFIG_MAGIC_SYSRQ=y CONFIG_UNUSED_SYMBOLS=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_KERNEL=y CONFIG_LOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0 CONFIG_DETECT_HUNG_TASK=y CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0 CONFIG_SCHED_DEBUG=y CONFIG_SCHEDSTATS=y CONFIG_TIMER_STATS=y CONFIG_STACKTRACE=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_REDUCED=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_FRAME_POINTER=y CONFIG_LATENCYTOP=y CONFIG_SYSCTL_SYSCALL_CHECK=y CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_SYSCALL_TRACEPOINTS=y CONFIG_HAVE_C_RECORDMCOUNT=y CONFIG_TRACING_SUPPORT=y CONFIG_HAVE_ARCH_KGDB=y CONFIG_HAVE_ARCH_KMEMCHECK=y CONFIG_STRICT_DEVMEM=y CONFIG_X86_VERBOSE_BOOTUP=y CONFIG_EARLY_PRINTK=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_RODATA=y CONFIG_DEBUG_SET_MODULE_RONX=y CONFIG_DOUBLEFAULT=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y CONFIG_IO_DELAY_TYPE_0X80=0 CONFIG_IO_DELAY_TYPE_0XED=1 CONFIG_IO_DELAY_TYPE_UDELAY=2 CONFIG_IO_DELAY_TYPE_NONE=3 CONFIG_IO_DELAY_0X80=y CONFIG_DEFAULT_IO_DELAY_TYPE=0 CONFIG_OPTIMIZE_INLINING=y CONFIG_DEFAULT_SECURITY_DAC=y CONFIG_DEFAULT_SECURITY="" CONFIG_CRYPTO=y CONFIG_CRYPTO_ALGAPI=m CONFIG_CRYPTO_ALGAPI2=m CONFIG_CRYPTO_AEAD2=m CONFIG_CRYPTO_BLKCIPHER=m CONFIG_CRYPTO_BLKCIPHER2=m CONFIG_CRYPTO_HASH=m CONFIG_CRYPTO_HASH2=m CONFIG_CRYPTO_RNG2=m CONFIG_CRYPTO_PCOMP2=m CONFIG_CRYPTO_MANAGER=m CONFIG_CRYPTO_MANAGER2=m CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y CONFIG_CRYPTO_WORKQUEUE=m CONFIG_CRYPTO_ECB=m CONFIG_CRYPTO_HMAC=m CONFIG_CRYPTO_MD4=m CONFIG_CRYPTO_MD5=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_AES=m CONFIG_CRYPTO_AES_586=m CONFIG_CRYPTO_ARC4=m CONFIG_HAVE_KVM=y CONFIG_BITREVERSE=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_FIND_LAST_BIT=y CONFIG_CRC_CCITT=m CONFIG_CRC_ITU_T=m CONFIG_CRC32=y CONFIG_ZLIB_INFLATE=m CONFIG_HAS_IOMEM=y CONFIG_HAS_IOPORT=y CONFIG_HAS_DMA=y CONFIG_NLATTR=y [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-23 20:44 ` Bruno Prémont @ 2011-04-27 5:08 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-04-27 5:08 UTC (permalink / raw) To: Bruno Prémont Cc: xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > Hi, > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > or -rc2) I've hit a "dying machine" where processes writing to disk end > up in D state. > From occurrence with -rc3+ I don't have logs as those never hit the disk, > for -rc4+ I have the following (sysrq+t was too big, what I have of it > misses a dozen of kernel tasks - if needed, please ask): > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > (+ 1 patch of mine to stop disk on reboot), > full dmesg available if needed; kernel config attached (only selected > options). In case there is something I should do at next occurrence > please tell. Unfortunately I have no trigger for it and it does not > happen very often. > > Thanks, > Bruno > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > [32040.120153] Call Trace: > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] Hmmmmm. That may be caused by the conversion of the xfsaild to a work queue. Can you post the output of "xfs_info <mntpt>" and the mount options (/proc/mounts) used on you system? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-27 5:08 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-04-27 5:08 UTC (permalink / raw) To: Bruno Prémont Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Alex Elder On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > Hi, > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > or -rc2) I've hit a "dying machine" where processes writing to disk end > up in D state. > From occurrence with -rc3+ I don't have logs as those never hit the disk, > for -rc4+ I have the following (sysrq+t was too big, what I have of it > misses a dozen of kernel tasks - if needed, please ask): > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > (+ 1 patch of mine to stop disk on reboot), > full dmesg available if needed; kernel config attached (only selected > options). In case there is something I should do at next occurrence > please tell. Unfortunately I have no trigger for it and it does not > happen very often. > > Thanks, > Bruno > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > [32040.120153] Call Trace: > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] Hmmmmm. That may be caused by the conversion of the xfsaild to a work queue. Can you post the output of "xfs_info <mntpt>" and the mount options (/proc/mounts) used on you system? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-27 5:08 ` Dave Chinner @ 2011-04-27 16:26 ` Bruno Prémont -1 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-27 16:26 UTC (permalink / raw) To: Dave Chinner Cc: xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > up in D state. > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > misses a dozen of kernel tasks - if needed, please ask): > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > (+ 1 patch of mine to stop disk on reboot), > > full dmesg available if needed; kernel config attached (only selected > > options). In case there is something I should do at next occurrence > > please tell. Unfortunately I have no trigger for it and it does not > > happen very often. > > > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > [32040.120153] Call Trace: > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > work queue. Can you post the output of "xfs_info <mntpt>" and the > mount options (/proc/mounts) used on you system? Here it comes (including all XFS mount-points - with affected kernel but after fresh boot): * /proc/mountinfo * /dev/sda6 /mnt/.SRC xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda7 /home xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/cache/edb xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /usr/src xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/tmp xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/log xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/lib/portage/packages xfs rw,noatime,attr2,delaylog,noquota 0 0 * xfs_info * meta-data=/dev/sda7 isize=256 agcount=4, agsize=987996 blks = sectsz=512 attr=2 data = bsize=4096 blocks=3951982, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 meta-data=/dev/sda6 isize=256 agcount=4, agsize=655149 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2620595, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 Bruno ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-27 16:26 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-27 16:26 UTC (permalink / raw) To: Dave Chinner Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Alex Elder On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > up in D state. > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > misses a dozen of kernel tasks - if needed, please ask): > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > (+ 1 patch of mine to stop disk on reboot), > > full dmesg available if needed; kernel config attached (only selected > > options). In case there is something I should do at next occurrence > > please tell. Unfortunately I have no trigger for it and it does not > > happen very often. > > > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > [32040.120153] Call Trace: > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > work queue. Can you post the output of "xfs_info <mntpt>" and the > mount options (/proc/mounts) used on you system? Here it comes (including all XFS mount-points - with affected kernel but after fresh boot): * /proc/mountinfo * /dev/sda6 /mnt/.SRC xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda7 /home xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/cache/edb xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /usr/src xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/tmp xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/log xfs rw,noatime,attr2,delaylog,noquota 0 0 /dev/sda6 /var/lib/portage/packages xfs rw,noatime,attr2,delaylog,noquota 0 0 * xfs_info * meta-data=/dev/sda7 isize=256 agcount=4, agsize=987996 blks = sectsz=512 attr=2 data = bsize=4096 blocks=3951982, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 meta-data=/dev/sda6 isize=256 agcount=4, agsize=655149 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2620595, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 Bruno _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-27 16:26 ` Bruno Prémont @ 2011-04-28 19:45 ` Markus Trippelsdorf -1 siblings, 0 replies; 44+ messages in thread From: Markus Trippelsdorf @ 2011-04-28 19:45 UTC (permalink / raw) To: Bruno Prémont Cc: Dave Chinner, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > up in D state. > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > (+ 1 patch of mine to stop disk on reboot), > > > full dmesg available if needed; kernel config attached (only selected > > > options). In case there is something I should do at next occurrence > > > please tell. Unfortunately I have no trigger for it and it does not > > > happen very often. > > > > > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > [32040.120153] Call Trace: > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > mount options (/proc/mounts) used on you system? I may have hit the same problem today and managed to capture some sysrq-l and sysrq-w output. The system was largely unusable during this incident. I could still switch between X and the console (and press the sysrq key-combination), but I couldn't run any commands in the terminal. x4 ~ # xfs_info / meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks = sectsz=512 attr=2 data = bsize=4096 blocks=7799296, imaxpct=25 = sunit=128 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=3808, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 x4 ~ # xfs_info /var meta-data=/dev/sda1 isize=256 agcount=4, agsize=12800000 blks = sectsz=4096 attr=2 data = bsize=4096 blocks=51200000, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=25000, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 x4 ~ # mount rootfs on / type rootfs (rw) /dev/root on / type xfs (rw,noatime,attr2,delaylog,logbsize=256k,sunit=1024,swidth=1024,noquota) devtmpfs on /dev type devtmpfs (rw,relatime,size=4091272k,nr_inodes=1022818,mode=755) proc on /proc type proc (rw,relatime) sys on /sys type sysfs (rw,relatime) devpts on /dev/pts type devpts (rw,relatime,mode=600) none on /dev/shm type tmpfs (rw,relatime,size=409600k) /dev/sda1 on /var type xfs (rw,noatime,attr2,delaylog,logbsize=256k,noquota) /dev/sda2 on /music type xfs (rw,noatime,attr2,delaylog,noquota) cgroup on /sys/fs/cgroup type cgroup (rw,relatime,cpu,release_agent=/sbin/rm_cgroup) Apr 27 09:53:01 x4 kernel: rtc_cmos 00:03: setting system clock to 2011-04-27 07:52:58 UTC (1303890778) Apr 27 09:53:01 x4 kernel: XFS (sdb2): Mounting Filesystem Apr 27 09:53:01 x4 kernel: XFS (sdb2): Ending clean mount Apr 27 09:53:01 x4 kernel: VFS: Mounted root (xfs filesystem) readonly on device 8:18. Apr 27 09:53:01 x4 kernel: devtmpfs: mounted Apr 27 09:53:01 x4 kernel: Freeing unused kernel memory: 596k freed Apr 27 09:53:01 x4 kernel: XFS (sda1): Mounting Filesystem Apr 27 09:53:01 x4 kernel: XFS (sda1): Ending clean mount Apr 27 09:53:01 x4 kernel: XFS (sda2): Mounting Filesystem Apr 27 09:53:01 x4 kernel: XFS (sda2): Ending clean mount Apr 27 09:53:01 x4 kernel: ATL1E 0000:02:00.0: irq 40 for MSI/MSI-X Apr 27 09:53:01 x4 kernel: ATL1E 0000:02:00.0: eth0: NIC Link is Up <100 Mbps Full Duplex> Apr 27 09:53:01 x4 kernel: ATL1E 0000:02:00.0: eth0: NIC Link is Up <100 Mbps Full Duplex> Apr 27 09:53:02 x4 kernel: Adding 2097148k swap on /var/tmp/swap/swapfile. Priority:-1 extents:2 across:2679512k Apr 27 14:59:02 x4 kernel: usb 1-3: new high speed USB device number 5 using ehci_hcd Apr 27 15:04:26 x4 kernel: usb 1-3: USB disconnect, device number 5 Apr 27 15:20:44 x4 kernel: usb 1-3: new high speed USB device number 6 using ehci_hcd Apr 27 15:25:36 x4 kernel: usb 1-3: USB disconnect, device number 6 Apr 27 16:14:49 x4 kernel: usb 1-3: new high speed USB device number 7 using ehci_hcd Apr 27 16:16:57 x4 kernel: usb 1-3: USB disconnect, device number 7 Apr 27 21:49:16 x4 kernel: chrome_sandbox (8376): /proc/8374/oom_adj is deprecated, please use /proc/8374/oom_score_adj instead. Apr 27 22:40:49 x4 kernel: CE: hpet increased min_delta_ns to 20113 nsec Apr 28 07:50:59 x4 kernel: usb 1-3: new high speed USB device number 8 using ehci_hcd Apr 28 07:55:54 x4 kernel: usb 1-3: USB disconnect, device number 8 Apr 28 17:38:23 x4 kernel: kvm: 16389: cpu0 unhandled rdmsr: 0xc0010001 Apr 28 20:59:27 x4 kernel: SysRq : Show backtrace of all active CPUs Apr 28 20:59:27 x4 kernel: CPU3: Apr 28 20:59:27 x4 kernel: CPU 3 Apr 28 20:59:27 x4 kernel: Pid: 0, comm: kworker/0:1 Not tainted 2.6.39-rc5 #4 System manufacturer System Product Name/M4A78T-E Apr 28 20:59:27 x4 kernel: RIP: 0010:[<ffffffff81044962>] [<ffffffff81044962>] default_idle+0x42/0x160 Apr 28 20:59:27 x4 kernel: RSP: 0018:ffff88021e94dea8 EFLAGS: 00000246 Apr 28 20:59:27 x4 kernel: RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000020 Apr 28 20:59:27 x4 kernel: RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffffffff8180b8ec Apr 28 20:59:27 x4 kernel: RBP: ffff88021e94dec8 R08: ffffffff816d1480 R09: 0000000000000001 Apr 28 20:59:27 x4 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8150d50e Apr 28 20:59:27 x4 kernel: R13: ffff88021e952400 R14: ffffffff8109cc05 R15: ffff88021e94de48 Apr 28 20:59:27 x4 kernel: FS: 00007f52065e3700(0000) GS:ffff88021fd80000(0000) knlGS:0000000000000000 Apr 28 20:59:27 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 28 20:59:27 x4 kernel: CR2: 00007f520612b7e0 CR3: 0000000110717000 CR4: 00000000000006e0 Apr 28 20:59:27 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 28 20:59:27 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 28 20:59:27 x4 kernel: Process kworker/0:1 (pid: 0, threadinfo ffff88021e94c000, task ffff88021e8bec00) Apr 28 20:59:27 x4 kernel: Stack: Apr 28 20:59:27 x4 kernel: ffff88021e94dee4 ffffffff81725568 ffff88021e94c000 0000000000000000 Apr 28 20:59:27 x4 kernel: ffff88021e94def8 ffffffff81044dd9 ffff88021e94dee8 0000000381094806 Apr 28 20:59:27 x4 kernel: ffff88021e94def8 ffff88021e94dfd8 ffff88021e94df28 ffffffff8103c126 Apr 28 20:59:27 x4 kernel: Call Trace: Apr 28 20:59:27 x4 kernel: [<ffffffff81044dd9>] c1e_idle+0x59/0x100 Apr 28 20:59:27 x4 kernel: [<ffffffff8103c126>] cpu_idle+0x66/0xb0 Apr 28 20:59:27 x4 kernel: [<ffffffff8176e013>] start_secondary+0x18d/0x191 Apr 28 20:59:27 x4 kernel: Code: 00 00 66 66 66 66 90 65 48 8b 04 25 08 b7 00 00 83 a0 3c e0 ff ff fb 0f ae f0 48 8b 80 38 e0 ff ff a8 08 0f 85 12 01 00 00 fb f4 Apr 28 20:59:27 x4 kernel: Call Trace: Apr 28 20:59:27 x4 kernel: [<ffffffff81044dd9>] c1e_idle+0x59/0x100 Apr 28 20:59:27 x4 kernel: [<ffffffff8103c126>] cpu_idle+0x66/0xb0 Apr 28 20:59:27 x4 kernel: [<ffffffff8176e013>] start_secondary+0x18d/0x191 Apr 28 20:59:27 x4 kernel: CPU0: Apr 28 20:59:27 x4 kernel: ffff88021fc03ef0 ffff88021fc03f48 0000000000000046 0000000000000000 Apr 28 20:59:27 x4 kernel: 00000000ffffffff ffffffff81285150 ffff88021cd3be2c ffff88021fc03f38 Apr 28 20:59:27 x4 kernel: ffffffff81040c7c ffff88021fc03f58 ffffffff812851a4 ffffffff8150ea0c Apr 28 20:59:27 x4 kernel: Call Trace: Apr 28 20:59:27 x4 kernel: <IRQ> [<ffffffff81285150>] ? __sysrq_swap_key_ops+0xb0/0xb0 Apr 28 20:59:27 x4 kernel: [<ffffffff81040c7c>] ? show_stack+0x1c/0x20 Apr 28 20:59:27 x4 kernel: [<ffffffff812851a4>] ? showacpu+0x54/0x70 Apr 28 20:59:27 x4 kernel: [<ffffffff8150ea0c>] ? call_softirq+0x1c/0x26 Apr 28 20:59:27 x4 kernel: [<ffffffff810a1fdb>] ? generic_smp_call_function_interrupt+0x8b/0x190 Apr 28 20:59:27 x4 kernel: [<ffffffff81052f17>] ? smp_call_function_interrupt+0x27/0x40 Apr 28 20:59:27 x4 kernel: [<ffffffff8150e693>] ? call_function_interrupt+0x13/0x20 Apr 28 20:59:27 x4 kernel: <EOI> [<ffffffff81116f41>] ? fget_light+0x1/0x100 Apr 28 20:59:27 x4 kernel: [<ffffffff811281b1>] ? do_sys_poll+0x201/0x4e0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126e70>] ? poll_select_copy_remaining+0x140/0x140 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff811e3e11>] ? xfs_aio_write_newsize_update+0x61/0x70 Apr 28 20:59:27 x4 kernel: [<ffffffff811bc5cc>] ? xfs_iunlock+0x12c/0x180 Apr 28 20:59:27 x4 kernel: [<ffffffff811e5389>] ? xfs_file_aio_write+0x269/0x310 Apr 28 20:59:27 x4 kernel: [<ffffffff811e53b1>] ? xfs_file_aio_write+0x291/0x310 Apr 28 20:59:27 x4 kernel: [<ffffffff81070a79>] ? console_unlock+0x199/0x1f0 Apr 28 20:59:27 x4 kernel: [<ffffffff8114c0f2>] ? fsnotify+0x1c2/0x2a0 Apr 28 20:59:27 x4 kernel: [<ffffffff81097901>] ? ktime_get_ts+0xb1/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff811272fa>] ? poll_select_set_timeout+0x7a/0x90 Apr 28 20:59:27 x4 kernel: [<ffffffff81128576>] ? sys_poll+0x76/0x110 Apr 28 20:59:27 x4 kernel: [<ffffffff8150dc6b>] ? system_call_fastpath+0x16/0x1b Apr 28 20:59:50 x4 kernel: SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W) dump-ftrace-buffer(Z) Apr 28 21:00:01 x4 kernel: SysRq : Show Blocked State Apr 28 21:00:01 x4 kernel: task PC stack pid father Apr 28 21:00:01 x4 kernel: khugepaged D 0000000107847b4d 0 557 2 0x00000000 Apr 28 21:00:01 x4 kernel: ffff88021ea73800 0000000000000046 ffff88021ea736f0 ffff880000000000 Apr 28 21:00:01 x4 kernel: ffff88021ea73914 ffff88021ea73910 ffff88021ea73700 ffff88021ea73fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff88021ea73fd8 ffff88021e8bec00 ffff88021ea5c200 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81196965>] ? xfs_bmap_search_extents+0x65/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff8119f8a7>] ? xfs_bmapi+0x2b7/0x1050 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811bc264>] ? xfs_ilock+0x114/0x150 Apr 28 21:00:01 x4 kernel: [<ffffffff811d9af8>] xfs_free_eofblocks+0x158/0x230 Apr 28 21:00:01 x4 kernel: [<ffffffff811db135>] xfs_inactive+0x105/0x430 Apr 28 21:00:01 x4 kernel: [<ffffffff811e935a>] xfs_fs_evict_inode+0xaa/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff8112d895>] evict+0x85/0x170 Apr 28 21:00:01 x4 kernel: [<ffffffff8112d9be>] dispose_list+0x3e/0x50 Apr 28 21:00:01 x4 kernel: [<ffffffff8112e764>] shrink_icache_memory+0x184/0x350 Apr 28 21:00:01 x4 kernel: [<ffffffff810e492c>] shrink_slab+0x11c/0x180 Apr 28 21:00:01 x4 kernel: [<ffffffff810e7817>] try_to_free_pages+0x207/0x490 Apr 28 21:00:01 x4 kernel: [<ffffffff810dd662>] __alloc_pages_nodemask+0x4b2/0x890 Apr 28 21:00:01 x4 kernel: [<ffffffff8107da70>] ? ftrace_raw_event_timer_class+0xb0/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811101f5>] khugepaged_alloc_hugepage+0x55/0xf0 Apr 28 21:00:01 x4 kernel: [<ffffffff8108f3d0>] ? abort_exclusive_wait+0xb0/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff81110855>] khugepaged+0x85/0x1140 Apr 28 21:00:01 x4 kernel: [<ffffffff8108f3d0>] ? abort_exclusive_wait+0xb0/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811107d0>] ? enabled_show+0x20/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e85c>] kthread+0x8c/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e914>] kernel_thread_helper+0x4/0x10 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e7d0>] ? kthread_flush_work_fn+0x20/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e910>] ? gs_change+0xb/0xb Apr 28 21:00:01 x4 kernel: zsh D 0000000000000000 0 1638 1620 0x00000000 Apr 28 21:00:01 x4 kernel: ffff88020bdb5b78 0000000000000086 0000000000000000 ffff88021e957800 Apr 28 21:00:01 x4 kernel: ffff88021fc11740 0000000000000000 0000000000011740 ffff88020bdb5fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff88020bdb5fd8 ffff88021e8bb600 ffff88021e02a400 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81061684>] ? __enqueue_entity+0x84/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff81052dd6>] ? native_smp_send_reschedule+0x46/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff81060188>] ? resched_task+0x68/0x70 Apr 28 21:00:01 x4 kernel: [<ffffffff81064235>] ? check_preempt_curr+0x75/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811dc813>] xfs_symlink+0x263/0x860 Apr 28 21:00:01 x4 kernel: [<ffffffff811bc5b9>] ? xfs_iunlock+0x119/0x180 Apr 28 21:00:01 x4 kernel: [<ffffffff811bc62e>] ? xfs_iunlock_map_shared+0xe/0x10 Apr 28 21:00:01 x4 kernel: [<ffffffff81128f24>] ? __d_instantiate+0x84/0x100 Apr 28 21:00:01 x4 kernel: [<ffffffff811295a9>] ? _d_rehash+0x49/0x50 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b8c4>] ? d_rehash+0x34/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff811e7b26>] ? xfs_vn_lookup+0x86/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff811e8399>] xfs_vn_symlink+0x59/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123d56>] vfs_symlink+0x76/0xc0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123e96>] sys_symlinkat+0xf6/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff81123ec6>] sys_symlink+0x16/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: zsh D 0000000000000000 0 1640 1548 0x00000004 Apr 28 21:00:01 x4 kernel: ffff88020bc63b78 0000000000000086 0000000000000001 ffffffff816efd20 Apr 28 21:00:01 x4 kernel: 0000000000000286 0000000000000000 ffff88021e915330 ffff88020bc63fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff88020bc63fd8 ffff88021e8bb600 ffff88021b011e00 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81127186>] ? poll_freewait+0x56/0xe0 Apr 28 21:00:01 x4 kernel: [<ffffffff81211d95>] ? cpumask_next_and+0x35/0x50 Apr 28 21:00:01 x4 kernel: [<ffffffff81060a05>] ? select_idle_sibling+0x95/0x150 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811dc813>] xfs_symlink+0x263/0x860 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b485>] ? __d_lookup+0xb5/0x170 Apr 28 21:00:01 x4 kernel: [<ffffffff811e8399>] xfs_vn_symlink+0x59/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123d56>] vfs_symlink+0x76/0xc0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123e96>] sys_symlinkat+0xf6/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff81123ec6>] sys_symlink+0x16/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: firefox-bin D 0000000107830cf8 0 22133 1 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801a5dbfae8 0000000000000082 ffff8801a5dbfa18 ffffffff00000000 Apr 28 21:00:01 x4 kernel: ffff88021e904480 ffff88021d6700d0 ffff8801f9cbc280 ffff8801a5dbffd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801a5dbffd8 ffff88021e8bbc00 ffff88021e957800 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff811a3383>] ? xfs_btree_is_lastrec+0x43/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff811a7d2c>] ? xfs_btree_insrec+0x2cc/0x580 Apr 28 21:00:01 x4 kernel: [<ffffffff811a2d54>] ? xfs_btree_setbuf+0x34/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811dd59f>] ? kmem_alloc+0x5f/0xe0 Apr 28 21:00:01 x4 kernel: [<ffffffff811c76aa>] xlog_grant_log_space+0x36a/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d4937>] xfs_rename+0x147/0x750 Apr 28 21:00:01 x4 kernel: [<ffffffff811e7b96>] xfs_vn_rename+0x66/0x70 Apr 28 21:00:01 x4 kernel: [<ffffffff81124300>] vfs_rename+0x120/0x420 Apr 28 21:00:01 x4 kernel: [<ffffffff8111f2bb>] ? __lookup_hash+0x5b/0x140 Apr 28 21:00:01 x4 kernel: [<ffffffff81124807>] sys_renameat+0x207/0x230 Apr 28 21:00:01 x4 kernel: [<ffffffff8111737e>] ? fput+0x14e/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114eb6>] ? filp_close+0x66/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff8112484b>] sys_rename+0x1b/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: firefox-bin D 0000000107825ec6 0 23835 1 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801b228fc98 0000000000000082 ffff8801b228fcb0 0000000000000000 Apr 28 21:00:01 x4 kernel: 0000000000000000 ffff8801b228fdd8 ffff8801b228fdd8 ffff8801b228ffd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801b228ffd8 ffff88021e8bbc00 ffff88021e953c00 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff8111f138>] ? acl_permission_check+0x38/0xe0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bfa6>] __mutex_lock_slowpath+0x116/0x1c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bd52>] mutex_lock+0x22/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff811226bd>] do_last.isra.42+0x30d/0x8a0 Apr 28 21:00:01 x4 kernel: [<ffffffff8112088a>] ? path_init+0x30a/0x3e0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122d14>] path_openat+0xc4/0x370 Apr 28 21:00:01 x4 kernel: [<ffffffff8112218f>] ? user_path_at+0x6f/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123002>] do_filp_open+0x42/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811304cc>] ? alloc_fd+0xec/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff81114d07>] do_sys_open+0xf7/0x1d0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114e00>] sys_open+0x20/0x30 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: okular D 000000010781eb7c 0 27510 1 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801a5e83a68 0000000000000086 ffff8801a5e839e8 ffff880100000000 Apr 28 21:00:01 x4 kernel: ffffffff81217681 ffff88021fd0dda8 ffff8801a5e83a80 ffff8801a5e83fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801a5e83fd8 ffff88021e8bd400 ffff88021ea5f800 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81217681>] ? rb_insert_color+0xa1/0x140 Apr 28 21:00:01 x4 kernel: [<ffffffff810927a8>] ? enqueue_hrtimer+0x38/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81092ce0>] ? lock_hrtimer_base.isra.28+0x30/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8109332d>] ? hrtimer_try_to_cancel+0x8d/0x120 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811daaba>] xfs_setattr+0x57a/0x770 Apr 28 21:00:01 x4 kernel: [<ffffffff811e817b>] xfs_vn_setattr+0x1b/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8112f595>] notify_change+0x125/0x360 Apr 28 21:00:01 x4 kernel: [<ffffffff81113b29>] do_truncate+0x59/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff8112297f>] do_last.isra.42+0x5cf/0x8a0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122d14>] path_openat+0xc4/0x370 Apr 28 21:00:01 x4 kernel: [<ffffffff8145f9dd>] ? sock_aio_read+0x2d/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff81123002>] do_filp_open+0x42/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811304cc>] ? alloc_fd+0xec/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff81114d07>] do_sys_open+0xf7/0x1d0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114e00>] sys_open+0x20/0x30 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: kworker/2:0 D 000000010781d02c 0 6304 2 0x00000000 Apr 28 21:00:01 x4 kernel: ffff880165afdc60 0000000000000046 0000000000000000 0000000000000000 Apr 28 21:00:01 x4 kernel: 0000000000000000 0000000000000000 0000000000000001 ffff880165afdfd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff880165afdfd8 ffff88021e8bd400 ffff88021e091800 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811eb0f0>] ? xfs_sync_inode_attr+0xf0/0xf0 Apr 28 21:00:01 x4 kernel: [<ffffffff811b9913>] xfs_fs_log_dummy+0x43/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff811eb16c>] xfs_sync_worker+0x7c/0x80 Apr 28 21:00:01 x4 kernel: [<ffffffff8108947e>] process_one_work+0x11e/0x4c0 Apr 28 21:00:01 x4 kernel: [<ffffffff81089dfe>] worker_thread+0x15e/0x340 Apr 28 21:00:01 x4 kernel: [<ffffffff81089ca0>] ? manage_workers.isra.28+0x220/0x220 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e85c>] kthread+0x8c/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e914>] kernel_thread_helper+0x4/0x10 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e7d0>] ? kthread_flush_work_fn+0x20/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e910>] ? gs_change+0xb/0xb Apr 28 21:00:01 x4 kernel: procmail D 000000010782a122 0 24304 24249 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801234a3a78 0000000000000086 0000000000000002 ffff880200000000 Apr 28 21:00:01 x4 kernel: 00000000ffffffff 0000000000000001 0000000000000008 ffff8801234a3fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801234a3fd8 ffff88021e8bd400 ffff88021ea5f200 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811db77e>] xfs_create+0x1be/0x5d0 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b8c4>] ? d_rehash+0x34/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b485>] ? __d_lookup+0xb5/0x170 Apr 28 21:00:01 x4 kernel: [<ffffffff811e825d>] xfs_vn_mknod+0x6d/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff811e8330>] xfs_vn_create+0x10/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff81122365>] vfs_create+0x85/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122a23>] do_last.isra.42+0x673/0x8a0 Apr 28 21:00:01 x4 kernel: [<ffffffff81120703>] ? path_init+0x183/0x3e0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122d14>] path_openat+0xc4/0x370 Apr 28 21:00:01 x4 kernel: [<ffffffff8112218f>] ? user_path_at+0x6f/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123002>] do_filp_open+0x42/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811304cc>] ? alloc_fd+0xec/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff81114d07>] do_sys_open+0xf7/0x1d0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114e00>] sys_open+0x20/0x30 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: bogofilter D 000000010784819c 0 27492 27491 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801aeacfdc8 0000000000000082 ffff88021dabc00d ffff880200000000 Apr 28 21:00:01 x4 kernel: ffff88021ebd5a00 000000000000002f ffff8801aeacfd78 ffff8801aeacffd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801aeacffd8 ffffffff816c7020 ffff88021ebd5a00 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff8111f0b9>] ? nameidata_drop_rcu_last+0x89/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81121601>] ? path_lookupat+0xd1/0x7c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bfa6>] __mutex_lock_slowpath+0x116/0x1c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bd52>] mutex_lock+0x22/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff8112315c>] lookup_create+0x2c/0xc0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123669>] sys_mkdirat+0x59/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff811152dd>] ? generic_file_llseek+0x4d/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff81123738>] sys_mkdir+0x18/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b -- Markus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-28 19:45 ` Markus Trippelsdorf 0 siblings, 0 replies; 44+ messages in thread From: Markus Trippelsdorf @ 2011-04-28 19:45 UTC (permalink / raw) To: Bruno Prémont Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Alex Elder On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > up in D state. > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > (+ 1 patch of mine to stop disk on reboot), > > > full dmesg available if needed; kernel config attached (only selected > > > options). In case there is something I should do at next occurrence > > > please tell. Unfortunately I have no trigger for it and it does not > > > happen very often. > > > > > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > [32040.120153] Call Trace: > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > mount options (/proc/mounts) used on you system? I may have hit the same problem today and managed to capture some sysrq-l and sysrq-w output. The system was largely unusable during this incident. I could still switch between X and the console (and press the sysrq key-combination), but I couldn't run any commands in the terminal. x4 ~ # xfs_info / meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks = sectsz=512 attr=2 data = bsize=4096 blocks=7799296, imaxpct=25 = sunit=128 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=3808, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 x4 ~ # xfs_info /var meta-data=/dev/sda1 isize=256 agcount=4, agsize=12800000 blks = sectsz=4096 attr=2 data = bsize=4096 blocks=51200000, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=25000, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 x4 ~ # mount rootfs on / type rootfs (rw) /dev/root on / type xfs (rw,noatime,attr2,delaylog,logbsize=256k,sunit=1024,swidth=1024,noquota) devtmpfs on /dev type devtmpfs (rw,relatime,size=4091272k,nr_inodes=1022818,mode=755) proc on /proc type proc (rw,relatime) sys on /sys type sysfs (rw,relatime) devpts on /dev/pts type devpts (rw,relatime,mode=600) none on /dev/shm type tmpfs (rw,relatime,size=409600k) /dev/sda1 on /var type xfs (rw,noatime,attr2,delaylog,logbsize=256k,noquota) /dev/sda2 on /music type xfs (rw,noatime,attr2,delaylog,noquota) cgroup on /sys/fs/cgroup type cgroup (rw,relatime,cpu,release_agent=/sbin/rm_cgroup) Apr 27 09:53:01 x4 kernel: rtc_cmos 00:03: setting system clock to 2011-04-27 07:52:58 UTC (1303890778) Apr 27 09:53:01 x4 kernel: XFS (sdb2): Mounting Filesystem Apr 27 09:53:01 x4 kernel: XFS (sdb2): Ending clean mount Apr 27 09:53:01 x4 kernel: VFS: Mounted root (xfs filesystem) readonly on device 8:18. Apr 27 09:53:01 x4 kernel: devtmpfs: mounted Apr 27 09:53:01 x4 kernel: Freeing unused kernel memory: 596k freed Apr 27 09:53:01 x4 kernel: XFS (sda1): Mounting Filesystem Apr 27 09:53:01 x4 kernel: XFS (sda1): Ending clean mount Apr 27 09:53:01 x4 kernel: XFS (sda2): Mounting Filesystem Apr 27 09:53:01 x4 kernel: XFS (sda2): Ending clean mount Apr 27 09:53:01 x4 kernel: ATL1E 0000:02:00.0: irq 40 for MSI/MSI-X Apr 27 09:53:01 x4 kernel: ATL1E 0000:02:00.0: eth0: NIC Link is Up <100 Mbps Full Duplex> Apr 27 09:53:01 x4 kernel: ATL1E 0000:02:00.0: eth0: NIC Link is Up <100 Mbps Full Duplex> Apr 27 09:53:02 x4 kernel: Adding 2097148k swap on /var/tmp/swap/swapfile. Priority:-1 extents:2 across:2679512k Apr 27 14:59:02 x4 kernel: usb 1-3: new high speed USB device number 5 using ehci_hcd Apr 27 15:04:26 x4 kernel: usb 1-3: USB disconnect, device number 5 Apr 27 15:20:44 x4 kernel: usb 1-3: new high speed USB device number 6 using ehci_hcd Apr 27 15:25:36 x4 kernel: usb 1-3: USB disconnect, device number 6 Apr 27 16:14:49 x4 kernel: usb 1-3: new high speed USB device number 7 using ehci_hcd Apr 27 16:16:57 x4 kernel: usb 1-3: USB disconnect, device number 7 Apr 27 21:49:16 x4 kernel: chrome_sandbox (8376): /proc/8374/oom_adj is deprecated, please use /proc/8374/oom_score_adj instead. Apr 27 22:40:49 x4 kernel: CE: hpet increased min_delta_ns to 20113 nsec Apr 28 07:50:59 x4 kernel: usb 1-3: new high speed USB device number 8 using ehci_hcd Apr 28 07:55:54 x4 kernel: usb 1-3: USB disconnect, device number 8 Apr 28 17:38:23 x4 kernel: kvm: 16389: cpu0 unhandled rdmsr: 0xc0010001 Apr 28 20:59:27 x4 kernel: SysRq : Show backtrace of all active CPUs Apr 28 20:59:27 x4 kernel: CPU3: Apr 28 20:59:27 x4 kernel: CPU 3 Apr 28 20:59:27 x4 kernel: Pid: 0, comm: kworker/0:1 Not tainted 2.6.39-rc5 #4 System manufacturer System Product Name/M4A78T-E Apr 28 20:59:27 x4 kernel: RIP: 0010:[<ffffffff81044962>] [<ffffffff81044962>] default_idle+0x42/0x160 Apr 28 20:59:27 x4 kernel: RSP: 0018:ffff88021e94dea8 EFLAGS: 00000246 Apr 28 20:59:27 x4 kernel: RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000020 Apr 28 20:59:27 x4 kernel: RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffffffff8180b8ec Apr 28 20:59:27 x4 kernel: RBP: ffff88021e94dec8 R08: ffffffff816d1480 R09: 0000000000000001 Apr 28 20:59:27 x4 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8150d50e Apr 28 20:59:27 x4 kernel: R13: ffff88021e952400 R14: ffffffff8109cc05 R15: ffff88021e94de48 Apr 28 20:59:27 x4 kernel: FS: 00007f52065e3700(0000) GS:ffff88021fd80000(0000) knlGS:0000000000000000 Apr 28 20:59:27 x4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 28 20:59:27 x4 kernel: CR2: 00007f520612b7e0 CR3: 0000000110717000 CR4: 00000000000006e0 Apr 28 20:59:27 x4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 28 20:59:27 x4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 28 20:59:27 x4 kernel: Process kworker/0:1 (pid: 0, threadinfo ffff88021e94c000, task ffff88021e8bec00) Apr 28 20:59:27 x4 kernel: Stack: Apr 28 20:59:27 x4 kernel: ffff88021e94dee4 ffffffff81725568 ffff88021e94c000 0000000000000000 Apr 28 20:59:27 x4 kernel: ffff88021e94def8 ffffffff81044dd9 ffff88021e94dee8 0000000381094806 Apr 28 20:59:27 x4 kernel: ffff88021e94def8 ffff88021e94dfd8 ffff88021e94df28 ffffffff8103c126 Apr 28 20:59:27 x4 kernel: Call Trace: Apr 28 20:59:27 x4 kernel: [<ffffffff81044dd9>] c1e_idle+0x59/0x100 Apr 28 20:59:27 x4 kernel: [<ffffffff8103c126>] cpu_idle+0x66/0xb0 Apr 28 20:59:27 x4 kernel: [<ffffffff8176e013>] start_secondary+0x18d/0x191 Apr 28 20:59:27 x4 kernel: Code: 00 00 66 66 66 66 90 65 48 8b 04 25 08 b7 00 00 83 a0 3c e0 ff ff fb 0f ae f0 48 8b 80 38 e0 ff ff a8 08 0f 85 12 01 00 00 fb f4 Apr 28 20:59:27 x4 kernel: Call Trace: Apr 28 20:59:27 x4 kernel: [<ffffffff81044dd9>] c1e_idle+0x59/0x100 Apr 28 20:59:27 x4 kernel: [<ffffffff8103c126>] cpu_idle+0x66/0xb0 Apr 28 20:59:27 x4 kernel: [<ffffffff8176e013>] start_secondary+0x18d/0x191 Apr 28 20:59:27 x4 kernel: CPU0: Apr 28 20:59:27 x4 kernel: ffff88021fc03ef0 ffff88021fc03f48 0000000000000046 0000000000000000 Apr 28 20:59:27 x4 kernel: 00000000ffffffff ffffffff81285150 ffff88021cd3be2c ffff88021fc03f38 Apr 28 20:59:27 x4 kernel: ffffffff81040c7c ffff88021fc03f58 ffffffff812851a4 ffffffff8150ea0c Apr 28 20:59:27 x4 kernel: Call Trace: Apr 28 20:59:27 x4 kernel: <IRQ> [<ffffffff81285150>] ? __sysrq_swap_key_ops+0xb0/0xb0 Apr 28 20:59:27 x4 kernel: [<ffffffff81040c7c>] ? show_stack+0x1c/0x20 Apr 28 20:59:27 x4 kernel: [<ffffffff812851a4>] ? showacpu+0x54/0x70 Apr 28 20:59:27 x4 kernel: [<ffffffff8150ea0c>] ? call_softirq+0x1c/0x26 Apr 28 20:59:27 x4 kernel: [<ffffffff810a1fdb>] ? generic_smp_call_function_interrupt+0x8b/0x190 Apr 28 20:59:27 x4 kernel: [<ffffffff81052f17>] ? smp_call_function_interrupt+0x27/0x40 Apr 28 20:59:27 x4 kernel: [<ffffffff8150e693>] ? call_function_interrupt+0x13/0x20 Apr 28 20:59:27 x4 kernel: <EOI> [<ffffffff81116f41>] ? fget_light+0x1/0x100 Apr 28 20:59:27 x4 kernel: [<ffffffff811281b1>] ? do_sys_poll+0x201/0x4e0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126e70>] ? poll_select_copy_remaining+0x140/0x140 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff81126f60>] ? __pollwait+0xf0/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff811e3e11>] ? xfs_aio_write_newsize_update+0x61/0x70 Apr 28 20:59:27 x4 kernel: [<ffffffff811bc5cc>] ? xfs_iunlock+0x12c/0x180 Apr 28 20:59:27 x4 kernel: [<ffffffff811e5389>] ? xfs_file_aio_write+0x269/0x310 Apr 28 20:59:27 x4 kernel: [<ffffffff811e53b1>] ? xfs_file_aio_write+0x291/0x310 Apr 28 20:59:27 x4 kernel: [<ffffffff81070a79>] ? console_unlock+0x199/0x1f0 Apr 28 20:59:27 x4 kernel: [<ffffffff8114c0f2>] ? fsnotify+0x1c2/0x2a0 Apr 28 20:59:27 x4 kernel: [<ffffffff81097901>] ? ktime_get_ts+0xb1/0xf0 Apr 28 20:59:27 x4 kernel: [<ffffffff811272fa>] ? poll_select_set_timeout+0x7a/0x90 Apr 28 20:59:27 x4 kernel: [<ffffffff81128576>] ? sys_poll+0x76/0x110 Apr 28 20:59:27 x4 kernel: [<ffffffff8150dc6b>] ? system_call_fastpath+0x16/0x1b Apr 28 20:59:50 x4 kernel: SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W) dump-ftrace-buffer(Z) Apr 28 21:00:01 x4 kernel: SysRq : Show Blocked State Apr 28 21:00:01 x4 kernel: task PC stack pid father Apr 28 21:00:01 x4 kernel: khugepaged D 0000000107847b4d 0 557 2 0x00000000 Apr 28 21:00:01 x4 kernel: ffff88021ea73800 0000000000000046 ffff88021ea736f0 ffff880000000000 Apr 28 21:00:01 x4 kernel: ffff88021ea73914 ffff88021ea73910 ffff88021ea73700 ffff88021ea73fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff88021ea73fd8 ffff88021e8bec00 ffff88021ea5c200 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81196965>] ? xfs_bmap_search_extents+0x65/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff8119f8a7>] ? xfs_bmapi+0x2b7/0x1050 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811bc264>] ? xfs_ilock+0x114/0x150 Apr 28 21:00:01 x4 kernel: [<ffffffff811d9af8>] xfs_free_eofblocks+0x158/0x230 Apr 28 21:00:01 x4 kernel: [<ffffffff811db135>] xfs_inactive+0x105/0x430 Apr 28 21:00:01 x4 kernel: [<ffffffff811e935a>] xfs_fs_evict_inode+0xaa/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff8112d895>] evict+0x85/0x170 Apr 28 21:00:01 x4 kernel: [<ffffffff8112d9be>] dispose_list+0x3e/0x50 Apr 28 21:00:01 x4 kernel: [<ffffffff8112e764>] shrink_icache_memory+0x184/0x350 Apr 28 21:00:01 x4 kernel: [<ffffffff810e492c>] shrink_slab+0x11c/0x180 Apr 28 21:00:01 x4 kernel: [<ffffffff810e7817>] try_to_free_pages+0x207/0x490 Apr 28 21:00:01 x4 kernel: [<ffffffff810dd662>] __alloc_pages_nodemask+0x4b2/0x890 Apr 28 21:00:01 x4 kernel: [<ffffffff8107da70>] ? ftrace_raw_event_timer_class+0xb0/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811101f5>] khugepaged_alloc_hugepage+0x55/0xf0 Apr 28 21:00:01 x4 kernel: [<ffffffff8108f3d0>] ? abort_exclusive_wait+0xb0/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff81110855>] khugepaged+0x85/0x1140 Apr 28 21:00:01 x4 kernel: [<ffffffff8108f3d0>] ? abort_exclusive_wait+0xb0/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811107d0>] ? enabled_show+0x20/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e85c>] kthread+0x8c/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e914>] kernel_thread_helper+0x4/0x10 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e7d0>] ? kthread_flush_work_fn+0x20/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e910>] ? gs_change+0xb/0xb Apr 28 21:00:01 x4 kernel: zsh D 0000000000000000 0 1638 1620 0x00000000 Apr 28 21:00:01 x4 kernel: ffff88020bdb5b78 0000000000000086 0000000000000000 ffff88021e957800 Apr 28 21:00:01 x4 kernel: ffff88021fc11740 0000000000000000 0000000000011740 ffff88020bdb5fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff88020bdb5fd8 ffff88021e8bb600 ffff88021e02a400 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81061684>] ? __enqueue_entity+0x84/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff81052dd6>] ? native_smp_send_reschedule+0x46/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff81060188>] ? resched_task+0x68/0x70 Apr 28 21:00:01 x4 kernel: [<ffffffff81064235>] ? check_preempt_curr+0x75/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811dc813>] xfs_symlink+0x263/0x860 Apr 28 21:00:01 x4 kernel: [<ffffffff811bc5b9>] ? xfs_iunlock+0x119/0x180 Apr 28 21:00:01 x4 kernel: [<ffffffff811bc62e>] ? xfs_iunlock_map_shared+0xe/0x10 Apr 28 21:00:01 x4 kernel: [<ffffffff81128f24>] ? __d_instantiate+0x84/0x100 Apr 28 21:00:01 x4 kernel: [<ffffffff811295a9>] ? _d_rehash+0x49/0x50 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b8c4>] ? d_rehash+0x34/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff811e7b26>] ? xfs_vn_lookup+0x86/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff811e8399>] xfs_vn_symlink+0x59/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123d56>] vfs_symlink+0x76/0xc0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123e96>] sys_symlinkat+0xf6/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff81123ec6>] sys_symlink+0x16/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: zsh D 0000000000000000 0 1640 1548 0x00000004 Apr 28 21:00:01 x4 kernel: ffff88020bc63b78 0000000000000086 0000000000000001 ffffffff816efd20 Apr 28 21:00:01 x4 kernel: 0000000000000286 0000000000000000 ffff88021e915330 ffff88020bc63fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff88020bc63fd8 ffff88021e8bb600 ffff88021b011e00 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81127186>] ? poll_freewait+0x56/0xe0 Apr 28 21:00:01 x4 kernel: [<ffffffff81211d95>] ? cpumask_next_and+0x35/0x50 Apr 28 21:00:01 x4 kernel: [<ffffffff81060a05>] ? select_idle_sibling+0x95/0x150 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811dc813>] xfs_symlink+0x263/0x860 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b485>] ? __d_lookup+0xb5/0x170 Apr 28 21:00:01 x4 kernel: [<ffffffff811e8399>] xfs_vn_symlink+0x59/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123d56>] vfs_symlink+0x76/0xc0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123e96>] sys_symlinkat+0xf6/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff81123ec6>] sys_symlink+0x16/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: firefox-bin D 0000000107830cf8 0 22133 1 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801a5dbfae8 0000000000000082 ffff8801a5dbfa18 ffffffff00000000 Apr 28 21:00:01 x4 kernel: ffff88021e904480 ffff88021d6700d0 ffff8801f9cbc280 ffff8801a5dbffd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801a5dbffd8 ffff88021e8bbc00 ffff88021e957800 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff811a3383>] ? xfs_btree_is_lastrec+0x43/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff811a7d2c>] ? xfs_btree_insrec+0x2cc/0x580 Apr 28 21:00:01 x4 kernel: [<ffffffff811a2d54>] ? xfs_btree_setbuf+0x34/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811dd59f>] ? kmem_alloc+0x5f/0xe0 Apr 28 21:00:01 x4 kernel: [<ffffffff811c76aa>] xlog_grant_log_space+0x36a/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d4937>] xfs_rename+0x147/0x750 Apr 28 21:00:01 x4 kernel: [<ffffffff811e7b96>] xfs_vn_rename+0x66/0x70 Apr 28 21:00:01 x4 kernel: [<ffffffff81124300>] vfs_rename+0x120/0x420 Apr 28 21:00:01 x4 kernel: [<ffffffff8111f2bb>] ? __lookup_hash+0x5b/0x140 Apr 28 21:00:01 x4 kernel: [<ffffffff81124807>] sys_renameat+0x207/0x230 Apr 28 21:00:01 x4 kernel: [<ffffffff8111737e>] ? fput+0x14e/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114eb6>] ? filp_close+0x66/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff8112484b>] sys_rename+0x1b/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: firefox-bin D 0000000107825ec6 0 23835 1 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801b228fc98 0000000000000082 ffff8801b228fcb0 0000000000000000 Apr 28 21:00:01 x4 kernel: 0000000000000000 ffff8801b228fdd8 ffff8801b228fdd8 ffff8801b228ffd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801b228ffd8 ffff88021e8bbc00 ffff88021e953c00 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff8111f138>] ? acl_permission_check+0x38/0xe0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bfa6>] __mutex_lock_slowpath+0x116/0x1c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bd52>] mutex_lock+0x22/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff811226bd>] do_last.isra.42+0x30d/0x8a0 Apr 28 21:00:01 x4 kernel: [<ffffffff8112088a>] ? path_init+0x30a/0x3e0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122d14>] path_openat+0xc4/0x370 Apr 28 21:00:01 x4 kernel: [<ffffffff8112218f>] ? user_path_at+0x6f/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123002>] do_filp_open+0x42/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811304cc>] ? alloc_fd+0xec/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff81114d07>] do_sys_open+0xf7/0x1d0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114e00>] sys_open+0x20/0x30 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: okular D 000000010781eb7c 0 27510 1 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801a5e83a68 0000000000000086 ffff8801a5e839e8 ffff880100000000 Apr 28 21:00:01 x4 kernel: ffffffff81217681 ffff88021fd0dda8 ffff8801a5e83a80 ffff8801a5e83fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801a5e83fd8 ffff88021e8bd400 ffff88021ea5f800 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff81217681>] ? rb_insert_color+0xa1/0x140 Apr 28 21:00:01 x4 kernel: [<ffffffff810927a8>] ? enqueue_hrtimer+0x38/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81092ce0>] ? lock_hrtimer_base.isra.28+0x30/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8109332d>] ? hrtimer_try_to_cancel+0x8d/0x120 Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811daaba>] xfs_setattr+0x57a/0x770 Apr 28 21:00:01 x4 kernel: [<ffffffff811e817b>] xfs_vn_setattr+0x1b/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8112f595>] notify_change+0x125/0x360 Apr 28 21:00:01 x4 kernel: [<ffffffff81113b29>] do_truncate+0x59/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff8112297f>] do_last.isra.42+0x5cf/0x8a0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122d14>] path_openat+0xc4/0x370 Apr 28 21:00:01 x4 kernel: [<ffffffff8145f9dd>] ? sock_aio_read+0x2d/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff81123002>] do_filp_open+0x42/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811304cc>] ? alloc_fd+0xec/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff81114d07>] do_sys_open+0xf7/0x1d0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114e00>] sys_open+0x20/0x30 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: kworker/2:0 D 000000010781d02c 0 6304 2 0x00000000 Apr 28 21:00:01 x4 kernel: ffff880165afdc60 0000000000000046 0000000000000000 0000000000000000 Apr 28 21:00:01 x4 kernel: 0000000000000000 0000000000000000 0000000000000001 ffff880165afdfd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff880165afdfd8 ffff88021e8bd400 ffff88021e091800 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811eb0f0>] ? xfs_sync_inode_attr+0xf0/0xf0 Apr 28 21:00:01 x4 kernel: [<ffffffff811b9913>] xfs_fs_log_dummy+0x43/0x90 Apr 28 21:00:01 x4 kernel: [<ffffffff811eb16c>] xfs_sync_worker+0x7c/0x80 Apr 28 21:00:01 x4 kernel: [<ffffffff8108947e>] process_one_work+0x11e/0x4c0 Apr 28 21:00:01 x4 kernel: [<ffffffff81089dfe>] worker_thread+0x15e/0x340 Apr 28 21:00:01 x4 kernel: [<ffffffff81089ca0>] ? manage_workers.isra.28+0x220/0x220 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e85c>] kthread+0x8c/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e914>] kernel_thread_helper+0x4/0x10 Apr 28 21:00:01 x4 kernel: [<ffffffff8108e7d0>] ? kthread_flush_work_fn+0x20/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150e910>] ? gs_change+0xb/0xb Apr 28 21:00:01 x4 kernel: procmail D 000000010782a122 0 24304 24249 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801234a3a78 0000000000000086 0000000000000002 ffff880200000000 Apr 28 21:00:01 x4 kernel: 00000000ffffffff 0000000000000001 0000000000000008 ffff8801234a3fd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801234a3fd8 ffff88021e8bd400 ffff88021ea5f200 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff811c7956>] xlog_grant_log_space+0x616/0x6c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8121d129>] ? random32+0x39/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff8106af80>] ? try_to_wake_up+0x380/0x380 Apr 28 21:00:01 x4 kernel: [<ffffffff811ca246>] xfs_log_reserve+0x176/0x1f0 Apr 28 21:00:01 x4 kernel: [<ffffffff811d622c>] xfs_trans_reserve+0x9c/0x200 Apr 28 21:00:01 x4 kernel: [<ffffffff811d60b1>] ? xfs_trans_alloc+0xa1/0xb0 Apr 28 21:00:01 x4 kernel: [<ffffffff811db77e>] xfs_create+0x1be/0x5d0 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b8c4>] ? d_rehash+0x34/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff8112b485>] ? __d_lookup+0xb5/0x170 Apr 28 21:00:01 x4 kernel: [<ffffffff811e825d>] xfs_vn_mknod+0x6d/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff811e8330>] xfs_vn_create+0x10/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff81122365>] vfs_create+0x85/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122a23>] do_last.isra.42+0x673/0x8a0 Apr 28 21:00:01 x4 kernel: [<ffffffff81120703>] ? path_init+0x183/0x3e0 Apr 28 21:00:01 x4 kernel: [<ffffffff81122d14>] path_openat+0xc4/0x370 Apr 28 21:00:01 x4 kernel: [<ffffffff8112218f>] ? user_path_at+0x6f/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123002>] do_filp_open+0x42/0xa0 Apr 28 21:00:01 x4 kernel: [<ffffffff811304cc>] ? alloc_fd+0xec/0x130 Apr 28 21:00:01 x4 kernel: [<ffffffff81114d07>] do_sys_open+0xf7/0x1d0 Apr 28 21:00:01 x4 kernel: [<ffffffff81114e00>] sys_open+0x20/0x30 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b Apr 28 21:00:01 x4 kernel: bogofilter D 000000010784819c 0 27492 27491 0x00000000 Apr 28 21:00:01 x4 kernel: ffff8801aeacfdc8 0000000000000082 ffff88021dabc00d ffff880200000000 Apr 28 21:00:01 x4 kernel: ffff88021ebd5a00 000000000000002f ffff8801aeacfd78 ffff8801aeacffd8 Apr 28 21:00:01 x4 kernel: 0000000000004000 ffff8801aeacffd8 ffffffff816c7020 ffff88021ebd5a00 Apr 28 21:00:01 x4 kernel: Call Trace: Apr 28 21:00:01 x4 kernel: [<ffffffff8111f0b9>] ? nameidata_drop_rcu_last+0x89/0xd0 Apr 28 21:00:01 x4 kernel: [<ffffffff81121601>] ? path_lookupat+0xd1/0x7c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bfa6>] __mutex_lock_slowpath+0x116/0x1c0 Apr 28 21:00:01 x4 kernel: [<ffffffff8150bd52>] mutex_lock+0x22/0x40 Apr 28 21:00:01 x4 kernel: [<ffffffff8112315c>] lookup_create+0x2c/0xc0 Apr 28 21:00:01 x4 kernel: [<ffffffff81123669>] sys_mkdirat+0x59/0x110 Apr 28 21:00:01 x4 kernel: [<ffffffff811152dd>] ? generic_file_llseek+0x4d/0x60 Apr 28 21:00:01 x4 kernel: [<ffffffff81123738>] sys_mkdir+0x18/0x20 Apr 28 21:00:01 x4 kernel: [<ffffffff8150dc6b>] system_call_fastpath+0x16/0x1b -- Markus _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-28 19:45 ` Markus Trippelsdorf @ 2011-04-29 1:19 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-04-29 1:19 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > up in D state. > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > (+ 1 patch of mine to stop disk on reboot), > > > > full dmesg available if needed; kernel config attached (only selected > > > > options). In case there is something I should do at next occurrence > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > happen very often. > > > > > > > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > [32040.120153] Call Trace: > > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > > mount options (/proc/mounts) used on you system? > > I may have hit the same problem today and managed to capture some sysrq-l > and sysrq-w output. > > The system was largely unusable during this incident. I could still > switch between X and the console (and press the sysrq key-combination), > but I couldn't run any commands in the terminal. > > > x4 ~ # xfs_info / > meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=7799296, imaxpct=25 > = sunit=128 swidth=128 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=3808, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 OK, so the common elements here appears to be root filesystems with small log sizes, which means they are tail pushing all the time metadata operations are in progress. Definitely seems like a race in the AIL workqueue trigger mechanism. I'll see if I can reproduce this and cook up a patch to fix it. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-29 1:19 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-04-29 1:19 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > up in D state. > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > (+ 1 patch of mine to stop disk on reboot), > > > > full dmesg available if needed; kernel config attached (only selected > > > > options). In case there is something I should do at next occurrence > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > happen very often. > > > > > > > > [ 0.000000] Linux version 2.6.39-rc4-00120-g73b5b55 (kbuild@neptune) (gcc version 4.4.5 (Gentoo 4.4.5 p1.2, pie-0.4.5) ) #12 Thu Apr 21 19:28:45 CEST 2011 > > > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > [32040.120153] Call Trace: > > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > > mount options (/proc/mounts) used on you system? > > I may have hit the same problem today and managed to capture some sysrq-l > and sysrq-w output. > > The system was largely unusable during this incident. I could still > switch between X and the console (and press the sysrq key-combination), > but I couldn't run any commands in the terminal. > > > x4 ~ # xfs_info / > meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=7799296, imaxpct=25 > = sunit=128 swidth=128 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=3808, version=2 > = sectsz=512 sunit=8 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 OK, so the common elements here appears to be root filesystems with small log sizes, which means they are tail pushing all the time metadata operations are in progress. Definitely seems like a race in the AIL workqueue trigger mechanism. I'll see if I can reproduce this and cook up a patch to fix it. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-29 1:19 ` Dave Chinner @ 2011-04-29 15:18 ` Markus Trippelsdorf -1 siblings, 0 replies; 44+ messages in thread From: Markus Trippelsdorf @ 2011-04-29 15:18 UTC (permalink / raw) To: Dave Chinner Cc: Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel, James Bottomley On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > > On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > > up in D state. > > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > > (+ 1 patch of mine to stop disk on reboot), > > > > > full dmesg available if needed; kernel config attached (only selected > > > > > options). In case there is something I should do at next occurrence > > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > > happen very often. > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > > [32040.120153] Call Trace: > > > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > > > mount options (/proc/mounts) used on you system? > > > > I may have hit the same problem today and managed to capture some sysrq-l > > and sysrq-w output. > > > > The system was largely unusable during this incident. I could still > > switch between X and the console (and press the sysrq key-combination), > > but I couldn't run any commands in the terminal. > > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Definitely seems like a > race in the AIL workqueue trigger mechanism. I'll see if I can > reproduce this and cook up a patch to fix it. Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, that Thomas Gleixner fixed yesterday: http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 http://thread.gmane.org/gmane.linux.kernel.mm/61909/ It also looks similar to the issue that James Bottomley reported earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ -- Markus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-29 15:18 ` Markus Trippelsdorf 0 siblings, 0 replies; 44+ messages in thread From: Markus Trippelsdorf @ 2011-04-29 15:18 UTC (permalink / raw) To: Dave Chinner Cc: James Bottomley, Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > > On Wed, 27 April 2011 Dave Chinner <david@fromorbit.com> wrote: > > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > > up in D state. > > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > > (+ 1 patch of mine to stop disk on reboot), > > > > > full dmesg available if needed; kernel config attached (only selected > > > > > options). In case there is something I should do at next occurrence > > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > > happen very often. > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > > [32040.120153] Call Trace: > > > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > > > mount options (/proc/mounts) used on you system? > > > > I may have hit the same problem today and managed to capture some sysrq-l > > and sysrq-w output. > > > > The system was largely unusable during this incident. I could still > > switch between X and the console (and press the sysrq key-combination), > > but I couldn't run any commands in the terminal. > > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Definitely seems like a > race in the AIL workqueue trigger mechanism. I'll see if I can > reproduce this and cook up a patch to fix it. Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, that Thomas Gleixner fixed yesterday: http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 http://thread.gmane.org/gmane.linux.kernel.mm/61909/ It also looks similar to the issue that James Bottomley reported earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ -- Markus _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-29 15:18 ` Markus Trippelsdorf @ 2011-04-29 19:35 ` Bruno Prémont -1 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-29 19:35 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Dave Chinner, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel, James Bottomley On Fri, 29 April 2011 Markus Trippelsdorf wrote: > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > > > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > > > On Wed, 27 April 2011 Dave Chinner wrote: > > > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > > > up in D state. > > > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > > > (+ 1 patch of mine to stop disk on reboot), > > > > > > full dmesg available if needed; kernel config attached (only selected > > > > > > options). In case there is something I should do at next occurrence > > > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > > > happen very often. > > > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > > > [32040.120153] Call Trace: > > > > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > > > > mount options (/proc/mounts) used on you system? > > > > > > I may have hit the same problem today and managed to capture some sysrq-l > > > and sysrq-w output. > > > > > > The system was largely unusable during this incident. I could still > > > switch between X and the console (and press the sysrq key-combination), > > > but I couldn't run any commands in the terminal. > > > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > that Thomas Gleixner fixed yesterday: > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > It also looks similar to the issue that James Bottomley reported > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without other changes to kernel). Going to run that kernel for the week-end and beyond if it survives to see what happens. Bruno ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-29 19:35 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-29 19:35 UTC (permalink / raw) To: Markus Trippelsdorf Cc: James Bottomley, Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Alex Elder On Fri, 29 April 2011 Markus Trippelsdorf wrote: > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > On Thu, Apr 28, 2011 at 09:45:28PM +0200, Markus Trippelsdorf wrote: > > > On 2011.04.27 at 18:26 +0200, Bruno Prémont wrote: > > > > On Wed, 27 April 2011 Dave Chinner wrote: > > > > > On Sat, Apr 23, 2011 at 10:44:03PM +0200, Bruno Prémont wrote: > > > > > > Running 2.6.39-rc3+ and now again on 2.6.39-rc4+ (I've not tested -rc1 > > > > > > or -rc2) I've hit a "dying machine" where processes writing to disk end > > > > > > up in D state. > > > > > > From occurrence with -rc3+ I don't have logs as those never hit the disk, > > > > > > for -rc4+ I have the following (sysrq+t was too big, what I have of it > > > > > > misses a dozen of kernel tasks - if needed, please ask): > > > > > > > > > > > > The -rc4 kernel is at commit 584f79046780e10cb24367a691f8c28398a00e84 > > > > > > (+ 1 patch of mine to stop disk on reboot), > > > > > > full dmesg available if needed; kernel config attached (only selected > > > > > > options). In case there is something I should do at next occurrence > > > > > > please tell. Unfortunately I have no trigger for it and it does not > > > > > > happen very often. > > > > > > > > > > > > [32040.120055] INFO: task flush-8:0:1665 blocked for more than 120 seconds. > > > > > > [32040.120068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > > [32040.120077] flush-8:0 D 00000000 4908 1665 2 0x00000000 > > > > > > [32040.120099] f55efb5c 00000046 00000000 00000000 00000000 00000001 e0382924 00000000 > > > > > > [32040.120118] f55efb0c f55efb5c 00000004 f629ba70 572f01a2 00001cfe f629ba70 ffffffc0 > > > > > > [32040.120135] f55efc68 f55efb30 f889d7f8 f55efb20 00000000 f55efc68 e0382900 f55efc94 > > > > > > [32040.120153] Call Trace: > > > > > > [32040.120220] [<f889d7f8>] ? xfs_bmap_search_multi_extents+0x88/0xe0 [xfs] > > > > > > [32040.120239] [<c109ce1d>] ? kmem_cache_alloc+0x2d/0x110 > > > > > > [32040.120294] [<f88c88ca>] ? xlog_space_left+0x2a/0xc0 [xfs] > > > > > > [32040.120346] [<f88c85cb>] xlog_wait+0x4b/0x70 [xfs] > > > > > > [32040.120359] [<c102ca00>] ? try_to_wake_up+0xc0/0xc0 > > > > > > [32040.120411] [<f88c948b>] xlog_grant_log_space+0x8b/0x240 [xfs] > > > > > > [32040.120464] [<f88c936e>] ? xlog_grant_push_ail+0xbe/0xf0 [xfs] > > > > > > [32040.120516] [<f88c99db>] xfs_log_reserve+0xab/0xb0 [xfs] > > > > > > [32040.120571] [<f88d6dc8>] xfs_trans_reserve+0x78/0x1f0 [xfs] > > > > > > > > > > Hmmmmm. That may be caused by the conversion of the xfsaild to a > > > > > work queue. Can you post the output of "xfs_info <mntpt>" and the > > > > > mount options (/proc/mounts) used on you system? > > > > > > I may have hit the same problem today and managed to capture some sysrq-l > > > and sysrq-w output. > > > > > > The system was largely unusable during this incident. I could still > > > switch between X and the console (and press the sysrq key-combination), > > > but I couldn't run any commands in the terminal. > > > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > that Thomas Gleixner fixed yesterday: > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > It also looks similar to the issue that James Bottomley reported > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without other changes to kernel). Going to run that kernel for the week-end and beyond if it survives to see what happens. Bruno _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-29 19:35 ` Bruno Prémont @ 2011-04-30 14:18 ` Bruno Prémont -1 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-30 14:18 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Dave Chinner, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel, James Bottomley On Fri, 29 April 2011 Bruno Prémont wrote: > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > OK, so the common elements here appears to be root filesystems > > > with small log sizes, which means they are tail pushing all the > > > time metadata operations are in progress. Definitely seems like a > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > reproduce this and cook up a patch to fix it. > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > that Thomas Gleixner fixed yesterday: > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > It also looks similar to the issue that James Bottomley reported > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > other changes to kernel). > Going to run that kernel for the week-end and beyond if it survives to see what > happens. Happened again (after a few hours of uptime), so it definitely is not caused by hrtimer bug that Thomas Gleixner fixed. Bruno ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-04-30 14:18 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-04-30 14:18 UTC (permalink / raw) To: Markus Trippelsdorf Cc: James Bottomley, Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Alex Elder On Fri, 29 April 2011 Bruno Prémont wrote: > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > OK, so the common elements here appears to be root filesystems > > > with small log sizes, which means they are tail pushing all the > > > time metadata operations are in progress. Definitely seems like a > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > reproduce this and cook up a patch to fix it. > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > that Thomas Gleixner fixed yesterday: > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > It also looks similar to the issue that James Bottomley reported > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > other changes to kernel). > Going to run that kernel for the week-end and beyond if it survives to see what > happens. Happened again (after a few hours of uptime), so it definitely is not caused by hrtimer bug that Thomas Gleixner fixed. Bruno _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-30 14:18 ` Bruno Prémont @ 2011-05-02 6:15 ` Markus Trippelsdorf -1 siblings, 0 replies; 44+ messages in thread From: Markus Trippelsdorf @ 2011-05-02 6:15 UTC (permalink / raw) To: Bruno Prémont Cc: Dave Chinner, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel, James Bottomley On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote: > On Fri, 29 April 2011 Bruno Prémont wrote: > > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > > OK, so the common elements here appears to be root filesystems > > > > with small log sizes, which means they are tail pushing all the > > > > time metadata operations are in progress. Definitely seems like a > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > reproduce this and cook up a patch to fix it. > > > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > > that Thomas Gleixner fixed yesterday: > > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > > > It also looks similar to the issue that James Bottomley reported > > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > > other changes to kernel). > > Going to run that kernel for the week-end and beyond if it survives to see what > > happens. > > Happened again (after a few hours of uptime), so it definitely is not > caused by hrtimer bug that Thomas Gleixner fixed. I've enabled lock debugging and this is what happened after a few hours uptime. (I can't tell if this is a false positive): ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.39-rc5-00130-g3fd9952 #10 ------------------------------------------------------- kio_file/7364 is trying to acquire lock: (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 but task is already holding lock: (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (xfs_iolock_active){++++++}: [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff8109d9bf>] down_write_nested+0x2f/0x60 [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 [<ffffffff812061d6>] xfs_file_buffered_aio_write+0x66/0x290 [<ffffffff81206561>] xfs_file_aio_write+0x161/0x300 [<ffffffff811261f2>] do_sync_write+0xd2/0x110 [<ffffffff811262df>] vfs_write+0xaf/0x160 [<ffffffff8112646a>] sys_write+0x4a/0x90 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}: [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370 [<ffffffff81135afd>] vfs_rename+0xed/0x420 [<ffffffff81136037>] sys_renameat+0x207/0x230 [<ffffffff8113607b>] sys_rename+0x1b/0x20 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b -> #0 (&sb->s_type->i_mutex_key#5/2){+.+...}: [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90 [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370 [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250 [<ffffffff8115024e>] do_splice_from+0x7e/0xb0 [<ffffffff811502a0>] direct_splice_actor+0x20/0x30 [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0 [<ffffffff81152158>] do_splice_direct+0x78/0x90 [<ffffffff81125f72>] do_sendfile+0x182/0x1d0 [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by kio_file/7364: #0: (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 stack backtrace: Pid: 7364, comm: kio_file Not tainted 2.6.39-rc5-00130-g3fd9952 #10 Call Trace: [<ffffffff81543f4c>] print_circular_bug+0xb8/0xc7 [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90 [<ffffffff8115153d>] ? __generic_file_splice_read+0x1cd/0x5c0 [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180 [<ffffffff8148de90>] ? sock_def_write_space+0x140/0x140 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180 [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250 [<ffffffff8120510f>] ? xfs_file_splice_read+0xef/0x220 [<ffffffff8115024e>] do_splice_from+0x7e/0xb0 [<ffffffff811502a0>] direct_splice_actor+0x20/0x30 [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0 [<ffffffff81150280>] ? do_splice_from+0xb0/0xb0 [<ffffffff81152158>] do_splice_direct+0x78/0x90 [<ffffffff81125f72>] do_sendfile+0x182/0x1d0 [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b -- Markus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-02 6:15 ` Markus Trippelsdorf 0 siblings, 0 replies; 44+ messages in thread From: Markus Trippelsdorf @ 2011-05-02 6:15 UTC (permalink / raw) To: Bruno Prémont Cc: James Bottomley, Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Alex Elder On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote: > On Fri, 29 April 2011 Bruno Prémont wrote: > > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > > OK, so the common elements here appears to be root filesystems > > > > with small log sizes, which means they are tail pushing all the > > > > time metadata operations are in progress. Definitely seems like a > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > reproduce this and cook up a patch to fix it. > > > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > > that Thomas Gleixner fixed yesterday: > > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > > > It also looks similar to the issue that James Bottomley reported > > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > > other changes to kernel). > > Going to run that kernel for the week-end and beyond if it survives to see what > > happens. > > Happened again (after a few hours of uptime), so it definitely is not > caused by hrtimer bug that Thomas Gleixner fixed. I've enabled lock debugging and this is what happened after a few hours uptime. (I can't tell if this is a false positive): ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.39-rc5-00130-g3fd9952 #10 ------------------------------------------------------- kio_file/7364 is trying to acquire lock: (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 but task is already holding lock: (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (xfs_iolock_active){++++++}: [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff8109d9bf>] down_write_nested+0x2f/0x60 [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 [<ffffffff812061d6>] xfs_file_buffered_aio_write+0x66/0x290 [<ffffffff81206561>] xfs_file_aio_write+0x161/0x300 [<ffffffff811261f2>] do_sync_write+0xd2/0x110 [<ffffffff811262df>] vfs_write+0xaf/0x160 [<ffffffff8112646a>] sys_write+0x4a/0x90 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}: [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370 [<ffffffff81135afd>] vfs_rename+0xed/0x420 [<ffffffff81136037>] sys_renameat+0x207/0x230 [<ffffffff8113607b>] sys_rename+0x1b/0x20 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b -> #0 (&sb->s_type->i_mutex_key#5/2){+.+...}: [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90 [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370 [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250 [<ffffffff8115024e>] do_splice_from+0x7e/0xb0 [<ffffffff811502a0>] direct_splice_actor+0x20/0x30 [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0 [<ffffffff81152158>] do_splice_direct+0x78/0x90 [<ffffffff81125f72>] do_sendfile+0x182/0x1d0 [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by kio_file/7364: #0: (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 stack backtrace: Pid: 7364, comm: kio_file Not tainted 2.6.39-rc5-00130-g3fd9952 #10 Call Trace: [<ffffffff81543f4c>] print_circular_bug+0xb8/0xc7 [<ffffffff810ac21f>] __lock_acquire+0x169f/0x1b90 [<ffffffff8115153d>] ? __generic_file_splice_read+0x1cd/0x5c0 [<ffffffff810ae8a2>] lock_acquire+0x92/0x1f0 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180 [<ffffffff8148de90>] ? sock_def_write_space+0x140/0x140 [<ffffffff81549981>] mutex_lock_nested+0x51/0x370 [<ffffffff81151e4e>] ? generic_file_splice_write+0xce/0x180 [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 [<ffffffff81205334>] xfs_file_splice_write+0xf4/0x250 [<ffffffff8120510f>] ? xfs_file_splice_read+0xef/0x220 [<ffffffff8115024e>] do_splice_from+0x7e/0xb0 [<ffffffff811502a0>] direct_splice_actor+0x20/0x30 [<ffffffff81151fde>] splice_direct_to_actor+0xbe/0x1c0 [<ffffffff81150280>] ? do_splice_from+0xb0/0xb0 [<ffffffff81152158>] do_splice_direct+0x78/0x90 [<ffffffff81125f72>] do_sendfile+0x182/0x1d0 [<ffffffff81126f1a>] sys_sendfile64+0x5a/0xb0 [<ffffffff8154cc6b>] system_call_fastpath+0x16/0x1b -- Markus _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-02 6:15 ` Markus Trippelsdorf @ 2011-05-02 12:40 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-02 12:40 UTC (permalink / raw) To: Markus Trippelsdorf Cc: Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel, James Bottomley On Mon, May 02, 2011 at 08:15:28AM +0200, Markus Trippelsdorf wrote: > On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote: > > On Fri, 29 April 2011 Bruno Prémont wrote: > > > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > > > that Thomas Gleixner fixed yesterday: > > > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > > > > > It also looks similar to the issue that James Bottomley reported > > > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > > > > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > > > other changes to kernel). > > > Going to run that kernel for the week-end and beyond if it survives to see what > > > happens. > > > > Happened again (after a few hours of uptime), so it definitely is not > > caused by hrtimer bug that Thomas Gleixner fixed. > > I've enabled lock debugging and this is what happened after a few hours > uptime. (I can't tell if this is a false positive): > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.39-rc5-00130-g3fd9952 #10 > ------------------------------------------------------- > kio_file/7364 is trying to acquire lock: > (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 > > but task is already holding lock: > (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 > > which lock already depends on the new lock. Known problem. Been broken for ages, yet I only first saw a lockdep report for this about a week ago on a 2.6.32 kernel.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-02 12:40 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-02 12:40 UTC (permalink / raw) To: Markus Trippelsdorf Cc: James Bottomley, Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder On Mon, May 02, 2011 at 08:15:28AM +0200, Markus Trippelsdorf wrote: > On 2011.04.30 at 16:18 +0200, Bruno Prémont wrote: > > On Fri, 29 April 2011 Bruno Prémont wrote: > > > On Fri, 29 April 2011 Markus Trippelsdorf wrote: > > > > On 2011.04.29 at 11:19 +1000, Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Hmm, I'm wondering if this issue is somehow related to the hrtimer bug, > > > > that Thomas Gleixner fixed yesterday: > > > > http://git.us.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=ce31332d3c77532d6ea97ddcb475a2b02dd358b4 > > > > http://thread.gmane.org/gmane.linux.kernel.mm/61909/ > > > > > > > > It also looks similar to the issue that James Bottomley reported > > > > earlier: http://thread.gmane.org/gmane.linux.kernel.mm/62185/ > > > > > > I'm going to see, I've applied Thomas' fix on the box seeing XFS freeze (without > > > other changes to kernel). > > > Going to run that kernel for the week-end and beyond if it survives to see what > > > happens. > > > > Happened again (after a few hours of uptime), so it definitely is not > > caused by hrtimer bug that Thomas Gleixner fixed. > > I've enabled lock debugging and this is what happened after a few hours > uptime. (I can't tell if this is a false positive): > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.39-rc5-00130-g3fd9952 #10 > ------------------------------------------------------- > kio_file/7364 is trying to acquire lock: > (&sb->s_type->i_mutex_key#5/2){+.+...}, at: [<ffffffff81151e4e>] generic_file_splice_write+0xce/0x180 > > but task is already holding lock: > (xfs_iolock_active){++++++}, at: [<ffffffff811d7d05>] xfs_ilock+0x125/0x1f0 > > which lock already depends on the new lock. Known problem. Been broken for ages, yet I only first saw a lockdep report for this about a week ago on a 2.6.32 kernel.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-29 1:19 ` Dave Chinner @ 2011-05-04 0:57 ` Jamie Heilman -1 siblings, 0 replies; 44+ messages in thread From: Jamie Heilman @ 2011-05-04 0:57 UTC (permalink / raw) To: linux-kernel Cc: Dave Chinner, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner Dave Chinner wrote: > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Definitely seems like a > race in the AIL workqueue trigger mechanism. I'll see if I can > reproduce this and cook up a patch to fix it. Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and other assorted feedback wrt this issue? I've had it happen twice now myself in the past week or so, though I have no reliable reproduction technique. Just wondering if more data points will help isolate the cause, and if so, how to be prepared to get them. For whatever its worth, my last lockup was while running 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. root@cucamonga:~# grep xfs /proc/mounts /dev/mapper/S-root / xfs rw,relatime,attr2,delaylog,noquota 0 0 /dev/mapper/S-var /var xfs rw,noatime,attr2,delaylog,inode64,noquota 0 0 root@cucamonga:~# xfs_info /var meta-data=/dev/mapper/S-var isize=256 agcount=4, agsize=6553600 blks = sectsz=512 attr=2 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=12800, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@cucamonga:~# xfs_info / meta-data=/dev/mapper/S-root isize=256 agcount=4, agsize=524288 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 those are lvm volumes on top of a md raid1 partition, though from the looks of everybody else's reports, that's not likely relevant. sysctl-w and l follow: May 3 07:12:28 cucamonga kernel: SysRq : Show Blocked State May 3 07:12:28 cucamonga kernel: task PC stack pid father May 3 07:12:28 cucamonga kernel: mutt D ffff88007bf89f40 0 2938 2921 0x00000000 May 3 07:12:28 cucamonga kernel: ffff88007ac0fb28 0000000000000046 ffff88007ac0fae8 ffffffff81095066 May 3 07:12:28 cucamonga kernel: ffff88007a8c4570 ffff88007ac0ffd8 00000000000112c0 ffff88007ac0ffd8 May 3 07:12:28 cucamonga kernel: ffff88007f0dcbc0 ffff88007a8c4570 ffff88007ac0fbb8 ffffffff81098363 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [<ffffffff81095066>] ? __pagevec_free+0x70/0x82 May 3 07:12:28 cucamonga kernel: [<ffffffff81098363>] ? release_pages+0x181/0x193 May 3 07:12:28 cucamonga kernel: [<ffffffffa007ef3c>] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff8102e454>] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [<ffffffffa007ff60>] xlog_grant_log_space+0x129/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008cc74>] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0081e2c>] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008b8ab>] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008f7c4>] xfs_inactive+0x16a/0x39a [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009ab94>] xfs_fs_evict_inode+0xc7/0xcf [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff810de311>] evict+0x81/0x125 May 3 07:12:28 cucamonga kernel: [<ffffffff810de53a>] iput+0x14b/0x153 May 3 07:12:28 cucamonga kernel: [<ffffffff810db5b5>] dentry_kill+0x127/0x149 May 3 07:12:28 cucamonga kernel: [<ffffffff810dbb1b>] dput+0xde/0xee May 3 07:12:28 cucamonga kernel: [<ffffffff810cc0e1>] fput+0x192/0x1aa May 3 07:12:28 cucamonga kernel: [<ffffffff810acb4b>] remove_vma+0x3c/0x64 May 3 07:12:28 cucamonga kernel: [<ffffffff810adfa2>] exit_mmap+0xbe/0xd9 May 3 07:12:28 cucamonga kernel: [<ffffffff8102fcfd>] mmput+0x5b/0x104 May 3 07:12:28 cucamonga kernel: [<ffffffff81033976>] exit_mm+0x125/0x132 May 3 07:12:28 cucamonga kernel: [<ffffffff810635a3>] ? acct_collect+0x176/0x182 May 3 07:12:28 cucamonga kernel: [<ffffffff81034ebd>] do_exit+0x21d/0x70a May 3 07:12:28 cucamonga kernel: [<ffffffff810ca4b2>] ? fsnotify_modify+0x5f/0x67 May 3 07:12:28 cucamonga kernel: [<ffffffffa03e7257>] ? kvm_on_user_return+0x4d/0x4f [kvm] May 3 07:12:28 cucamonga kernel: [<ffffffff8108e03f>] ? fire_user_return_notifiers+0x3c/0x65 May 3 07:12:28 cucamonga kernel: [<ffffffff8103564d>] do_group_exit+0x76/0x9e May 3 07:12:28 cucamonga kernel: [<ffffffff8103568c>] sys_exit_group+0x17/0x17 May 3 07:12:28 cucamonga kernel: [<ffffffff81334deb>] system_call_fastpath+0x16/0x1b May 3 07:12:28 cucamonga kernel: kworker/1:0 D ffffffff81341180 0 24351 2 0x00000000 May 3 07:12:28 cucamonga kernel: ffff8800046c1ca0 0000000000000046 ffff880000000000 0000000100000000 May 3 07:12:28 cucamonga kernel: ffff88007c230ca0 ffff8800046c1fd8 00000000000112c0 ffff8800046c1fd8 May 3 07:12:28 cucamonga kernel: ffff88007f0a3f20 ffff88007c230ca0 0000000000000000 0000000100000000 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [<ffffffffa007ef3c>] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff8102e454>] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [<ffffffffa007ff60>] xlog_grant_log_space+0x129/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008cc74>] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0081e2c>] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008b8ab>] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009c6aa>] ? xfs_reclaim_inode+0x23b/0x23b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009c6aa>] ? xfs_reclaim_inode+0x23b/0x23b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0075123>] xfs_fs_log_dummy+0x43/0x7f [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009c6ed>] xfs_sync_worker+0x43/0x69 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff81047035>] process_one_work+0x179/0x295 May 3 07:12:28 cucamonga kernel: [<ffffffff81048456>] worker_thread+0xd4/0x158 May 3 07:12:28 cucamonga kernel: [<ffffffff81048382>] ? manage_workers.isra.23+0x170/0x170 May 3 07:12:28 cucamonga kernel: [<ffffffff81048382>] ? manage_workers.isra.23+0x170/0x170 May 3 07:12:28 cucamonga kernel: [<ffffffff8104b60c>] kthread+0x84/0x8c May 3 07:12:28 cucamonga kernel: [<ffffffff81335a54>] kernel_thread_helper+0x4/0x10 May 3 07:12:28 cucamonga kernel: [<ffffffff8104b588>] ? kthread_worker_fn+0x116/0x116 May 3 07:12:28 cucamonga kernel: [<ffffffff81335a50>] ? gs_change+0xb/0xb May 3 07:12:28 cucamonga kernel: dpkg D ffffffff81341180 0 28235 24677 0x00000000 May 3 07:12:28 cucamonga kernel: ffff88001b70dc98 0000000000000082 0000000000000001 ffff880000000000 May 3 07:12:28 cucamonga kernel: ffff88007f0a5eb0 ffff88001b70dfd8 00000000000112c0 ffff88001b70dfd8 May 3 07:12:28 cucamonga kernel: ffffffff81499020 ffff88007f0a5eb0 ffff88001b70dc88 000000011b70dcc8 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [<ffffffffa007ef3c>] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff8102e454>] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [<ffffffffa008007e>] xlog_grant_log_space+0x247/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008cc74>] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0081e2c>] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008b8ab>] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008e390>] xfs_free_eofblocks+0x153/0x1e2 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008f622>] xfs_release+0x178/0x1b0 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa00969e1>] xfs_file_release+0x15/0x19 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff810cc04c>] fput+0xfd/0x1aa May 3 07:12:28 cucamonga kernel: [<ffffffff810c9120>] filp_close+0x6e/0x7a May 3 07:12:28 cucamonga kernel: [<ffffffff810c91d9>] sys_close+0xad/0xef May 3 07:12:28 cucamonga kernel: [<ffffffff81334deb>] system_call_fastpath+0x16/0x1b May 3 07:22:26 cucamonga kernel: SysRq : Show backtrace of all active CPUs May 3 07:22:26 cucamonga kernel: CPU1: May 3 07:22:26 cucamonga kernel: CPU 1 May 3 07:22:26 cucamonga kernel: Modules linked in: pci_slot fan cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 cpufreq_conservative k May 3 07:22:26 cucamonga kernel: May 3 07:22:26 cucamonga kernel: Pid: 0, comm: kworker/0:0 Not tainted 2.6.39-rc5-00127-g1be6a1f #1 Dell Inc. Precision WorkStation T3400 /0TP4 May 3 07:22:26 cucamonga kernel: RIP: 0010:[<ffffffff810081b2>] [<ffffffff810081b2>] mwait_idle+0x7c/0x94 May 3 07:22:26 cucamonga kernel: RSP: 0018:ffff88007f0d1ee8 EFLAGS: 00000246 May 3 07:22:26 cucamonga kernel: RAX: 0000000000000000 RBX: ffffffff81592100 RCX: 0000000000000000 May 3 07:22:26 cucamonga kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007f0d0000 May 3 07:22:26 cucamonga kernel: RBP: ffff88007f0d1ee8 R08: 0000000000000000 R09: 0000000000000000 May 3 07:22:26 cucamonga kernel: R10: 0000000000000000 R11: ffff88007fb0dc50 R12: ffffffff8133468e May 3 07:22:26 cucamonga kernel: R13: ffff88007f0d1e78 R14: 0000000000000086 R15: ffff88007fb11c00 May 3 07:22:26 cucamonga kernel: FS: 0000000000000000(0000) GS:ffff88007fb00000(0000) knlGS:0000000000000000 May 3 07:22:26 cucamonga kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 07:22:26 cucamonga kernel: CR2: 00007ffec6e368f0 CR3: 0000000004820000 CR4: 00000000000406f0 May 3 07:22:26 cucamonga kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 3 07:22:26 cucamonga kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 3 07:22:26 cucamonga kernel: Process kworker/0:0 (pid: 0, threadinfo ffff88007f0d0000, task ffff88007f0a3f20) May 3 07:22:26 cucamonga kernel: Stack: May 3 07:22:26 cucamonga kernel: ffff88007f0d1f18 ffffffff810008ad ffff88007f0d1f08 57bb49c37cf88ea3 May 3 07:22:26 cucamonga kernel: 0000000000000001 0000000000000000 ffff88007f0d1f48 ffffffff81539e79 May 3 07:22:26 cucamonga kernel: 0000000000000000 3128fe8622c57963 0000000000000000 0000000000000000 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [<ffffffff810008ad>] cpu_idle+0xa3/0xe9 May 3 07:22:26 cucamonga kernel: [<ffffffff81539e79>] start_secondary+0x1bd/0x1c4 May 3 07:22:26 cucamonga kernel: Code: d2 65 48 8b 04 25 c8 b5 00 00 48 89 d1 48 2d c8 1f 00 00 0f 01 c8 0f ae f0 e8 52 fe ff ff 85 c0 75 0b 31 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [<ffffffff810008ad>] cpu_idle+0xa3/0xe9 May 3 07:22:26 cucamonga kernel: [<ffffffff81539e79>] start_secondary+0x1bd/0x1c4 May 3 07:22:26 cucamonga kernel: CPU0: May 3 07:22:26 cucamonga kernel: ffff88007fa03ef0 ffff88007fa03f48 0000000000000046 ffff88007fa03f68 May 3 07:22:26 cucamonga kernel: 0000000000000001 ffff88007aabdc48 0000000000000001 ffff88007fa03f38 May 3 07:22:26 cucamonga kernel: ffffffff810049a6 ffff88007fa03f58 ffffffff811acfb3 dead000000200200 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: <IRQ> [<ffffffff810049a6>] ? show_stack+0x1c/0x1e May 3 07:22:26 cucamonga kernel: [<ffffffff811acfb3>] ? showacpu+0x4a/0x5d May 3 07:22:26 cucamonga kernel: [<ffffffff8105cb50>] ? generic_smp_call_function_single_interrupt+0xd3/0xf6 May 3 07:22:26 cucamonga kernel: [<ffffffff81017a31>] ? smp_call_function_single_interrupt+0x18/0x27 May 3 07:22:26 cucamonga kernel: [<ffffffff813357b3>] ? call_function_single_interrupt+0x13/0x20 May 3 07:22:26 cucamonga kernel: <EOI> [<ffffffff8133428c>] ? _raw_spin_unlock_irqrestore+0xd/0x30 May 3 07:22:26 cucamonga kernel: [<ffffffff8104bc8c>] ? remove_wait_queue+0x51/0x56 May 3 07:22:26 cucamonga kernel: [<ffffffff810d8c7d>] ? poll_freewait+0x3e/0xaa May 3 07:22:26 cucamonga kernel: [<ffffffff813338bd>] ? schedule_hrtimeout_range+0x13/0x15 May 3 07:22:26 cucamonga kernel: [<ffffffff810d9457>] ? do_select+0x4df/0x4f4 May 3 07:22:26 cucamonga kernel: [<ffffffff810d8ce9>] ? poll_freewait+0xaa/0xaa May 3 07:22:26 cucamonga kernel: [<ffffffff810d8db5>] ? __pollwait+0xcc/0xcc May 3 07:22:26 cucamonga last message repeated 7 times May 3 07:22:26 cucamonga kernel: [<ffffffff810d95e3>] ? core_sys_select+0x177/0x216 May 3 07:22:26 cucamonga kernel: [<ffffffff810c1f3c>] ? kmem_cache_free+0x79/0xbc May 3 07:22:26 cucamonga kernel: [<ffffffff812be2f9>] ? sock_destroy_inode+0x32/0x36 May 3 07:22:26 cucamonga kernel: [<ffffffff81334274>] ? _raw_spin_unlock+0x23/0x2e May 3 07:22:26 cucamonga kernel: [<ffffffff810dbb0c>] ? dput+0xcf/0xee May 3 07:22:26 cucamonga kernel: [<ffffffff810d9713>] ? sys_select+0x91/0xb9 May 3 07:22:26 cucamonga kernel: [<ffffffff810cada7>] ? sys_read+0x61/0x6e May 3 07:22:26 cucamonga kernel: [<ffffffff81334deb>] ? system_call_fastpath+0x16/0x1b I have captures of my process list and slabinfo at the time of the lockup too should they be useful. -- Jamie Heilman http://audible.transient.net/~jamie/ "Most people wouldn't know music if it came up and bit them on the ass." -Frank Zappa ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-04 0:57 ` Jamie Heilman 0 siblings, 0 replies; 44+ messages in thread From: Jamie Heilman @ 2011-05-04 0:57 UTC (permalink / raw) To: linux-kernel Cc: Dave Chinner, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder, Markus Trippelsdorf Dave Chinner wrote: > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Definitely seems like a > race in the AIL workqueue trigger mechanism. I'll see if I can > reproduce this and cook up a patch to fix it. Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and other assorted feedback wrt this issue? I've had it happen twice now myself in the past week or so, though I have no reliable reproduction technique. Just wondering if more data points will help isolate the cause, and if so, how to be prepared to get them. For whatever its worth, my last lockup was while running 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. root@cucamonga:~# grep xfs /proc/mounts /dev/mapper/S-root / xfs rw,relatime,attr2,delaylog,noquota 0 0 /dev/mapper/S-var /var xfs rw,noatime,attr2,delaylog,inode64,noquota 0 0 root@cucamonga:~# xfs_info /var meta-data=/dev/mapper/S-var isize=256 agcount=4, agsize=6553600 blks = sectsz=512 attr=2 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=12800, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@cucamonga:~# xfs_info / meta-data=/dev/mapper/S-root isize=256 agcount=4, agsize=524288 blks = sectsz=512 attr=2 data = bsize=4096 blocks=2097152, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 those are lvm volumes on top of a md raid1 partition, though from the looks of everybody else's reports, that's not likely relevant. sysctl-w and l follow: May 3 07:12:28 cucamonga kernel: SysRq : Show Blocked State May 3 07:12:28 cucamonga kernel: task PC stack pid father May 3 07:12:28 cucamonga kernel: mutt D ffff88007bf89f40 0 2938 2921 0x00000000 May 3 07:12:28 cucamonga kernel: ffff88007ac0fb28 0000000000000046 ffff88007ac0fae8 ffffffff81095066 May 3 07:12:28 cucamonga kernel: ffff88007a8c4570 ffff88007ac0ffd8 00000000000112c0 ffff88007ac0ffd8 May 3 07:12:28 cucamonga kernel: ffff88007f0dcbc0 ffff88007a8c4570 ffff88007ac0fbb8 ffffffff81098363 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [<ffffffff81095066>] ? __pagevec_free+0x70/0x82 May 3 07:12:28 cucamonga kernel: [<ffffffff81098363>] ? release_pages+0x181/0x193 May 3 07:12:28 cucamonga kernel: [<ffffffffa007ef3c>] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff8102e454>] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [<ffffffffa007ff60>] xlog_grant_log_space+0x129/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008cc74>] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0081e2c>] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008b8ab>] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008f7c4>] xfs_inactive+0x16a/0x39a [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009ab94>] xfs_fs_evict_inode+0xc7/0xcf [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff810de311>] evict+0x81/0x125 May 3 07:12:28 cucamonga kernel: [<ffffffff810de53a>] iput+0x14b/0x153 May 3 07:12:28 cucamonga kernel: [<ffffffff810db5b5>] dentry_kill+0x127/0x149 May 3 07:12:28 cucamonga kernel: [<ffffffff810dbb1b>] dput+0xde/0xee May 3 07:12:28 cucamonga kernel: [<ffffffff810cc0e1>] fput+0x192/0x1aa May 3 07:12:28 cucamonga kernel: [<ffffffff810acb4b>] remove_vma+0x3c/0x64 May 3 07:12:28 cucamonga kernel: [<ffffffff810adfa2>] exit_mmap+0xbe/0xd9 May 3 07:12:28 cucamonga kernel: [<ffffffff8102fcfd>] mmput+0x5b/0x104 May 3 07:12:28 cucamonga kernel: [<ffffffff81033976>] exit_mm+0x125/0x132 May 3 07:12:28 cucamonga kernel: [<ffffffff810635a3>] ? acct_collect+0x176/0x182 May 3 07:12:28 cucamonga kernel: [<ffffffff81034ebd>] do_exit+0x21d/0x70a May 3 07:12:28 cucamonga kernel: [<ffffffff810ca4b2>] ? fsnotify_modify+0x5f/0x67 May 3 07:12:28 cucamonga kernel: [<ffffffffa03e7257>] ? kvm_on_user_return+0x4d/0x4f [kvm] May 3 07:12:28 cucamonga kernel: [<ffffffff8108e03f>] ? fire_user_return_notifiers+0x3c/0x65 May 3 07:12:28 cucamonga kernel: [<ffffffff8103564d>] do_group_exit+0x76/0x9e May 3 07:12:28 cucamonga kernel: [<ffffffff8103568c>] sys_exit_group+0x17/0x17 May 3 07:12:28 cucamonga kernel: [<ffffffff81334deb>] system_call_fastpath+0x16/0x1b May 3 07:12:28 cucamonga kernel: kworker/1:0 D ffffffff81341180 0 24351 2 0x00000000 May 3 07:12:28 cucamonga kernel: ffff8800046c1ca0 0000000000000046 ffff880000000000 0000000100000000 May 3 07:12:28 cucamonga kernel: ffff88007c230ca0 ffff8800046c1fd8 00000000000112c0 ffff8800046c1fd8 May 3 07:12:28 cucamonga kernel: ffff88007f0a3f20 ffff88007c230ca0 0000000000000000 0000000100000000 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [<ffffffffa007ef3c>] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff8102e454>] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [<ffffffffa007ff60>] xlog_grant_log_space+0x129/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008cc74>] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0081e2c>] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008b8ab>] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009c6aa>] ? xfs_reclaim_inode+0x23b/0x23b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009c6aa>] ? xfs_reclaim_inode+0x23b/0x23b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0075123>] xfs_fs_log_dummy+0x43/0x7f [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa009c6ed>] xfs_sync_worker+0x43/0x69 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff81047035>] process_one_work+0x179/0x295 May 3 07:12:28 cucamonga kernel: [<ffffffff81048456>] worker_thread+0xd4/0x158 May 3 07:12:28 cucamonga kernel: [<ffffffff81048382>] ? manage_workers.isra.23+0x170/0x170 May 3 07:12:28 cucamonga kernel: [<ffffffff81048382>] ? manage_workers.isra.23+0x170/0x170 May 3 07:12:28 cucamonga kernel: [<ffffffff8104b60c>] kthread+0x84/0x8c May 3 07:12:28 cucamonga kernel: [<ffffffff81335a54>] kernel_thread_helper+0x4/0x10 May 3 07:12:28 cucamonga kernel: [<ffffffff8104b588>] ? kthread_worker_fn+0x116/0x116 May 3 07:12:28 cucamonga kernel: [<ffffffff81335a50>] ? gs_change+0xb/0xb May 3 07:12:28 cucamonga kernel: dpkg D ffffffff81341180 0 28235 24677 0x00000000 May 3 07:12:28 cucamonga kernel: ffff88001b70dc98 0000000000000082 0000000000000001 ffff880000000000 May 3 07:12:28 cucamonga kernel: ffff88007f0a5eb0 ffff88001b70dfd8 00000000000112c0 ffff88001b70dfd8 May 3 07:12:28 cucamonga kernel: ffffffff81499020 ffff88007f0a5eb0 ffff88001b70dc88 000000011b70dcc8 May 3 07:12:28 cucamonga kernel: Call Trace: May 3 07:12:28 cucamonga kernel: [<ffffffffa007ef3c>] xlog_wait+0x5b/0x72 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff8102e454>] ? try_to_wake_up+0x1bd/0x1bd May 3 07:12:28 cucamonga kernel: [<ffffffffa008007e>] xlog_grant_log_space+0x247/0x3d6 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008cc74>] ? xfs_ail_push+0x3c/0x6b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa0081e2c>] xfs_log_reserve+0xe5/0xee [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008b8ab>] xfs_trans_reserve+0xcf/0x19b [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008e390>] xfs_free_eofblocks+0x153/0x1e2 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa008f622>] xfs_release+0x178/0x1b0 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffffa00969e1>] xfs_file_release+0x15/0x19 [xfs] May 3 07:12:28 cucamonga kernel: [<ffffffff810cc04c>] fput+0xfd/0x1aa May 3 07:12:28 cucamonga kernel: [<ffffffff810c9120>] filp_close+0x6e/0x7a May 3 07:12:28 cucamonga kernel: [<ffffffff810c91d9>] sys_close+0xad/0xef May 3 07:12:28 cucamonga kernel: [<ffffffff81334deb>] system_call_fastpath+0x16/0x1b May 3 07:22:26 cucamonga kernel: SysRq : Show backtrace of all active CPUs May 3 07:22:26 cucamonga kernel: CPU1: May 3 07:22:26 cucamonga kernel: CPU 1 May 3 07:22:26 cucamonga kernel: Modules linked in: pci_slot fan cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 cpufreq_conservative k May 3 07:22:26 cucamonga kernel: May 3 07:22:26 cucamonga kernel: Pid: 0, comm: kworker/0:0 Not tainted 2.6.39-rc5-00127-g1be6a1f #1 Dell Inc. Precision WorkStation T3400 /0TP4 May 3 07:22:26 cucamonga kernel: RIP: 0010:[<ffffffff810081b2>] [<ffffffff810081b2>] mwait_idle+0x7c/0x94 May 3 07:22:26 cucamonga kernel: RSP: 0018:ffff88007f0d1ee8 EFLAGS: 00000246 May 3 07:22:26 cucamonga kernel: RAX: 0000000000000000 RBX: ffffffff81592100 RCX: 0000000000000000 May 3 07:22:26 cucamonga kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007f0d0000 May 3 07:22:26 cucamonga kernel: RBP: ffff88007f0d1ee8 R08: 0000000000000000 R09: 0000000000000000 May 3 07:22:26 cucamonga kernel: R10: 0000000000000000 R11: ffff88007fb0dc50 R12: ffffffff8133468e May 3 07:22:26 cucamonga kernel: R13: ffff88007f0d1e78 R14: 0000000000000086 R15: ffff88007fb11c00 May 3 07:22:26 cucamonga kernel: FS: 0000000000000000(0000) GS:ffff88007fb00000(0000) knlGS:0000000000000000 May 3 07:22:26 cucamonga kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 07:22:26 cucamonga kernel: CR2: 00007ffec6e368f0 CR3: 0000000004820000 CR4: 00000000000406f0 May 3 07:22:26 cucamonga kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 3 07:22:26 cucamonga kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 3 07:22:26 cucamonga kernel: Process kworker/0:0 (pid: 0, threadinfo ffff88007f0d0000, task ffff88007f0a3f20) May 3 07:22:26 cucamonga kernel: Stack: May 3 07:22:26 cucamonga kernel: ffff88007f0d1f18 ffffffff810008ad ffff88007f0d1f08 57bb49c37cf88ea3 May 3 07:22:26 cucamonga kernel: 0000000000000001 0000000000000000 ffff88007f0d1f48 ffffffff81539e79 May 3 07:22:26 cucamonga kernel: 0000000000000000 3128fe8622c57963 0000000000000000 0000000000000000 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [<ffffffff810008ad>] cpu_idle+0xa3/0xe9 May 3 07:22:26 cucamonga kernel: [<ffffffff81539e79>] start_secondary+0x1bd/0x1c4 May 3 07:22:26 cucamonga kernel: Code: d2 65 48 8b 04 25 c8 b5 00 00 48 89 d1 48 2d c8 1f 00 00 0f 01 c8 0f ae f0 e8 52 fe ff ff 85 c0 75 0b 31 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: [<ffffffff810008ad>] cpu_idle+0xa3/0xe9 May 3 07:22:26 cucamonga kernel: [<ffffffff81539e79>] start_secondary+0x1bd/0x1c4 May 3 07:22:26 cucamonga kernel: CPU0: May 3 07:22:26 cucamonga kernel: ffff88007fa03ef0 ffff88007fa03f48 0000000000000046 ffff88007fa03f68 May 3 07:22:26 cucamonga kernel: 0000000000000001 ffff88007aabdc48 0000000000000001 ffff88007fa03f38 May 3 07:22:26 cucamonga kernel: ffffffff810049a6 ffff88007fa03f58 ffffffff811acfb3 dead000000200200 May 3 07:22:26 cucamonga kernel: Call Trace: May 3 07:22:26 cucamonga kernel: <IRQ> [<ffffffff810049a6>] ? show_stack+0x1c/0x1e May 3 07:22:26 cucamonga kernel: [<ffffffff811acfb3>] ? showacpu+0x4a/0x5d May 3 07:22:26 cucamonga kernel: [<ffffffff8105cb50>] ? generic_smp_call_function_single_interrupt+0xd3/0xf6 May 3 07:22:26 cucamonga kernel: [<ffffffff81017a31>] ? smp_call_function_single_interrupt+0x18/0x27 May 3 07:22:26 cucamonga kernel: [<ffffffff813357b3>] ? call_function_single_interrupt+0x13/0x20 May 3 07:22:26 cucamonga kernel: <EOI> [<ffffffff8133428c>] ? _raw_spin_unlock_irqrestore+0xd/0x30 May 3 07:22:26 cucamonga kernel: [<ffffffff8104bc8c>] ? remove_wait_queue+0x51/0x56 May 3 07:22:26 cucamonga kernel: [<ffffffff810d8c7d>] ? poll_freewait+0x3e/0xaa May 3 07:22:26 cucamonga kernel: [<ffffffff813338bd>] ? schedule_hrtimeout_range+0x13/0x15 May 3 07:22:26 cucamonga kernel: [<ffffffff810d9457>] ? do_select+0x4df/0x4f4 May 3 07:22:26 cucamonga kernel: [<ffffffff810d8ce9>] ? poll_freewait+0xaa/0xaa May 3 07:22:26 cucamonga kernel: [<ffffffff810d8db5>] ? __pollwait+0xcc/0xcc May 3 07:22:26 cucamonga last message repeated 7 times May 3 07:22:26 cucamonga kernel: [<ffffffff810d95e3>] ? core_sys_select+0x177/0x216 May 3 07:22:26 cucamonga kernel: [<ffffffff810c1f3c>] ? kmem_cache_free+0x79/0xbc May 3 07:22:26 cucamonga kernel: [<ffffffff812be2f9>] ? sock_destroy_inode+0x32/0x36 May 3 07:22:26 cucamonga kernel: [<ffffffff81334274>] ? _raw_spin_unlock+0x23/0x2e May 3 07:22:26 cucamonga kernel: [<ffffffff810dbb0c>] ? dput+0xcf/0xee May 3 07:22:26 cucamonga kernel: [<ffffffff810d9713>] ? sys_select+0x91/0xb9 May 3 07:22:26 cucamonga kernel: [<ffffffff810cada7>] ? sys_read+0x61/0x6e May 3 07:22:26 cucamonga kernel: [<ffffffff81334deb>] ? system_call_fastpath+0x16/0x1b I have captures of my process list and slabinfo at the time of the lockup too should they be useful. -- Jamie Heilman http://audible.transient.net/~jamie/ "Most people wouldn't know music if it came up and bit them on the ass." -Frank Zappa _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-04 0:57 ` Jamie Heilman @ 2011-05-04 13:25 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-04 13:25 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > Dave Chinner wrote: > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > other assorted feedback wrt this issue? I've had it happen twice now > myself in the past week or so, though I have no reliable reproduction > technique. Just wondering if more data points will help isolate the > cause, and if so, how to be prepared to get them. Not really. I think I know where the problem lies, but I currently lack a reproducer. There's also been another regression I've only just got to the bottom of, so I haven't really had a chance to focus properly on this one yet. Log space hangs like this have historically been difficult to reproduce reliably.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-04 13:25 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-04 13:25 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > Dave Chinner wrote: > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > other assorted feedback wrt this issue? I've had it happen twice now > myself in the past week or so, though I have no reliable reproduction > technique. Just wondering if more data points will help isolate the > cause, and if so, how to be prepared to get them. Not really. I think I know where the problem lies, but I currently lack a reproducer. There's also been another regression I've only just got to the bottom of, so I haven't really had a chance to focus properly on this one yet. Log space hangs like this have historically been difficult to reproduce reliably.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-04 0:57 ` Jamie Heilman @ 2011-05-05 0:21 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-05 0:21 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > Dave Chinner wrote: > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > other assorted feedback wrt this issue? I've had it happen twice now > myself in the past week or so, though I have no reliable reproduction > technique. Just wondering if more data points will help isolate the > cause, and if so, how to be prepared to get them. > > For whatever its worth, my last lockup was while running > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. Can you all try the patch below? I've managed to trigger a couple of xlog_wait() lockups in some controlled load tests. The lockups don't appear to occur with the following patch to he race condition in the AIL workqueue trigger. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: fix race condition queuing AIL pushes From: Dave Chinner <dchinner@redhat.com> The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The problem is that the use of the XFS_AIL_PUSHING_BIT to determine whether a push is currently in progress is racy. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_trans_ail.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..b7606d9 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -486,15 +486,19 @@ xfs_ail_worker( ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (ailp->xa_target == target) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (ailp->xa_target == target || + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-05 0:21 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-05 0:21 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > Dave Chinner wrote: > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. Definitely seems like a > > race in the AIL workqueue trigger mechanism. I'll see if I can > > reproduce this and cook up a patch to fix it. > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > other assorted feedback wrt this issue? I've had it happen twice now > myself in the past week or so, though I have no reliable reproduction > technique. Just wondering if more data points will help isolate the > cause, and if so, how to be prepared to get them. > > For whatever its worth, my last lockup was while running > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. Can you all try the patch below? I've managed to trigger a couple of xlog_wait() lockups in some controlled load tests. The lockups don't appear to occur with the following patch to he race condition in the AIL workqueue trigger. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: fix race condition queuing AIL pushes From: Dave Chinner <dchinner@redhat.com> The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The problem is that the use of the XFS_AIL_PUSHING_BIT to determine whether a push is currently in progress is racy. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_trans_ail.c | 16 ++++++++++------ 1 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..b7606d9 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -486,15 +486,19 @@ xfs_ail_worker( ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (ailp->xa_target == target) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (ailp->xa_target == target || + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 0:21 ` Dave Chinner @ 2011-05-05 2:26 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-05 2:26 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > Dave Chinner wrote: > > > OK, so the common elements here appears to be root filesystems > > > with small log sizes, which means they are tail pushing all the > > > time metadata operations are in progress. Definitely seems like a > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > reproduce this and cook up a patch to fix it. > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > other assorted feedback wrt this issue? I've had it happen twice now > > myself in the past week or so, though I have no reliable reproduction > > technique. Just wondering if more data points will help isolate the > > cause, and if so, how to be prepared to get them. > > > > For whatever its worth, my last lockup was while running > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > Can you all try the patch below? I've managed to trigger a couple of > xlog_wait() lockups in some controlled load tests. The lockups don't > appear to occur with the following patch to he race condition in > the AIL workqueue trigger. They are still there, just harder to hit. FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" gets the system moving again because that changes the push target. I've found two more bugs, and now my test case is now reliably reproducably a 5-10s pause at ~1M created 1byte files and then hanging at about 1.25M files. So there's yet another problem lurking that I need to get to the bottom of. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-05 2:26 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-05 2:26 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > Dave Chinner wrote: > > > OK, so the common elements here appears to be root filesystems > > > with small log sizes, which means they are tail pushing all the > > > time metadata operations are in progress. Definitely seems like a > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > reproduce this and cook up a patch to fix it. > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > other assorted feedback wrt this issue? I've had it happen twice now > > myself in the past week or so, though I have no reliable reproduction > > technique. Just wondering if more data points will help isolate the > > cause, and if so, how to be prepared to get them. > > > > For whatever its worth, my last lockup was while running > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > Can you all try the patch below? I've managed to trigger a couple of > xlog_wait() lockups in some controlled load tests. The lockups don't > appear to occur with the following patch to he race condition in > the AIL workqueue trigger. They are still there, just harder to hit. FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" gets the system moving again because that changes the push target. I've found two more bugs, and now my test case is now reliably reproducably a 5-10s pause at ~1M created 1byte files and then hanging at about 1.25M files. So there's yet another problem lurking that I need to get to the bottom of. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 2:26 ` Dave Chinner @ 2011-05-05 12:21 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-05 12:21 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > Dave Chinner wrote: > > > > OK, so the common elements here appears to be root filesystems > > > > with small log sizes, which means they are tail pushing all the > > > > time metadata operations are in progress. Definitely seems like a > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > reproduce this and cook up a patch to fix it. > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > other assorted feedback wrt this issue? I've had it happen twice now > > > myself in the past week or so, though I have no reliable reproduction > > > technique. Just wondering if more data points will help isolate the > > > cause, and if so, how to be prepared to get them. > > > > > > For whatever its worth, my last lockup was while running > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > Can you all try the patch below? I've managed to trigger a couple of > > xlog_wait() lockups in some controlled load tests. The lockups don't > > appear to occur with the following patch to he race condition in > > the AIL workqueue trigger. > > They are still there, just harder to hit. > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > gets the system moving again because that changes the push target. > > I've found two more bugs, and now my test case is now reliably > reproducably a 5-10s pause at ~1M created 1byte files and then > hanging at about 1.25M files. So there's yet another problem lurking > that I need to get to the bottom of. Which, of course, was the real regression. The patch below has survived a couple of hours of testing, which fixes all 4 of the problems I found. Please test. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: fix race conditions and regressions in AIL pushing From: Dave Chinner <dchinner@redhat.com> The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. There are four problems being hit. The first problem is that the use of the XFS_AIL_PUSHING_BIT to determine whether a push is currently in progress is racy. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. The second problem is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. The third problem is that updating the push target is not safe on 32 bit machines. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. THe final problem, and the ultimate cause of the regression is that there are two exit paths from the AIL push work. One does the right thing with clearing the PUSH bit and rechecking the target, while the other simply returns if the AIL is empty when the push work starts. This exit path needs to do the same PUSHING bit clearing and target checking as the normal other "no more work to be done" path. Note: this needs to be split into 4 patches to push into mainline. This is an aggregated patch just for testing. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_trans_ail.c | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..ab0d045 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -368,8 +368,7 @@ xfs_ail_worker( */ xfs_trans_ail_cursor_done(ailp, cur); spin_unlock(&ailp->xa_lock); - ailp->xa_last_pushed_lsn = 0; - return; + goto out_done; } XFS_STATS_INC(xs_push_ail); @@ -387,7 +386,7 @@ xfs_ail_worker( */ lsn = lip->li_lsn; flush_log = stuck = count = 0; - while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { + while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) { int lock_result; /* * If we can lock the item without sleeping, unlock the AIL @@ -482,19 +481,24 @@ xfs_ail_worker( /* assume we have more work to do in a short while */ tout = 10; if (!count) { +out_done: /* We're past our target or empty, so idle */ ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (ailp->xa_target == target) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (ailp->xa_target == target || + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* @@ -553,7 +557,7 @@ xfs_ail_push( * the XFS_AIL_PUSHING_BIT. */ smp_wmb(); - ailp->xa_target = threshold_lsn; + xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn); if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags)) queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0); } ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-05 12:21 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-05 12:21 UTC (permalink / raw) To: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > Dave Chinner wrote: > > > > OK, so the common elements here appears to be root filesystems > > > > with small log sizes, which means they are tail pushing all the > > > > time metadata operations are in progress. Definitely seems like a > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > reproduce this and cook up a patch to fix it. > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > other assorted feedback wrt this issue? I've had it happen twice now > > > myself in the past week or so, though I have no reliable reproduction > > > technique. Just wondering if more data points will help isolate the > > > cause, and if so, how to be prepared to get them. > > > > > > For whatever its worth, my last lockup was while running > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > Can you all try the patch below? I've managed to trigger a couple of > > xlog_wait() lockups in some controlled load tests. The lockups don't > > appear to occur with the following patch to he race condition in > > the AIL workqueue trigger. > > They are still there, just harder to hit. > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > gets the system moving again because that changes the push target. > > I've found two more bugs, and now my test case is now reliably > reproducably a 5-10s pause at ~1M created 1byte files and then > hanging at about 1.25M files. So there's yet another problem lurking > that I need to get to the bottom of. Which, of course, was the real regression. The patch below has survived a couple of hours of testing, which fixes all 4 of the problems I found. Please test. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: fix race conditions and regressions in AIL pushing From: Dave Chinner <dchinner@redhat.com> The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. There are four problems being hit. The first problem is that the use of the XFS_AIL_PUSHING_BIT to determine whether a push is currently in progress is racy. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. The second problem is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. The third problem is that updating the push target is not safe on 32 bit machines. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. THe final problem, and the ultimate cause of the regression is that there are two exit paths from the AIL push work. One does the right thing with clearing the PUSH bit and rechecking the target, while the other simply returns if the AIL is empty when the push work starts. This exit path needs to do the same PUSHING bit clearing and target checking as the normal other "no more work to be done" path. Note: this needs to be split into 4 patches to push into mainline. This is an aggregated patch just for testing. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_trans_ail.c | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index acdb92f..ab0d045 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -368,8 +368,7 @@ xfs_ail_worker( */ xfs_trans_ail_cursor_done(ailp, cur); spin_unlock(&ailp->xa_lock); - ailp->xa_last_pushed_lsn = 0; - return; + goto out_done; } XFS_STATS_INC(xs_push_ail); @@ -387,7 +386,7 @@ xfs_ail_worker( */ lsn = lip->li_lsn; flush_log = stuck = count = 0; - while ((XFS_LSN_CMP(lip->li_lsn, target) < 0)) { + while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) { int lock_result; /* * If we can lock the item without sleeping, unlock the AIL @@ -482,19 +481,24 @@ xfs_ail_worker( /* assume we have more work to do in a short while */ tout = 10; if (!count) { +out_done: /* We're past our target or empty, so idle */ ailp->xa_last_pushed_lsn = 0; /* - * Check for an updated push target before clearing the - * XFS_AIL_PUSHING_BIT. If the target changed, we've got more - * work to do. Wait a bit longer before starting that work. + * We clear the XFS_AIL_PUSHING_BIT first before checking + * whether the target has changed. If the target has changed, + * this pushes the requeue race directly onto the result of the + * atomic test/set bit, so we are guaranteed that either the + * the pusher that changed the target or ourselves will requeue + * the work (but not both). */ + clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); smp_rmb(); - if (ailp->xa_target == target) { - clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags); + if (ailp->xa_target == target || + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) return; - } + tout = 50; } else if (XFS_LSN_CMP(lsn, target) >= 0) { /* @@ -553,7 +557,7 @@ xfs_ail_push( * the XFS_AIL_PUSHING_BIT. */ smp_wmb(); - ailp->xa_target = threshold_lsn; + xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn); if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags)) queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0); } _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 12:21 ` Dave Chinner @ 2011-05-05 12:39 ` Christoph Hellwig -1 siblings, 0 replies; 44+ messages in thread From: Christoph Hellwig @ 2011-05-05 12:39 UTC (permalink / raw) To: Dave Chinner Cc: linux-kernel, Markus Trippelsdorf, Bruno Pr?mont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner > The third problem is that updating the push target is not safe on 32 > bit machines. We cannot copy a 64 bit LSN without the possibility of > corrupting the result when racing with another updating thread. We > have function to do this update safely without needing to care about > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > updating the AIL push target. But reading xa_target without xa_lock isn't safe on 32-bit either, is it? For the first read it can trivially be moved into the critical section a few lines below, and the second one should probably use XFS_LSN_CMP. > @@ -482,19 +481,24 @@ xfs_ail_worker( > /* assume we have more work to do in a short while */ > tout = 10; > if (!count) { > +out_done: Jumping into conditionals is really ugly. By initializing count a bit earlier you can just jump in front of the if/else clauses. And while you're there maybe moving the tout = 10; into an else clause would also make the code more readable. an uninitialied used of tout. > + if (ailp->xa_target == target || > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) no need for braces around the test_and_set_bit call. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-05 12:39 ` Christoph Hellwig 0 siblings, 0 replies; 44+ messages in thread From: Christoph Hellwig @ 2011-05-05 12:39 UTC (permalink / raw) To: Dave Chinner Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Pr?mont, Alex Elder, Markus Trippelsdorf > The third problem is that updating the push target is not safe on 32 > bit machines. We cannot copy a 64 bit LSN without the possibility of > corrupting the result when racing with another updating thread. We > have function to do this update safely without needing to care about > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > updating the AIL push target. But reading xa_target without xa_lock isn't safe on 32-bit either, is it? For the first read it can trivially be moved into the critical section a few lines below, and the second one should probably use XFS_LSN_CMP. > @@ -482,19 +481,24 @@ xfs_ail_worker( > /* assume we have more work to do in a short while */ > tout = 10; > if (!count) { > +out_done: Jumping into conditionals is really ugly. By initializing count a bit earlier you can just jump in front of the if/else clauses. And while you're there maybe moving the tout = 10; into an else clause would also make the code more readable. an uninitialied used of tout. > + if (ailp->xa_target == target || > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) no need for braces around the test_and_set_bit call. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 12:39 ` Christoph Hellwig @ 2011-05-06 1:49 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-06 1:49 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-kernel, Markus Trippelsdorf, Bruno Pr?mont, xfs-masters, xfs, Alex Elder, Dave Chinner On Thu, May 05, 2011 at 08:39:59AM -0400, Christoph Hellwig wrote: > > The third problem is that updating the push target is not safe on 32 > > bit machines. We cannot copy a 64 bit LSN without the possibility of > > corrupting the result when racing with another updating thread. We > > have function to do this update safely without needing to care about > > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > > updating the AIL push target. > > But reading xa_target without xa_lock isn't safe on 32-bit either, is it? Not sure - I think it depends on the platform. I don't think we protect LSN reads in any specific way on 32 bit platforms. In this case, I don't think it matters so much on read, because if we get a race with a write that mixes upper/lower words of the target we will eventually hit the stop condition and we won't get a match. That will trigger the requeue code and we'll start the push again. The problem with getting such a race on the target write is that we could get a cycle/block pair that is beyond the current head of the log and we'd never be able to push the AIL again as all push thresholds are truncated to the current head LSN on disk... > For the first read it can trivially be moved into the critical > section a few lines below, and the second one should probably use > XFS_LSN_CMP. > > > @@ -482,19 +481,24 @@ xfs_ail_worker( > > /* assume we have more work to do in a short while */ > > tout = 10; > > if (!count) { > > +out_done: > > Jumping into conditionals is really ugly. By initializing count a bit > earlier you can just jump in front of the if/else clauses. And while > you're there maybe moving the tout = 10; into an else clause would > also make the code more readable. > an uninitialied used of tout. Ok, I'll rework that. > > + if (ailp->xa_target == target || > > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) > > no need for braces around the test_and_set_bit call. *nod*. Left over from developing the fix... I'll split all these and post them to the xfs-list for review... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-06 1:49 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-06 1:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Dave Chinner, linux-kernel, xfs, xfs-masters, Bruno Pr?mont, Alex Elder, Markus Trippelsdorf On Thu, May 05, 2011 at 08:39:59AM -0400, Christoph Hellwig wrote: > > The third problem is that updating the push target is not safe on 32 > > bit machines. We cannot copy a 64 bit LSN without the possibility of > > corrupting the result when racing with another updating thread. We > > have function to do this update safely without needing to care about > > 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when > > updating the AIL push target. > > But reading xa_target without xa_lock isn't safe on 32-bit either, is it? Not sure - I think it depends on the platform. I don't think we protect LSN reads in any specific way on 32 bit platforms. In this case, I don't think it matters so much on read, because if we get a race with a write that mixes upper/lower words of the target we will eventually hit the stop condition and we won't get a match. That will trigger the requeue code and we'll start the push again. The problem with getting such a race on the target write is that we could get a cycle/block pair that is beyond the current head of the log and we'd never be able to push the AIL again as all push thresholds are truncated to the current head LSN on disk... > For the first read it can trivially be moved into the critical > section a few lines below, and the second one should probably use > XFS_LSN_CMP. > > > @@ -482,19 +481,24 @@ xfs_ail_worker( > > /* assume we have more work to do in a short while */ > > tout = 10; > > if (!count) { > > +out_done: > > Jumping into conditionals is really ugly. By initializing count a bit > earlier you can just jump in front of the if/else clauses. And while > you're there maybe moving the tout = 10; into an else clause would > also make the code more readable. > an uninitialied used of tout. Ok, I'll rework that. > > + if (ailp->xa_target == target || > > + (test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))) > > no need for braces around the test_and_set_bit call. *nod*. Left over from developing the fix... I'll split all these and post them to the xfs-list for review... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 12:21 ` Dave Chinner @ 2011-05-05 20:35 ` Bruno Prémont -1 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-05-05 20:35 UTC (permalink / raw) To: Dave Chinner Cc: linux-kernel, Markus Trippelsdorf, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Thu, 05 May 2011 Dave Chinner wrote: > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > myself in the past week or so, though I have no reliable reproduction > > > > technique. Just wondering if more data points will help isolate the > > > > cause, and if so, how to be prepared to get them. > > > > > > > > For whatever its worth, my last lockup was while running > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > appear to occur with the following patch to he race condition in > > > the AIL workqueue trigger. > > > > They are still there, just harder to hit. > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > gets the system moving again because that changes the push target. > > > > I've found two more bugs, and now my test case is now reliably > > reproducably a 5-10s pause at ~1M created 1byte files and then > > hanging at about 1.25M files. So there's yet another problem lurking > > that I need to get to the bottom of. > > Which, of course, was the real regression. The patch below has > survived a couple of hours of testing, which fixes all 4 of the > problems I found. Please test. Successfully survives my 2-hours session of today. Will continue testing during week-end and see if it also survives the longer whole-day sessions. Will report results at end of week-end (or earlier in case of trouble). Thanks, Bruno > Cheers, > > Dave. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-05 20:35 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-05-05 20:35 UTC (permalink / raw) To: Dave Chinner Cc: Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Dave, Alex Elder, Markus Trippelsdorf On Thu, 05 May 2011 Dave Chinner wrote: > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > myself in the past week or so, though I have no reliable reproduction > > > > technique. Just wondering if more data points will help isolate the > > > > cause, and if so, how to be prepared to get them. > > > > > > > > For whatever its worth, my last lockup was while running > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > appear to occur with the following patch to he race condition in > > > the AIL workqueue trigger. > > > > They are still there, just harder to hit. > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > gets the system moving again because that changes the push target. > > > > I've found two more bugs, and now my test case is now reliably > > reproducably a 5-10s pause at ~1M created 1byte files and then > > hanging at about 1.25M files. So there's yet another problem lurking > > that I need to get to the bottom of. > > Which, of course, was the real regression. The patch below has > survived a couple of hours of testing, which fixes all 4 of the > problems I found. Please test. Successfully survives my 2-hours session of today. Will continue testing during week-end and see if it also survives the longer whole-day sessions. Will report results at end of week-end (or earlier in case of trouble). Thanks, Bruno > Cheers, > > Dave. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 20:35 ` Bruno Prémont @ 2011-05-09 5:57 ` Bruno Prémont -1 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-05-09 5:57 UTC (permalink / raw) To: Dave Chinner Cc: linux-kernel, Markus Trippelsdorf, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner On Thu, 5 May 2011 22:35:13 Bruno Prémont wrote: > On Thu, 05 May 2011 Dave Chinner wrote: > > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > > Dave Chinner wrote: > > > > > > OK, so the common elements here appears to be root filesystems > > > > > > with small log sizes, which means they are tail pushing all the > > > > > > time metadata operations are in progress. Definitely seems like a > > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > > myself in the past week or so, though I have no reliable reproduction > > > > > technique. Just wondering if more data points will help isolate the > > > > > cause, and if so, how to be prepared to get them. > > > > > > > > > > For whatever its worth, my last lockup was while running > > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > > appear to occur with the following patch to he race condition in > > > > the AIL workqueue trigger. > > > > > > They are still there, just harder to hit. > > > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > > gets the system moving again because that changes the push target. > > > > > > I've found two more bugs, and now my test case is now reliably > > > reproducably a 5-10s pause at ~1M created 1byte files and then > > > hanging at about 1.25M files. So there's yet another problem lurking > > > that I need to get to the bottom of. > > > > Which, of course, was the real regression. The patch below has > > survived a couple of hours of testing, which fixes all 4 of the > > problems I found. Please test. > > Successfully survives my 2-hours session of today. Will continue testing > during week-end and see if it also survives the longer whole-day sessions. > > Will report results at end of week-end (or earlier in case of trouble). Also survived the whole week-end (at least twice 10 hours) with normal desktop work as well as a few hours of software compilation. (without the patch it would probably have frozen at least twice a day) So looks really good! Thanks, Bruno ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-09 5:57 ` Bruno Prémont 0 siblings, 0 replies; 44+ messages in thread From: Bruno Prémont @ 2011-05-09 5:57 UTC (permalink / raw) To: Dave Chinner Cc: Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Dave, Alex Elder, Markus Trippelsdorf On Thu, 5 May 2011 22:35:13 Bruno Prémont wrote: > On Thu, 05 May 2011 Dave Chinner wrote: > > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > > Dave Chinner wrote: > > > > > > OK, so the common elements here appears to be root filesystems > > > > > > with small log sizes, which means they are tail pushing all the > > > > > > time metadata operations are in progress. Definitely seems like a > > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > > myself in the past week or so, though I have no reliable reproduction > > > > > technique. Just wondering if more data points will help isolate the > > > > > cause, and if so, how to be prepared to get them. > > > > > > > > > > For whatever its worth, my last lockup was while running > > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > > appear to occur with the following patch to he race condition in > > > > the AIL workqueue trigger. > > > > > > They are still there, just harder to hit. > > > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > > gets the system moving again because that changes the push target. > > > > > > I've found two more bugs, and now my test case is now reliably > > > reproducably a 5-10s pause at ~1M created 1byte files and then > > > hanging at about 1.25M files. So there's yet another problem lurking > > > that I need to get to the bottom of. > > > > Which, of course, was the real regression. The patch below has > > survived a couple of hours of testing, which fixes all 4 of the > > problems I found. Please test. > > Successfully survives my 2-hours session of today. Will continue testing > during week-end and see if it also survives the longer whole-day sessions. > > Will report results at end of week-end (or earlier in case of trouble). Also survived the whole week-end (at least twice 10 hours) with normal desktop work as well as a few hours of software compilation. (without the patch it would probably have frozen at least twice a day) So looks really good! Thanks, Bruno _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-05 12:21 ` Dave Chinner @ 2011-05-08 5:11 ` Jamie Heilman -1 siblings, 0 replies; 44+ messages in thread From: Jamie Heilman @ 2011-05-08 5:11 UTC (permalink / raw) To: Dave Chinner Cc: linux-kernel, Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner Dave Chinner wrote: > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > myself in the past week or so, though I have no reliable reproduction > > > > technique. Just wondering if more data points will help isolate the > > > > cause, and if so, how to be prepared to get them. > > > > > > > > For whatever its worth, my last lockup was while running > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > appear to occur with the following patch to he race condition in > > > the AIL workqueue trigger. > > > > They are still there, just harder to hit. > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > gets the system moving again because that changes the push target. > > > > I've found two more bugs, and now my test case is now reliably > > reproducably a 5-10s pause at ~1M created 1byte files and then > > hanging at about 1.25M files. So there's yet another problem lurking > > that I need to get to the bottom of. > > Which, of course, was the real regression. The patch below has > survived a couple of hours of testing, which fixes all 4 of the > problems I found. Please test. Well, 61 hours in now, and no lockups. I've written ~204GiB to my xfs volumes in that time, much of which was audacity temp files which are 1037kB each, so not as metadata intensive as your test case, but it's more or less what I'd been doing in the past when the lockups happened. Looks pretty promising at this point. -- Jamie Heilman http://audible.transient.net/~jamie/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-08 5:11 ` Jamie Heilman 0 siblings, 0 replies; 44+ messages in thread From: Jamie Heilman @ 2011-05-08 5:11 UTC (permalink / raw) To: Dave Chinner Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder, Markus Trippelsdorf Dave Chinner wrote: > On Thu, May 05, 2011 at 12:26:13PM +1000, Dave Chinner wrote: > > On Thu, May 05, 2011 at 10:21:26AM +1000, Dave Chinner wrote: > > > On Wed, May 04, 2011 at 12:57:36AM +0000, Jamie Heilman wrote: > > > > Dave Chinner wrote: > > > > > OK, so the common elements here appears to be root filesystems > > > > > with small log sizes, which means they are tail pushing all the > > > > > time metadata operations are in progress. Definitely seems like a > > > > > race in the AIL workqueue trigger mechanism. I'll see if I can > > > > > reproduce this and cook up a patch to fix it. > > > > > > > > Is there value in continuing to post sysrq-w, sysrq-l, xfs_info, and > > > > other assorted feedback wrt this issue? I've had it happen twice now > > > > myself in the past week or so, though I have no reliable reproduction > > > > technique. Just wondering if more data points will help isolate the > > > > cause, and if so, how to be prepared to get them. > > > > > > > > For whatever its worth, my last lockup was while running > > > > 2.6.39-rc5-00127-g1be6a1f with a preempt config without cgroups. > > > > > > Can you all try the patch below? I've managed to trigger a couple of > > > xlog_wait() lockups in some controlled load tests. The lockups don't > > > appear to occur with the following patch to he race condition in > > > the AIL workqueue trigger. > > > > They are still there, just harder to hit. > > > > FWIW, I've also discovered that "echo 2 > /proc/sys/vm/drop_caches" > > gets the system moving again because that changes the push target. > > > > I've found two more bugs, and now my test case is now reliably > > reproducably a 5-10s pause at ~1M created 1byte files and then > > hanging at about 1.25M files. So there's yet another problem lurking > > that I need to get to the bottom of. > > Which, of course, was the real regression. The patch below has > survived a couple of hours of testing, which fixes all 4 of the > problems I found. Please test. Well, 61 hours in now, and no lockups. I've written ~204GiB to my xfs volumes in that time, much of which was audacity temp files which are 1037kB each, so not as metadata intensive as your test case, but it's more or less what I'd been doing in the past when the lockups happened. Looks pretty promising at this point. -- Jamie Heilman http://audible.transient.net/~jamie/ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-04-29 1:19 ` Dave Chinner @ 2011-05-20 11:20 ` Andrey Rahmatullin -1 siblings, 0 replies; 44+ messages in thread From: Andrey Rahmatullin @ 2011-05-20 11:20 UTC (permalink / raw) To: Dave Chinner Cc: Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel [-- Attachment #1: Type: text/plain, Size: 987 bytes --] On Fri, Apr 29, 2011 at 11:19:29AM +1000, Dave Chinner wrote: > > x4 ~ # xfs_info / > > meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks > > = sectsz=512 attr=2 > > data = bsize=4096 blocks=7799296, imaxpct=25 > > = sunit=128 swidth=128 blks > > naming =version 2 bsize=4096 ascii-ci=0 > > log =internal bsize=4096 blocks=3808, version=2 > > = sectsz=512 sunit=8 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Does that mean that such filesystems are not optimal in terms of performance and/or reliability and should have larger log sizes? -- WBR, wRAR [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-20 11:20 ` Andrey Rahmatullin 0 siblings, 0 replies; 44+ messages in thread From: Andrey Rahmatullin @ 2011-05-20 11:20 UTC (permalink / raw) To: Dave Chinner Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder, Markus Trippelsdorf [-- Attachment #1.1: Type: text/plain, Size: 987 bytes --] On Fri, Apr 29, 2011 at 11:19:29AM +1000, Dave Chinner wrote: > > x4 ~ # xfs_info / > > meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks > > = sectsz=512 attr=2 > > data = bsize=4096 blocks=7799296, imaxpct=25 > > = sunit=128 swidth=128 blks > > naming =version 2 bsize=4096 ascii-ci=0 > > log =internal bsize=4096 blocks=3808, version=2 > > = sectsz=512 sunit=8 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > OK, so the common elements here appears to be root filesystems > with small log sizes, which means they are tail pushing all the > time metadata operations are in progress. Does that mean that such filesystems are not optimal in terms of performance and/or reliability and should have larger log sizes? -- WBR, wRAR [-- Attachment #1.2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 2011-05-20 11:20 ` Andrey Rahmatullin @ 2011-05-21 0:14 ` Dave Chinner -1 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-21 0:14 UTC (permalink / raw) To: Andrey Rahmatullin Cc: Markus Trippelsdorf, Bruno Prémont, xfs-masters, xfs, Christoph Hellwig, Alex Elder, Dave Chinner, linux-kernel On Fri, May 20, 2011 at 05:20:18PM +0600, Andrey Rahmatullin wrote: > On Fri, Apr 29, 2011 at 11:19:29AM +1000, Dave Chinner wrote: > > > x4 ~ # xfs_info / > > > meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks > > > = sectsz=512 attr=2 > > > data = bsize=4096 blocks=7799296, imaxpct=25 > > > = sunit=128 swidth=128 blks > > > naming =version 2 bsize=4096 ascii-ci=0 > > > log =internal bsize=4096 blocks=3808, version=2 > > > = sectsz=512 sunit=8 blks, lazy-count=1 > > > realtime =none extsz=4096 blocks=0, rtextents=0 > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. > Does that mean that such filesystems are not optimal in terms of > performance and/or reliability and should have larger log sizes? Performance. Larger logs generally result in faster metadata performance, but it's really dependent on your workload and storage as to whether it makes any difference. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 @ 2011-05-21 0:14 ` Dave Chinner 0 siblings, 0 replies; 44+ messages in thread From: Dave Chinner @ 2011-05-21 0:14 UTC (permalink / raw) To: Andrey Rahmatullin Cc: Dave Chinner, linux-kernel, xfs, Christoph Hellwig, xfs-masters, Bruno Prémont, Alex Elder, Markus Trippelsdorf On Fri, May 20, 2011 at 05:20:18PM +0600, Andrey Rahmatullin wrote: > On Fri, Apr 29, 2011 at 11:19:29AM +1000, Dave Chinner wrote: > > > x4 ~ # xfs_info / > > > meta-data=/dev/root isize=256 agcount=4, agsize=1949824 blks > > > = sectsz=512 attr=2 > > > data = bsize=4096 blocks=7799296, imaxpct=25 > > > = sunit=128 swidth=128 blks > > > naming =version 2 bsize=4096 ascii-ci=0 > > > log =internal bsize=4096 blocks=3808, version=2 > > > = sectsz=512 sunit=8 blks, lazy-count=1 > > > realtime =none extsz=4096 blocks=0, rtextents=0 > > OK, so the common elements here appears to be root filesystems > > with small log sizes, which means they are tail pushing all the > > time metadata operations are in progress. > Does that mean that such filesystems are not optimal in terms of > performance and/or reliability and should have larger log sizes? Performance. Larger logs generally result in faster metadata performance, but it's really dependent on your workload and storage as to whether it makes any difference. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2011-05-21 0:14 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-04-23 20:44 2.6.39-rc3, 2.6.39-rc4: XFS lockup - regression since 2.6.38 Bruno Prémont 2011-04-23 20:44 ` Bruno Prémont 2011-04-27 5:08 ` Dave Chinner 2011-04-27 5:08 ` Dave Chinner 2011-04-27 16:26 ` Bruno Prémont 2011-04-27 16:26 ` Bruno Prémont 2011-04-28 19:45 ` Markus Trippelsdorf 2011-04-28 19:45 ` Markus Trippelsdorf 2011-04-29 1:19 ` Dave Chinner 2011-04-29 1:19 ` Dave Chinner 2011-04-29 15:18 ` Markus Trippelsdorf 2011-04-29 15:18 ` Markus Trippelsdorf 2011-04-29 19:35 ` Bruno Prémont 2011-04-29 19:35 ` Bruno Prémont 2011-04-30 14:18 ` Bruno Prémont 2011-04-30 14:18 ` Bruno Prémont 2011-05-02 6:15 ` Markus Trippelsdorf 2011-05-02 6:15 ` Markus Trippelsdorf 2011-05-02 12:40 ` Dave Chinner 2011-05-02 12:40 ` Dave Chinner 2011-05-04 0:57 ` Jamie Heilman 2011-05-04 0:57 ` Jamie Heilman 2011-05-04 13:25 ` Dave Chinner 2011-05-04 13:25 ` Dave Chinner 2011-05-05 0:21 ` Dave Chinner 2011-05-05 0:21 ` Dave Chinner 2011-05-05 2:26 ` Dave Chinner 2011-05-05 2:26 ` Dave Chinner 2011-05-05 12:21 ` Dave Chinner 2011-05-05 12:21 ` Dave Chinner 2011-05-05 12:39 ` Christoph Hellwig 2011-05-05 12:39 ` Christoph Hellwig 2011-05-06 1:49 ` Dave Chinner 2011-05-06 1:49 ` Dave Chinner 2011-05-05 20:35 ` Bruno Prémont 2011-05-05 20:35 ` Bruno Prémont 2011-05-09 5:57 ` Bruno Prémont 2011-05-09 5:57 ` Bruno Prémont 2011-05-08 5:11 ` Jamie Heilman 2011-05-08 5:11 ` Jamie Heilman 2011-05-20 11:20 ` Andrey Rahmatullin 2011-05-20 11:20 ` Andrey Rahmatullin 2011-05-21 0:14 ` Dave Chinner 2011-05-21 0:14 ` Dave Chinner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.