* 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux @ 2015-04-04 18:03 Nix 2015-04-04 21:05 ` Francois Romieu 0 siblings, 1 reply; 7+ messages in thread From: Nix @ 2015-04-04 18:03 UTC (permalink / raw) To: rl, Francois Romieu, Bjarke Istrup Pedersen, David S. Miller, Linux-Netdev So I started to encounter hangs with high CPU load when trafficking on multiple NICs when I added a bunch of extra VIA Rhine NICs to my Soekris net5501 board via a lan1741 board. This was in 3.18, but of course the bug could well have predated that. The hangs appear to need simultaneous rx and tx load (as commonly seen in routing situations): a iperf -t 3600 -l 100 routed through the box is sufficient to cause a crash in half an hour or so. (UDP flows don't work, not even bidirectional ones, for reasons that remain unclear but are probably related to iperf's use of a ~five-times- higher packet flow rate when using TCP, or perhaps related to my iptables rules, though those are minimal for anything TCP or UDP on either of the interfaces I tested with. It is just barely possible that two flows are required for this, and three interfaces: this box is my firewall, and it seems statistically unlikely how often the crash happens just at the instant something is coming off the internet, in through an interface uninvolved in this testing and then out of one of the interfaces that is involved.) In 3.18, everything seemed to work for the built-in four NICs (and even worked for the non-built-in ones if you kept the traffic low enough, probably because no interrupts came in while it was handling an existing packet), but in 3.19, even the built-in NICs are affected. A really crude bisection suggests that things went wrong around 87545899b, but there's no way I trust that that commit was actually at fault. (I have the ocmplete bisection log if anyone trusts it at all). The absence of crashes associated with the built-in NICs in 3.18 and below is more likely to be due to differences in the timing of IRQs coming across the single-slot PCI expansion bus than anything else, I suspect. I ran across <http://lkml.iu.edu/hypermail/linux/kernel/1112.3/00556.html> which culminated in <http://lkml.iu.edu/hypermail/linux/kernel/1112.3/00807.html>, moving all the rx work into the tx handler, which seemed to culminate in 7ab87ff4c. Unfortunately, this appears not to fix things, or this fix appears to have regressed -- but it's hard to say, since the only symptom I see is a hang so hard that nothing can get written out anywhere at all (if the serial console is in the middle of emitting something, it halts mid-message). The only thing that can get any response out of the machine after a hang is a powerdown or the ever-handy hardware watchdog. We sometimes see extremely high CPU loads in bursts before the point of the hang, but this appears to be unrelated, since extremely high CPU load is also observed doing perfectly hang-free bidirectional iperf flows over a single NIC (up to 75% of time consumed in the interrupt handler in both cases). I suspect that everything goes fine until an interrupt comes in at the wrong moment, and then everything hangs, so instantaneous CPU load from a machine that hasn't hung yet won't tell us anything. This is, like all Soekris net5501s, an AMD Geode LX, a UP x86-32 system, so a hang or loop in an interrupt handler means you are as dead as a very dead thing with no chance of recovery. .config follows: CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_LOCALVERSION="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y CONFIG_DEFAULT_HOSTNAME="fold" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_FHANDLE=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y CONFIG_TICK_CPU_ACCOUNTING=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_TINY_RCU=y CONFIG_LOG_BUF_SHIFT=14 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_SCHED_AUTOGROUP=y CONFIG_RELAY=y CONFIG_INIT_FALLBACK=y CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_HAVE_UID16=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_BPF=y CONFIG_EXPERT=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_ADVISE_SYSCALLS=y CONFIG_PCI_QUIRKS=y CONFIG_EMBEDDED=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y CONFIG_HAVE_OPROFILE=y CONFIG_OPROFILE_NMI_TIMER=y CONFIG_JUMP_LABEL=y CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_ARCH_USE_BUILTIN_BSWAP=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_OPTPROBES=y CONFIG_HAVE_KPROBES_ON_FTRACE=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_HAVE_DMA_CONTIGUOUS=y CONFIG_GENERIC_SMP_IDLE_THREAD=y CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y CONFIG_HAVE_DMA_API_DEBUG=y CONFIG_HAVE_HW_BREAKPOINT=y CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y CONFIG_HAVE_USER_RETURN_NOTIFIER=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_HAVE_PERF_REGS=y CONFIG_HAVE_PERF_USER_STACK_DUMP=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y CONFIG_HAVE_CMPXCHG_LOCAL=y CONFIG_HAVE_CMPXCHG_DOUBLE=y CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y CONFIG_HAVE_ARCH_SECCOMP_FILTER=y CONFIG_SECCOMP_FILTER=y CONFIG_HAVE_CC_STACKPROTECTOR=y CONFIG_CC_STACKPROTECTOR_NONE=y CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_MODULES_USE_ELF_REL=y CONFIG_CLONE_BACKWARDS=y CONFIG_OLD_SIGSUSPEND3=y CONFIG_OLD_SIGACTION=y CONFIG_ARCH_HAS_GCOV_PROFILE_ALL=y CONFIG_HAVE_GENERIC_DMA_COHERENT=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_BLOCK=y CONFIG_BLK_DEV_BSG=y CONFIG_PARTITION_ADVANCED=y CONFIG_MSDOS_PARTITION=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_DEADLINE=y CONFIG_DEFAULT_DEADLINE=y CONFIG_DEFAULT_IOSCHED="deadline" CONFIG_INLINE_SPIN_UNLOCK_IRQ=y CONFIG_INLINE_READ_UNLOCK=y CONFIG_INLINE_READ_UNLOCK_IRQ=y CONFIG_INLINE_WRITE_UNLOCK=y CONFIG_INLINE_WRITE_UNLOCK_IRQ=y CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y CONFIG_ARCH_USE_QUEUE_RWLOCK=y CONFIG_ZONE_DMA=y CONFIG_X86_FEATURE_NAMES=y CONFIG_X86_MPPARSE=y CONFIG_IOSF_MBI=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_NO_BOOTMEM=y CONFIG_MGEODE_LX=y CONFIG_X86_INTERNODE_CACHE_SHIFT=5 CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_USE_3DNOW=y CONFIG_X86_TSC=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=4 CONFIG_X86_DEBUGCTLMSR=y CONFIG_CPU_SUP_INTEL=y CONFIG_CPU_SUP_CYRIX_32=y CONFIG_CPU_SUP_AMD=y CONFIG_CPU_SUP_CENTAUR=y CONFIG_CPU_SUP_TRANSMETA_32=y CONFIG_CPU_SUP_UMC_32=y CONFIG_NR_CPUS=1 CONFIG_PREEMPT_NONE=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_APIC_MSI=y CONFIG_X86_LOCAL_APIC=y CONFIG_VM86=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_NOHIGHMEM=y CONFIG_VMSPLIT_3G=y CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ILLEGAL_POINTER_VALUE=0 CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_HAVE_MEMBLOCK=y CONFIG_HAVE_MEMBLOCK_NODE_MAP=y CONFIG_ARCH_DISCARD_MEMBLOCK=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_DEFAULT_MMAP_MIN_ADDR=4096 CONFIG_NEED_PER_CPU_KM=y CONFIG_ZSMALLOC=y CONFIG_GENERIC_EARLY_IOREMAP=y CONFIG_X86_RESERVE_LOW=4 CONFIG_SECCOMP=y CONFIG_HZ_100=y CONFIG_HZ=100 CONFIG_SCHED_HRTICK=y CONFIG_PHYSICAL_START=0x100000 CONFIG_PHYSICAL_ALIGN=0x100000 CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_DOMAINS=y CONFIG_PCI_MSI=y CONFIG_ISA_DMA_API=y CONFIG_SCx200=y CONFIG_SCx200HR_TIMER=y CONFIG_NET5501=y CONFIG_AMD_NB=y CONFIG_BINFMT_ELF=y CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y CONFIG_BINFMT_SCRIPT=y CONFIG_HAVE_AOUT=y CONFIG_BINFMT_MISC=y CONFIG_COREDUMP=y CONFIG_HAVE_ATOMIC_IOMAP=y CONFIG_PMC_ATOM=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_PACKET_DIAG=y CONFIG_UNIX=y CONFIG_UNIX_DIAG=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_MULTIPATH=y CONFIG_IP_ROUTE_CLASSID=y CONFIG_NET_IP_TUNNEL=y CONFIG_SYN_COOKIES=y CONFIG_INET_TUNNEL=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_INET_UDP_DIAG=y CONFIG_TCP_CONG_ADVANCED=y CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" CONFIG_IPV6=y CONFIG_IPV6_SIT=y CONFIG_IPV6_NDISC_NODETYPE=y CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y CONFIG_NETFILTER_NETLINK=y CONFIG_NETFILTER_NETLINK_ACCT=y CONFIG_NF_CONNTRACK=y CONFIG_NF_LOG_COMMON=y CONFIG_NF_CONNTRACK_MARK=y CONFIG_NF_CONNTRACK_TIMEOUT=y CONFIG_NF_CT_PROTO_DCCP=y CONFIG_NF_CT_PROTO_UDPLITE=y CONFIG_NF_CONNTRACK_FTP=y CONFIG_NF_CONNTRACK_IRC=y CONFIG_NF_CONNTRACK_BROADCAST=y CONFIG_NF_CONNTRACK_SNMP=y CONFIG_NF_CONNTRACK_SIP=y CONFIG_NF_CT_NETLINK=y CONFIG_NF_CT_NETLINK_TIMEOUT=y CONFIG_NF_NAT=y CONFIG_NF_NAT_NEEDED=y CONFIG_NF_NAT_PROTO_DCCP=y CONFIG_NF_NAT_PROTO_UDPLITE=y CONFIG_NF_NAT_FTP=y CONFIG_NF_NAT_IRC=y CONFIG_NF_NAT_SIP=y CONFIG_NF_NAT_REDIRECT=y CONFIG_NETFILTER_XTABLES=y CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y CONFIG_NETFILTER_XT_TARGET_DSCP=y CONFIG_NETFILTER_XT_TARGET_LOG=y CONFIG_NETFILTER_XT_NAT=y CONFIG_NETFILTER_XT_TARGET_REDIRECT=y CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y CONFIG_NETFILTER_XT_MATCH_DSCP=y CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y CONFIG_NETFILTER_XT_MATCH_HELPER=y CONFIG_NETFILTER_XT_MATCH_IPRANGE=y CONFIG_NETFILTER_XT_MATCH_LIMIT=y CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y CONFIG_NETFILTER_XT_MATCH_NFACCT=y CONFIG_NETFILTER_XT_MATCH_OWNER=y CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y CONFIG_NETFILTER_XT_MATCH_QUOTA=y CONFIG_NETFILTER_XT_MATCH_STATE=y CONFIG_NF_DEFRAG_IPV4=y CONFIG_NF_CONNTRACK_IPV4=y CONFIG_NF_LOG_IPV4=y CONFIG_NF_REJECT_IPV4=y CONFIG_NF_NAT_IPV4=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_NAT=y CONFIG_IP_NF_TARGET_REDIRECT=y CONFIG_IP_NF_MANGLE=y CONFIG_NF_DEFRAG_IPV6=y CONFIG_NF_CONNTRACK_IPV6=y CONFIG_NF_REJECT_IPV6=y CONFIG_NF_LOG_IPV6=y CONFIG_IP6_NF_IPTABLES=y CONFIG_IP6_NF_FILTER=y CONFIG_IP6_NF_TARGET_REJECT=y CONFIG_IP6_NF_MANGLE=y CONFIG_HAVE_NET_DSA=y CONFIG_NET_SCHED=y CONFIG_NET_SCH_CBQ=y CONFIG_NET_SCH_HTB=y CONFIG_NET_SCH_HFSC=y CONFIG_NET_SCH_PRIO=y CONFIG_NET_SCH_RED=y CONFIG_NET_SCH_SFQ=y CONFIG_NET_SCH_TEQL=y CONFIG_NET_SCH_TBF=y CONFIG_NET_SCH_GRED=y CONFIG_NET_SCH_DSMARK=y CONFIG_NET_SCH_INGRESS=y CONFIG_NET_CLS=y CONFIG_NET_CLS_BASIC=y CONFIG_NET_CLS_TCINDEX=y CONFIG_NET_CLS_ROUTE4=y CONFIG_NET_CLS_FW=y CONFIG_NET_CLS_U32=y CONFIG_CLS_U32_PERF=y CONFIG_CLS_U32_MARK=y CONFIG_NET_CLS_RSVP=y CONFIG_NET_EMATCH=y CONFIG_NET_EMATCH_STACK=32 CONFIG_NET_EMATCH_CMP=y CONFIG_NET_EMATCH_NBYTE=y CONFIG_NET_EMATCH_U32=y CONFIG_NET_EMATCH_META=y CONFIG_NET_CLS_ACT=y CONFIG_NET_ACT_POLICE=y CONFIG_NET_ACT_GACT=y CONFIG_NET_ACT_PEDIT=y CONFIG_NET_SCH_FIFO=y CONFIG_NETLINK_DIAG=y CONFIG_NET_RX_BUSY_POLL=y CONFIG_BQL=y CONFIG_FIB_RULES=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="" CONFIG_GENERIC_CPU_AUTOPROBE=y CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y CONFIG_BLK_DEV=y CONFIG_ZRAM=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_LOOP_MIN_COUNT=8 CONFIG_BLK_DEV_CRYPTOLOOP=y CONFIG_CS5535_MFGPT=y CONFIG_CS5535_MFGPT_DEFAULT_IRQ=7 CONFIG_CS5535_CLOCK_EVENT_SRC=y CONFIG_HAVE_IDE=y CONFIG_SCSI_MOD=y CONFIG_SCSI=y CONFIG_SCSI_DMA=y CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_SCSI_SPI_ATTRS=y CONFIG_ATA=y CONFIG_ATA_VERBOSE_ERROR=y CONFIG_ATA_SFF=y CONFIG_ATA_BMDMA=y CONFIG_PATA_CS5536=y CONFIG_NETDEVICES=y CONFIG_MII=y CONFIG_NET_CORE=y CONFIG_DUMMY=y CONFIG_NETCONSOLE=y CONFIG_NETPOLL=y CONFIG_NET_POLL_CONTROLLER=y CONFIG_ETHERNET=y CONFIG_NET_VENDOR_VIA=y CONFIG_VIA_RHINE=y CONFIG_VIA_RHINE_MMIO=y CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y CONFIG_TTY=y CONFIG_UNIX98_PTYS=y CONFIG_SERIAL_EARLYCON=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_PCI=y CONFIG_SERIAL_8250_NR_UARTS=1 CONFIG_SERIAL_8250_RUNTIME_UARTS=1 CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_HW_RANDOM=y CONFIG_HW_RANDOM_GEODE=y CONFIG_HANGCHECK_TIMER=y CONFIG_DEVPORT=y CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y CONFIG_I2C_CHARDEV=y CONFIG_I2C_HELPER_AUTO=y CONFIG_SCx200_ACB=y CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y CONFIG_GPIOLIB=y CONFIG_GPIO_DEVRES=y CONFIG_GPIO_SYSFS=y CONFIG_GPIO_CS5535=y CONFIG_HWMON=y CONFIG_HWMON_VID=y CONFIG_SENSORS_PC87360=y CONFIG_WATCHDOG=y CONFIG_WATCHDOG_CORE=y CONFIG_GEODE_WDT=y CONFIG_SSB_POSSIBLE=y CONFIG_BCMA_POSSIBLE=y CONFIG_MFD_CORE=y CONFIG_MFD_CS5535=y CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_SUPPORT=y CONFIG_USB_COMMON=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB=y CONFIG_USB_DEFAULT_PERSIST=y CONFIG_USB_DYNAMIC_MINORS=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_PCI=y CONFIG_USB_OHCI_HCD=y CONFIG_USB_OHCI_HCD_PCI=y CONFIG_USB_ACM=y CONFIG_USB_STORAGE=y CONFIG_RTC_LIB=y CONFIG_STAGING=y CONFIG_CLKSRC_I8253=y CONFIG_CLKEVT_I8253=y CONFIG_CLKBLD_I8253=y CONFIG_DCACHE_WORD_ACCESS=y CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_FS_MBCACHE=y CONFIG_FS_POSIX_ACL=y CONFIG_EXPORTFS=y CONFIG_FILE_LOCKING=y CONFIG_FSNOTIFY=y CONFIG_DNOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_QUOTA=y CONFIG_PRINT_QUOTA_WARNING=y CONFIG_QUOTA_TREE=y CONFIG_QFMT_V2=y CONFIG_QUOTACTL=y CONFIG_PROC_FS=y CONFIG_PROC_SYSCTL=y CONFIG_PROC_PAGE_MONITOR=y CONFIG_KERNFS=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_TMPFS_XATTR=y CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFS_V3_ACL=y CONFIG_NFS_SWAP=y CONFIG_GRACE_PERIOD=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_NFS_ACL_SUPPORT=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y CONFIG_SUNRPC_SWAP=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_CODEPAGE_850=y CONFIG_NLS_CODEPAGE_852=y CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_2=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_UTF8=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4 CONFIG_ENABLE_WARN_DEPRECATED=y CONFIG_ENABLE_MUST_CHECK=y CONFIG_FRAME_WARN=1024 CONFIG_STRIP_ASM_SYMS=y CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_DEBUG_KERNEL=y CONFIG_HAVE_DEBUG_KMEMLEAK=y CONFIG_HAVE_DEBUG_STACKOVERFLOW=y CONFIG_HAVE_ARCH_KMEMCHECK=y CONFIG_DETECT_HUNG_TASK=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120 CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0 CONFIG_PANIC_ON_OOPS=y CONFIG_PANIC_ON_OOPS_VALUE=1 CONFIG_PANIC_TIMEOUT=5 CONFIG_SCHED_STACK_END_CHECK=y CONFIG_DEBUG_BUGVERBOSE=y CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_SYSCALL_TRACEPOINTS=y CONFIG_HAVE_C_RECORDMCOUNT=y CONFIG_TRACING_SUPPORT=y CONFIG_HAVE_ARCH_KGDB=y CONFIG_STRICT_DEVMEM=y CONFIG_DEBUG_RODATA=y CONFIG_DOUBLEFAULT=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y CONFIG_IO_DELAY_TYPE_0X80=0 CONFIG_IO_DELAY_TYPE_0XED=1 CONFIG_IO_DELAY_TYPE_UDELAY=2 CONFIG_IO_DELAY_TYPE_NONE=3 CONFIG_IO_DELAY_0X80=y CONFIG_DEFAULT_IO_DELAY_TYPE=0 CONFIG_SECURITY_DMESG_RESTRICT=y CONFIG_SECURITY=y CONFIG_SECURITYFS=y CONFIG_SECURITY_PATH=y CONFIG_SECURITY_YAMA=y CONFIG_SECURITY_YAMA_STACKED=y CONFIG_DEFAULT_SECURITY_YAMA=y CONFIG_DEFAULT_SECURITY="yama" CONFIG_CRYPTO=y CONFIG_CRYPTO_ALGAPI=y CONFIG_CRYPTO_ALGAPI2=y CONFIG_CRYPTO_AEAD2=y CONFIG_CRYPTO_BLKCIPHER=y CONFIG_CRYPTO_BLKCIPHER2=y CONFIG_CRYPTO_HASH2=y CONFIG_CRYPTO_RNG2=y CONFIG_CRYPTO_PCOMP2=y CONFIG_CRYPTO_MANAGER=y CONFIG_CRYPTO_MANAGER2=y CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y CONFIG_CRYPTO_WORKQUEUE=y CONFIG_CRYPTO_CBC=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_HW=y CONFIG_CRYPTO_DEV_GEODE=y CONFIG_HAVE_KVM=y CONFIG_BITREVERSE=y CONFIG_GENERIC_STRNCPY_FROM_USER=y CONFIG_GENERIC_STRNLEN_USER=y CONFIG_GENERIC_NET_UTILS=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_PCI_IOMAP=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_IO=y CONFIG_ARCH_HAS_FAST_MULTIPLIER=y CONFIG_CRC32=y CONFIG_CRC32_SLICEBY8=y CONFIG_LZO_COMPRESS=y CONFIG_LZO_DECOMPRESS=y CONFIG_HAS_IOMEM=y CONFIG_HAS_IOPORT_MAP=y CONFIG_HAS_DMA=y CONFIG_DQL=y CONFIG_GLOB=y CONFIG_NLATTR=y CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y CONFIG_ARCH_HAS_SG_CHAIN=y ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux 2015-04-04 18:03 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux Nix @ 2015-04-04 21:05 ` Francois Romieu 2015-04-04 21:26 ` Nix ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Francois Romieu @ 2015-04-04 21:05 UTC (permalink / raw) To: Nix; +Cc: rl, Bjarke Istrup Pedersen, David S. Miller, Linux-Netdev Nix <nix@esperi.org.uk> : [...] This driver leaves holes in its receive ring under memory pressure. It may not help. You can try the gross patch below against v3.19. It compiles. I have to go. diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c index a191afc..f58d1d5 100644 --- a/drivers/net/ethernet/via/via-rhine.c +++ b/drivers/net/ethernet/via/via-rhine.c @@ -471,7 +471,7 @@ struct rhine_private { /* Frequently used values: keep some adjacent for cache effect. */ u32 quirks; struct rx_desc *rx_head_desc; - unsigned int cur_rx, dirty_rx; /* Producer/consumer ring indices */ + unsigned int cur_rx; unsigned int cur_tx, dirty_tx; unsigned int rx_buf_sz; /* Based on MTU+slack. */ struct rhine_stats rx_stats; @@ -1218,7 +1218,7 @@ static void alloc_rbufs(struct net_device *dev) dma_addr_t next; int i; - rp->dirty_rx = rp->cur_rx = 0; + rp->cur_rx = 0; rp->rx_buf_sz = (dev->mtu <= 1500 ? PKT_BUF_SZ : dev->mtu + 32); rp->rx_head_desc = &rp->rx_ring[0]; @@ -1239,8 +1239,8 @@ static void alloc_rbufs(struct net_device *dev) for (i = 0; i < RX_RING_SIZE; i++) { struct sk_buff *skb = netdev_alloc_skb(dev, rp->rx_buf_sz); rp->rx_skbuff[i] = skb; - if (skb == NULL) - break; + + BUG_ON(skb == NULL); rp->rx_skbuff_dma[i] = dma_map_single(hwdev, skb->data, rp->rx_buf_sz, @@ -1253,7 +1253,6 @@ static void alloc_rbufs(struct net_device *dev) rp->rx_ring[i].addr = cpu_to_le32(rp->rx_skbuff_dma[i]); rp->rx_ring[i].rx_status = cpu_to_le32(DescOwn); } - rp->dirty_rx = (unsigned int)(i - RX_RING_SIZE); } static void free_rbufs(struct net_device* dev) @@ -1932,13 +1931,68 @@ static inline u16 rhine_get_vlan_tci(struct sk_buff *skb, int data_size) return be16_to_cpup((__be16 *)trailer); } +static struct sk_buff *rhine_rx_copybreak(struct net_device *dev, int entry, u16 len) +{ + struct rhine_private *rp = netdev_priv(dev); + dma_addr_t mapping = rp->rx_skbuff_dma[entry]; + struct device *hwdev = dev->dev.parent; + const int size = rp->rx_buf_sz; + struct sk_buff *new_skb; + + new_skb = netdev_alloc_skb_ip_align(dev, len); + if (unlikely(!new_skb)) { + dev->stats.rx_dropped++; + goto sync; + } + + dma_sync_single_for_cpu(hwdev, mapping, size, DMA_FROM_DEVICE); + + skb_copy_to_linear_data(new_skb, rp->rx_skbuff[entry]->data, len); +sync: + dma_sync_single_for_device(hwdev, mapping, size, DMA_FROM_DEVICE); + + return new_skb; +} + +static struct sk_buff *rhine_rx_swap_one(struct net_device *dev, int entry) +{ + struct rhine_private *rp = netdev_priv(dev); + struct sk_buff *skb = rp->rx_skbuff[entry]; + struct device *hwdev = dev->dev.parent; + const int size = rp->rx_buf_sz; + struct sk_buff *new_skb; + dma_addr_t mapping; + + new_skb = netdev_alloc_skb(dev, size); + if (!new_skb) + goto out_drop; + + mapping = dma_map_single(hwdev, new_skb->data, size, DMA_FROM_DEVICE); + if (unlikely(dma_mapping_error(hwdev, mapping))) { + netdev_err(dev, "Rx DMA mapping failure\n"); + goto out_kfree; + } + + dma_unmap_single(hwdev, rp->rx_skbuff_dma[entry], size, DMA_FROM_DEVICE); + rp->rx_skbuff[entry] = new_skb; + rp->rx_skbuff_dma[entry] = mapping; + rp->rx_ring[entry].addr = cpu_to_le32(mapping); + + return skb; + +out_kfree: + dev_kfree_skb(new_skb); +out_drop: + dev->stats.rx_dropped++; + return NULL; +} + /* Process up to limit frames from receive ring */ static int rhine_rx(struct net_device *dev, int limit) { struct rhine_private *rp = netdev_priv(dev); - struct device *hwdev = dev->dev.parent; - int count; int entry = rp->cur_rx % RX_RING_SIZE; + int count; netif_dbg(rp, rx_status, dev, "%s(), entry %d status %08x\n", __func__, entry, le32_to_cpu(rp->rx_head_desc->rx_status)); @@ -1988,42 +2042,21 @@ static int rhine_rx(struct net_device *dev, int limit) } } } else { - struct sk_buff *skb = NULL; /* Length should omit the CRC */ int pkt_len = data_size - 4; + struct sk_buff *skb; u16 vlan_tci = 0; - /* Check if the packet is long enough to accept without - copying to a minimally-sized skbuff. */ - if (pkt_len < rx_copybreak) - skb = netdev_alloc_skb_ip_align(dev, pkt_len); - if (skb) { - dma_sync_single_for_cpu(hwdev, - rp->rx_skbuff_dma[entry], - rp->rx_buf_sz, - DMA_FROM_DEVICE); - - skb_copy_to_linear_data(skb, - rp->rx_skbuff[entry]->data, - pkt_len); - skb_put(skb, pkt_len); - dma_sync_single_for_device(hwdev, - rp->rx_skbuff_dma[entry], - rp->rx_buf_sz, - DMA_FROM_DEVICE); - } else { - skb = rp->rx_skbuff[entry]; - if (skb == NULL) { - netdev_err(dev, "Inconsistent Rx descriptor chain\n"); - break; - } - rp->rx_skbuff[entry] = NULL; - skb_put(skb, pkt_len); - dma_unmap_single(hwdev, - rp->rx_skbuff_dma[entry], - rp->rx_buf_sz, - DMA_FROM_DEVICE); - } + BUG_ON(pkt_len < 0); + + skb = (pkt_len < rx_copybreak) ? + rhine_rx_copybreak(dev, entry, pkt_len) : + rhine_rx_swap_one(dev, entry); + + if (!skb) + break; + + skb_put(skb, pkt_len); if (unlikely(desc_length & DescTag)) vlan_tci = rhine_get_vlan_tci(skb, data_size); @@ -2039,34 +2072,12 @@ static int rhine_rx(struct net_device *dev, int limit) rp->rx_stats.packets++; u64_stats_update_end(&rp->rx_stats.syncp); } + rp->rx_ring[entry].rx_status = cpu_to_le32(DescOwn); + entry = (++rp->cur_rx) % RX_RING_SIZE; rp->rx_head_desc = &rp->rx_ring[entry]; } - /* Refill the Rx ring buffers. */ - for (; rp->cur_rx - rp->dirty_rx > 0; rp->dirty_rx++) { - struct sk_buff *skb; - entry = rp->dirty_rx % RX_RING_SIZE; - if (rp->rx_skbuff[entry] == NULL) { - skb = netdev_alloc_skb(dev, rp->rx_buf_sz); - rp->rx_skbuff[entry] = skb; - if (skb == NULL) - break; /* Better luck next round. */ - rp->rx_skbuff_dma[entry] = - dma_map_single(hwdev, skb->data, - rp->rx_buf_sz, - DMA_FROM_DEVICE); - if (dma_mapping_error(hwdev, - rp->rx_skbuff_dma[entry])) { - dev_kfree_skb(skb); - rp->rx_skbuff_dma[entry] = 0; - break; - } - rp->rx_ring[entry].addr = cpu_to_le32(rp->rx_skbuff_dma[entry]); - } - rp->rx_ring[entry].rx_status = cpu_to_le32(DescOwn); - } - return count; } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux 2015-04-04 21:05 ` Francois Romieu @ 2015-04-04 21:26 ` Nix 2015-04-05 4:01 ` David Miller 2015-04-05 20:59 ` Nix 2 siblings, 0 replies; 7+ messages in thread From: Nix @ 2015-04-04 21:26 UTC (permalink / raw) To: Francois Romieu; +Cc: rl, Bjarke Istrup Pedersen, David S. Miller, Linux-Netdev On 4 Apr 2015, Francois Romieu told this: > Nix <nix@esperi.org.uk> : > [...] > > This driver leaves holes in its receive ring under memory pressure. > It may not help. ... OK so that is something I really should have spotted. I wasn't looking at the driver's response to memory-shortage conditions because this machine, though swapless, has wads of memory free under normal conditions: nix@fold 2 /home/nix% cat /proc/meminfo MemTotal: 515720 kB MemFree: 376552 kB MemAvailable: 417120 kB But of course this is networking, so we need skb_allocable memory, which might be in much shorter supply than merely reclaimable/available memory (though, again, 376552kB should be enough, you'd think). > You can try the gross patch below against v3.19. It compiles. I have to go. All I can say is "good grief, a response at Eastertime?! Valour beyond the call of duty!" I'll give it a try tomorrow :) but whether it works or not, I owe you a drink next time you're in the UK south-east for spending any time at all on this at Easter! -- NULL && (void) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux 2015-04-04 21:05 ` Francois Romieu 2015-04-04 21:26 ` Nix @ 2015-04-05 4:01 ` David Miller 2015-04-05 20:59 ` Nix 2 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2015-04-05 4:01 UTC (permalink / raw) To: romieu; +Cc: nix, rl, gurligebis, netdev From: Francois Romieu <romieu@fr.zoreil.com> Date: Sat, 4 Apr 2015 23:05:18 +0200 > Nix <nix@esperi.org.uk> : > [...] > > This driver leaves holes in its receive ring under memory pressure. > It may not help. > > You can try the gross patch below against v3.19. It compiles. I have to go. Indeed, converting this driver to just drop packets when skb allocation fails on receive, and just reusing the original skb and placing it back into the RX ring, is the way to go. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux 2015-04-04 21:05 ` Francois Romieu 2015-04-04 21:26 ` Nix 2015-04-05 4:01 ` David Miller @ 2015-04-05 20:59 ` Nix 2015-04-05 23:15 ` Francois Romieu 2 siblings, 1 reply; 7+ messages in thread From: Nix @ 2015-04-05 20:59 UTC (permalink / raw) To: Francois Romieu; +Cc: rl, Bjarke Istrup Pedersen, David S. Miller, Linux-Netdev On 4 Apr 2015, Francois Romieu said: > Nix <nix@esperi.org.uk> : > [...] > > This driver leaves holes in its receive ring under memory pressure. > It may not help. > > You can try the gross patch below against v3.19. It compiles. I have to go. Gross or not, it seems to work: I've loaded it enough to crash it half a dozen times, and not a crash. However, the rx_dropped stats on the link aren't going up, so maybe I've just been lucky. I'll put it under more load tomorrow :) -- NULL && (void) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux 2015-04-05 20:59 ` Nix @ 2015-04-05 23:15 ` Francois Romieu 2015-04-05 23:29 ` Nix 0 siblings, 1 reply; 7+ messages in thread From: Francois Romieu @ 2015-04-05 23:15 UTC (permalink / raw) To: Nix; +Cc: rl, Bjarke Istrup Pedersen, David S. Miller, Linux-Netdev Nix <nix@esperi.org.uk> : [...] > Gross or not, it seems to work: I've loaded it enough to crash it half a > dozen times, and not a crash. However, the rx_dropped stats on the link > aren't going up, so maybe I've just been lucky. Rx descriptors are now recycled as soon as they are processed whereas the driver used to perform a complete processing batch before recycling any descriptor. It could make a huge difference. The pre-patch rx batch recycling did not include any barrier between rp->rx_ring[entry].addr and rp->rx_ring[entry].rx_status updates to enforce the ordering. The patch could help here as well. You don't need to focus on rx_dropped: it's just a distraction :o) Whatever the outcome I'll have to clean my mess though. -- Ueimor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux 2015-04-05 23:15 ` Francois Romieu @ 2015-04-05 23:29 ` Nix 0 siblings, 0 replies; 7+ messages in thread From: Nix @ 2015-04-05 23:29 UTC (permalink / raw) To: Francois Romieu; +Cc: rl, Bjarke Istrup Pedersen, David S. Miller, Linux-Netdev On 6 Apr 2015, Francois Romieu outgrape: > Nix <nix@esperi.org.uk> : > [...] >> Gross or not, it seems to work: I've loaded it enough to crash it half a >> dozen times, and not a crash. However, the rx_dropped stats on the link >> aren't going up, so maybe I've just been lucky. > > Rx descriptors are now recycled as soon as they are processed whereas the > driver used to perform a complete processing batch before recycling any > descriptor. It could make a huge difference. Ah, of course, nothing bounds rx rates :( I was stupidly thinking the TX_RING_SIZE / TX_QUEUE_LEN gap would help us, but of course that's on the other side. I just shouldn't read code when thick with cold, I make really stupid thinkos... tx != rx dammit, it's not like they even share much code in this driver, with rx being run out of napipoll and tx still being direct... (I'm still surprised a 64-entry RX ring can run us out of memory, though: 64 * 1500 isn't that big, even for atomic allocations...) > The pre-patch rx batch recycling did not include any barrier between > rp->rx_ring[entry].addr and rp->rx_ring[entry].rx_status updates to > enforce the ordering. I bet that's the crucial part. At high rx rates in the pre-patch driver, you fill up the ring and then lose that race, and disaster ensues. > Whatever the outcome I'll have to clean my mess though. Your mess has a) fixed the problem and b) fixed the problem *during the easter break*. Major kudos. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-04-05 23:29 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-04 18:03 3.19+: (and quite probably earlier) VIA Rhine hanging under high network load, yet again: redux Nix 2015-04-04 21:05 ` Francois Romieu 2015-04-04 21:26 ` Nix 2015-04-05 4:01 ` David Miller 2015-04-05 20:59 ` Nix 2015-04-05 23:15 ` Francois Romieu 2015-04-05 23:29 ` Nix
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).