From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932259AbaKRSy1 (ORCPT ); Tue, 18 Nov 2014 13:54:27 -0500 Received: from www.linutronix.de ([62.245.132.108]:40930 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754379AbaKRSyX (ORCPT ); Tue, 18 Nov 2014 13:54:23 -0500 Date: Tue, 18 Nov 2014 19:54:17 +0100 (CET) From: Thomas Gleixner To: Dave Jones cc: Linus Torvalds , Linux Kernel , the arch/x86 maintainers Subject: Re: frequent lockups in 3.18rc4 In-Reply-To: <20141118145234.GA7487@redhat.com> Message-ID: References: <20141115213405.GA31971@redhat.com> <20141116014006.GA5016@redhat.com> <20141117170359.GA1382@redhat.com> <20141118020959.GA2091@redhat.com> <20141118023930.GA2871@redhat.com> <20141118145234.GA7487@redhat.com> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 18 Nov 2014, Dave Jones wrote: > Here's the first hit. Curiously, one cpu is missing. I don't think so > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837] > irq event stamp: 2258092 > hardirqs last enabled at (2258091): [] get_page_from_freelist+0x555/0xaa0 > hardirqs last disabled at (2258092): [] apic_timer_interrupt+0x6a/0x80 So that means we are in the timer interrupt and handling watchdog_timer_fn. > CPU: 1 PID: 17837 Comm: trinity-c180 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526] > task: ffff8801575e4680 ti: ffff880202434000 task.ti: ffff880202434000 > RIP: 0010:[] [] bad_range+0x0/0x90 So the softlockup tells us, that the high priority watchdog thread was not able to touch the watchdog timestamp. That means this task was hogging the CPU for 20+ seconds. I have no idea how that happens in that call chain. Call Trace: [] ? get_page_from_freelist+0x564/0xaa0 [] ? get_parent_ip+0x11/0x50 [] __alloc_pages_nodemask+0x230/0xd20 [] ? get_parent_ip+0x11/0x50 [] ? mark_held_locks+0x75/0xa0 [] alloc_pages_vma+0xee/0x1b0 [] ? shmem_alloc_page+0x6e/0xc0 [] shmem_alloc_page+0x6e/0xc0 [] ? get_parent_ip+0x11/0x50 [] ? preempt_count_sub+0x7b/0x100 [] ? __percpu_counter_add+0x86/0xb0 [] ? __vm_enough_memory+0x66/0x1c0 [] ? find_get_entry+0x5/0x230 [] ? cap_vm_enough_memory+0x4c/0x60 [] shmem_getpage_gfp+0x630/0xa40 [] ? match_held_lock+0x111/0x160 [] shmem_write_begin+0x42/0x70 [] generic_perform_write+0xd4/0x1f0 [] __generic_file_write_iter+0x162/0x350 [] ? new_sync_read+0xd0/0xd0 [] generic_file_write_iter+0x3f/0xb0 [] ? __generic_file_write_iter+0x350/0x350 [] do_iter_readv_writev+0x78/0xc0 [] do_readv_writev+0xd8/0x2a0 [] ? __generic_file_write_iter+0x350/0x350 [] ? lock_release_holdtime.part.28+0xe6/0x160 [] ? __generic_file_write_iter+0x350/0x350 [] ? get_parent_ip+0x11/0x50 [] ? preempt_count_sub+0x7b/0x100 [] ? rcu_read_lock_held+0x6e/0x80 [] vfs_writev+0x3c/0x50 [] SyS_writev+0x5c/0x100 [] tracesys_phase2+0xd4/0xd9 But this gets pages for a write into shmem and the other one below does a madvise on a shmem map. Coincidence? > sending NMI to other CPUs: So here we kick the other cpus > NMI backtrace for cpu 2 > CPU: 2 PID: 15913 Comm: trinity-c141 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 35/402 20526] > task: ffff880223229780 ti: ffff8801afca0000 task.ti: ffff8801afca0000 > RIP: 0010:[] [] generic_exec_single+0xee/0x1a0 > [] ? do_flush_tlb_all+0x60/0x60 > [] ? do_flush_tlb_all+0x60/0x60 > [] smp_call_function_single+0x6a/0xe0 > [] ? cpumask_next_and+0x4f/0xb0 > [] ? do_flush_tlb_all+0x60/0x60 > [] smp_call_function_many+0x2b9/0x320 > [] flush_tlb_mm_range+0xe0/0x370 > [] tlb_flush_mmu_tlbonly+0x42/0x50 > [] unmap_single_vma+0x6b8/0x900 > [] zap_page_range_single+0xfc/0x160 > [] unmap_mapping_range+0x134/0x190 > [] shmem_fallocate+0x4fd/0x520 > [] ? prepare_to_wait+0x27/0x90 > [] do_fallocate+0x132/0x1d0 > [] SyS_madvise+0x398/0x870 > [] ? rcu_read_lock_sched_held+0x4e/0x6a > [] ? syscall_trace_enter_phase2+0xa7/0x2b0 > [] tracesys_phase2+0xd4/0xd9 We've seen that before > NMI backtrace for cpu 0 > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 37.091 msecs So it complains that the backtrace handler took 37 msec, which is indeed long for just dumping a stack trace. > CPU: 0 PID: 15851 Comm: trinity-c80 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 36/402 20526] > task: ffff8801874e8000 ti: ffff88022baec000 task.ti: ffff88022baec000 > RIP: 0010:[] [] preempt_count_add+0x0/0xc0 > RSP: 0000:ffff880244003c30 EFLAGS: 00000092 > RAX: 0000000000000001 RBX: ffffffffa9edb560 RCX: 0000000000000001 > RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001 > RBP: ffff880244003c48 R08: 0000000000000000 R09: 0000000000000001 > R10: 0000000000000000 R11: ffff8801874e88c8 [23543.271956] NMI backtrace for cpu 3 So here we mangle CPU3 in and lose the backtrace for cpu0, which might be the real interesting one .... > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 100.612 msecs This one takes 100ms. > CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc5+ #90 [loadavg: 199.00 178.81 173.92 37/402 20526] > task: ffff880242b5c680 ti: ffff880242b78000 task.ti: ffff880242b78000 > RIP: 0010:[] [] intel_idle+0xd5/0x180 So that one is simply idle. > INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 125.739 msecs > And we get another backtrace handler taking too long. Of course we cannot tell which of the 3 complaints comes from which cpu, because the printk lacks a cpuid. Thanks, tglx