From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752145AbbBSPmn (ORCPT ); Thu, 19 Feb 2015 10:42:43 -0500 Received: from mail-qc0-f172.google.com ([209.85.216.172]:63971 "EHLO mail-qc0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751795AbbBSPmk (ORCPT ); Thu, 19 Feb 2015 10:42:40 -0500 MIME-Version: 1.0 X-Originating-IP: [191.180.238.226] In-Reply-To: <20150218222544.GA17717@twins.programming.kicks-ass.net> References: <20150218222544.GA17717@twins.programming.kicks-ass.net> Date: Thu, 19 Feb 2015 13:42:39 -0200 Message-ID: Subject: Re: smp_call_function_single lockups From: Rafael David Tinoco To: Peter Zijlstra Cc: Linus Torvalds , LKML , Thomas Gleixner , Jens Axboe , Frederic Weisbecker , Gema Gomez , chris.j.arges@canonical.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus, Peter, Thomas Just a quick feedback, We were able to reproduce the lockup with this proposed patch (3.19 + patch). Unfortunately we had problems with the core file and I have only the stack trace for now but I think we are able to reproduce it again and provide more details (sorry for the delay... after a reboot it took some days for us to reproduce this again). It looks like RIP is still smp_call_function_single. Same environment as before: Nested KVM (2 vcpus) on top of Proliant DL380G8 with acpi_idle and no x2apic optout. [47708.068013] CPU: 0 PID: 29869 Comm: qemu-system-x86 Tainted: G E 3.19.0-c7671cf-lp1413540v2 #31 [47708.068013] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [47708.068013] task: ffff88081b9beca0 ti: ffff88081a7a0000 task.ti: ffff88081a7a0000 [47708.068013] RIP: 0010:[] [] smp_call_function_single+0xca/0x120 [47708.068013] RSP: 0018:ffff88081a7a3b38 EFLAGS: 00000202 [47708.068013] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002 [47708.068013] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000296 [47708.068013] RBP: ffff88081a7a3b78 R08: ffffffff81815168 R09: ffff880818192000 [47708.068013] R10: 000000000000bdf6 R11: 000000000001bf90 R12: 00080000810b66f8 [47708.068013] R13: 00000000000000fb R14: 0000000000000296 R15: 0000000000000000 [47708.068013] FS: 00007fa143fff700(0000) GS:ffff88083fc00000(0000) knlGS:0000000000000000 [47708.068013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [47708.068013] CR2: 00007f5d76f5d050 CR3: 00000008190cc000 CR4: 00000000000426f0 [47708.068013] Stack: [47708.068013] ffff88083fd151b8 0000000000000001 0000000000000000 ffffffffc0589320 [47708.068013] ffff88081a547a80 0000000000000003 ffff88081a543f80 0000000000000000 [47708.068013] ffff88081a7a3b88 ffffffffc0586097 ffff88081a7a3bc8 ffffffffc058aefe [47708.068013] Call Trace: [47708.068013] [] ? copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel] [47708.068013] [] loaded_vmcs_clear+0x27/0x30 [kvm_intel] [47708.068013] [] vmx_vcpu_load+0x17e/0x1a0 [kvm_intel] [47708.068013] [] ? set_next_entity+0x9d/0xb0 [47708.068013] [] kvm_arch_vcpu_load+0x33/0x1f0 [kvm] [47708.068013] [] kvm_sched_in+0x39/0x40 [kvm] [47708.068013] [] finish_task_switch+0x98/0x1a0 [47708.068013] [] __schedule+0x33b/0x900 [47708.068013] [] schedule+0x37/0x90 [47708.068013] [] kvm_vcpu_block+0x6d/0xb0 [kvm] [47708.068013] [] ? prepare_to_wait_event+0x110/0x110 [47708.068013] [] kvm_arch_vcpu_ioctl_run+0x10c/0x1290 [kvm] [47708.068013] [] kvm_vcpu_ioctl+0x2ce/0x670 [kvm] [47708.068013] [] ? new_sync_write+0x81/0xb0 [47708.068013] [] do_vfs_ioctl+0x2f8/0x510 [47708.068013] [] ? __sb_end_write+0x35/0x70 [47708.068013] [] ? kvm_on_user_return+0x74/0x80 [kvm] [47708.068013] [] SyS_ioctl+0x81/0xa0 [47708.068013] [] system_call_fastpath+0x16/0x1b [47708.068013] Code: 30 5b 41 5c 5d c3 0f 1f 00 48 8d 75 d0 48 89 d1 89 df 4c 89 e2 e8 57 fe ff ff 0f b7 55 e8 83 e2 01 74 da 66 0f 1f 44 00 00 f3 90 <0f> b7 55 e8 83 e2 01 75 f5 eb c7 0f 1f 00 8b 05 ca e6 dd 00 85 [47708.068013] Kernel panic - not syncing: softlockup: hung tasks [47708.068013] CPU: 0 PID: 29869 Comm: qemu-system-x86 Tainted: G EL 3.19.0-c7671cf-lp1413540v2 #31 [47708.068013] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [47708.068013] ffff88081b9beca0 ffff88083fc03de8 ffffffff817a6bf6 0000000000000000 [47708.068013] ffffffff81ab30d4 ffff88083fc03e68 ffffffff817a1aec 0000000000000e92 [47708.068013] 0000000000000008 ffff88083fc03e78 ffff88083fc03e18 ffff88083fc03e68 [47708.068013] Call Trace: [47708.068013] [] dump_stack+0x45/0x57 [47708.068013] [] panic+0xc1/0x1f5 [47708.068013] [] watchdog_timer_fn+0x1db/0x1f0 [47708.068013] [] __run_hrtimer+0x77/0x1d0 [47708.068013] [] ? watchdog+0x30/0x30 [47708.068013] [] hrtimer_interrupt+0xf3/0x220 [47708.068013] [] ? copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel] [47708.068013] [] local_apic_timer_interrupt+0x39/0x60 [47708.068013] [] smp_apic_timer_interrupt+0x45/0x60 [47708.068013] [] apic_timer_interrupt+0x6d/0x80 [47708.068013] [] ? smp_call_function_single+0xca/0x120 [47708.068013] [] ? smp_call_function_single+0xb9/0x120 [47708.068013] [] ? copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel] [47708.068013] [] loaded_vmcs_clear+0x27/0x30 [kvm_intel] [47708.068013] [] vmx_vcpu_load+0x17e/0x1a0 [kvm_intel] [47708.068013] [] ? set_next_entity+0x9d/0xb0 [47708.068013] [] kvm_arch_vcpu_load+0x33/0x1f0 [kvm] [47708.068013] [] kvm_sched_in+0x39/0x40 [kvm] [47708.068013] [] finish_task_switch+0x98/0x1a0 [47708.068013] [] __schedule+0x33b/0x900 [47708.068013] [] schedule+0x37/0x90 [47708.068013] [] kvm_vcpu_block+0x6d/0xb0 [kvm] [47708.068013] [] ? prepare_to_wait_event+0x110/0x110 [47708.068013] [] kvm_arch_vcpu_ioctl_run+0x10c/0x1290 [kvm] [47708.068013] [] kvm_vcpu_ioctl+0x2ce/0x670 [kvm] [47708.068013] [] ? new_sync_write+0x81/0xb0 [47708.068013] [] do_vfs_ioctl+0x2f8/0x510 [47708.068013] [] ? __sb_end_write+0x35/0x70 [47708.068013] [] ? kvm_on_user_return+0x74/0x80 [kvm] [47708.068013] [] SyS_ioctl+0x81/0xa0 [47708.068013] [] system_call_fastpath+0x16/0x1b Tks Rafael Tinoco On Wed, Feb 18, 2015 at 8:25 PM, Peter Zijlstra wrote: > On Wed, Feb 11, 2015 at 12:42:10PM -0800, Linus Torvalds wrote: >> Ok, this is a more involved patch than I'd like, but making the >> *caller* do all the CSD maintenance actually cleans things up. >> >> And this is still completely untested, and may be entirely buggy. What >> do you guys think? > > I think it makes perfect sense. > > Acked-by: Peter Zijlstra (Intel)