From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752899AbbBKNTO (ORCPT ); Wed, 11 Feb 2015 08:19:14 -0500 Received: from mail-qa0-f44.google.com ([209.85.216.44]:60904 "EHLO mail-qa0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752763AbbBKNTN (ORCPT ); Wed, 11 Feb 2015 08:19:13 -0500 MIME-Version: 1.0 X-Originating-IP: [191.180.238.226] Date: Wed, 11 Feb 2015 11:19:11 -0200 Message-ID: Subject: smp_call_function_single lockups From: Rafael David Tinoco To: LKML Cc: torvalds@linux-foundation.org, Thomas Gleixner , Jens Axboe Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus, Thomas, Jens.. During the 3.18 - 3.19 "frequent lockups discussion", in some point you have observed csd_lock() and csd_unlock() possible synchronization problems. I think we have managed to reproduce that issue in a constant basis with 3.13 (ubuntu) and 3.19 (latest vanilla). - When running "open-stack tempest" in a nested-kvm environment we are able to cause a lockup in question of hours (from 2 to 20 hours usually). Trace from nested hypervisor (ubuntu 3.13): crash> bt PID: 29130 TASK: ffff8804288ac800 CPU: 1 COMMAND: "qemu-system-x86" #0 [ffff88043fd03d18] machine_kexec at ffffffff8104ac02 #1 [ffff88043fd03d68] crash_kexec at ffffffff810e7203 #2 [ffff88043fd03e30] panic at ffffffff81719ff4 #3 [ffff88043fd03ea8] watchdog_timer_fn at ffffffff8110d7c5 #4 [ffff88043fd03ed8] __run_hrtimer at ffffffff8108e787 #5 [ffff88043fd03f18] hrtimer_interrupt at ffffffff8108ef4f #6 [ffff88043fd03f80] local_apic_timer_interrupt at ffffffff81043537 #7 [ffff88043fd03f98] smp_apic_timer_interrupt at ffffffff81733d4f #8 [ffff88043fd03fb0] apic_timer_interrupt at ffffffff817326dd --- --- #9 [ffff8804284a3bc8] apic_timer_interrupt at ffffffff817326dd [exception RIP: generic_exec_single+130] RIP: ffffffff810dbe62 RSP: ffff8804284a3c70 RFLAGS: 00000202 RAX: 0000000000000002 RBX: ffff8804284a3c40 RCX: 0000000000000001 RDX: ffffffff8180ad60 RSI: 0000000000000000 RDI: 0000000000000286 RBP: ffff8804284a3ca0 R8: ffffffff8180ad48 R9: 0000000000000001 R10: ffffffff81185cac R11: ffffea00109b4a00 R12: ffff88042829f400 R13: 0000000000000000 R14: ffffea001017d640 R15: 0000000000000005 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #10 [ffff8804284a3ca8] smp_call_function_single at ffffffff810dbf75 #11 [ffff8804284a3d20] smp_call_function_many at ffffffff810dc3a6 #12 [ffff8804284a3d80] native_flush_tlb_others at ffffffff8105c8f7 #13 [ffff8804284a3da8] flush_tlb_mm_range at ffffffff8105c9cb #14 [ffff8804284a3dd8] tlb_flush_mmu at ffffffff811755b3 #15 [ffff8804284a3e00] tlb_finish_mmu at ffffffff81176145 #16 [ffff8804284a3e20] unmap_region at ffffffff8117e013 #17 [ffff8804284a3ee0] do_munmap at ffffffff81180356 #18 [ffff8804284a3f30] vm_munmap at ffffffff81180521 #19 [ffff8804284a3f60] sys_munmap at ffffffff81181482 #20 [ffff8804284a3f80] system_call_fastpath at ffffffff8173196d RIP: 00007fa3ed16c587 RSP: 00007fa3536f5c10 RFLAGS: 00000246 RAX: 000000000000000b RBX: ffffffff8173196d RCX: 0000001d00000007 RDX: 0000000000000000 RSI: 0000000000801000 RDI: 00007fa315ff4000 RBP: 00007fa3167f49c0 R8: 0000000000000000 R9: 00007fa3f5396738 R10: 00007fa3536f5a60 R11: 0000000000000202 R12: 00007fa3ed6562a0 R13: 00007fa350ef19c0 R14: ffffffff81181482 R15: ffff8804284a3f78 ORIG_RAX: 000000000000000b CS: 0033 SS: 002b - After applying patch provided by Thomas we were able to cause the lockup only after 6 days (also locked inside smp_call_function_single). Test performance (even for a nested kvm) was reduced substantially with 3.19 + this patch. Trace from the nested hypervisor (3.19 + patch): crash> bt PID: 10467 TASK: ffff880817b3b1c0 CPU: 1 COMMAND: "qemu-system-x86" #0 [ffff88083fd03cc0] machine_kexec at ffffffff81052052 #1 [ffff88083fd03d10] crash_kexec at ffffffff810f91c3 #2 [ffff88083fd03de0] panic at ffffffff8176f713 #3 [ffff88083fd03e60] watchdog_timer_fn at ffffffff8112316b #4 [ffff88083fd03ea0] __run_hrtimer at ffffffff810da087 #5 [ffff88083fd03ef0] hrtimer_interrupt at ffffffff810da467 #6 [ffff88083fd03f70] local_apic_timer_interrupt at ffffffff81049769 #7 [ffff88083fd03f90] smp_apic_timer_interrupt at ffffffff8177fc25 #8 [ffff88083fd03fb0] apic_timer_interrupt at ffffffff8177dcbd --- --- #9 [ffff880817973a68] apic_timer_interrupt at ffffffff8177dcbd [exception RIP: generic_exec_single+218] RIP: ffffffff810ee0ca RSP: ffff880817973b18 RFLAGS: 00000202 RAX: 0000000000000002 RBX: 0000000000000292 RCX: 0000000000000001 RDX: ffffffff8180e6e0 RSI: 0000000000000000 RDI: 0000000000000292 RBP: ffff880817973b58 R8: ffffffff8180e6c8 R9: 0000000000000001 R10: 000000000000b6e0 R11: 0000000000000001 R12: ffffffff811f6626 R13: ffff880817973ab8 R14: ffffffff8109cfd2 R15: ffff880817973a78 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #10 [ffff880817973b60] smp_call_function_single at ffffffff810ee1c7 #11 [ffff880817973b90] loaded_vmcs_clear at ffffffffa0309097 [kvm_intel] #12 [ffff880817973ba0] vmx_vcpu_load at ffffffffa030defe [kvm_intel] #13 [ffff880817973be0] kvm_arch_vcpu_load at ffffffffa01eba53 [kvm] #14 [ffff880817973c00] kvm_sched_in at ffffffffa01d94a9 [kvm] #15 [ffff880817973c20] finish_task_switch at ffffffff81099148 #16 [ffff880817973c60] __schedule at ffffffff817781ec #17 [ffff880817973cd0] schedule at ffffffff81778699 #18 [ffff880817973ce0] kvm_vcpu_block at ffffffffa01d8dfd [kvm] #19 [ffff880817973d40] kvm_arch_vcpu_ioctl_run at ffffffffa01ef64c [kvm] #20 [ffff880817973e10] kvm_vcpu_ioctl at ffffffffa01dbc19 [kvm] #21 [ffff880817973eb0] do_vfs_ioctl at ffffffff811f5948 #22 [ffff880817973f30] sys_ioctl at ffffffff811f5be1 #23 [ffff880817973f80] system_call_fastpath at ffffffff8177cc2d RIP: 00007f42f987fec7 RSP: 00007f42ef1bebd8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff8177cc2d RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000e RBP: 00007f430047b040 R8: 0000000000000000 R9: 00000000000000ff R10: 0000000000000000 R11: 0000000000000246 R12: 00007f42ff920240 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Not sure if you are still pursuing this, anyway.. let me know if you think of any other change, I'll keep the environment. -Tinoco On Mon, 17 Nov 2014, Thomas Gleixner wrote: > On Mon, 17 Nov 2014, Linus Torvalds wrote: > > llist_for_each_entry_safe(csd, csd_next, entry, llist) { > > - csd->func(csd->info); > > + smp_call_func_t func = csd->func; > > + void *info = csd->info; > > csd_unlock(csd); > > + > > + func(info); > > No, that won't work for synchronous calls: > > CPU 0 CPU 1 > > csd_lock(csd); > queue_csd(); > ipi(); > func = csd->func; > info = csd->info; > csd_unlock(csd); > csd_lock_wait(); > func(info); > > The csd_lock_wait() side will succeed and therefor assume that the > call has been completed while the function has not been called at > all. Interesting explosions to follow. > > The proper solution is to revert that commit and properly analyze the > problem which Jens was trying to solve and work from there. So a combo of both (Jens and yours) might do the trick. Patch below. I think what Jens was trying to solve is: CPU 0 CPU 1 csd_lock(csd); queue_csd(); ipi(); csd->func(csd->info); wait_for_completion(csd); complete(csd); reuse_csd(csd); csd_unlock(csd); Thanks, tglx Index: linux/kernel/smp.c =================================================================== --- linux.orig/kernel/smp.c +++ linux/kernel/smp.c @@ -126,7 +126,7 @@ static void csd_lock(struct call_single_ static void csd_unlock(struct call_single_data *csd) { - WARN_ON((csd->flags & CSD_FLAG_WAIT) && !(csd->flags & CSD_FLAG_LOCK)); + WARN_ON(!(csd->flags & CSD_FLAG_LOCK)); /* * ensure we're all done before releasing data: @@ -250,8 +250,23 @@ static void flush_smp_call_function_queu } llist_for_each_entry_safe(csd, csd_next, entry, llist) { - csd->func(csd->info); - csd_unlock(csd); + + /* + * For synchronous calls we are not allowed to unlock + * before the callback returned. For the async case + * its the responsibility of the caller to keep + * csd->info consistent while the callback runs. + */ + if (csd->flags & CSD_FLAG_WAIT) { + csd->func(csd->info); + csd_unlock(csd); + } else { + smp_call_func_t func = csd->func; + void *info = csd->info; + + csd_unlock(csd); + func(info); + } } /*