From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754508Ab3AZUG1 (ORCPT ); Sat, 26 Jan 2013 15:06:27 -0500 Received: from mail-vc0-f174.google.com ([209.85.220.174]:48719 "EHLO mail-vc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754474Ab3AZUGW (ORCPT ); Sat, 26 Jan 2013 15:06:22 -0500 MIME-Version: 1.0 In-Reply-To: <20130126075357.GA3205@udknight> References: <20130126075357.GA3205@udknight> From: Linus Torvalds Date: Sat, 26 Jan 2013 12:06:01 -0800 X-Google-Sender-Auth: mVhNcA63MNbUTqJGa6g8nbsMUVU Message-ID: Subject: Re: [PATCH]smp: Fix send func call IPI to empty cpu mask To: Wang YanQing , Andrew Morton , Peter Zijlstra , Thomas Gleixner , mina86@mina86.org, "Srivatsa S. Bhat" , Linux Kernel Mailing List , stable , Ingo Molnar , Mike Galbraith , Jan Beulich , Milton Miller Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing wrote: > I get below warning every day with 3.7, > one or two times per day. > > [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8() > [ 2235.186030] Hardware name: Aspire 4741 > [ 2235.186032] empty IPI mask > [ 2235.186079] [] native_send_call_func_ipi+0x4f/0x57 > [ 2235.186087] [] smp_call_function_many+0x191/0x1a9 > [ 2235.186097] [] native_flush_tlb_others+0x21/0x24 > [ 2235.186101] [] flush_tlb_page+0x63/0x89 > [ 2235.186105] [] ptep_set_access_flags+0x20/0x26 > [ 2235.186111] [] do_wp_page+0x234/0x502 > [ 2235.186121] [] handle_pte_fault+0x50d/0x54c > [ 2235.186148] [] handle_mm_fault+0xd0/0xe2 > [ 2235.186153] [] __do_page_fault+0x411/0x42d > [ 2235.186166] [] do_page_fault+0x8/0xa > [ 2235.186170] [] error_code+0x5a/0x60 > > This patch fix it. > > This patch also fix some system hang problem: > If the data->cpumask been cleared after pass > > if (WARN_ONCE(!mask, "empty IPI mask")) > return; > then the problem 83d349f3 fix will happen again. Hmm. We have very consciously tried to avoid the extra copy, although I'm not entirely sure why (it might possibly hurt on the MAXSMP configuration). See for example commit 723aae25d5cd ("smp_call_function_many: handle concurrent clearing of mask") which fixed another version of this problem. But I do agree that it looks like the copy is required, simply because - as you say - once we've done the "list_add_rcu()" to add it to the queue, we can have (another) IPI to the target CPU that can now see it and clear the mask. So by the time we get to actually send the IPI, the mask might have been cleared by another IPI. So I do agree that your patch seems correct, but I really really want to run it by other people. Guys? Original patch on lkml. The other possible fix might be to take the &call_function.lock earlier in generic_smp_call_function_interrupt(), so that we can never clear the bit while somebody is adding entries to the list... But I think it very much tries to avoid that on purpose right now, with only the last CPU responding to that IPI taking the lock. So copying the IPI mask seems to be the reasonable approach. Comments? Linus