From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754586AbbBTQtl (ORCPT ); Fri, 20 Feb 2015 11:49:41 -0500 Received: from mail-ig0-f179.google.com ([209.85.213.179]:49724 "EHLO mail-ig0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754526AbbBTQth (ORCPT ); Fri, 20 Feb 2015 11:49:37 -0500 MIME-Version: 1.0 In-Reply-To: <20150220093000.GA22661@gmail.com> References: <20150218222544.GA17717@twins.programming.kicks-ass.net> <20150220093000.GA22661@gmail.com> Date: Fri, 20 Feb 2015 08:49:36 -0800 X-Google-Sender-Auth: Orw6m44tMwizwPDqcqvcfmK_E8M Message-ID: Subject: Re: smp_call_function_single lockups From: Linus Torvalds To: Ingo Molnar Cc: Rafael David Tinoco , Peter Anvin , Jiang Liu , Peter Zijlstra , LKML , Jens Axboe , Frederic Weisbecker , Gema Gomez , Christopher Arges , "the arch/x86 maintainers" Content-Type: multipart/mixed; boundary=001a11409acc74f4d7050f87d692 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --001a11409acc74f4d7050f87d692 Content-Type: text/plain; charset=UTF-8 On Fri, Feb 20, 2015 at 1:30 AM, Ingo Molnar wrote: > > So if my memory serves me right, I think it was for local > APICs, and even there mostly it was a performance issue: if > an IO-APIC sent more than 2 IRQs per 'level' to a local > APIC then the IO-APIC might be forced to resend those IRQs, > leading to excessive message traffic on the relevant > hardware bus. Hmm. I have a distinct memory of interrupts actually being lost, but I really can't find anything to support that memory, so it's probably some drug-induced confusion of mine. I don't find *anything* about interrupt "levels" any more in modern Intel documentation on the APIC, but maybe I missed something. But it might all have been an IO-APIC thing. > I think you got it right. > > So the principle of EOI acknowledgement from the OS to the > local APIC is specific to the IRQ that raised the interrupt > and caused the vector to be executed, so it's not possible > to ack the 'wrong' IRQ. I agree - but only for cases where we actualyl should ACK in the first place. The LAPIC knows what its own priorities were, so it always just clears the highest-priority ISR bit. > So I _think_ it's not possible to accidentally acknowledge > a pending IRQ that has not been issued to the CPU yet > (unless we have hardirqs enabled), just by writing stray > EOIs to the local APIC. So in that sense the ExtInt irq0 > case should be mostly harmless. It must clearly be mostly harmless even if I'm right, since we've always done the ACK for it, as far as I can tell. So you're probably right, and it doesn't matter. In particular, it would depend on exactly when the ISR bit actually gets set in the APIC. If it gets set only after the CPU has actually *accepted* the interrupt, then it should not be possible for a spurious EOI write to screw things up, because it's still happening in the context of the previously executing interrupt, so a new ISR bit hasn't been set yet, and any old ISR bits must be lower-priority than the currently executing interrupt routine (that were interrupted by a higher-priority one). That sounds like the likely scenario. It just worries me. And my test patch did actually trigger, even if it only seemed to trigger during device probing (when the probing itself probably caused and then cleared an interrupt). > I also fully share your frustration about the level of > obfuscation the various APIC drivers display today. > > The lack of a simple single-IPI implementation is annoying > as well - when that injury was first inflicted with > clustered APICs I tried to resist, but AFAICR there were > some good hardware arguments why it cannot be kept and I > gave up. > > If you agree then I can declare a feature stop for new > hardware support (that isn't a stop-ship issue for users) > until it's all cleaned up for real, and Thomas started some > of that work already. Well, the attached patch for that seems pretty trivial. And seems to work for me (my machine also defaults to x2apic clustered mode), and allows the APIC code to start doing a "send to specific cpu" thing one by one, since it falls back to the send_IPI_mask() function if no individual CPU IPI function exists. NOTE! There's a few cases in arch/x86/kernel/apic/vector.c that also do that "apic->send_IPI_mask(cpumask_of(i), .." thing, but they aren't that important, so I didn't bother with them. NOTE2! I've tested this, and it seems to work, but maybe there is something seriously wrong. I skipped the "disable interrupts" part when doing the "send_IPI", for example, because I think it's entirely unnecessary for that case. But this has certainly *not* gotten any real stress-testing. But /proc/interrupts shows thousands of rescheduling IPI's, so this should all have triggered. Linus --001a11409acc74f4d7050f87d692 Content-Type: text/plain; charset=US-ASCII; name="patch.diff" Content-Disposition: attachment; filename="patch.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i6dtbv5x0 IGFyY2gveDg2L2luY2x1ZGUvYXNtL2FwaWMuaCAgICAgICAgICAgfCAgMSArCiBhcmNoL3g4Ni9r ZXJuZWwvYXBpYy94MmFwaWNfY2x1c3Rlci5jIHwgIDkgKysrKysrKysrCiBhcmNoL3g4Ni9rZXJu ZWwvc21wLmMgICAgICAgICAgICAgICAgIHwgMTYgKysrKysrKysrKysrKystLQogMyBmaWxlcyBj aGFuZ2VkLCAyNCBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2Fy Y2gveDg2L2luY2x1ZGUvYXNtL2FwaWMuaCBiL2FyY2gveDg2L2luY2x1ZGUvYXNtL2FwaWMuaApp bmRleCA5MjAwM2YzYzhhNDIuLjVkNDdmZmNmYTY1ZCAxMDA2NDQKLS0tIGEvYXJjaC94ODYvaW5j bHVkZS9hc20vYXBpYy5oCisrKyBiL2FyY2gveDg2L2luY2x1ZGUvYXNtL2FwaWMuaApAQCAtMjk2 LDYgKzI5Niw3IEBAIHN0cnVjdCBhcGljIHsKIAkJCQkgICAgICB1bnNpZ25lZCBpbnQgKmFwaWNp ZCk7CiAKIAkvKiBpcGkgKi8KKwl2b2lkICgqc2VuZF9JUEkpKGludCBjcHUsIGludCB2ZWN0b3Ip OwogCXZvaWQgKCpzZW5kX0lQSV9tYXNrKShjb25zdCBzdHJ1Y3QgY3B1bWFzayAqbWFzaywgaW50 IHZlY3Rvcik7CiAJdm9pZCAoKnNlbmRfSVBJX21hc2tfYWxsYnV0c2VsZikoY29uc3Qgc3RydWN0 IGNwdW1hc2sgKm1hc2ssCiAJCQkJCSBpbnQgdmVjdG9yKTsKZGlmZiAtLWdpdCBhL2FyY2gveDg2 L2tlcm5lbC9hcGljL3gyYXBpY19jbHVzdGVyLmMgYi9hcmNoL3g4Ni9rZXJuZWwvYXBpYy94MmFw aWNfY2x1c3Rlci5jCmluZGV4IGU2NThmMjE2ODFjOC4uMTc3YzRmYjAyN2ExIDEwMDY0NAotLS0g YS9hcmNoL3g4Ni9rZXJuZWwvYXBpYy94MmFwaWNfY2x1c3Rlci5jCisrKyBiL2FyY2gveDg2L2tl cm5lbC9hcGljL3gyYXBpY19jbHVzdGVyLmMKQEAgLTIzLDYgKzIzLDE0IEBAIHN0YXRpYyBpbmxp bmUgdTMyIHgyYXBpY19jbHVzdGVyKGludCBjcHUpCiAJcmV0dXJuIHBlcl9jcHUoeDg2X2NwdV90 b19sb2dpY2FsX2FwaWNpZCwgY3B1KSA+PiAxNjsKIH0KIAorc3RhdGljIHZvaWQgeDJhcGljX3Nl bmRfSVBJKGludCBjcHUsIGludCB2ZWN0b3IpCit7CisJdTMyIGRlc3QgPSBwZXJfY3B1KHg4Nl9j cHVfdG9fbG9naWNhbF9hcGljaWQsIGNwdSk7CisKKwl4MmFwaWNfd3Jtc3JfZmVuY2UoKTsKKwlf X3gyYXBpY19zZW5kX0lQSV9kZXN0KGRlc3QsIHZlY3RvciwgQVBJQ19ERVNUX0xPR0lDQUwpOwor fQorCiBzdGF0aWMgdm9pZAogX194MmFwaWNfc2VuZF9JUElfbWFzayhjb25zdCBzdHJ1Y3QgY3B1 bWFzayAqbWFzaywgaW50IHZlY3RvciwgaW50IGFwaWNfZGVzdCkKIHsKQEAgLTI2Niw2ICsyNzQs NyBAQCBzdGF0aWMgc3RydWN0IGFwaWMgYXBpY194MmFwaWNfY2x1c3RlciA9IHsKIAogCS5jcHVf bWFza190b19hcGljaWRfYW5kCQk9IHgyYXBpY19jcHVfbWFza190b19hcGljaWRfYW5kLAogCisJ LnNlbmRfSVBJCQkJPSB4MmFwaWNfc2VuZF9JUEksCiAJLnNlbmRfSVBJX21hc2sJCQk9IHgyYXBp Y19zZW5kX0lQSV9tYXNrLAogCS5zZW5kX0lQSV9tYXNrX2FsbGJ1dHNlbGYJPSB4MmFwaWNfc2Vu ZF9JUElfbWFza19hbGxidXRzZWxmLAogCS5zZW5kX0lQSV9hbGxidXRzZWxmCQk9IHgyYXBpY19z ZW5kX0lQSV9hbGxidXRzZWxmLApkaWZmIC0tZ2l0IGEvYXJjaC94ODYva2VybmVsL3NtcC5jIGIv YXJjaC94ODYva2VybmVsL3NtcC5jCmluZGV4IGJlOGUxYmRlMDdhYS4uNzhjOWIxMmQ4OTJhIDEw MDY0NAotLS0gYS9hcmNoL3g4Ni9rZXJuZWwvc21wLmMKKysrIGIvYXJjaC94ODYva2VybmVsL3Nt cC5jCkBAIC0xMTQsNiArMTE0LDE4IEBAIHN0YXRpYyBhdG9taWNfdCBzdG9wcGluZ19jcHUgPSBB VE9NSUNfSU5JVCgtMSk7CiBzdGF0aWMgYm9vbCBzbXBfbm9fbm1pX2lwaSA9IGZhbHNlOwogCiAv KgorICogSGVscGVyIHdyYXBwZXI6IG5vdCBhbGwgYXBpYyBkZWZpbml0aW9ucyBzdXBwb3J0IHNl bmRpbmcgdG8KKyAqIGEgc2luZ2xlIENQVSwgc28gd2UgZmFsbCBiYWNrIHRvIHNlbmRpbmcgdG8g YSBtYXNrLgorICovCitzdGF0aWMgdm9pZCBzZW5kX0lQSV9jcHUoaW50IGNwdSwgaW50IHZlY3Rv cikKK3sKKwlpZiAoYXBpYy0+c2VuZF9JUEkpCisJCWFwaWMtPnNlbmRfSVBJKGNwdSwgdmVjdG9y KTsKKwllbHNlCisJCWFwaWMtPnNlbmRfSVBJX21hc2soY3B1bWFza19vZihjcHUpLCB2ZWN0b3Ip OworfQorCisvKgogICogdGhpcyBmdW5jdGlvbiBzZW5kcyBhICdyZXNjaGVkdWxlJyBJUEkgdG8g YW5vdGhlciBDUFUuCiAgKiBpdCBnb2VzIHN0cmFpZ2h0IHRocm91Z2ggYW5kIHdhc3RlcyBubyB0 aW1lIHNlcmlhbGl6aW5nCiAgKiBhbnl0aGluZy4gV29yc3QgY2FzZSBpcyB0aGF0IHdlIGxvc2Ug YSByZXNjaGVkdWxlIC4uLgpAQCAtMTI0LDEyICsxMzYsMTIgQEAgc3RhdGljIHZvaWQgbmF0aXZl X3NtcF9zZW5kX3Jlc2NoZWR1bGUoaW50IGNwdSkKIAkJV0FSTl9PTigxKTsKIAkJcmV0dXJuOwog CX0KLQlhcGljLT5zZW5kX0lQSV9tYXNrKGNwdW1hc2tfb2YoY3B1KSwgUkVTQ0hFRFVMRV9WRUNU T1IpOworCXNlbmRfSVBJX2NwdShjcHUsIFJFU0NIRURVTEVfVkVDVE9SKTsKIH0KIAogdm9pZCBu YXRpdmVfc2VuZF9jYWxsX2Z1bmNfc2luZ2xlX2lwaShpbnQgY3B1KQogewotCWFwaWMtPnNlbmRf SVBJX21hc2soY3B1bWFza19vZihjcHUpLCBDQUxMX0ZVTkNUSU9OX1NJTkdMRV9WRUNUT1IpOwor CXNlbmRfSVBJX2NwdShjcHUsIENBTExfRlVOQ1RJT05fU0lOR0xFX1ZFQ1RPUik7CiB9CiAKIHZv aWQgbmF0aXZlX3NlbmRfY2FsbF9mdW5jX2lwaShjb25zdCBzdHJ1Y3QgY3B1bWFzayAqbWFzaykK --001a11409acc74f4d7050f87d692--