From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCC33ECE597 for ; Thu, 17 Oct 2019 17:23:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AAB5C21925 for ; Thu, 17 Oct 2019 17:23:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2440682AbfJQRXO (ORCPT ); Thu, 17 Oct 2019 13:23:14 -0400 Received: from mga09.intel.com ([134.134.136.24]:62788 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436760AbfJQRXO (ORCPT ); Thu, 17 Oct 2019 13:23:14 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2019 10:23:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,308,1566889200"; d="scan'208";a="226245158" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga002.fm.intel.com with ESMTP; 17 Oct 2019 10:23:12 -0700 Date: Thu, 17 Oct 2019 10:23:12 -0700 From: Sean Christopherson To: Thomas Gleixner Cc: Paolo Bonzini , Xiaoyao Li , Fenghua Yu , Ingo Molnar , Borislav Petkov , H Peter Anvin , Peter Zijlstra , Andrew Morton , Dave Hansen , Radim Krcmar , Ashok Raj , Tony Luck , Dan Williams , Sai Praneeth Prakhya , Ravi V Shankar , linux-kernel , x86 , kvm@vger.kernel.org Subject: Re: [RFD] x86/split_lock: Request to Intel Message-ID: <20191017172312.GC20903@linux.intel.com> References: <20190925180931.GG31852@linux.intel.com> <3ec328dc-2763-9da5-28d6-e28970262c58@redhat.com> <57f40083-9063-5d41-f06d-fa1ae4c78ec6@redhat.com> <8808c9ac-0906-5eec-a31f-27cbec778f9c@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote: > The more I look at this trainwreck, the less interested I am in merging any > of this at all. > > The fact that it took Intel more than a year to figure out that the MSR is > per core and not per thread is yet another proof that this industry just > works by pure chance. > > There is a simple way out of this misery: > > Intel issues a microcode update which does: > > 1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to > AND logic, i.e. when one thread disables AC it's automatically > disabled on the core. > > Alternatively it supresses the #AC when the current thread has it > disabled. > > 2) Provide a separate bit which indicates that the AC enable logic is > actually AND based or that #AC is supressed when the current thread > has it disabled. > > Which way I don't really care as long as it makes sense. The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on one CPU are immediately visible on its sibling CPU. It doesn't magically solve the problem, but I don't think we need IPIs to coordinate between siblings, e.g. wouldn't something like this work? The per-cpu things being pointers that are shared by siblings. void split_lock_disable(void) { spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock); spin_lock(ac_lock); if (this_cpu_inc_return(*split_lock_ac_disabled) == 1) WRMSR(RDMSR() & ~bit); spin_unlock(ac_lock); } void split_lock_enable(void) { spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock); spin_lock(ac_lock); if (this_cpu_dec_return(*split_lock_ac_disabled) == 0) WRMSR(RDMSR() | bit); spin_unlock(ac_lock); } To avoid the spin_lock and WRMSR latency on every VM-Enter and VM-Exit, actions (3a) and (4a) from your matrix (copied below) could be changed to only do split_lock_disable() if the guest actually generates an #AC, and then do split_lock_enable() on the next VM-Exit. Assuming even legacy guests are somewhat sane and rarely do split-locks, lazily disabling the control would eliminate most of the overhead and would also reduce the time that the sibling CPU is running in the host without #AC protection. N | #AC | #AC enabled | SMT | Ctrl | Guest | Action R | available | on host | | exposed | #AC | --|-----------|-------------|-----|---------|-------|--------------------- | | | | | | 0 | N | x | x | N | x | None | | | | | | 1 | Y | N | x | N | x | None | | | | | | 2 | Y | Y | x | Y | Y | Forward to guest | | | | | | 3 | Y | Y | N | Y | N | A) Store in vCPU and | | | | | | toggle on VMENTER/EXIT | | | | | | | | | | | | B) SIGBUS or KVM exit code | | | | | | 4 | Y | Y | Y | Y | N | A) Disable globally on | | | | | | host. Store in vCPU/guest | | | | | | state and evtl. reenable | | | | | | when guest goes away. | | | | | | | | | | | | B) SIGBUS or KVM exit code > If that's not going to happen, then we just bury the whole thing and put it > on hold until a sane implementation of that functionality surfaces in > silicon some day in the not so foreseeable future. > > Seriously, this makes only sense when it's by default enabled and not > rendered useless by VIRT. Otherwise we never get any reports and none of > the issues are going to be fixed. > > Thanks, > > tglx