From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75780FA372A for ; Thu, 17 Oct 2019 21:31:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 521C821D7C for ; Thu, 17 Oct 2019 21:31:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2441541AbfJQVbn (ORCPT ); Thu, 17 Oct 2019 17:31:43 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:54518 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2441534AbfJQVbm (ORCPT ); Thu, 17 Oct 2019 17:31:42 -0400 Received: from p5b06da22.dip0.t-ipconnect.de ([91.6.218.34] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1iLDMn-0007MJ-68; Thu, 17 Oct 2019 23:31:29 +0200 Date: Thu, 17 Oct 2019 23:31:15 +0200 (CEST) From: Thomas Gleixner To: Sean Christopherson cc: Paolo Bonzini , Xiaoyao Li , Fenghua Yu , Ingo Molnar , Borislav Petkov , H Peter Anvin , Peter Zijlstra , Andrew Morton , Dave Hansen , Radim Krcmar , Ashok Raj , Tony Luck , Dan Williams , Sai Praneeth Prakhya , Ravi V Shankar , linux-kernel , x86 , kvm@vger.kernel.org Subject: Re: [RFD] x86/split_lock: Request to Intel In-Reply-To: <20191017172312.GC20903@linux.intel.com> Message-ID: References: <20190925180931.GG31852@linux.intel.com> <3ec328dc-2763-9da5-28d6-e28970262c58@redhat.com> <57f40083-9063-5d41-f06d-fa1ae4c78ec6@redhat.com> <8808c9ac-0906-5eec-a31f-27cbec778f9c@intel.com> <20191017172312.GC20903@linux.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, 17 Oct 2019, Sean Christopherson wrote: > On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote: > > The more I look at this trainwreck, the less interested I am in merging any > > of this at all. > > > > The fact that it took Intel more than a year to figure out that the MSR is > > per core and not per thread is yet another proof that this industry just > > works by pure chance. > > > > There is a simple way out of this misery: > > > > Intel issues a microcode update which does: > > > > 1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to > > AND logic, i.e. when one thread disables AC it's automatically > > disabled on the core. > > > > Alternatively it supresses the #AC when the current thread has it > > disabled. > > > > 2) Provide a separate bit which indicates that the AC enable logic is > > actually AND based or that #AC is supressed when the current thread > > has it disabled. > > > > Which way I don't really care as long as it makes sense. > > The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on > one CPU are immediately visible on its sibling CPU. That's less horrible than I read out of your initial explanation. Thankfully all of this is meticulously documented in the SDM ... Though it changes the picture radically. The truly shared MSR allows regular software synchronization without IPIs and without an insane amount of corner case handling. So as you pointed out we need a per core state, which is influenced by: 1) The global enablement switch 2) Host induced #AC 3) Guest induced #AC A) Guest has #AC handling B) Guest has no #AC handling #1: - OFF: #AC is globally disabled - ON: #AC is globally enabled - FORCE: same as ON but #AC is enforced on guests #2: If the host triggers an #AC then the #AC has to be force disabled on the affected core independent of the state of #1. Nothing we can do about that and once the initial wave of #AC issues is fixed this should not happen on production systems. That disables #3 even for the #3.A case for simplicity sake. #3: A) Guest has #AC handling #AC is forwarded to the guest. No further action required aside of accounting B) Guest has no #AC handling If #AC triggers the resulting action depends on the state of #1: - FORCE: Guest is killed with SIGBUS or whatever the virt crowd thinks is the appropriate solution - ON: #AC triggered state is recorded per vCPU and the MSR is toggled on VMENTER/VMEXIT in software from that point on. So the only interesting case is #3.B and #1.state == ON. There you need serialization of the state and the MSR write between the cores, but only when the vCPU triggered an #AC. Until then, nothing to do. vmenter() { if (vcpu->ac_disable) this_core_disable_ac(); } vmexit() { if (vcpu->ac_disable) { this_core_enable_ac(); } this_core_dis/enable_ac() takes the global state into account and has the necessary serialization in place. Thanks, tglx