From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFEB0CA9EC8 for ; Thu, 17 Oct 2019 23:38:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C979021D7C for ; Thu, 17 Oct 2019 23:38:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2441776AbfJQXiZ (ORCPT ); Thu, 17 Oct 2019 19:38:25 -0400 Received: from mga14.intel.com ([192.55.52.115]:32452 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438560AbfJQXiZ (ORCPT ); Thu, 17 Oct 2019 19:38:25 -0400 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2019 16:38:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,309,1566889200"; d="scan'208";a="221564530" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.41]) by fmsmga004.fm.intel.com with ESMTP; 17 Oct 2019 16:38:24 -0700 Date: Thu, 17 Oct 2019 16:38:24 -0700 From: Sean Christopherson To: Thomas Gleixner Cc: Paolo Bonzini , Xiaoyao Li , Fenghua Yu , Ingo Molnar , Borislav Petkov , H Peter Anvin , Peter Zijlstra , Andrew Morton , Dave Hansen , Radim Krcmar , Ashok Raj , Tony Luck , Dan Williams , Sai Praneeth Prakhya , Ravi V Shankar , linux-kernel , x86 , kvm@vger.kernel.org Subject: Re: [RFD] x86/split_lock: Request to Intel Message-ID: <20191017233824.GA23654@linux.intel.com> References: <57f40083-9063-5d41-f06d-fa1ae4c78ec6@redhat.com> <8808c9ac-0906-5eec-a31f-27cbec778f9c@intel.com> <20191017172312.GC20903@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 17, 2019 at 11:31:15PM +0200, Thomas Gleixner wrote: > On Thu, 17 Oct 2019, Sean Christopherson wrote: > > On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote: > > > The more I look at this trainwreck, the less interested I am in merging any > > > of this at all. > > > > > > The fact that it took Intel more than a year to figure out that the MSR is > > > per core and not per thread is yet another proof that this industry just > > > works by pure chance. > > > > > > There is a simple way out of this misery: > > > > > > Intel issues a microcode update which does: > > > > > > 1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to > > > AND logic, i.e. when one thread disables AC it's automatically > > > disabled on the core. > > > > > > Alternatively it supresses the #AC when the current thread has it > > > disabled. > > > > > > 2) Provide a separate bit which indicates that the AC enable logic is > > > actually AND based or that #AC is supressed when the current thread > > > has it disabled. > > > > > > Which way I don't really care as long as it makes sense. > > > > The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on > > one CPU are immediately visible on its sibling CPU. > > That's less horrible than I read out of your initial explanation. > > Thankfully all of this is meticulously documented in the SDM ... Preaching to the choir on this one... > Though it changes the picture radically. The truly shared MSR allows > regular software synchronization without IPIs and without an insane amount > of corner case handling. > > So as you pointed out we need a per core state, which is influenced by: > > 1) The global enablement switch > > 2) Host induced #AC > > 3) Guest induced #AC > > A) Guest has #AC handling > > B) Guest has no #AC handling > > #1: > > - OFF: #AC is globally disabled > > - ON: #AC is globally enabled > > - FORCE: same as ON but #AC is enforced on guests > > #2: > > If the host triggers an #AC then the #AC has to be force disabled on the > affected core independent of the state of #1. Nothing we can do about > that and once the initial wave of #AC issues is fixed this should not > happen on production systems. That disables #3 even for the #3.A case > for simplicity sake. > > #3: > > A) Guest has #AC handling > > #AC is forwarded to the guest. No further action required aside of > accounting > > B) Guest has no #AC handling > > If #AC triggers the resulting action depends on the state of #1: > > - FORCE: Guest is killed with SIGBUS or whatever the virt crowd > thinks is the appropriate solution > - ON: #AC triggered state is recorded per vCPU and the MSR is > toggled on VMENTER/VMEXIT in software from that point on. > > So the only interesting case is #3.B and #1.state == ON. There you need > serialization of the state and the MSR write between the cores, but only > when the vCPU triggered an #AC. Until then, nothing to do. And "vCPU triggered an #AC" should include an explicit check in KVM's emulator. > vmenter() > { > if (vcpu->ac_disable) > this_core_disable_ac(); > } > > vmexit() > { > if (vcpu->ac_disable) { > this_core_enable_ac(); > } > > this_core_dis/enable_ac() takes the global state into account and has the > necessary serialization in place. Overall, looks good to me. Although Tony's mail makes it obvious we need to sync internally...