From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <borntraeger@de.ibm.com>
X-Google-Smtp-Source: AH8x227kf6cm+NpQ/RnkxIuAKM8KbaL2HPw+nNhBUlsPD7FKe2yZthMpwZE+WZib6lyGgD82b0Ph
ARC-Seal: i=1; a=rsa-sha256; t=1517323941; cv=none;
        d=google.com; s=arc-20160816;
        b=fmXgXawR39jivoJp2dsvB+SKv3uHihl+inpKXPfdfKSgu3o/Aq5zOD7/vdTvDoza2w
         u65TjoAgrUEDW5cQCM3RXa1OPrDpUGOpkIFFtxUQGT5k6fK5OvLIBXqonaLOfoaMiCrx
         i2B1bPL1G4s09QNxFEdiz0iR6N153BWzyOl7q48Dpz1VjvAuHN94YffcX11tbT3eGzUr
         L/SE911JH4qqxZw2Da3C4XpYCZeyReC82sXR5N76+OzVrte4CpqkaQ5/t6MGd6Lie/Xp
         s3c3suh35Fj7nJ3f3BG4BXwSAzx2zOWOEH2+dCevs8ezfqm8M5x8poaCLVZ4Rz5ERgeD
         lSXQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
        h=message-id:content-transfer-encoding:content-language:in-reply-to
         :mime-version:user-agent:date:from:references:cc:to:subject
         :arc-authentication-results;
        bh=x+8RJBnQGX4i4Kr7PtVvILm8vJj7hAwt+6R1zTLv9nQ=;
        b=Fdpouo1Qbt4E5NljyasvF+f23FyyKx6RFZuESJeiILRAFBHbuxKVKc55s6u6KHHf5X
         Ba3h3jR/eEEuoob2AEBZVFsyAnc6xLB601WcYdi8DrGarrNZpqyiYAu3Rpf4VG+06yuh
         l9nKu3z+PdL0W0KIbt7/jbuaDLLF2ZGx0aDiHH4Jr0t8Sw9iA6QM9gqQ2MUAFOIAhu1x
         pE9hE4Vc1Z5aw2Nq2ph0NNB0Mf49FpcM45KaVkI1cqD4dPkFXfNx1JpPT07gC5zH2nx8
         DpJ2oOD52phFqzd1Bz9kvWNdEWB2G2yLPZ6Mdbu2AxAxGoWVisfyJPqOo793tdvhFIxW
         /BaA==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of borntraeger@de.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=borntraeger@de.ibm.com;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of borntraeger@de.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=borntraeger@de.ibm.com;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com
Subject: Re: [RFC,05/10] x86/speculation: Add basic IBRS support
 infrastructure
To: Christophe de Dinechin <christophe.de.dinechin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        David Woodhouse <dwmw2@infradead.org>,
        Arjan van de Ven <arjan@linux.intel.com>,
        Eduardo Habkost <ehabkost@redhat.com>,
        KarimAllah Ahmed
 <karahmed@amazon.de>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Andi Kleen <ak@linux.intel.com>,
        Andrea Arcangeli <aarcange@redhat.com>,
        Andy Lutomirski <luto@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
        Asit Mallick <asit.k.mallick@intel.com>, Borislav Petkov <bp@suse.de>,
        Dan Williams <dan.j.williams@intel.com>,
        Dave Hansen
 <dave.hansen@intel.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>,
        Joerg Roedel <joro@8bytes.org>, Jun Nakajima <jun.nakajima@intel.com>,
        Laura Abbott <labbott@redhat.com>,
        Masami Hiramatsu <mhiramat@kernel.org>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Tim Chen <tim.c.chen@linux.intel.com>,
        Tom Lendacky <thomas.lendacky@amd.com>, KVM list <kvm@vger.kernel.org>,
        the arch/x86 maintainers <x86@kernel.org>,
        "Dr. David Alan Gilbert" <dgilbert@redhat.com>
References: <1516476182-5153-6-git-send-email-karahmed@amazon.de>
 <20180129201404.GA1588@localhost.localdomain>
 <1517257022.18619.30.camel@infradead.org>
 <20180129204256.GV25150@localhost.localdomain>
 <31415b7f-9c76-c102-86cd-6bf4e23e3aee@linux.intel.com>
 <1517259759.18619.38.camel@infradead.org>
 <CA+55aFxBh3LvsQq9wy313NQCn3iu+yuAmsi0zNVxWmGDUUds-A@mail.gmail.com>
 <56a33b36-5568-5d6e-a858-3b22ea335bcb@de.ibm.com>
 <F3EF5343-C86F-4242-B254-20EF26BAB96C@dinechin.org>
From: Christian Borntraeger <borntraeger@de.ibm.com>
Date: Tue, 30 Jan 2018 15:52:09 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.2
MIME-Version: 1.0
In-Reply-To: <F3EF5343-C86F-4242-B254-20EF26BAB96C@dinechin.org>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-TM-AS-GCONF: 00
x-cbid: 18013014-0012-0000-0000-000005A97158
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 18013014-0013-0000-0000-000019250F48
Message-Id: <6a2713b1-74e7-53db-527d-d77cc4394f61@de.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-30_07:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000
 definitions=main-1801300187
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: =?utf-8?q?1590140581449802182?=
X-GMAIL-MSGID: =?utf-8?q?1591029469554697232?=
X-Mailing-List: linux-kernel@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>


On 01/30/2018 03:46 PM, Christophe de Dinechin wrote:
> 
> 
>> On 30 Jan 2018, at 13:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>
>>
>>
>> On 01/30/2018 01:23 AM, Linus Torvalds wrote:
>> [...]
>>>
>>> So I actually have a _different_ question to the virtualization
>>> people. This includes the vmware people, but it also obviously
>>> incldues the Amazon AWS kind of usage.
>>>
>>> When you're a hypervisor (whether vmware or Amazon), why do you even
>>> end up caring about these things so much? You're protected from
>>> meltdown thanks to the virtual environment already having separate
>>> page tables.  And the "big hammer" approach to spectre would seem to
>>> be to just make sure the BTB and RSB are flushed at vmexit time - and
>>> even then you might decide that you really want to just move it to
>>> vmenter time, and only do it if the VM has changed since last time
>>> (per CPU).
>>>
>>> Why do you even _care_ about the guest, and how it acts wrt Skylake?
>>> What you should care about is not so much the guests (which do their
>>> own thing) but protect guests from each other, no?
>>>
>>> So I'm a bit mystified by some of this discussion within the context
>>> of virtual machines. I think that is separate from any measures that
>>> the guest machine may then decide to partake in.
>>>
>>> If you are ever going to migrate to Skylake, I think you should just
>>> always tell the guests that you're running on Skylake. That way the
>>> guests will always assume the worst case situation wrt Specte.
>>>
>>> Maybe that mystification comes from me missing something.
>>
>> I can only speak for KVM, but I think the hypervisor issues come from
>> the fact that for migration purposes the hypervisor "lies" to the guest
>> in regard to what kind of CPU is running.  (it has to lie, see below).
>>
>> This is to avoid random guest crashes by not announcing features. For
>> example if you want to migrate forth and back between a system that
>> has AVX512 and another one that has not you must tell the guest that
>> AVX512 is not available - even if it runs on the capable system.
>>
>> To protect against new features the hypervisor only announces features
>> that it understands.
>> So you essentially start a VM in QEMU of a given CPU type that is
>> constructed of a base cpu type plus extra features. Before migration, 
>> it is checked if  he target system can run a guest of given type - 
>> otherwise migration is rejected. 
>>
>> The management stack also knows things like baselining - basically
>> creating the best possible guest CPU given a set of hosts.
>>
>> The problem now is: If you have lets say Broadwell and Skylakes.
>> What kind of CPU type are you telling your guest? If you claim
>> broadwell but run on skylake then you prevent that the guest can 
>> protect itself, because the guest does not know that it should do 
>> something special. If you say skylake the guest might start using
>> features that broadwell does not understand.
> 
> I believe that Linus’ question was whether it makes sense to defer
> the entirety of the protection to the host kernel, although I was a bit
> confused by his suggestion to always assume Skylake.
> 
> In other words, is it safe enough to rely on the host kernel countermeasure
> to protect guest kernels and their applications? In which case having
> the guest believe it runs on Broadwell would not be that problematic.
> 
> Aren’t there enough vmexits on the guest kernel context switch
> to enforce protection on its behalf? Even if it’s
> 
> a) some old kernel that without mitigation code
> 
> or
> 
> b) some new kernel that thinks it runs on an old CPU and disabled mitigation
> 
I think it is not safe to just protect the host. CPU bound workload in the guest
will switch a lot between guest user and guest kernel without triggering an
exit.