From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Google-Smtp-Source: AH8x226PAkUI9jk0ujUMmcw6Dhfsuzh7sr/r7c6EU2zAKHSCluWRQQYYIXmb4EA6xInuNv6xZeoP ARC-Seal: i=1; a=rsa-sha256; t=1516794648; cv=none; d=google.com; s=arc-20160816; b=OedlrCy5AUZrh9XK/zsJnn75rjC8veQd/7qjQ7i8LnoMwgGEEVDdaFzw4eFfFvq8l8 WoOOA9GIVgQBSp8YTSAr31n+K1lRI45//boUI6R/7eheCWCJeMKuJJ+EL6xJqSPjGDUz 8Ai4RsS/cTawI7YS+kxkiLUu1UUN+irV5f9XM6Nm5s3By51joFwW5mwq9te4tOBrvfhX uQEZLMUThkSofdim2ZHMNTd4vTJm2hEIXUgv9C3ftkZU/nXld6MtJCbOVsIx0P5V92+o ZLc5B6FTyOmr5n29z2lQScYiLvA1dpPeh3G9apDd2RmeIDLXSbivh6zFXA/gx2875A1e J9Gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=XqpQdJmdRJqrZZYp1eDGD47zqeRlXRb+xDMFtUho+5Y=; b=pZGJKS5bnBKGptIIw8F7zV7pzsiLQrZcbs/CnFMfZx3/lmrwJIuBOSfhstCKjAfRGS gH9qFrsSPUe2Cl0albB+9qOqt1NfgD7A3UODNWA4SXtdi1VLuJX8HK0VJGfoUE6WQ00N hl+04T9SBPD7NRcHs5pozUU5PFdMq3Tdcu18yA7U4/gfEY44dBPVgtqoBn5Jz5PG9oV+ BHLq07powG4vqLSO1Tp83tuIG7B0CR2JF3SHtsgaWXM5TzbG2BIyNybhowhBqvASrEwL 9lHqZeoxmrpn24PTltV3KowEuDNDcHrM+0Xr39EIEaQn+kKElkjML9kTJwTJ3B3SdSxy kJkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of rkrcmar@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=rkrcmar@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Authentication-Results: mx.google.com; spf=pass (google.com: domain of rkrcmar@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=rkrcmar@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Date: Wed, 24 Jan 2018 12:50:31 +0100 From: Radim =?utf-8?B?S3LEjW3DocWZ?= To: Martin Schwidefsky Cc: Christian Borntraeger , kvm@vger.kernel.org, Paolo Bonzini , linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org, Heiko Carstens , Cornelia Huck , David Hildenbrand , Greg Kroah-Hartman , Jon Masters , Marcus Meissner , Jiri Kosina Subject: Re: [PATCH 4/5] s390: define ISOLATE_BP to run tasks with modified branch prediction Message-ID: <20180124115030.GB655@flask> References: <1516712825-2917-1-git-send-email-schwidefsky@de.ibm.com> <1516712825-2917-5-git-send-email-schwidefsky@de.ibm.com> <20180123203223.GA648@flask> <20180124073605.494aceb8@mschwideX1> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180124073605.494aceb8@mschwideX1> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: =?utf-8?q?1590388681708887133?= X-GMAIL-MSGID: =?utf-8?q?1590474465612132962?= X-Mailing-List: linux-kernel@vger.kernel.org List-ID: 2018-01-24 07:36+0100, Martin Schwidefsky: > On Tue, 23 Jan 2018 21:32:24 +0100 > Radim Krčmář wrote: > > > 2018-01-23 15:21+0100, Christian Borntraeger: > > > Paolo, Radim, > > > > > > this patch not only allows to isolate a userspace process, it also allows us > > > to add a new interface for KVM that would allow us to isolate a KVM guest CPU > > > to no longer being able to inject branches in any host or other guests. (while > > > at the same time QEMU and host kernel can run with full power). > > > We just have to set the TIF bit TIF_ISOLATE_BP_GUEST for the thread that runs a > > > given CPU. This would certainly be an addon patch on top of this patch at a later > > > point in time. > > > > I think that the default should be secure, so userspace will be > > breaking the isolation instead of setting it up and having just one > > place to screw up would be better -- the prctl could decide which > > isolation mode to pick. > > The prctl is one direction only. Once a task is "secured" there is no way back. Good point, I was thinking of reversing the direction and having TIF_NOT_ISOLATE_BP_GUEST prctl, but allowing tasks to subvert security would be even worse. > If we start with a default of secure then *all* tasks will run with limited > branch prediction. Right, because all of them are untrusted. What is the performance impact of BP isolation? This design seems very fragile to me -- we're forcing userspace to care about some arcane hardware implementation and isolation in the system is broken if a task running malicious code doesn't do that for any reason. > > Maybe we can change the conditions and break logical connection between > > TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST, to make a separate KVM > > interface useful. > > The thinking here is that you use TIF_ISOLATE_BP to make use space secure, > but you need to close the loophole that you can use a KVM guest to get out of > the secured mode. That is why you need to run the guest with isolated BP if > TIF_ISOLATE_BP is set. But if you want to run qemu as always and only the > KVM guest with isolataed BP you need a second bit, thus TIF_ISOLATE_GUEST_BP. I understand, I was following the misguided idea where we have reversed logic and then use just TIF_NOT_ISOLATE_GUEST_BP for sie switches. > > > Do you think something similar would be useful for other architectures as well? > > > > It goes against my idea of virtualization, but there probably are users > > that don't care about isolation and still use virtual machines ... > > I expect most architectures to have a fairly similar resolution of > > branch prediction leaks, so the idea should be easily abstractable on > > all levels. (At least x86 is.) > > Yes. > > > > In that case we should try to come up with a cross-architecture interface to enable > > > that. > > > > Makes me think of a generic VM control "prefer performance over > > security", which would also take care of future problems and let arches > > decide what is worth the code. > > VM as in virtual machine or VM as in virtual memory? Virtual machine. (But could be anywhere really, especially the kernel/user split slowed applications down for too long already. :]) > > A main drawback is that this will introduce dynamic branches to the > > code, which are going to slow down the common case to speed up a niche. > > Where would you place these additional branches? I don't quite get the idea. The BP* macros contain a branch in them -- avoidable if we only had isolated virtual machines. Thanks.