From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ak@linux.intel.com>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 11 Dec
  2018 00:03:07 -0000
Received: from mga12.intel.com ([192.55.52.136])
	by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <ak@linux.intel.com>)
	id 1gWVVy-0007nb-4t
	for speck@linutronix.de; Tue, 11 Dec 2018 01:03:06 +0100
Date: Mon, 10 Dec 2018 16:03:03 -0800
From: Andi Kleen <ak@linux.intel.com>
Subject: [MODERATED] Re: [PATCH v2 3/8] MDSv2 5
Message-ID: <20181211000303.GB16024@tassilo.jf.intel.com>
References: <cover.1544464266.git.ak@linux.intel.com>
 <0d6a3fbe4c511152a0f5350e62e9e09ec545f709.1544464266.git.ak@linux.intel.com>
 <CAHk-=wg8+Dkp0abqFa6TZAWJq-KT9sgSr=XZuw7w96fmufuoyA@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: 
 <CAHk-=wg8+Dkp0abqFa6TZAWJq-KT9sgSr=XZuw7w96fmufuoyA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
To: speck@linutronix.de
List-ID: <speck.linutronix.de>


Hi Linus,

> So you have a non-inline function that
> 
>  - has a test to return early
> 
>  - does a *single* instruction
> 
>  - the instruction gets nop'ed out if it's not valid
> 
> all of which just *screams* "that's wrong" to me. Why isn't it inline
> and a static branch?

I tried inline originally, but it caused bad include loops with alternative.h
and mwait.h.

It could be done, but would need some restructuring in the includes.

I don't think it's worth it because going into idle 
isn't really that performance sensitive. Also if the
side effect is there its overhead really swamps any 
call overhead.

alternative already handles the noping, so I don't think we would
need a static branch, unless you want to toggle it at runtime?

> 
> The actual asm sequence also seems bad to me.
> 
> Why isn't that just using
> 
>         int val = __KERNEL_DS;
> 
> with the asm just doing
> 
>         "verw %[kernelds]" ..  [kernelds] "m" (val)
> 
> instead, letting gcc do the stack setup part.

It has to be the memory form, only that has the necessary side effect
in the microcode.

> 
> Maybe none of this matters simply because it's in the idle case, but
> the exact same issues happen for the kernel exit case.

kernel exit doesn't use the C code, it's custom assembler.

The only case where it may matter is KVM entry, but that is
only used with the software sequence, and that is quite
expensive anyways.

> 
> And no, we're not doing that insane kernel exit software case by
> default. Not without a whole lot of examples of horrid badness.
> 
> The data that system calls touch is basically already user data. Sure,
> there may be kernel pointers etc there, but we've already accepted
> those leaking locally for all the usual reasons that we have no
> control over.

It's encryption keys etc. too. But yes.

Another case is interrupt data -- for example the TCP stack
sometimes copies packet data.  Ok you could argue they 
should encrypt on the wire, but it still seems bogus 
to leak it.

Then if there are drivers which copy data in interrupts
they may also leak, but presumably that's rare these
days.

> 
> So the whole "let's add crazy long sequences to every kernel exit" is
> not going to happen. Not without a lot more explanations.

So you would prefer to only flush on context switch, and perhaps
for kernel crypto code and perhaps TCP stack if it copied anything?

One issue here is that to be really sure there no such other
cases it would need some way to audit all the code. I'm not 
sure there's a practical way to do such an audit.

FWIW from our tests so far the performance loss from the kernel exit overhead
doesn't seem to be that bad.

-Andi