From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 22 Jan 2019 17:56:18 -0000 Received: from mga11.intel.com ([192.55.52.93]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gm0HZ-0003s5-KJ for speck@linutronix.de; Tue, 22 Jan 2019 18:56:18 +0100 Date: Tue, 22 Jan 2019 09:56:14 -0800 From: Andi Kleen Subject: [MODERATED] Re: [PATCH v5 22/27] MDSv5 24 Message-ID: <20190122175614.GV6118@tassilo.jf.intel.com> References: <5fc3209d2880402d332ec93cf076467b3706a401.1547858934.git.ak@linux.intel.com> <20190122012233.GS6118@tassilo.jf.intel.com> MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: speck@linutronix.de List-ID: On Tue, Jan 22, 2019 at 05:09:05PM +0100, speck for Thomas Gleixner wrote: > On Mon, 21 Jan 2019, speck for Andi Kleen wrote: > > On Tue, Jan 22, 2019 at 10:24:46AM +1300, speck for Linus Torvalds wrote: > > > I think this is crazy. > > > > > > We're marking things as "clear cpu state" for when we touch data that > > > WAS VISIBLE ON THE NETWORK! > > > > Well there's loopback too and it should be encrypted, but yes. > > > > There could be still a reasonable expectation that different users > > of the network are isolated. > > > > We could drop it, but I fear it would encourage more use of mds=full. > > Well, looking at where you slap the conditionals into the code (timers, > hrtimers, interrupts, tasklets ...) and all of the things are by default > marked unsafe then I don't see how that's different from mds=full. At least in my limited testing the patch doesn't cause that actually, even though it may be counterintuitive. See the numbers for Chrome for example in the last EBPF patch. That's a complex workload with many context switches, and it gets a clear roughly every third syscall We also see similar results in the benchmarks. For example loopback apache has practically no overhead because everything interesting happens in process context. I think the reason is that that most timers/tasklets/etc. are actually fairly rare and don't really matter. I suspect the same is true for most interrupt handlers too. Every driver that really cares about performance for bandwidth already has some form of interrupt mitigation or polling to limit interrupt overhead, and just adding a few clears doesn't really matter. So this only leaves some latency sensitive workloads, which cannot mitigate interrupts, but the interrupt handlers for those can be fixed over time based on profiling results. Overall I suspect it will be only a small subset of the total number of drivers. Of course that really needs to be validated with more testing. > The only sane way IMO is to have mds=cond just handle context switching and > then have proper It's only sane if you have a good way to find and maintain all these places. So far nobody has proposed a scalable way to do that. I personally don't know how to do it. -Andi