On Tue, 2020-04-14 at 16:21 +0200, Peter Zijlstra wrote: > On Wed, Mar 04, 2020 at 04:59:50PM +0000, vpillai wrote: > > > > - Investigate the source of the overhead even when no tasks are > > tagged: > > https://lkml.org/lkml/2019/10/29/242 > > - explain why we're all still doing this .... > > Seriously, what actual problems does it solve? The patch-set still > isn't > L1TF complete and afaict it does exactly nothing for MDS. > Hey Peter! Late to the party, I know... But I'm replying anyway. At least, you'll have the chance to yell at me for this during OSPM. ;-P > Like I've written many times now, back when the world was simpler and > all we had to worry about was L1TF, core-scheduling made some sense, > but > how does it make sense today? > Indeed core-scheduling alone doesn't even completely solve L1TF. There are the interrupts and the VMEXITs issues. Both are being discussed in this thread and, FWIW, my personal opinion is that the way to go is what Alex says here: <79529592-5d60-2a41-fbb6-4a5f8279f998@amazon.com> (E.g., when he mentions solution 4 "Create a "safe" page table which runs with HT enabled", etc). But let's stick to your point: if it were only for L1TF, then fine, but it's all pointless because of MDS. My answer to this is very much focused on my usecase, which is virtualization. I know you hate us, and you surely have your good reasons, but you know... :-) Correct me if I'm wrong, but I think that the "nice" thing of L1TF is that it allows a VM to spy on another VM or on the host, but it does not allow a regular task to spy on another task or on the kernel (well, it would, but it's easily mitigated). The bad thing about MDS is that it instead allow *all* of that. Now, one thing that we absolutely want to avoid in virt is that a VM is able to spy on other VMs or on the host. Sure, we also care about tasks running in our VMs to be safe, but, really, inter-VM and VM-to-host isolation is the primary concern of an hypervisor. And how a VM (or stuff running inside a VM) can spy on another VM or on the host, via L1TF or MDS? Well, if the attacker VM and the victim VM --or if the attacker VM and the host-- are running on the same core. If they're not, it can't... which is basically an L1TF-only looking scenario. So, in virt, core-scheduling: 1) is the *only* way (aside from no-EPT) to prevent attacker VM to spy on victim VM, if they're running concurrently, both in guest mode, on the same core (and that's, of course, because with core-scheduling they just won't be doing that :-) ) 2) interrupts and VMEXITs needs being taken care of --which was the case already when, as you said "we had only L1TF". Once that is done we will effectively prevent all VM to VM and VM to host attack scenarios. Sure, it will still be possible, for instance, for task_A in VM1 to spy on task_B, also in VM1. This seems to be, AFAIUI, Joel's usecase, so I'm happy to leave it to him to defend that, as he's doing already (but indeed I'm very happy to see that it is also getting attention). Now, of course saying anything like "works for my own usecase so let's go for it" does not fly. But since you were asking whether and how this feature could make sense today, suppose that: 1) we get core-scheduling, 2) we find a solution for irqs and VMEXITs, as we would have to if there was only L1TF, 3) we manage to make the overhead of core-scheduling close to zero when it's there (I mean, enabled at compile time) but not used (I mean, no tagging of tasks, or whatever). That would mean that virt people can enable core-scheduling, and achieve good inter-VM and VM-to-host isolation, without imposing overhead to other use cases, that would leave core-scheduling disabled. And this is something that I would think it makes sense. Of course, we're not there... because even when this series will give us point 1, we will also need 2 and we need to make sure we also satisfy 3 (and we weren't, last time I checked ;-P). But I think it's worth keeping trying. I'd also add a couple of more ideas, still about core-scheduling in virt, but from a different standpoint than security: - if I tag vcpu0 and vcpu1 together[*], then vcpu2 and vcpu3 together, then vcpu4 and vcpu5 together, then I'm sure that each pair will always be scheduled on the same core. At which point I can define an SMT virtual topology, for the VM, that will make sense, even without pinning the vcpus; - if I run VMs from different customers, when vcpu2 of VM1 and vcpu1 of VM2 run on the same core, they influence each others' performance. If, e.g., I bill basing on time spent on CPUs, it means customer A's workload, running in VM1, may influence the billing of customer B, who owns VM2. With core scheduling, if I tag all the vcpus of each VM together, I won't have this any longer. [*] with "tag together" I mean let them have the same tag which, ATM would be "put them in the same cgroup and enable cpu.tag". Whether or not these make sense, e.g., performance wise, it's a bid hard to tell, with the feature not-yet finalized... But I've started doing some preliminary measurements already. Hopefully, they'll be ready by Monday. So that's it. I hope this gives you enough material to complain about during OSPM. At least, given the event is virtual, I won't get any microphone box (or, worse, frozen sharks!) thrown at me in anger! :-D Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <> (Raistlin Majere)