Hi Julien, -----Original Message----- From: Julien Grall Sent: Tuesday, December 13, 2022 3:55 PM To: Smith, Jackson > > On 13/12/2022 19:48, Smith, Jackson wrote: > > Hi Xen Developers, > > Hi Jackson, > > Thanks for sharing the prototype with the community. Some > questions/remarks below. > > > My team at Riverside Research is currently spending IRAD funding to > > prototype next-generation secure hypervisor design ideas on Xen. In > > particular, we are prototyping the idea of Virtual Memory Fuses for > > Software Enclaves, as described in this paper: > > https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note > that that > > paper talks about OS/Process while we have implemented the idea > for > > Hypervisor/VM. > > > > Our goal is to emulate something akin to Intel SGX or AMD SEV, but > > using only existing virtual memory features common in all processors. > > The basic idea is not to map guest memory into the hypervisor so > that > > a compromised hypervisor cannot compromise (e.g. read/write) the > > guest. This idea has been proposed before, however, Virtual Memory > > Fuses go one step further; they delete the hypervisor's mappings to > > its own page tables, essentially locking the virtual memory > > configuration for the lifetime of the system. This creates what we > > call "Software Enclaves", ensuring that an adversary with arbitrary > > code execution in the hypervisor STILL cannot read/write guest > memory. > > I am confused, if the attacker is able to execute arbitrary code, then > what prevent them to write code to map/unmap the page? > > Skimming through the paper (pages 5-6), it looks like you would need > to implement extra defense in Xen to be able to prevent map/unmap a > page. > The key piece is deleting all virtual mappings to Xen's page table structures. From the paper (4.4.1 last paragraph), "Because all memory accesses operate through the MMU, even page table memory needs corresponding page table entries in order to be written to." Without a virtual mapping to the page table, no code can modify the page table because it cannot read or write the table. Therefore the mappings to the guest cannot be restored even with arbitrary code execution. > > > > With this technique, we protect the integrity and confidentiality of > > guest memory. However, a compromised hypervisor can still > read/write > > register state during traps, or refuse to schedule a guest, denying > > service. We also recognize that because this technique precludes > > modifying Xen's page tables after startup, it may not be compatible > > with all of Xen's potential use cases. On the other hand, there are > > some uses cases (in particular statically defined embedded systems) > > where our technique could be adopted with minimal friction. > > From what you wrote, this sounds very much like the project Citrix and > Amazon worked on called "Secret-free hypervisor" with a twist. In your > case, you want to prevent the hypervisor to map/unmap the guest > memory. > > You can find some details in [1]. The code is x86 only, but I don't see > any major blocker to port it on arm64. > Yes, we are familiar with the "secret-free hypervisor" work. As you point out, both our work and the secret-free hypervisor remove the directmap region to mitigate the risk of leaking sensitive guest secrets. However, our work is slightly different because it additionally prevents attackers from tricking Xen into remapping a guest. We see our goals and the secret-free hypervisor goals as orthogonal. While the secret-free hypervisor views guests as untrusted and wants to keep compromised guests from leaking secrets, our work comes from the perspective of an individual guest trying to protect its secrets from the rest of the stack. So it wouldn't be unreasonable to say "I want a hypervisor that is 'secret-free' and implements VMF". We see them as different techniques with overlapping implementations. > > > > With this in mind our goal is to work with the Xen community to > > upstream this work as an optional feature. At this point, we have a > > prototype implementation of VMF on Xen (the contents of this RFC > patch > > series) that supports dom0less guests on arm 64. By sharing our > > prototype, we hope to socialize our idea, gauge interest, and > > hopefully gain useful feedback as we work toward upstreaming. > > > > ** IMPLEMENTATION ** > > In our current setup we have a static configuration with dom0 and > one > > or two domUs. Soon after boot, Dom0 issues a hypercall through the > > xenctrl interface to blow the fuse for the domU. In the future, we > > could also add code to support blowing the fuse automatically on > > startup, before any domains are un-paused. > > > > Our Xen/arm64 prototype creates Software Enclaves in two steps, > > represented by these two functions defined in xen/vmf.h: > > void vmf_unmap_guest(struct domain *d); void > > vmf_lock_xen_pgtables(void); > > > > In the first, the Xen removes mappings to the guest(s) On arm64, Xen > > keeps a reference to all of guest memory in the directmap. Right now, > > we simply walk all of the guest second stage tables and remove them > > from the directmap, although there is probably a more elegant > method > > for this. > > IIUC, you first map all the RAM and then remove the pages. What you > could do instead is to map only the memory required for Xen use. The > rest would be left unmapped. > > This would be similar to what we are doing on arm32. We have a split > heap. Only the xenheap is mapped. The pages from the domheap will > be mapped ondemand. Yes, I think that would work. Xen can temporarily map guest memory in the domheap when loading guests. When the system finishes booting, we can prevent the hypervisor from mapping pages by unmaping the domheap root tables. We could start by adding an option to enable split xenheap on arm64. > Another approach, would be to have a single heap where pages used > by Xen are mapped in the page-tables when allocated (this is what > secret-free hypervisor is doing is). > > If you don't map to keep the page-tables around, then it sounds like > you want the first approach. > > > > > Second, the Xen removes mappings to its own page tables. > > On arm64, this also involves manipulating the directmap. One > challenge > > here is that as we start to unmap our tables from the directmap, we > > can't use the directmap to walk them. Our solution here is also bit > > less elegant, we temporarily insert a recursive mapping and use that > > to remove page table entries. > > See above. Using the split xenheap approach means we don't have to worry about unmapping guest pagetables or xen's dynamically allocated tables. We still need to unmap the handful of static pagetables that are declared at the top of xen/arch/arm/mm.c. Remember our goal is to prevent Xen from reading or writing its own page tables. We can't just unmap these static tables without shattering because they end up part of the superpages that map the xen binary. We're probably only shattering a single superpage for this right now. Maybe we can move the static tables to a superpage aligned region of the binary and pad that region so we can unmap an entire superpage without shattering? In the future we might adjust the boot code to avoid the dependency on static page table locations. > > > > > ** LIMITATIONS and other closing thoughts ** The current Xen code > has > > obviously been implemented under the assumption that new pages > can be > > mapped, and that guest virtual addresses can be read, so this > > technique will break some Xen features. However, in the general case > > Can you clarify your definition of "general case"? From my PoV, it is a > lot more common to have guest with PV emulated device rather than > with device attached. So it will be mandatory to access part of the > memory (e.g. grant table). Yes "general case" may have been poor wording on my part. I wanted to say that configurations exist that do not require reading guest memory, not that this was the most common (or even a common) case. > > > (in particular for static > > workloads where the number of guest's is not changed after boot) > > That very much depend on how you configure your guest. If they have > device assigned then possibly yes. Otherwise see above. Yes right now we are assuming only assigned devices, no PV or emulated ones. > > > Finally, our initial testing suggests that Xen never reads guest > > memory (in a static, non-dom0-enchanced configuration), but have > not > > really explored this thoroughly. > > We know at least these things work: > > Dom0less virtual serial terminal > > Domain scheduling > > We are aware that these things currently depend on accessible guest > > memory: > > Some hypercalls take guest pointers as arguments > > There are not many hypercalls that don't take guest pointers. > > > Virtualized MMIO on arm needs to decode certain load/store > > instructions > > On Arm, this can be avoided of the guest OS is not using such > instruction. In fact they were only added to cater "broken" guest OS. > What do you mean by "broken" guests? I see in the arm ARM where it discusses interpreting the syndrome register. But I'm not understanding which instructions populate the syndrome register and which do not. Why are guests using instructions that don't populate the syndrome register considered "broken"? Is there somewhere I can look to learn more? > Also, this will probably be a lot more difficult on x86 as, AFAIK, there > is > no instruction syndrome. So you will need to decode the instruction in > order to emulate the access. > > > > > It's likely that other Xen features require guest memory access. > > For Arm, guest memory access is also needed when using the GICv3 ITS > and/or second-level SMMU (still in RFC). > Thanks for pointing this out. We will be sure to make note of these limitations going forward. > > For x86, if you don't want to access the guest memory, then you may > need to restrict to PVH as for HVM we need to emulate some devices in > QEMU. > That said, I am not sure PVH is even feasible. > Is that mostly in reference to the need decode instructions on x86 or are there other reasons why you feel it might not be feasible to apply this to Xen on x86? Thanks for taking the time to consider our work. I think our next step is to rethink the implementation in terms of the split xenheap design and try to avoid the need for superpage shattering, so I'll work on that before pushing the idea further. Thanks, Jackson