Hi Julien,

-----Original Message-----
From: Julien Grall <julien@xen.org>
Sent: Tuesday, December 13, 2022 3:55 PM
To: Smith, Jackson <rsmith@RiversideResearch.org>
>
> On 13/12/2022 19:48, Smith, Jackson wrote:
> > Hi Xen Developers,
>
> Hi Jackson,
>
> Thanks for sharing the prototype with the community. Some
> questions/remarks below.
>
> > My team at Riverside Research is currently spending IRAD funding to
> > prototype next-generation secure hypervisor design ideas on Xen. In
> > particular, we are prototyping the idea of Virtual Memory Fuses for
> > Software Enclaves, as described in this paper:
> > https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note
> that that
> > paper talks about OS/Process while we have implemented the idea
> for
> > Hypervisor/VM.
> >
> > Our goal is to emulate something akin to Intel SGX or AMD SEV, but
> > using only existing virtual memory features common in all
processors.
> > The basic idea is not to map guest memory into the hypervisor so
> that
> > a compromised hypervisor cannot compromise (e.g. read/write) the
> > guest. This idea has been proposed before, however, Virtual Memory
> > Fuses go one step further; they delete the hypervisor's mappings to
> > its own page tables, essentially locking the virtual memory
> > configuration for the lifetime of the system. This creates what we
> > call "Software Enclaves", ensuring that an adversary with arbitrary
> > code execution in the hypervisor STILL cannot read/write guest
> memory.
>
> I am confused, if the attacker is able to execute arbitrary code, then
> what prevent them to write code to map/unmap the page?
>
> Skimming through the paper (pages 5-6), it looks like you would need
> to implement extra defense in Xen to be able to prevent map/unmap a
> page.
>

The key piece is deleting all virtual mappings to Xen's page table
structures. From the paper (4.4.1 last paragraph), "Because all memory
accesses operate through the MMU, even page table memory needs
corresponding page table entries in order to be written to." Without a
virtual mapping to the page table, no code can modify the page table
because it cannot read or write the table. Therefore the mappings to the
guest cannot be restored even with arbitrary code execution.

> >
> > With this technique, we protect the integrity and confidentiality of
> > guest memory. However, a compromised hypervisor can still
> read/write
> > register state during traps, or refuse to schedule a guest, denying
> > service. We also recognize that because this technique precludes
> > modifying Xen's page tables after startup, it may not be compatible
> > with all of Xen's potential use cases. On the other hand, there are
> > some uses cases (in particular statically defined embedded systems)
> > where our technique could be adopted with minimal friction.
>
>  From what you wrote, this sounds very much like the project Citrix
and
> Amazon worked on called "Secret-free hypervisor" with a twist. In your
> case, you want to prevent the hypervisor to map/unmap the guest
> memory.
>
> You can find some details in [1]. The code is x86 only, but I don't
see
> any major blocker to port it on arm64.
>

Yes, we are familiar with the "secret-free hypervisor" work. As you
point out, both our work and the secret-free hypervisor remove the
directmap region to mitigate the risk of leaking sensitive guest
secrets. However, our work is slightly different because it additionally
prevents attackers from tricking Xen into remapping a guest. 

We see our goals and the secret-free hypervisor goals as orthogonal.
While the secret-free hypervisor views guests as untrusted and wants to
keep compromised guests from leaking secrets, our work comes from the
perspective of an individual guest trying to protect its secrets from
the rest of the stack. So it wouldn't be unreasonable to say "I want a
hypervisor that is 'secret-free' and implements VMF". We see them as 
different techniques with overlapping implementations.

> >
> > With this in mind our goal is to work with the Xen community to
> > upstream this work as an optional feature. At this point, we have a
> > prototype implementation of VMF on Xen (the contents of this RFC
> patch
> > series) that supports dom0less guests on arm 64. By sharing our
> > prototype, we hope to socialize our idea, gauge interest, and
> > hopefully gain useful feedback as we work toward upstreaming.
> >
> > ** IMPLEMENTATION **
> > In our current setup we have a static configuration with dom0 and
> one
> > or two domUs. Soon after boot, Dom0 issues a hypercall through the
> > xenctrl interface to blow the fuse for the domU. In the future, we
> > could also add code to support blowing the fuse automatically on
> > startup, before any domains are un-paused.
> >
> > Our Xen/arm64 prototype creates Software Enclaves in two steps,
> > represented by these two functions defined in xen/vmf.h:
> > void vmf_unmap_guest(struct domain *d); void
> > vmf_lock_xen_pgtables(void);
> >
> > In the first, the Xen removes mappings to the guest(s) On arm64, Xen
> > keeps a reference to all of guest memory in the directmap. Right
now,
> > we simply walk all of the guest second stage tables and remove them
> > from the directmap, although there is probably a more elegant
> method
> > for this.
>
> IIUC, you first map all the RAM and then remove the pages. What you
> could do instead is to map only the memory required for Xen use. The
> rest would be left unmapped.
>
> This would be similar to what we are doing on arm32. We have a split
> heap. Only the xenheap is mapped. The pages from the domheap will
> be mapped ondemand.

Yes, I think that would work. Xen can temporarily map guest memory
in the domheap when loading guests. When the system finishes booting, we
can prevent the hypervisor from mapping pages by unmaping the domheap
root tables. We could start by adding an option to enable split xenheap
on arm64.

> Another approach, would be to have a single heap where pages used
> by Xen are mapped in the page-tables when allocated (this is what
> secret-free hypervisor is doing is).
>
> If you don't map to keep the page-tables around, then it sounds like
> you want the first approach.
>
> >
> > Second, the Xen removes mappings to its own page tables.
> > On arm64, this also involves manipulating the directmap. One
> challenge
> > here is that as we start to unmap our tables from the directmap, we
> > can't use the directmap to walk them. Our solution here is also bit
> > less elegant, we temporarily insert a recursive mapping and use that
> > to remove page table entries.
>
> See above.

Using the split xenheap approach means we don't have to worry about
unmapping guest pagetables or xen's dynamically allocated tables.

We still need to unmap the handful of static pagetables that are
declared at the top of xen/arch/arm/mm.c. Remember our goal is to
prevent Xen from reading or writing its own page tables. We can't just
unmap these static tables without shattering because they end up part of
the superpages that map the xen binary. We're probably only shattering a
single superpage for this right now. Maybe we can move the static tables
to a superpage aligned region of the binary and pad that region so we
can unmap an entire superpage without shattering? In the future we might
adjust the boot code to avoid the dependency on static page table
locations.

>
> >
> > ** LIMITATIONS and other closing thoughts ** The current Xen code
> has
> > obviously been implemented under the assumption that new pages
> can be
> > mapped, and that guest virtual addresses can be read, so this
> > technique will break some Xen features. However, in the general case
>
> Can you clarify your definition of "general case"? From my PoV, it is
a
> lot more common to have guest with PV emulated device rather than
> with device attached. So it will be mandatory to access part of the
> memory (e.g. grant table).

Yes "general case" may have been poor wording on my part. I wanted to
say that configurations exist that do not require reading guest memory,
not that this was the most common (or even a common) case.

>
> > (in particular for static
> > workloads where the number of guest's is not changed after boot)
>
> That very much depend on how you configure your guest. If they have
> device assigned then possibly yes. Otherwise see above.

Yes right now we are assuming only assigned devices, no PV or emulated
ones.

>
> > Finally, our initial testing suggests that Xen never reads guest
> > memory (in a static, non-dom0-enchanced configuration), but have
> not
> > really explored this thoroughly.
> > We know at least these things work:
> > 	Dom0less virtual serial terminal
> > 	Domain scheduling
> > We are aware that these things currently depend on accessible guest
> > memory:
> > 	Some hypercalls take guest pointers as arguments
>
> There are not many hypercalls that don't take guest pointers.
>
> > 	Virtualized MMIO on arm needs to decode certain load/store
> > 	instructions
>
> On Arm, this can be avoided of the guest OS is not using such
> instruction. In fact they were only added to cater "broken" guest OS.
>

What do you mean by "broken" guests?

I see in the arm ARM where it discusses interpreting the syndrome
register. But I'm not understanding which instructions populate the
syndrome register and which do not. Why are guests using instructions
that don't populate the syndrome register considered "broken"? Is there
somewhere I can look to learn more?

> Also, this will probably be a lot more difficult on x86 as, AFAIK,
there
> is
> no instruction syndrome. So you will need to decode the instruction in
> order to emulate the access.
>
> >
> > It's likely that other Xen features require guest memory access.
>
> For Arm, guest memory access is also needed when using the GICv3 ITS
> and/or second-level SMMU (still in RFC).
>

Thanks for pointing this out. We will be sure to make note of these
limitations going forward.

>
> For x86, if you don't want to access the guest memory, then you may
> need to restrict to PVH as for HVM we need to emulate some devices in
> QEMU.
> That said, I am not sure PVH is even feasible.
>

Is that mostly in reference to the need decode instructions on x86 or
are there other reasons why you feel it might not be feasible to apply 
this to Xen on x86?

Thanks for taking the time to consider our work. I think our next step
is to rethink the implementation in terms of the split xenheap design
and try to avoid the need for superpage shattering, so I'll work on
that before pushing the idea further.

Thanks,
Jackson