All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Adding Virtual Memory Fuses to Xen
@ 2022-12-13 19:48 Smith, Jackson
  2022-12-13 19:50 ` [RFC 1/4] Add VMF Hypercall Smith, Jackson
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Smith, Jackson @ 2022-12-13 19:48 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, Julien Grall,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

Hi Xen Developers,

My team at Riverside Research is currently spending IRAD funding
to prototype next-generation secure hypervisor design ideas
on Xen. In particular, we are prototyping the idea of Virtual
Memory Fuses for Software Enclaves, as described in this paper:
https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note that
that paper talks about OS/Process while we have implemented the idea
for Hypervisor/VM.

Our goal is to emulate something akin to Intel SGX or AMD SEV,
but using only existing virtual memory features common in all
processors. The basic idea is not to map guest memory into the
hypervisor so that a compromised hypervisor cannot compromise
(e.g. read/write) the guest. This idea has been proposed before,
however, Virtual Memory Fuses go one step further; they delete the
hypervisor's mappings to its own page tables, essentially locking
the virtual memory configuration for the lifetime of the system. This
creates what we call "Software Enclaves", ensuring that an adversary
with arbitrary code execution in the hypervisor STILL cannot read/write
guest memory.

With this technique, we protect the integrity and confidentiality of
guest memory. However, a compromised hypervisor can still read/write
register state during traps, or refuse to schedule a guest, denying
service. We also recognize that because this technique precludes
modifying Xen's page tables after startup, it may not be compatible
with all of Xen's potential use cases. On the other hand, there are
some uses cases (in particular statically defined embedded systems)
where our technique could be adopted with minimal friction.

With this in mind our goal is to work with the Xen community to
upstream this work as an optional feature. At this point, we have
a prototype implementation of VMF on Xen (the contents of this RFC
patch series) that supports dom0less guests on arm 64. By sharing
our prototype, we hope to socialize our idea, gauge interest, and
hopefully gain useful feedback as we work toward upstreaming.

** IMPLEMENTATION **
In our current setup we have a static configuration with dom0 and
one or two domUs. Soon after boot, Dom0 issues a hypercall through
the xenctrl interface to blow the fuse for the domU. In the future,
we could also add code to support blowing the fuse automatically on
startup, before any domains are un-paused.

Our Xen/arm64 prototype creates Software Enclaves in two steps,
represented by these two functions defined in xen/vmf.h:
void vmf_unmap_guest(struct domain *d);
void vmf_lock_xen_pgtables(void);

In the first, the Xen removes mappings to the guest(s) On arm64, Xen
keeps a reference to all of guest memory in the directmap. Right now,
we simply walk all of the guest second stage tables and remove them
from the directmap, although there is probably a more elegant method
for this.

Second, the Xen removes mappings to its own page tables.
On arm64, this also involves manipulating the directmap. One challenge
here is that as we start to unmap our tables from the directmap,
we can't use the directmap to walk them. Our solution here is also
bit less elegant, we temporarily insert a recursive mapping and use
that to remove page table entries.

** LIMITATIONS and other closing thoughts **
The current Xen code has obviously been implemented under the
assumption that new pages can be mapped, and that guest virtual
addresses can be read, so this technique will break some Xen
features. However, in the general case (in particular for static
workloads where the number of guest's is not changed after boot)
we've seen that Xen rarely needs to access guest memory or adjust
its page tables.

We see a lot of potential synergy with other Xen initiatives like
Hyperlaunch for static domain allocation, or SEV support driving new
hypercall interfaces that don't require reading guest memory. These
features would allow VMF (Virtual Memory Fuses) to work with more
configurations and architectures than our current prototype, which
only supports static configurations on ARM 64.

We have not yet studied how the prototype VMF implementation impacts
performance. On the surface, there should be no significant changes.
However, cache effects from splitting the directmap superpages could
introduce a performance cost.

Additionally, there is additional latency introduced by walking all the
tables to retroactively remove guest memory. This could be optimized
by reworking the Xen code to remove the directmap. We've toyed with
the idea, but haven't attempted it yet.

Finally, our initial testing suggests that Xen never reads guest memory
(in a static, non-dom0-enchanced configuration), but have not really
explored this thoroughly.
We know at least these things work:
	Dom0less virtual serial terminal
	Domain scheduling
We are aware that these things currently depend on accessible guest
memory:
	Some hypercalls take guest pointers as arguments
	Virtualized MMIO on arm needs to decode certain load/store
	instructions

It's likely that other Xen features require guest memory access.

Also, there is currently a lot of debug code that isn't needed for
normal operation, but assumes the ability to read guest memory or
walk page tables in an exceptional case. The xen codebase will need
to be audited for these cases, and proper guards inserted so this
code doesn't pagefault.

Thanks for allowing us to share our work with you. We are really
excited about it, and we look forward to hearing your feedback. We
figure those working with Xen on a day to day basis will likely
uncover details we have overlooked.

Jackson


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 1/4] Add VMF Hypercall
  2022-12-13 19:48 [RFC 0/4] Adding Virtual Memory Fuses to Xen Smith, Jackson
@ 2022-12-13 19:50 ` Smith, Jackson
  2022-12-14  9:29   ` Jan Beulich
  2022-12-13 19:53 ` [RFC 2/4] Add VMF tool Smith, Jackson
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Smith, Jackson @ 2022-12-13 19:50 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, Julien Grall,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

This commit introduces a new vmf_op hypercall. If desired, could be merged
into an exisiting hypercall.

Also, introduce a VMF Kconfig option and xen/vmf.h, defining the arch specific
functions that must be implmented to support vmf.
---
 tools/include/xenctrl.h                 |   2 +
 tools/libs/ctrl/xc_private.c            |   5 ++
 tools/libs/ctrl/xc_private.h            |   5 ++
 xen/arch/x86/guest/xen/hypercall_page.S |   2 +
 xen/common/Kconfig                      |   3 +
 xen/common/Makefile                     |   1 +
 xen/common/vmf.c                        | 111 ++++++++++++++++++++++++++++++++
 xen/include/hypercall-defs.c            |   6 ++
 xen/include/public/vmf.h                |  24 +++++++
 xen/include/public/xen.h                |   3 +
 xen/include/xen/vmf.h                   |  20 ++++++
 11 files changed, 182 insertions(+)
 create mode 100644 xen/common/vmf.c
 create mode 100644 xen/include/public/vmf.h
 create mode 100644 xen/include/xen/vmf.h

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2303787..804ddba 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1604,6 +1604,8 @@ long xc_memory_op(xc_interface *xch, unsigned int cmd, void *arg, size_t len);
 
 int xc_version(xc_interface *xch, int cmd, void *arg);
 
+int xc_vmf_op(xc_interface *xch, unsigned int cmd, uint32_t domid);
+
 int xc_flask_op(xc_interface *xch, xen_flask_op_t *op);
 
 /*
diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c
index 2f99a7d..44fe9ba 100644
--- a/tools/libs/ctrl/xc_private.c
+++ b/tools/libs/ctrl/xc_private.c
@@ -555,6 +555,11 @@ int xc_version(xc_interface *xch, int cmd, void *arg)
     return rc;
 }
 
+int xc_vmf_op(xc_interface *xch, unsigned int cmd, uint32_t domid)
+{
+    return do_vmf_op(xch, cmd, domid);
+}
+
 unsigned long xc_make_page_below_4G(
     xc_interface *xch, uint32_t domid, unsigned long mfn)
 {
diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index ed960c6..fb72cb4 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -222,6 +222,11 @@ static inline int do_xen_version(xc_interface *xch, int cmd, xc_hypercall_buffer
                     cmd, HYPERCALL_BUFFER_AS_ARG(dest));
 }
 
+static inline int do_vmf_op(xc_interface *xch, unsigned int cmd, uint32_t domid)
+{
+    return xencall2(xch->xcall, __HYPERVISOR_vmf_op, cmd, domid);
+}
+
 static inline int do_physdev_op(xc_interface *xch, int cmd, void *op, size_t len)
 {
     int ret = -1;
diff --git a/xen/arch/x86/guest/xen/hypercall_page.S b/xen/arch/x86/guest/xen/hypercall_page.S
index 9958d02..2efdd58 100644
--- a/xen/arch/x86/guest/xen/hypercall_page.S
+++ b/xen/arch/x86/guest/xen/hypercall_page.S
@@ -70,6 +70,8 @@ DECLARE_HYPERCALL(arch_5)
 DECLARE_HYPERCALL(arch_6)
 DECLARE_HYPERCALL(arch_7)
 
+DECLARE_HYPERCALL(vmf_op)
+
 /*
  * Local variables:
  * tab-width: 8
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index f1ea319..3bf92b8 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -92,6 +92,9 @@ config STATIC_MEMORY
 
 	  If unsure, say N.
 
+config VMF
+	bool "Virtual Memory Fuse Support"
+
 menu "Speculative hardening"
 
 config INDIRECT_THUNK
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 3baf83d..fb9118d 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -48,6 +48,7 @@ obj-y += timer.o
 obj-$(CONFIG_TRACEBUFFER) += trace.o
 obj-y += version.o
 obj-y += virtual_region.o
+obj-$(CONFIG_VMF) += vmf.o
 obj-y += vm_event.o
 obj-y += vmap.o
 obj-y += vsprintf.o
diff --git a/xen/common/vmf.c b/xen/common/vmf.c
new file mode 100644
index 0000000..20c61d1
--- /dev/null
+++ b/xen/common/vmf.c
@@ -0,0 +1,111 @@
+/******************************************************************************
+ * vmf.c
+ * 
+ * Common implementation of the VMF hypercall
+ */
+
+#include <xen/lib.h>
+#include <xen/sched.h>
+
+#include <public/vmf.h>
+#include <xen/vmf.h>
+
+static void dump_domain_vcpus(struct domain *d)
+{
+    struct vcpu *v;
+    int i;
+
+    if (d == NULL) {
+        printk("NULL\n");
+        return;
+    }
+
+    printk("Domain: %d (%d vcpus)\n", d->domain_id, d->max_vcpus);
+#if defined(CONFIG_ARM_64)
+    printk("  vttbr: 0x%lx\n", d->arch.p2m.vttbr);
+#endif
+
+    i = 0;
+    for_each_vcpu(d, v)
+    {
+        printk("  vcpu [%d: id=%d, proc=%d]: \n", i++, v->vcpu_id, v->processor);
+        /* archvcpu for arm has: */
+#if defined(CONFIG_ARM_64)
+        printk("    .ttbr0     is 0x%lx\n", v->arch.ttbr0);
+        printk("    .ttbr1     is 0x%lx\n", v->arch.ttbr1);
+#endif
+    }
+}
+
+static void dump_domains(void)
+{
+    struct domain *d;
+
+    for_each_domain(d)
+        dump_domain_vcpus(d);
+
+    /* Dump system domains */
+    printk("IDLE DOMAIN:\n");
+    dump_domain_vcpus(idle_vcpu[0]->domain);
+    printk("HARDWARE DOMAIN:\n");
+    dump_domain_vcpus(hardware_domain);
+    printk("XEN DOMAIN:\n");
+    dump_domain_vcpus(dom_xen);
+    printk("IO DOMAIN:\n");
+    dump_domain_vcpus(dom_io);
+}
+
+long do_vmf_op(unsigned int cmd, domid_t domid)
+{
+    int ret = 0;
+    struct domain *d = NULL;
+
+    printk("VMF hypercall: ");
+
+    if (domid == DOMID_IDLE) {
+        printk("Xen\n");
+    } else if ((domid < DOMID_FIRST_RESERVED) && (d = get_domain_by_id(domid))) {
+        printk("Domain(%d)\n", domid);
+    } else {
+        printk("Invalid domain id (%d)\n", domid);
+        ret = -1;
+        goto out;
+    }
+
+    switch (cmd) {
+    case XENVMF_dump_info:
+        if (d) {
+            vmf_dump_domain_info(d);
+        } else {
+            dump_domains();
+            vmf_dump_xen_info();
+        }
+        break;
+
+    case XENVMF_dump_tables:
+        if (d)
+            vmf_dump_domain_tables(d);
+        else
+            vmf_dump_xen_tables();
+        break;
+
+    case XENVMF_unmap:
+        printk("BLOW VIRTUAL MEMORY FUSE:\n");
+        if (d) {
+            printk("Unmapping Domain(%d)\n", d->domain_id);
+            vmf_unmap_guest(d);
+        } else {
+            printk("Locking Virtual Memory Configuration\n");
+            vmf_lock_xen_pgtables();
+        }
+        break;
+
+    default:
+        printk("Not Implemented\n");
+        break;
+    }
+
+out:
+    printk("Done!\n");
+    return ret;
+}
diff --git a/xen/include/hypercall-defs.c b/xen/include/hypercall-defs.c
index 1896121..fb61bc6 100644
--- a/xen/include/hypercall-defs.c
+++ b/xen/include/hypercall-defs.c
@@ -166,6 +166,9 @@ vm_assist(unsigned int cmd, unsigned int type)
 event_channel_op(int cmd, void *arg)
 mmuext_op(mmuext_op_t *uops, unsigned int count, unsigned int *pdone, unsigned int foreigndom)
 multicall(multicall_entry_t *call_list, unsigned int nr_calls)
+#if defined(CONFIG_VMF)
+vmf_op(unsigned int cmd, domid_t domid)
+#endif
 #ifdef CONFIG_PV
 mmu_update(mmu_update_t *ureqs, unsigned int count, unsigned int *pdone, unsigned int foreigndom)
 stack_switch(unsigned long ss, unsigned long esp)
@@ -239,6 +242,9 @@ update_va_mapping                  compat   do       -        -        -
 set_timer_op                       compat   do       compat   do       -
 event_channel_op_compat            do       do       -        -        dep
 xen_version                        compat   do       compat   do       do
+#if defined(CONFIG_VMF)
+vmf_op                             do       do       do       do       do
+#endif
 console_io                         do       do       do       do       do
 physdev_op_compat                  compat   do       -        -        dep
 #if defined(CONFIG_GRANT_TABLE)
diff --git a/xen/include/public/vmf.h b/xen/include/public/vmf.h
new file mode 100644
index 0000000..a5ec004
--- /dev/null
+++ b/xen/include/public/vmf.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: MIT */
+/******************************************************************************
+ * vmf.h
+ *
+ */
+
+#ifndef __XEN_PUBLIC_VMF_H__
+#define __XEN_PUBLIC_VMF_H__
+
+#define XENVMF_dump_info 1
+#define XENVMF_dump_tables 2
+#define XENVMF_unmap 11
+
+#endif /* __XEN_PUBLIC_VMF_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 920567e..077000c 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -125,6 +125,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_arch_6               54
 #define __HYPERVISOR_arch_7               55
 
+/* custom vmf hypercall */
+#define __HYPERVISOR_vmf_op               56
+
 /* ` } */
 
 /*
diff --git a/xen/include/xen/vmf.h b/xen/include/xen/vmf.h
new file mode 100644
index 0000000..f4b350c
--- /dev/null
+++ b/xen/include/xen/vmf.h
@@ -0,0 +1,20 @@
+/******************************************************************************
+ * vmf.h
+ * 
+ * Public VMF interface to be implemented in arch specific code
+ */
+
+#ifndef __XEN_VMF_H__
+#define __XEN_VMF_H__
+
+struct domain;
+
+void vmf_dump_xen_info(void);
+void vmf_dump_domain_info(struct domain *d);
+void vmf_dump_xen_tables(void);
+void vmf_dump_domain_tables(struct domain *d);
+
+void vmf_unmap_guest(struct domain *d);
+void vmf_lock_xen_pgtables(void);
+
+#endif /* __XEN_VMF_H__ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 2/4] Add VMF tool
  2022-12-13 19:48 [RFC 0/4] Adding Virtual Memory Fuses to Xen Smith, Jackson
  2022-12-13 19:50 ` [RFC 1/4] Add VMF Hypercall Smith, Jackson
@ 2022-12-13 19:53 ` Smith, Jackson
  2022-12-13 19:54 ` [RFC 3/4] Add xen superpage splitting support to arm Smith, Jackson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Smith, Jackson @ 2022-12-13 19:53 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, Julien Grall,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

VMF tool for calling vmf_op hypercall. Eventually should be merged into xl and
related libraries.
---
 tools/Makefile     |  1 +
 tools/vmf/Makefile | 32 +++++++++++++++++++++++++++
 tools/vmf/vmf.c    | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)
 create mode 100644 tools/vmf/Makefile
 create mode 100644 tools/vmf/vmf.c

diff --git a/tools/Makefile b/tools/Makefile
index 7997535..ccf36a1 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -9,6 +9,7 @@ SUBDIRS-y += libs
 SUBDIRS-y += flask
 SUBDIRS-y += fuzz
 SUBDIRS-y += xenstore
+SUBDIRS-y += vmf
 SUBDIRS-y += misc
 SUBDIRS-y += examples
 SUBDIRS-y += hotplug
diff --git a/tools/vmf/Makefile b/tools/vmf/Makefile
new file mode 100644
index 0000000..ac5073b
--- /dev/null
+++ b/tools/vmf/Makefile
@@ -0,0 +1,32 @@
+XEN_ROOT=$(CURDIR)/../..
+include $(XEN_ROOT)/tools/Rules.mk
+
+CFLAGS  += $(CFLAGS_libxenctrl)
+LDLIBS  += $(LDLIBS_libxenctrl)
+
+.PHONY: all
+all: build
+
+.PHONY: build
+build: vmf
+
+.PHONY: install
+install: build
+	$(INSTALL_DIR) $(DESTDIR)$(bindir)
+	$(INSTALL_PROG) vmf $(DESTDIR)$(bindir)/vmf
+
+.PHONY: uninstall
+uninstall:
+	rm -f $(DESTDIR)$(bindir)/vmf
+
+.PHONY: clean
+clean:
+	$(RM) -f $(DEPS_RM) vmf vmf.o
+
+.PHONY: distclean
+distclean: clean
+
+vmf: vmf.o Makefile
+	$(CC) $(LDFLAGS) $< -o $@ $(LDLIBS) $(APPEND_LDFLAGS)
+
+-include $(DEPS_INCLUDE)
diff --git a/tools/vmf/vmf.c b/tools/vmf/vmf.c
new file mode 100644
index 0000000..8b7b293
--- /dev/null
+++ b/tools/vmf/vmf.c
@@ -0,0 +1,65 @@
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <xenctrl.h>
+
+#include <xen/xen.h>
+#include <xen/vmf.h>
+
+int call(unsigned int cmd, unsigned int domid)
+{
+  int ret;
+
+  xc_interface *xch = xc_interface_open(NULL, NULL, 0);
+  ret = xc_vmf_op(xch, cmd, domid);
+  xc_interface_close(xch);
+
+  return ret;
+}
+
+void help(const char *arg0)
+{
+  printf("Usage:\n");
+  printf("  %s dump\n", arg0);
+  printf("  %s info <domid>\n", arg0);
+  printf("  %s tables <domid>\n", arg0);
+  printf("  %s unmap <domid>\n", arg0);
+  printf("  %s lock\n", arg0);
+}
+
+int get_domid(const char *str) {
+  char *endptr;
+  long domid = strtol(str, &endptr, 10);
+  if (domid >= 0)
+    return (int)domid;
+
+  printf("Invalid domid (%ld)\n", domid);
+  exit(1);
+}
+
+int main(int argc, const char* argv[])
+{
+  int domid;
+  if (argc == 2) {
+    domid = DOMID_IDLE;
+  } else if (argc == 3) {
+    domid = get_domid(argv[2]);
+  } else {
+    help(argv[0]);
+    return 0;
+  }
+
+#define ARG(cmd) ((strcmp(cmd, argv[1]) == 0))
+
+  if (ARG("info"))
+    return call(XENVMF_dump_info, domid);
+  else if (ARG("tables"))
+    return call(XENVMF_dump_tables, domid);
+  else if (ARG("unmap"))
+    return call(XENVMF_unmap, domid);
+  else if (ARG("lock") && (argc == 2))
+    return call(XENVMF_unmap, DOMID_IDLE);
+
+  help(argv[0]);
+  return 0;
+}
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 3/4] Add xen superpage splitting support to arm
  2022-12-13 19:48 [RFC 0/4] Adding Virtual Memory Fuses to Xen Smith, Jackson
  2022-12-13 19:50 ` [RFC 1/4] Add VMF Hypercall Smith, Jackson
  2022-12-13 19:53 ` [RFC 2/4] Add VMF tool Smith, Jackson
@ 2022-12-13 19:54 ` Smith, Jackson
  2022-12-13 21:15   ` Julien Grall
  2022-12-13 19:55 ` [RFC 4/4] Implement VMF for arm64 Smith, Jackson
  2022-12-13 20:55 ` [RFC 0/4] Adding Virtual Memory Fuses to Xen Julien Grall
  4 siblings, 1 reply; 33+ messages in thread
From: Smith, Jackson @ 2022-12-13 19:54 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, Julien Grall,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

Updates xen_pt_update_entry function from xen/arch/arm/mm.c to
automatically split superpages as needed.
---
 xen/arch/arm/mm.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 78 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 6301752..91b9c2b 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -753,8 +753,78 @@ static int create_xen_table(lpae_t *entry)
 }
 
 #define XEN_TABLE_MAP_FAILED 0
-#define XEN_TABLE_SUPER_PAGE 1
-#define XEN_TABLE_NORMAL_PAGE 2
+#define XEN_TABLE_NORMAL_PAGE 1
+
+/* More or less taken from p2m_split_superpage, without the p2m stuff */
+static bool xen_split_superpage(lpae_t *entry, unsigned int level,
+                                unsigned int target, const unsigned int *offsets)
+{
+    struct page_info *page;
+    lpae_t pte, *table;
+    unsigned int i;
+    bool rv = true;
+
+    mfn_t mfn = lpae_get_mfn(*entry);
+    unsigned int next_level = level + 1;
+    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
+
+    ASSERT(level < target);
+    ASSERT(lpae_is_superpage(*entry, level));
+
+    page = alloc_domheap_page(NULL, 0);
+    if ( !page )
+        return false;
+
+    table = __map_domain_page(page);
+
+    /*
+     * We are either splitting a first level 1G page into 512 second level
+     * 2M pages, or a second level 2M page into 512 third level 4K pages.
+     */
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        lpae_t *new_entry = table + i;
+
+        /*
+         * Use the content of the superpage entry and override
+         * the necessary fields. So the correct permission are kept.
+         */
+        pte = *entry;
+        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
+
+        /*
+         * First and second level pages set walk.table = 0, but third
+         * level entries set walk.table = 1.
+         */
+        pte.walk.table = (next_level == 3);
+
+        write_pte(new_entry, pte);
+    }
+
+    /*
+     * Shatter superpage in the page to the level we want to make the
+     * changes.
+     * This is done outside the loop to avoid checking the offset to
+     * know whether the entry should be shattered for every entry.
+     */
+    if ( next_level != target )
+        rv = xen_split_superpage(table + offsets[next_level],
+                                 level + 1, target, offsets);
+
+    clean_dcache_va_range(table, PAGE_SIZE);
+    unmap_domain_page(table);
+
+    /*
+     * Generate the entry for this new table we created,
+     * and write it back in place of the superpage entry.
+     */
+    pte = mfn_to_xen_entry(page_to_mfn(page), MT_NORMAL);
+    pte.pt.table = 1;
+    write_pte(entry, pte);
+    clean_dcache(*entry);
+
+    return rv;
+}
 
 /*
  * Take the currently mapped table, find the corresponding entry,
@@ -767,16 +837,15 @@ static int create_xen_table(lpae_t *entry)
  *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
  *  was empty, or allocating a new page failed.
  *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
- *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
  */
 static int xen_pt_next_level(bool read_only, unsigned int level,
-                             lpae_t **table, unsigned int offset)
+                             lpae_t **table, const unsigned int *offsets)
 {
     lpae_t *entry;
     int ret;
     mfn_t mfn;
 
-    entry = *table + offset;
+    entry = *table + offsets[level];
 
     if ( !lpae_is_valid(*entry) )
     {
@@ -790,7 +859,8 @@ static int xen_pt_next_level(bool read_only, unsigned int level,
 
     /* The function xen_pt_next_level is never called at the 3rd level */
     if ( lpae_is_mapping(*entry, level) )
-        return XEN_TABLE_SUPER_PAGE;
+        /* Shatter the superpage before continuing */
+        xen_split_superpage(entry, level, level + 1, offsets);
 
     mfn = lpae_get_mfn(*entry);
 
@@ -915,7 +985,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
     table = xen_map_table(root);
     for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
     {
-        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
+        rc = xen_pt_next_level(read_only, level, &table, offsets);
         if ( rc == XEN_TABLE_MAP_FAILED )
         {
             /*
@@ -941,12 +1011,7 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt,
             break;
     }
 
-    if ( level != target )
-    {
-        mm_printk("%s: Shattering superpage is not supported\n", __func__);
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
+    BUG_ON( level != target );
 
     entry = table + offsets[level];
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 4/4] Implement VMF for arm64
  2022-12-13 19:48 [RFC 0/4] Adding Virtual Memory Fuses to Xen Smith, Jackson
                   ` (2 preceding siblings ...)
  2022-12-13 19:54 ` [RFC 3/4] Add xen superpage splitting support to arm Smith, Jackson
@ 2022-12-13 19:55 ` Smith, Jackson
  2022-12-13 20:55 ` [RFC 0/4] Adding Virtual Memory Fuses to Xen Julien Grall
  4 siblings, 0 replies; 33+ messages in thread
From: Smith, Jackson @ 2022-12-13 19:55 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, Julien Grall,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

Implements the functions from xen/vmf.h for arm64.
Introduces an xen/arch/arm/mm-walk.c helper file for
walking an entire page table structure.
---
 xen/arch/arm/Makefile              |   1 +
 xen/arch/arm/include/asm/mm-walk.h |  53 ++++++++++
 xen/arch/arm/include/asm/mm.h      |  11 +++
 xen/arch/arm/mm-walk.c             | 181 +++++++++++++++++++++++++++++++++
 xen/arch/arm/mm.c                  | 198 ++++++++++++++++++++++++++++++++++++-
 xen/common/Kconfig                 |   2 +
 6 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/include/asm/mm-walk.h
 create mode 100644 xen/arch/arm/mm-walk.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 4d076b2..e358452 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -37,6 +37,7 @@ obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += mem_access.o
 obj-y += mm.o
+obj-y += mm-walk.o
 obj-y += monitor.o
 obj-y += p2m.o
 obj-y += percpu.o
diff --git a/xen/arch/arm/include/asm/mm-walk.h b/xen/arch/arm/include/asm/mm-walk.h
new file mode 100644
index 0000000..770cc89
--- /dev/null
+++ b/xen/arch/arm/include/asm/mm-walk.h
@@ -0,0 +1,53 @@
+#ifndef __ARM_MM_WALK_H__
+#define __ARM_MM_WALK_H__
+
+#include <asm/lpae.h>
+
+#define RECURSIVE_IDX ((unsigned long)(XEN_PT_LPAE_ENTRIES-1))
+#define RECURSIVE_VA (RECURSIVE_IDX << ZEROETH_SHIFT)
+
+/*
+ * Remove all mappings in these tables from Xen's address space
+ * Only makes sense if walking a guest's tables
+ */
+#define WALK_HIDE_GUEST_MAPPING (1U << 0)
+/*
+ * Remove all mappings to these tables from Xen's address space
+ * Makes sense if walking a guest's table (hide guest tables from Xen)
+ * Or if walking Xen's tables (lock Xen's virtual memory configuration)
+ */
+#define WALK_HIDE_GUEST_TABLE (1U << 1)
+
+/*
+ * Before we can hide individual table entires,
+ * we need to split the directmap superpages
+ */
+#define WALK_SPLIT_DIRECTMAP_TABLE (1U << 2)
+/*
+ * Like walk table hide, but using recursive mapping
+ * to bypass walking directmap when table is in the directmap
+ */
+#define WALK_HIDE_DIRECTMAP_TABLE (1U << 3)
+
+/* These are useful for development/debug */
+/* Show all pte's for a given address space */
+#define WALK_DUMP_ENTRIES (1U << 4)
+/* Show all mappings for a given address space */
+#define WALK_DUMP_MAPPINGS (1U << 5)
+
+/*
+ * Given the value of a ttbr register, this function walks every valid entry in the trie
+ * (As opposed to dump_pt_walk, which follows a single address from root to leaf)
+ */
+void do_walk_tables(paddr_t ttbr, int root_level, int nr_root_tables, int flags);
+
+#endif /*  __ARM_MM_WALK_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 68adcac..2e85885 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -209,6 +209,17 @@ extern void mmu_init_secondary_cpu(void);
  * For Arm64, map the region in the directmap area.
  */
 extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long nr_mfns);
+/* Shatter superpages for these mfns if needed */
+extern int split_directmap_mapping(unsigned long mfn, unsigned long nr_mfns);
+/* Remove these mfns from the directmap */
+extern int destroy_directmap_mapping(unsigned long mfn, unsigned long nr_mfns);
+/*
+ * Remove this mfn from the directmap (bypassing normal update code)
+ * This is a workaround for current pgtable update code, which cannot be used
+ * to remove directmap table entries from the directmap (because they are
+ * needed to walk the directmap)
+ */
+extern void destroy_directmap_table(unsigned long mfn);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
diff --git a/xen/arch/arm/mm-walk.c b/xen/arch/arm/mm-walk.c
new file mode 100644
index 0000000..48f9b2d
--- /dev/null
+++ b/xen/arch/arm/mm-walk.c
@@ -0,0 +1,181 @@
+/*
+ * xen/arch/arm/mm-walk.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <xen/lib.h>
+#include <xen/domain_page.h>
+
+#include <asm/page.h>
+#include <asm/mm-walk.h>
+
+typedef struct {
+    /* Keeps track of all the table offsets so we can reconstruct the VA if we need to */
+    int off[4];
+
+    /* Keeps track of root level so we can make sense of the table offsets */
+    int root_level;
+    int root_table_idx; /* only meaningful when nr_root_tables > 1 */
+} walk_info_t;
+
+/*
+ * Turn a walk_info_t into a virtual address
+ *
+ * XXX: This only applies to the lower VA range
+ * Ie. if you are looking at a table in ttbr1, this is different
+ * XXX: doesn't work for concat tables right now either
+ */
+static unsigned long walk_to_va(int level, walk_info_t *walk)
+{
+/* #define off_valid(x) (((x) <= level) && ((x) >= walk->root_level)) */
+#define off_valid(x) ((x) <= level)
+#define off_val(x) ((u64)(off_valid(x) ? walk->off[x] : 0))
+
+    return (off_val(0) << ZEROETH_SHIFT)  \
+           | (off_val(1) << FIRST_SHIFT)  \
+           | (off_val(2) << SECOND_SHIFT) \
+           | (off_val(3) << THIRD_SHIFT);
+}
+
+/* Prints each entry in the form "\t @XTH TABLE:0.0.0.0 = 0xENTRY" */
+static void dump_entry(int level, lpae_t pte, walk_info_t *walk)
+{
+    int i;
+    static const char *level_strs[4] = { "0TH", "1ST", "2ND", "3RD" };
+    ASSERT(level <= 3);
+
+    for (i = 0; i < level; i++)
+        printk("  ");
+
+    printk("@%s %i:", level_strs[level], walk->root_table_idx);
+
+    for (i = walk->root_level; i < level; i++)
+        printk("%d.", walk->off[i]);
+
+    printk("%d = 0x%lx\n", walk->off[level], pte.bits);
+}
+
+/* Prints each mapping in the form IA:0xIA -> OFN:0xOFN XG,M,K */
+static void dump_mapping(int level, lpae_t pte, walk_info_t *walk)
+{
+    unsigned long va;
+    unsigned long ofn = pte.walk.base;
+    const char *size[4] = {"??", "1G", "2M", "4K"};
+
+    ASSERT(level >= 1);
+    ASSERT(level <= 3);
+
+    va = walk_to_va(level, walk);
+
+    /* ofn stands for output frame number.. I just made it up. */
+    printk("0x%lx -> 0x%lx %s\n", va, ofn, size[level]);
+}
+
+/* Recursive walk function */
+static void walk_table(mfn_t mfn, int level, walk_info_t *walk, int flags)
+{
+    lpae_t *table;
+
+    #define i (walk->off[level])
+
+    BUG_ON(level > 3);
+
+    table = map_domain_page(mfn);
+    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+    {
+        lpae_t pte = table[i];
+        if ( !lpae_is_valid(pte) )
+            continue;
+
+        /* Skip recursive mapping */
+        if ( level == 0 && i == RECURSIVE_IDX )
+            continue;
+
+        if ( flags & WALK_DUMP_ENTRIES )
+            dump_entry(level, pte, walk);
+
+        if ( lpae_is_mapping(pte, level) )
+        {
+            /* Do mapping related things */
+            if ( flags & WALK_DUMP_MAPPINGS )
+                dump_mapping(level, pte, walk);
+            if ( flags & WALK_HIDE_GUEST_MAPPING )
+                /* Destroy all of Xen's mappings to the physical frames covered by this entry */
+                destroy_directmap_mapping(pte.walk.base, 1 << XEN_PT_LEVEL_ORDER(level));
+        }
+        else if ( lpae_is_table(pte, level) )
+        {
+            /* else, pte is a table: recurse! */
+            walk_table(lpae_get_mfn(pte), level + 1, walk, flags);
+
+            /* Note that the entry is a normal entry in xen's page tables */
+            if ( flags & WALK_HIDE_GUEST_TABLE )
+                /*
+                 * This call will look up the table pointed to by this entry in the directmap
+                 * and remove it in the typical way
+                 * This leaves the table intact, but removes the directmap mapping to it, hiding it from xen
+                 */
+                destroy_directmap_mapping(pte.walk.base, 1);
+            if ( flags & WALK_SPLIT_DIRECTMAP_TABLE )
+                /*
+                 * This call will look up the table pointed to by this entry in the directmap
+                 * and make sure that it has it's own l3 entry, splitting superpages if needed
+                 */
+                split_directmap_mapping(pte.walk.base, 1);
+            if ( flags & WALK_HIDE_DIRECTMAP_TABLE )
+                /*
+                 * This call will look up the table pointed to by this entry in the directmap
+                 * and (now that it has it's own l3 entry) overwrite that entry with 0's
+                 * This leaves the table intact, but removes the directmap mapping to it, hiding it from xen
+                 */
+                destroy_directmap_table(pte.walk.base);
+        }
+        /* else, invalid pte, level == 3, vaild == true, table = false */
+    }
+    unmap_domain_page(table);
+
+    #undef i
+}
+
+void do_walk_tables(paddr_t ttbr, int root_level, int nr_root_tables, int flags)
+{
+    int i;
+    mfn_t root = maddr_to_mfn(ttbr & PADDR_MASK);
+    walk_info_t walk = {
+        .off = {0},
+        .root_level = root_level,
+    };
+
+    BUG_ON( !mfn_x(root) || !mfn_valid(root) );
+
+    for ( i = 0; i < nr_root_tables; i++, root = mfn_add(root, 1) ) {
+        walk.root_table_idx = i;
+        walk_table(root, root_level, &walk, flags);
+
+        /* Our walk doesn't consider the root table, so do that here */
+        if ( flags & WALK_SPLIT_DIRECTMAP_TABLE )
+            split_directmap_mapping(mfn_x(root), 1);
+        if ( flags & WALK_HIDE_GUEST_TABLE )
+            destroy_directmap_mapping(mfn_x(root), 1);
+        if ( flags & WALK_HIDE_DIRECTMAP_TABLE )
+            destroy_directmap_table(mfn_x(root));
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 91b9c2b..64e9efd 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -21,11 +21,13 @@
 #include <xen/sizes.h>
 #include <xen/types.h>
 #include <xen/vmap.h>
+#include <xen/vmf.h>
 
 #include <xsm/xsm.h>
 
 #include <asm/fixmap.h>
 #include <asm/setup.h>
+#include <asm/mm-walk.h>
 
 #include <public/memory.h>
 
@@ -1164,7 +1166,8 @@ static int xen_pt_update(unsigned long virt,
      *
      * XXX: Add a check.
      */
-    const mfn_t root = virt_to_mfn(THIS_CPU_PGTABLE);
+    /* TODO: does this change have a negative performance impact? */
+    const mfn_t root = maddr_to_mfn(READ_SYSREG64(TTBR0_EL2));
 
     /*
      * The hardware was configured to forbid mapping both writeable and
@@ -1273,6 +1276,199 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
     return xen_pt_update(s, INVALID_MFN, (e - s) >> PAGE_SHIFT, flags);
 }
 
+static void insert_recursive_mapping(void)
+{
+    uint64_t ttbr = READ_SYSREG64(TTBR0_EL2);
+    const mfn_t root_mfn = maddr_to_mfn(ttbr & PADDR_MASK);
+    lpae_t *pgtable = map_domain_page(root_mfn);
+
+    lpae_t pte = mfn_to_xen_entry(root_mfn, MT_NORMAL);
+    pte.pt.table = 1;
+
+    spin_lock(&xen_pt_lock);
+
+    write_pte(&pgtable[RECURSIVE_IDX], pte);
+    clean_dcache(pgtable[RECURSIVE_IDX]);
+
+    unmap_domain_page(pgtable);
+    spin_unlock(&xen_pt_lock);
+}
+
+/*
+ * Converts va to a table pointer through the recursive mapping
+ * Only valid for the current address space obviously
+ */
+static lpae_t *va_to_table(int level, unsigned long va)
+{
+    /* Shift everything by 9 for each walk we skip */
+    /* Last off shifted out becomes becomes offset into page */
+    for ( ;level <= 3; level++ ) {
+        va >>= XEN_PT_LPAE_SHIFT;
+        va |= RECURSIVE_VA;
+    }
+
+    /* Mask out any offset, in case caller is asking about a misalligned va */
+    va &= ~0x7;
+    return (lpae_t *)va;
+}
+
+/*
+ * Zero out the table at level when walking to virt
+ * Do this through the recursive mapping, in case we have already
+ * removed part of the directmap and can't walk to that entry
+ */
+static void clear_pte_directly(int level, void *virt)
+{
+    unsigned long va = (unsigned long)virt;
+    lpae_t empty = {.pt = {0x0}};
+    lpae_t *table;
+
+    spin_lock(&xen_pt_lock);
+
+    /* We're assuming we can safely remove an entry at `level` */
+    /* This depends on va not living in a superpage */
+    BUG_ON(level > 1 && !va_to_table(1, va)->pt.table);
+    BUG_ON(level > 2 && !va_to_table(2, va)->pt.table);
+
+    table = va_to_table(level, va);
+    write_pte(table, empty);
+    clean_dcache(*table);
+    flush_xen_tlb_range_va((vaddr_t)table, sizeof(*table));
+
+    spin_unlock(&xen_pt_lock);
+}
+
+static void remove_recursive_mapping(void)
+{
+    clear_pte_directly(0, (void *)RECURSIVE_VA);
+}
+
+static int modify_virt_mapping(void *virt, int nr_pages, int flags)
+{
+    unsigned long va = (unsigned long)virt;
+    return modify_xen_mappings(va, va + (PAGE_SIZE * nr_pages), flags);
+}
+
+static int destroy_virt_mapping(void *virt, int nr_pages)
+{
+    return modify_virt_mapping(virt, nr_pages, 0);
+}
+
+static int modify_directmap_mapping(unsigned long mfn, unsigned long nr_mfns, int flags)
+{
+    if ( mfn & pfn_hole_mask )
+    {
+        printk("** Skipping mfn 0x%lx because it lives in the pfn hole **\n", mfn);
+        return 0;
+    }
+
+    return modify_virt_mapping(__mfn_to_virt(mfn), nr_mfns, flags);
+}
+
+int split_directmap_mapping(unsigned long mfn, unsigned long nr_mfns)
+{
+    return modify_directmap_mapping(mfn, nr_mfns, PAGE_HYPERVISOR);
+}
+
+int destroy_directmap_mapping(unsigned long mfn, unsigned long nr_mfns)
+{
+    return modify_directmap_mapping(mfn, nr_mfns, 0);
+}
+
+void destroy_directmap_table(unsigned long mfn)
+{
+    BUG_ON(mfn & pfn_hole_mask);
+    clear_pte_directly(3, __mfn_to_virt(mfn));
+}
+
+static void unmap_xen_root_tables(void)
+{
+    destroy_virt_mapping(xen_xenmap, 1);
+    destroy_virt_mapping(xen_fixmap, 1);
+    destroy_virt_mapping(xen_second, 1);
+#if defined(CONFIG_ARM_64)
+    destroy_virt_mapping(xen_first, 1);
+    destroy_virt_mapping(xen_pgtable, 1);
+#endif
+}
+
+static void walk_hyp_tables(int flags)
+{
+    uint64_t httbr = READ_SYSREG64(TTBR0_EL2);
+    do_walk_tables(httbr, HYP_PT_ROOT_LEVEL, 1, flags);
+}
+
+static void walk_guest_tables(struct domain *d, int flags)
+{
+    uint64_t vttbr = d->arch.p2m.vttbr;
+    do_walk_tables(vttbr, P2M_ROOT_LEVEL, 1<<P2M_ROOT_ORDER, flags);
+}
+
+
+void vmf_unmap_guest(struct domain *d)
+{
+    /* Remove all of directmap mappings to guest */
+    walk_guest_tables(d, WALK_HIDE_GUEST_MAPPING);
+
+    /* Remove all mappings to guest second stage tables */
+    walk_guest_tables(d, WALK_HIDE_GUEST_TABLE);
+}
+
+void vmf_lock_xen_pgtables(void)
+{
+    /* Remove all of the static allocated root tables */
+    unmap_xen_root_tables();
+
+    /*
+     * Remove all tables from directmap
+     * Becuase we can't use the directmap to walk tables while we are removing
+     * the directmap, add a recursive pointer and use that to erase pte's
+     */
+    insert_recursive_mapping();
+    walk_hyp_tables(WALK_SPLIT_DIRECTMAP_TABLE);
+    walk_hyp_tables(WALK_HIDE_DIRECTMAP_TABLE);
+    remove_recursive_mapping();
+}
+
+void vmf_dump_xen_info()
+{
+    printk("Dump reg info...\n");
+    printk("current httbr0 is 0x%lx\n", READ_SYSREG64(TTBR0_EL2));
+    printk("current vttbr is 0x%lx\n", READ_SYSREG64(VTTBR_EL2));
+    printk("current ttbr0 is 0x%lx\n", READ_SYSREG64(TTBR0_EL1));
+    printk("current ttbr1 is 0x%lx\n", READ_SYSREG64(TTBR1_EL1));
+    printk("\n");
+
+    printk("Dump xen table info...\n");
+#if defined(CONFIG_ARM_64)
+    printk("xen_pgtable: 0x%"PRIvaddr"\n", (vaddr_t)xen_pgtable);
+    printk("xen_first: 0x%"PRIvaddr"\n", (vaddr_t)xen_first);
+#endif
+    printk("xen_second: 0x%"PRIvaddr"\n", (vaddr_t)xen_second);
+    printk("xen_xenmap: 0x%"PRIvaddr"\n", (vaddr_t)xen_xenmap);
+    printk("xen_fixmap: 0x%"PRIvaddr"\n", (vaddr_t)xen_fixmap);
+}
+
+void vmf_dump_domain_info(struct domain *d)
+{
+    uint64_t vttbr = d->arch.p2m.vttbr;
+    uint64_t httbr = READ_SYSREG64(TTBR0_EL2);
+
+    printk("Dump domain info...\n");
+    printk("guest mfn = 0x%lx\n", paddr_to_pfn(vttbr & PADDR_MASK));
+    printk("xen mfn = 0x%lx\n", paddr_to_pfn(httbr & PADDR_MASK));
+}
+
+void vmf_dump_xen_tables()
+{
+    walk_hyp_tables(WALK_DUMP_MAPPINGS | WALK_DUMP_ENTRIES);
+}
+
+void vmf_dump_domain_tables(struct domain *d)
+{
+    walk_guest_tables(d, WALK_DUMP_MAPPINGS | WALK_DUMP_ENTRIES);
+}
+
 /* Release all __init and __initdata ranges to be reused */
 void free_init_memory(void)
 {
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 3bf92b8..c087371 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -94,6 +94,8 @@ config STATIC_MEMORY
 
 config VMF
 	bool "Virtual Memory Fuse Support"
+	depends on ARM_64
+	default y
 
 menu "Speculative hardening"
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 19:48 [RFC 0/4] Adding Virtual Memory Fuses to Xen Smith, Jackson
                   ` (3 preceding siblings ...)
  2022-12-13 19:55 ` [RFC 4/4] Implement VMF for arm64 Smith, Jackson
@ 2022-12-13 20:55 ` Julien Grall
  2022-12-13 22:22   ` Demi Marie Obenour
  2022-12-15 19:27   ` Smith, Jackson
  4 siblings, 2 replies; 33+ messages in thread
From: Julien Grall @ 2022-12-13 20:55 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark



On 13/12/2022 19:48, Smith, Jackson wrote:
> Hi Xen Developers,

Hi Jackson,

Thanks for sharing the prototype with the community. Some 
questions/remarks below.

> My team at Riverside Research is currently spending IRAD funding
> to prototype next-generation secure hypervisor design ideas
> on Xen. In particular, we are prototyping the idea of Virtual
> Memory Fuses for Software Enclaves, as described in this paper:
> https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note that
> that paper talks about OS/Process while we have implemented the idea
> for Hypervisor/VM.
> 
> Our goal is to emulate something akin to Intel SGX or AMD SEV,
> but using only existing virtual memory features common in all
> processors. The basic idea is not to map guest memory into the
> hypervisor so that a compromised hypervisor cannot compromise
> (e.g. read/write) the guest. This idea has been proposed before,
> however, Virtual Memory Fuses go one step further; they delete the
> hypervisor's mappings to its own page tables, essentially locking
> the virtual memory configuration for the lifetime of the system. This
> creates what we call "Software Enclaves", ensuring that an adversary
> with arbitrary code execution in the hypervisor STILL cannot read/write
> guest memory.

I am confused, if the attacker is able to execute arbitrary code, then 
what prevent them to write code to map/unmap the page?

Skimming through the paper (pages 5-6), it looks like you would need to 
implement extra defense in Xen to be able to prevent map/unmap a page.

> 
> With this technique, we protect the integrity and confidentiality of
> guest memory. However, a compromised hypervisor can still read/write
> register state during traps, or refuse to schedule a guest, denying
> service. We also recognize that because this technique precludes
> modifying Xen's page tables after startup, it may not be compatible
> with all of Xen's potential use cases. On the other hand, there are
> some uses cases (in particular statically defined embedded systems)
> where our technique could be adopted with minimal friction.

 From what you wrote, this sounds very much like the project Citrix and 
Amazon worked on called "Secret-free hypervisor" with a twist. In your 
case, you want to prevent the hypervisor to map/unmap the guest memory.

You can find some details in [1]. The code is x86 only, but I don't see 
any major blocker to port it on arm64.

> 
> With this in mind our goal is to work with the Xen community to
> upstream this work as an optional feature. At this point, we have
> a prototype implementation of VMF on Xen (the contents of this RFC
> patch series) that supports dom0less guests on arm 64. By sharing
> our prototype, we hope to socialize our idea, gauge interest, and
> hopefully gain useful feedback as we work toward upstreaming.
> 
> ** IMPLEMENTATION **
> In our current setup we have a static configuration with dom0 and
> one or two domUs. Soon after boot, Dom0 issues a hypercall through
> the xenctrl interface to blow the fuse for the domU. In the future,
> we could also add code to support blowing the fuse automatically on
> startup, before any domains are un-paused.
> 
> Our Xen/arm64 prototype creates Software Enclaves in two steps,
> represented by these two functions defined in xen/vmf.h:
> void vmf_unmap_guest(struct domain *d);
> void vmf_lock_xen_pgtables(void);
> 
> In the first, the Xen removes mappings to the guest(s) On arm64, Xen
> keeps a reference to all of guest memory in the directmap. Right now,
> we simply walk all of the guest second stage tables and remove them
> from the directmap, although there is probably a more elegant method
> for this.

IIUC, you first map all the RAM and then remove the pages. What you 
could do instead is to map only the memory required for Xen use. The 
rest would be left unmapped.

This would be similar to what we are doing on arm32. We have a split 
heap. Only the xenheap is mapped. The pages from the domheap will be 
mapped ondemand.

Another approach, would be to have a single heap where pages used by Xen 
are mapped in the page-tables when allocated (this is what secret-free 
hypervisor is doing is).

If you don't map to keep the page-tables around, then it sounds like you 
want the first approach.

> 
> Second, the Xen removes mappings to its own page tables.
> On arm64, this also involves manipulating the directmap. One challenge
> here is that as we start to unmap our tables from the directmap,
> we can't use the directmap to walk them. Our solution here is also
> bit less elegant, we temporarily insert a recursive mapping and use
> that to remove page table entries.

See above.

> 
> ** LIMITATIONS and other closing thoughts **
> The current Xen code has obviously been implemented under the
> assumption that new pages can be mapped, and that guest virtual
> addresses can be read, so this technique will break some Xen
> features. However, in the general case

Can you clarify your definition of "general case"? From my PoV, it is a 
lot more common to have guest with PV emulated device rather than with 
device attached. So it will be mandatory to access part of the memory 
(e.g. grant table).

> (in particular for static
> workloads where the number of guest's is not changed after boot)

That very much depend on how you configure your guest. If they have 
device assigned then possibly yes. Otherwise see above.

> Finally, our initial testing suggests that Xen never reads guest memory
> (in a static, non-dom0-enchanced configuration), but have not really
> explored this thoroughly.
> We know at least these things work:
> 	Dom0less virtual serial terminal
> 	Domain scheduling
> We are aware that these things currently depend on accessible guest
> memory:
> 	Some hypercalls take guest pointers as arguments

There are not many hypercalls that don't take guest pointers.

> 	Virtualized MMIO on arm needs to decode certain load/store
> 	instructions

On Arm, this can be avoided of the guest OS is not using such 
instruction. In fact they were only added to cater "broken" guest OS.

Also, this will probably be a lot more difficult on x86 as, AFAIK, there 
is no instruction syndrome. So you will need to decode the instruction 
in order to emulate the access.

> 
> It's likely that other Xen features require guest memory access.

For Arm, guest memory access is also needed when using the GICv3 ITS 
and/or second-level SMMU (still in RFC).

For x86, if you don't want to access the guest memory, then you may need 
to restrict to PVH as for HVM we need to emulate some devices in QEMU. 
That said, I am not sure PVH is even feasible.

Cheers,

[1] 
https://www.youtube.com/watch?v=RKJOwIkCnB4&list=PLYyw7IQjL-zFYmEoZEYswoVuXrHvXAWxj&index=5

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 3/4] Add xen superpage splitting support to arm
  2022-12-13 19:54 ` [RFC 3/4] Add xen superpage splitting support to arm Smith, Jackson
@ 2022-12-13 21:15   ` Julien Grall
  2022-12-13 22:17     ` Demi Marie Obenour
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-13 21:15 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

Hi,

On 13/12/2022 19:54, Smith, Jackson wrote:
> Updates xen_pt_update_entry function from xen/arch/arm/mm.c to
> automatically split superpages as needed.
Your signed-off-by is missing.

> ---
>   xen/arch/arm/mm.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 78 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 6301752..91b9c2b 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -753,8 +753,78 @@ static int create_xen_table(lpae_t *entry)
>   }
>   
>   #define XEN_TABLE_MAP_FAILED 0
> -#define XEN_TABLE_SUPER_PAGE 1
> -#define XEN_TABLE_NORMAL_PAGE 2
> +#define XEN_TABLE_NORMAL_PAGE 1
> +
> +/* More or less taken from p2m_split_superpage, without the p2m stuff */
> +static bool xen_split_superpage(lpae_t *entry, unsigned int level,
> +                                unsigned int target, const unsigned int *offsets)
> +{
> +    struct page_info *page;
> +    lpae_t pte, *table;
> +    unsigned int i;
> +    bool rv = true;
> +
> +    mfn_t mfn = lpae_get_mfn(*entry);
> +    unsigned int next_level = level + 1;
> +    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
> +
> +    ASSERT(level < target);
> +    ASSERT(lpae_is_superpage(*entry, level));
> +
> +    page = alloc_domheap_page(NULL, 0);
Page-table may be allocated from the boot allocator. So you want to use 
create_xen_table().

> +    if ( !page )
> +        return false;
> +
> +    table = __map_domain_page(page);

You want to use xen_map_table().

> +
> +    /*
> +     * We are either splitting a first level 1G page into 512 second level
> +     * 2M pages, or a second level 2M page into 512 third level 4K pages.
> +     */
> +    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
> +    {
> +        lpae_t *new_entry = table + i;
> +
> +        /*
> +         * Use the content of the superpage entry and override
> +         * the necessary fields. So the correct permission are kept.
> +         */
> +        pte = *entry;
> +        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
> +
> +        /*
> +         * First and second level pages set walk.table = 0, but third
> +         * level entries set walk.table = 1.
> +         */
> +        pte.walk.table = (next_level == 3);
> +
> +        write_pte(new_entry, pte);
> +    }
> +
> +    /*
> +     * Shatter superpage in the page to the level we want to make the
> +     * changes.
> +     * This is done outside the loop to avoid checking the offset to
> +     * know whether the entry should be shattered for every entry.
> +     */
> +    if ( next_level != target )
> +        rv = xen_split_superpage(table + offsets[next_level],
> +                                 level + 1, target, offsets);
> +
> +    clean_dcache_va_range(table, PAGE_SIZE);

Cleaning the cache is not necessary. This is done in the P2M case 
because it is shared with the IOMMU which may not support coherent access.

> +    unmap_domain_page(table);

This would be xen_map

> +
> +    /*
> +     * Generate the entry for this new table we created,
> +     * and write it back in place of the superpage entry.
> +     */

I am afraid this is not compliant with the Arm Arm. If you want to 
update valid entry (e.g. shattering a superpage), then you need to 
follow the break-before-make sequence. This means that:
   1. Replace the valid entry with an entry with an invalid one
   2. Flush the TLBs
   3. Write the new entry

Those steps will make your code compliant but it also means that a 
virtual address will be temporarily invalid so you could take a fault in 
the middle of your split if your stack or the table was part of the 
region. The same could happen for the other running CPUs but this is 
less problematic as they could spin on the page-table lock.

This is the main reason why we never implemented super-page shattering 
for the hypervisor.

So I would rather prefer if we can avoid shattering (I have made some 
suggestion in the cover letter). If we really need to shatter, then we 
should make sure this is only used in very limited use case by 
introducing a flag. So the caller will be reponsible to acknowledge it 
doesn't modify a region that may be used by itself or another CPU.

> +    pte = mfn_to_xen_entry(page_to_mfn(page), MT_NORMAL);
> +    pte.pt.table = 1;
> +    write_pte(entry, pte);
> +    clean_dcache(*entry);

Ditto about the cache cleaning.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 3/4] Add xen superpage splitting support to arm
  2022-12-13 21:15   ` Julien Grall
@ 2022-12-13 22:17     ` Demi Marie Obenour
  2022-12-13 23:07       ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-13 22:17 UTC (permalink / raw)
  To: Julien Grall, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 4724 bytes --]

On Tue, Dec 13, 2022 at 09:15:49PM +0000, Julien Grall wrote:
> Hi,
> 
> On 13/12/2022 19:54, Smith, Jackson wrote:
> > Updates xen_pt_update_entry function from xen/arch/arm/mm.c to
> > automatically split superpages as needed.
> Your signed-off-by is missing.
> 
> > ---
> >   xen/arch/arm/mm.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++--------
> >   1 file changed, 78 insertions(+), 13 deletions(-)
> > 
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > index 6301752..91b9c2b 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -753,8 +753,78 @@ static int create_xen_table(lpae_t *entry)
> >   }
> >   #define XEN_TABLE_MAP_FAILED 0
> > -#define XEN_TABLE_SUPER_PAGE 1
> > -#define XEN_TABLE_NORMAL_PAGE 2
> > +#define XEN_TABLE_NORMAL_PAGE 1
> > +
> > +/* More or less taken from p2m_split_superpage, without the p2m stuff */
> > +static bool xen_split_superpage(lpae_t *entry, unsigned int level,
> > +                                unsigned int target, const unsigned int *offsets)
> > +{
> > +    struct page_info *page;
> > +    lpae_t pte, *table;
> > +    unsigned int i;
> > +    bool rv = true;
> > +
> > +    mfn_t mfn = lpae_get_mfn(*entry);
> > +    unsigned int next_level = level + 1;
> > +    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
> > +
> > +    ASSERT(level < target);
> > +    ASSERT(lpae_is_superpage(*entry, level));
> > +
> > +    page = alloc_domheap_page(NULL, 0);
> Page-table may be allocated from the boot allocator. So you want to use
> create_xen_table().
> 
> > +    if ( !page )
> > +        return false;
> > +
> > +    table = __map_domain_page(page);
> 
> You want to use xen_map_table().
> 
> > +
> > +    /*
> > +     * We are either splitting a first level 1G page into 512 second level
> > +     * 2M pages, or a second level 2M page into 512 third level 4K pages.
> > +     */
> > +    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
> > +    {
> > +        lpae_t *new_entry = table + i;
> > +
> > +        /*
> > +         * Use the content of the superpage entry and override
> > +         * the necessary fields. So the correct permission are kept.
> > +         */
> > +        pte = *entry;
> > +        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
> > +
> > +        /*
> > +         * First and second level pages set walk.table = 0, but third
> > +         * level entries set walk.table = 1.
> > +         */
> > +        pte.walk.table = (next_level == 3);
> > +
> > +        write_pte(new_entry, pte);
> > +    }
> > +
> > +    /*
> > +     * Shatter superpage in the page to the level we want to make the
> > +     * changes.
> > +     * This is done outside the loop to avoid checking the offset to
> > +     * know whether the entry should be shattered for every entry.
> > +     */
> > +    if ( next_level != target )
> > +        rv = xen_split_superpage(table + offsets[next_level],
> > +                                 level + 1, target, offsets);
> > +
> > +    clean_dcache_va_range(table, PAGE_SIZE);
> 
> Cleaning the cache is not necessary. This is done in the P2M case because it
> is shared with the IOMMU which may not support coherent access.
> 
> > +    unmap_domain_page(table);
> 
> This would be xen_map
> 
> > +
> > +    /*
> > +     * Generate the entry for this new table we created,
> > +     * and write it back in place of the superpage entry.
> > +     */
> 
> I am afraid this is not compliant with the Arm Arm. If you want to update
> valid entry (e.g. shattering a superpage), then you need to follow the
> break-before-make sequence. This means that:
>   1. Replace the valid entry with an entry with an invalid one
>   2. Flush the TLBs
>   3. Write the new entry
> 
> Those steps will make your code compliant but it also means that a virtual
> address will be temporarily invalid so you could take a fault in the middle
> of your split if your stack or the table was part of the region. The same
> could happen for the other running CPUs but this is less problematic as they
> could spin on the page-table lock.

Could this be worked around by writing the critical section in
assembler?  The assembler code would never access the stack and would
run with interrupts disabled.  There could also be BUG() checks for
attempting to shatter a PTE that was needed to access the PTE in
question, though I suspect one can work around this with a temporary
PTE.  That said, shattering large pages requires allocating memory,
which might fail.  What happens if the allocation does fail?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 20:55 ` [RFC 0/4] Adding Virtual Memory Fuses to Xen Julien Grall
@ 2022-12-13 22:22   ` Demi Marie Obenour
  2022-12-13 23:05     ` Julien Grall
  2022-12-16 11:58     ` Julien Grall
  2022-12-15 19:27   ` Smith, Jackson
  1 sibling, 2 replies; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-13 22:22 UTC (permalink / raw)
  To: Julien Grall, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

On Tue, Dec 13, 2022 at 08:55:28PM +0000, Julien Grall wrote:
> On 13/12/2022 19:48, Smith, Jackson wrote:
> > Hi Xen Developers,
> 
> Hi Jackson,
> 
> Thanks for sharing the prototype with the community. Some questions/remarks
> below.

[snip]

> > With this technique, we protect the integrity and confidentiality of
> > guest memory. However, a compromised hypervisor can still read/write
> > register state during traps, or refuse to schedule a guest, denying
> > service. We also recognize that because this technique precludes
> > modifying Xen's page tables after startup, it may not be compatible
> > with all of Xen's potential use cases. On the other hand, there are
> > some uses cases (in particular statically defined embedded systems)
> > where our technique could be adopted with minimal friction.
> 
> From what you wrote, this sounds very much like the project Citrix and
> Amazon worked on called "Secret-free hypervisor" with a twist. In your case,
> you want to prevent the hypervisor to map/unmap the guest memory.
> 
> You can find some details in [1]. The code is x86 only, but I don't see any
> major blocker to port it on arm64.

Is there any way the secret-free hypervisor code could be upstreamed?
My understanding is that it would enable guests to use SMT without
risking the host, which would be amazing.

> > 	Virtualized MMIO on arm needs to decode certain load/store
> > 	instructions
> 
> On Arm, this can be avoided of the guest OS is not using such instruction.
> In fact they were only added to cater "broken" guest OS.
> 
> Also, this will probably be a lot more difficult on x86 as, AFAIK, there is
> no instruction syndrome. So you will need to decode the instruction in order
> to emulate the access.

Is requiring the guest to emulate such instructions itself an option?
μXen, SEV-SNP, and TDX all do this.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 22:22   ` Demi Marie Obenour
@ 2022-12-13 23:05     ` Julien Grall
  2022-12-14  1:28       ` Demi Marie Obenour
  2022-12-14 14:06       ` Julien Grall
  2022-12-16 11:58     ` Julien Grall
  1 sibling, 2 replies; 33+ messages in thread
From: Julien Grall @ 2022-12-13 23:05 UTC (permalink / raw)
  To: Demi Marie Obenour, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi Demi,

On 13/12/2022 22:22, Demi Marie Obenour wrote:
> On Tue, Dec 13, 2022 at 08:55:28PM +0000, Julien Grall wrote:
>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>> Hi Xen Developers,
>>
>> Hi Jackson,
>>
>> Thanks for sharing the prototype with the community. Some questions/remarks
>> below.
> 
> [snip]
> 
>>> With this technique, we protect the integrity and confidentiality of
>>> guest memory. However, a compromised hypervisor can still read/write
>>> register state during traps, or refuse to schedule a guest, denying
>>> service. We also recognize that because this technique precludes
>>> modifying Xen's page tables after startup, it may not be compatible
>>> with all of Xen's potential use cases. On the other hand, there are
>>> some uses cases (in particular statically defined embedded systems)
>>> where our technique could be adopted with minimal friction.
>>
>>  From what you wrote, this sounds very much like the project Citrix and
>> Amazon worked on called "Secret-free hypervisor" with a twist. In your case,
>> you want to prevent the hypervisor to map/unmap the guest memory.
>>
>> You can find some details in [1]. The code is x86 only, but I don't see any
>> major blocker to port it on arm64.
> 
> Is there any way the secret-free hypervisor code could be upstreamed?
This has been in my todo list for more than year but didn't yet find 
anyone to finish the work.

I need to have a look how much left the original work it is left to do. 
Would you be interested to contribute?

> My understanding is that it would enable guests to use SMT without
> risking the host, which would be amazing.
> 
>>> 	Virtualized MMIO on arm needs to decode certain load/store
>>> 	instructions
>>
>> On Arm, this can be avoided of the guest OS is not using such instruction.
>> In fact they were only added to cater "broken" guest OS.
>>
>> Also, this will probably be a lot more difficult on x86 as, AFAIK, there is
>> no instruction syndrome. So you will need to decode the instruction in order
>> to emulate the access.
> 
> Is requiring the guest to emulate such instructions itself an option?
> μXen, SEV-SNP, and TDX all do this.


I am not very familiar with this. So a few questions:
  * Does this mean the OS needs to be modified?
  * What happen for emulated device?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 3/4] Add xen superpage splitting support to arm
  2022-12-13 22:17     ` Demi Marie Obenour
@ 2022-12-13 23:07       ` Julien Grall
  2022-12-14  1:38         ` Demi Marie Obenour
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-13 23:07 UTC (permalink / raw)
  To: Demi Marie Obenour, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi Demi,

On 13/12/2022 22:17, Demi Marie Obenour wrote:
> On Tue, Dec 13, 2022 at 09:15:49PM +0000, Julien Grall wrote:
>> Hi,
>>
>> On 13/12/2022 19:54, Smith, Jackson wrote:
>>> Updates xen_pt_update_entry function from xen/arch/arm/mm.c to
>>> automatically split superpages as needed.
>> Your signed-off-by is missing.
>>
>>> ---
>>>    xen/arch/arm/mm.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++--------
>>>    1 file changed, 78 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>> index 6301752..91b9c2b 100644
>>> --- a/xen/arch/arm/mm.c
>>> +++ b/xen/arch/arm/mm.c
>>> @@ -753,8 +753,78 @@ static int create_xen_table(lpae_t *entry)
>>>    }
>>>    #define XEN_TABLE_MAP_FAILED 0
>>> -#define XEN_TABLE_SUPER_PAGE 1
>>> -#define XEN_TABLE_NORMAL_PAGE 2
>>> +#define XEN_TABLE_NORMAL_PAGE 1
>>> +
>>> +/* More or less taken from p2m_split_superpage, without the p2m stuff */
>>> +static bool xen_split_superpage(lpae_t *entry, unsigned int level,
>>> +                                unsigned int target, const unsigned int *offsets)
>>> +{
>>> +    struct page_info *page;
>>> +    lpae_t pte, *table;
>>> +    unsigned int i;
>>> +    bool rv = true;
>>> +
>>> +    mfn_t mfn = lpae_get_mfn(*entry);
>>> +    unsigned int next_level = level + 1;
>>> +    unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level);
>>> +
>>> +    ASSERT(level < target);
>>> +    ASSERT(lpae_is_superpage(*entry, level));
>>> +
>>> +    page = alloc_domheap_page(NULL, 0);
>> Page-table may be allocated from the boot allocator. So you want to use
>> create_xen_table().
>>
>>> +    if ( !page )
>>> +        return false;
>>> +
>>> +    table = __map_domain_page(page);
>>
>> You want to use xen_map_table().
>>
>>> +
>>> +    /*
>>> +     * We are either splitting a first level 1G page into 512 second level
>>> +     * 2M pages, or a second level 2M page into 512 third level 4K pages.
>>> +     */
>>> +    for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
>>> +    {
>>> +        lpae_t *new_entry = table + i;
>>> +
>>> +        /*
>>> +         * Use the content of the superpage entry and override
>>> +         * the necessary fields. So the correct permission are kept.
>>> +         */
>>> +        pte = *entry;
>>> +        lpae_set_mfn(pte, mfn_add(mfn, i << level_order));
>>> +
>>> +        /*
>>> +         * First and second level pages set walk.table = 0, but third
>>> +         * level entries set walk.table = 1.
>>> +         */
>>> +        pte.walk.table = (next_level == 3);
>>> +
>>> +        write_pte(new_entry, pte);
>>> +    }
>>> +
>>> +    /*
>>> +     * Shatter superpage in the page to the level we want to make the
>>> +     * changes.
>>> +     * This is done outside the loop to avoid checking the offset to
>>> +     * know whether the entry should be shattered for every entry.
>>> +     */
>>> +    if ( next_level != target )
>>> +        rv = xen_split_superpage(table + offsets[next_level],
>>> +                                 level + 1, target, offsets);
>>> +
>>> +    clean_dcache_va_range(table, PAGE_SIZE);
>>
>> Cleaning the cache is not necessary. This is done in the P2M case because it
>> is shared with the IOMMU which may not support coherent access.
>>
>>> +    unmap_domain_page(table);
>>
>> This would be xen_map
>>
>>> +
>>> +    /*
>>> +     * Generate the entry for this new table we created,
>>> +     * and write it back in place of the superpage entry.
>>> +     */
>>
>> I am afraid this is not compliant with the Arm Arm. If you want to update
>> valid entry (e.g. shattering a superpage), then you need to follow the
>> break-before-make sequence. This means that:
>>    1. Replace the valid entry with an entry with an invalid one
>>    2. Flush the TLBs
>>    3. Write the new entry
>>
>> Those steps will make your code compliant but it also means that a virtual
>> address will be temporarily invalid so you could take a fault in the middle
>> of your split if your stack or the table was part of the region. The same
>> could happen for the other running CPUs but this is less problematic as they
>> could spin on the page-table lock.
> 
> Could this be worked around by writing the critical section in
> assembler? 

Everything is feasible. Is this worth it? I don't think so. There are 
way we can avoid the shattering at first by simply not mapping all the RAM.

> The assembler code would never access the stack and would
> run with interrupts disabled.  There could also be BUG() checks for
> attempting to shatter a PTE that was needed to access the PTE in
> question, though I suspect one can work around this with a temporary
> PTE.  That said, shattering large pages requires allocating memory,
> which might fail.  What happens if the allocation does fail?

If this is only done during boot, then I would argue you will want to 
crash Xen.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 23:05     ` Julien Grall
@ 2022-12-14  1:28       ` Demi Marie Obenour
  2022-12-14 14:06       ` Julien Grall
  1 sibling, 0 replies; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-14  1:28 UTC (permalink / raw)
  To: Julien Grall, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark,
	Marek Marczykowski-Górecki

[-- Attachment #1: Type: text/plain, Size: 4697 bytes --]

On Tue, Dec 13, 2022 at 11:05:49PM +0000, Julien Grall wrote:
> Hi Demi,
> 
> On 13/12/2022 22:22, Demi Marie Obenour wrote:
> > On Tue, Dec 13, 2022 at 08:55:28PM +0000, Julien Grall wrote:
> > > On 13/12/2022 19:48, Smith, Jackson wrote:
> > > > Hi Xen Developers,
> > > 
> > > Hi Jackson,
> > > 
> > > Thanks for sharing the prototype with the community. Some questions/remarks
> > > below.
> > 
> > [snip]
> > 
> > > > With this technique, we protect the integrity and confidentiality of
> > > > guest memory. However, a compromised hypervisor can still read/write
> > > > register state during traps, or refuse to schedule a guest, denying
> > > > service. We also recognize that because this technique precludes
> > > > modifying Xen's page tables after startup, it may not be compatible
> > > > with all of Xen's potential use cases. On the other hand, there are
> > > > some uses cases (in particular statically defined embedded systems)
> > > > where our technique could be adopted with minimal friction.
> > > 
> > >  From what you wrote, this sounds very much like the project Citrix and
> > > Amazon worked on called "Secret-free hypervisor" with a twist. In your case,
> > > you want to prevent the hypervisor to map/unmap the guest memory.
> > > 
> > > You can find some details in [1]. The code is x86 only, but I don't see any
> > > major blocker to port it on arm64.
> > 
> > Is there any way the secret-free hypervisor code could be upstreamed?
> This has been in my todo list for more than year but didn't yet find anyone
> to finish the work.
> 
> I need to have a look how much left the original work it is left to do.
> Would you be interested to contribute?

That’s up to Marek.  My understanding is that it would allow guests to
use SMT if (and only if) they do not rely on any form of in-guest
sandboxing (at least as far as confidentiality is concerned).  In Qubes
OS, most guests should satisfy this criterion.  The main exception are
guests that run a web browser or that use the sandboxed indexing
functionality of tracker3.  In particular, Marek’s builders and other
qubes that do CPU-intensive workloads could benefit significantly.

> > My understanding is that it would enable guests to use SMT without
> > risking the host, which would be amazing.
> > 
> > > > 	Virtualized MMIO on arm needs to decode certain load/store
> > > > 	instructions
> > > 
> > > On Arm, this can be avoided of the guest OS is not using such instruction.
> > > In fact they were only added to cater "broken" guest OS.
> > > 
> > > Also, this will probably be a lot more difficult on x86 as, AFAIK, there is
> > > no instruction syndrome. So you will need to decode the instruction in order
> > > to emulate the access.
> > 
> > Is requiring the guest to emulate such instructions itself an option?
> > μXen, SEV-SNP, and TDX all do this.
> 
> 
> I am not very familiar with this. So a few questions:
>  * Does this mean the OS needs to be modified?

Any form of confidential computing requires that the OS be modified to
treat the devices (such as disk and network interfaces) that it receives
from the host as untrusted, so such modification will be needed anyway.
Therefore, this is not an obstacle.  Conversely, cases where modifying
the guest is not possible invariably consider the host to be trusted,
unless I am missing something.

In contexts where the host is trusted, and the goal is to e.g. get rid
of the hypervisor’s instruction emulator, one approach would be inject
some emulation code into the guest that runs with guest kernel
privileges and has full R/W over all guest memory.  The emulation code
would normally be hidden by second-level page tables, but when the
hypervisor needs to emulate an instruction, the hypervisor switches to a
second-level page table in which this code and its stack are visible.
The emulation logic then does the needed emulation and returns to the
hypervisor, without the guest ever being aware that anything unusual has
happened.  While the emulation logic runs in the guest, it is normally
hidden by second-level page tables, so even the guest kernel cannot
observe or tamper with it.

>  * What happen for emulated device?

Using emulated devices in a setup where the emulator is not trusted
makes no sense anyway, so I don’t think this question is relevant.  The
only reason to use emulated devices is legacy compatibility, and the
legacy OSs that require them will consider them to be trusted.
Therefore, relying on emulated devices would defeat the purpose.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 3/4] Add xen superpage splitting support to arm
  2022-12-13 23:07       ` Julien Grall
@ 2022-12-14  1:38         ` Demi Marie Obenour
  2022-12-14  9:09           ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-14  1:38 UTC (permalink / raw)
  To: Julien Grall, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 2371 bytes --]

On Tue, Dec 13, 2022 at 11:07:55PM +0000, Julien Grall wrote:
> Hi Demi,
> 
> On 13/12/2022 22:17, Demi Marie Obenour wrote:
> > On Tue, Dec 13, 2022 at 09:15:49PM +0000, Julien Grall wrote:

[snip]

> > > > +
> > > > +    /*
> > > > +     * Generate the entry for this new table we created,
> > > > +     * and write it back in place of the superpage entry.
> > > > +     */
> > > 
> > > I am afraid this is not compliant with the Arm Arm. If you want to update
> > > valid entry (e.g. shattering a superpage), then you need to follow the
> > > break-before-make sequence. This means that:
> > >    1. Replace the valid entry with an entry with an invalid one
> > >    2. Flush the TLBs
> > >    3. Write the new entry
> > > 
> > > Those steps will make your code compliant but it also means that a virtual
> > > address will be temporarily invalid so you could take a fault in the middle
> > > of your split if your stack or the table was part of the region. The same
> > > could happen for the other running CPUs but this is less problematic as they
> > > could spin on the page-table lock.
> > 
> > Could this be worked around by writing the critical section in
> > assembler?
> 
> Everything is feasible. Is this worth it? I don't think so. There are way we
> can avoid the shattering at first by simply not mapping all the RAM.

Good point.  I do wonder what would go wrong if one replaced one live
PTE with another that pointed to the same physical address.  Is this
merely a case of “spec doesn’t allow it”, or does it actually break on
real hardware?

> > The assembler code would never access the stack and would
> > run with interrupts disabled.  There could also be BUG() checks for
> > attempting to shatter a PTE that was needed to access the PTE in
> > question, though I suspect one can work around this with a temporary
> > PTE.  That said, shattering large pages requires allocating memory,
> > which might fail.  What happens if the allocation does fail?
> 
> If this is only done during boot, then I would argue you will want to crash
> Xen.

Fair!  Even NASA’s coding standards for spacecraft allow for dynamic
allocation during initialization.  After all, initialization is
typically deterministic and easy to test.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 3/4] Add xen superpage splitting support to arm
  2022-12-14  1:38         ` Demi Marie Obenour
@ 2022-12-14  9:09           ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2022-12-14  9:09 UTC (permalink / raw)
  To: Demi Marie Obenour, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi Demi,

On 14/12/2022 01:38, Demi Marie Obenour wrote:
> On Tue, Dec 13, 2022 at 11:07:55PM +0000, Julien Grall wrote:
>> Hi Demi,
>>
>> On 13/12/2022 22:17, Demi Marie Obenour wrote:
>>> On Tue, Dec 13, 2022 at 09:15:49PM +0000, Julien Grall wrote:
> 
> [snip]
> 
>>>>> +
>>>>> +    /*
>>>>> +     * Generate the entry for this new table we created,
>>>>> +     * and write it back in place of the superpage entry.
>>>>> +     */
>>>>
>>>> I am afraid this is not compliant with the Arm Arm. If you want to update
>>>> valid entry (e.g. shattering a superpage), then you need to follow the
>>>> break-before-make sequence. This means that:
>>>>     1. Replace the valid entry with an entry with an invalid one
>>>>     2. Flush the TLBs
>>>>     3. Write the new entry
>>>>
>>>> Those steps will make your code compliant but it also means that a virtual
>>>> address will be temporarily invalid so you could take a fault in the middle
>>>> of your split if your stack or the table was part of the region. The same
>>>> could happen for the other running CPUs but this is less problematic as they
>>>> could spin on the page-table lock.
>>>
>>> Could this be worked around by writing the critical section in
>>> assembler?
>>
>> Everything is feasible. Is this worth it? I don't think so. There are way we
>> can avoid the shattering at first by simply not mapping all the RAM.
> 
> Good point.  I do wonder what would go wrong if one replaced one live
> PTE with another that pointed to the same physical address. 

It depends what you are modifying the PTE. If you only modify the 
permissions, then that's fine. But anything else could result to TLB 
conflict, loss of coherency...

> Is this
> merely a case of “spec doesn’t allow it”, or does it actually break on
> real hardware?
I have seen issues on real HW if the ordering is not respected. Recent 
version of the Arm Arm introduced the possibility to skip the sequence 
in certain conditions and if the HW supports it (reported via the ID 
registers).

I haven't yet seen such processor.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 1/4] Add VMF Hypercall
  2022-12-13 19:50 ` [RFC 1/4] Add VMF Hypercall Smith, Jackson
@ 2022-12-14  9:29   ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2022-12-14  9:29 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, Julien Grall,
	bertrand.marquis, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

On 13.12.2022 20:50, Smith, Jackson wrote:
> This commit introduces a new vmf_op hypercall. If desired, could be merged
> into an exisiting hypercall.
> 
> Also, introduce a VMF Kconfig option and xen/vmf.h, defining the arch specific
> functions that must be implmented to support vmf.

Neither here nor in the public interface header you describe what this is
intended to do (including both present sub-ops as well as future ones,
which - judging from the numbering - appear to exist somewhere). Therefore
there is too little context here to make any judgement.

Jan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 23:05     ` Julien Grall
  2022-12-14  1:28       ` Demi Marie Obenour
@ 2022-12-14 14:06       ` Julien Grall
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2022-12-14 14:06 UTC (permalink / raw)
  To: Demi Marie Obenour, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi,

On 13/12/2022 23:05, Julien Grall wrote: > On 13/12/2022 22:22, Demi 
Marie Obenour wrote:
>> On Tue, Dec 13, 2022 at 08:55:28PM +0000, Julien Grall wrote:
>>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>>> Hi Xen Developers,
>>>
>>> Hi Jackson,
>>>
>>> Thanks for sharing the prototype with the community. Some 
>>> questions/remarks
>>> below.
>>
>> [snip]
>>
>>>> With this technique, we protect the integrity and confidentiality of
>>>> guest memory. However, a compromised hypervisor can still read/write
>>>> register state during traps, or refuse to schedule a guest, denying
>>>> service. We also recognize that because this technique precludes
>>>> modifying Xen's page tables after startup, it may not be compatible
>>>> with all of Xen's potential use cases. On the other hand, there are
>>>> some uses cases (in particular statically defined embedded systems)
>>>> where our technique could be adopted with minimal friction.
>>>
>>>  From what you wrote, this sounds very much like the project Citrix and
>>> Amazon worked on called "Secret-free hypervisor" with a twist. In 
>>> your case,
>>> you want to prevent the hypervisor to map/unmap the guest memory.
>>>
>>> You can find some details in [1]. The code is x86 only, but I don't 
>>> see any
>>> major blocker to port it on arm64.
>>
>> Is there any way the secret-free hypervisor code could be upstreamed?
> This has been in my todo list for more than year but didn't yet find 
> anyone to finish the work.
> 
> I need to have a look how much left the original work it is left to do. 

I have looked at the series. It looks like there are only 16 patches 
left to be reviewed.

They are two years old but the code hasn't changed too much. So I will 
look at porting them over the next few days and hopefully I can respin 
the series before Christmas.

Cheers,
-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 20:55 ` [RFC 0/4] Adding Virtual Memory Fuses to Xen Julien Grall
  2022-12-13 22:22   ` Demi Marie Obenour
@ 2022-12-15 19:27   ` Smith, Jackson
  2022-12-15 22:00     ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Smith, Jackson @ 2022-12-15 19:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 10625 bytes --]

Hi Julien,

-----Original Message-----
From: Julien Grall <julien@xen.org>
Sent: Tuesday, December 13, 2022 3:55 PM
To: Smith, Jackson <rsmith@RiversideResearch.org>
>
> On 13/12/2022 19:48, Smith, Jackson wrote:
> > Hi Xen Developers,
>
> Hi Jackson,
>
> Thanks for sharing the prototype with the community. Some
> questions/remarks below.
>
> > My team at Riverside Research is currently spending IRAD funding to
> > prototype next-generation secure hypervisor design ideas on Xen. In
> > particular, we are prototyping the idea of Virtual Memory Fuses for
> > Software Enclaves, as described in this paper:
> > https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note
> that that
> > paper talks about OS/Process while we have implemented the idea
> for
> > Hypervisor/VM.
> >
> > Our goal is to emulate something akin to Intel SGX or AMD SEV, but
> > using only existing virtual memory features common in all
processors.
> > The basic idea is not to map guest memory into the hypervisor so
> that
> > a compromised hypervisor cannot compromise (e.g. read/write) the
> > guest. This idea has been proposed before, however, Virtual Memory
> > Fuses go one step further; they delete the hypervisor's mappings to
> > its own page tables, essentially locking the virtual memory
> > configuration for the lifetime of the system. This creates what we
> > call "Software Enclaves", ensuring that an adversary with arbitrary
> > code execution in the hypervisor STILL cannot read/write guest
> memory.
>
> I am confused, if the attacker is able to execute arbitrary code, then
> what prevent them to write code to map/unmap the page?
>
> Skimming through the paper (pages 5-6), it looks like you would need
> to implement extra defense in Xen to be able to prevent map/unmap a
> page.
>

The key piece is deleting all virtual mappings to Xen's page table
structures. From the paper (4.4.1 last paragraph), "Because all memory
accesses operate through the MMU, even page table memory needs
corresponding page table entries in order to be written to." Without a
virtual mapping to the page table, no code can modify the page table
because it cannot read or write the table. Therefore the mappings to the
guest cannot be restored even with arbitrary code execution.

> >
> > With this technique, we protect the integrity and confidentiality of
> > guest memory. However, a compromised hypervisor can still
> read/write
> > register state during traps, or refuse to schedule a guest, denying
> > service. We also recognize that because this technique precludes
> > modifying Xen's page tables after startup, it may not be compatible
> > with all of Xen's potential use cases. On the other hand, there are
> > some uses cases (in particular statically defined embedded systems)
> > where our technique could be adopted with minimal friction.
>
>  From what you wrote, this sounds very much like the project Citrix
and
> Amazon worked on called "Secret-free hypervisor" with a twist. In your
> case, you want to prevent the hypervisor to map/unmap the guest
> memory.
>
> You can find some details in [1]. The code is x86 only, but I don't
see
> any major blocker to port it on arm64.
>

Yes, we are familiar with the "secret-free hypervisor" work. As you
point out, both our work and the secret-free hypervisor remove the
directmap region to mitigate the risk of leaking sensitive guest
secrets. However, our work is slightly different because it additionally
prevents attackers from tricking Xen into remapping a guest. 

We see our goals and the secret-free hypervisor goals as orthogonal.
While the secret-free hypervisor views guests as untrusted and wants to
keep compromised guests from leaking secrets, our work comes from the
perspective of an individual guest trying to protect its secrets from
the rest of the stack. So it wouldn't be unreasonable to say "I want a
hypervisor that is 'secret-free' and implements VMF". We see them as 
different techniques with overlapping implementations.

> >
> > With this in mind our goal is to work with the Xen community to
> > upstream this work as an optional feature. At this point, we have a
> > prototype implementation of VMF on Xen (the contents of this RFC
> patch
> > series) that supports dom0less guests on arm 64. By sharing our
> > prototype, we hope to socialize our idea, gauge interest, and
> > hopefully gain useful feedback as we work toward upstreaming.
> >
> > ** IMPLEMENTATION **
> > In our current setup we have a static configuration with dom0 and
> one
> > or two domUs. Soon after boot, Dom0 issues a hypercall through the
> > xenctrl interface to blow the fuse for the domU. In the future, we
> > could also add code to support blowing the fuse automatically on
> > startup, before any domains are un-paused.
> >
> > Our Xen/arm64 prototype creates Software Enclaves in two steps,
> > represented by these two functions defined in xen/vmf.h:
> > void vmf_unmap_guest(struct domain *d); void
> > vmf_lock_xen_pgtables(void);
> >
> > In the first, the Xen removes mappings to the guest(s) On arm64, Xen
> > keeps a reference to all of guest memory in the directmap. Right
now,
> > we simply walk all of the guest second stage tables and remove them
> > from the directmap, although there is probably a more elegant
> method
> > for this.
>
> IIUC, you first map all the RAM and then remove the pages. What you
> could do instead is to map only the memory required for Xen use. The
> rest would be left unmapped.
>
> This would be similar to what we are doing on arm32. We have a split
> heap. Only the xenheap is mapped. The pages from the domheap will
> be mapped ondemand.

Yes, I think that would work. Xen can temporarily map guest memory
in the domheap when loading guests. When the system finishes booting, we
can prevent the hypervisor from mapping pages by unmaping the domheap
root tables. We could start by adding an option to enable split xenheap
on arm64.

> Another approach, would be to have a single heap where pages used
> by Xen are mapped in the page-tables when allocated (this is what
> secret-free hypervisor is doing is).
>
> If you don't map to keep the page-tables around, then it sounds like
> you want the first approach.
>
> >
> > Second, the Xen removes mappings to its own page tables.
> > On arm64, this also involves manipulating the directmap. One
> challenge
> > here is that as we start to unmap our tables from the directmap, we
> > can't use the directmap to walk them. Our solution here is also bit
> > less elegant, we temporarily insert a recursive mapping and use that
> > to remove page table entries.
>
> See above.

Using the split xenheap approach means we don't have to worry about
unmapping guest pagetables or xen's dynamically allocated tables.

We still need to unmap the handful of static pagetables that are
declared at the top of xen/arch/arm/mm.c. Remember our goal is to
prevent Xen from reading or writing its own page tables. We can't just
unmap these static tables without shattering because they end up part of
the superpages that map the xen binary. We're probably only shattering a
single superpage for this right now. Maybe we can move the static tables
to a superpage aligned region of the binary and pad that region so we
can unmap an entire superpage without shattering? In the future we might
adjust the boot code to avoid the dependency on static page table
locations.

>
> >
> > ** LIMITATIONS and other closing thoughts ** The current Xen code
> has
> > obviously been implemented under the assumption that new pages
> can be
> > mapped, and that guest virtual addresses can be read, so this
> > technique will break some Xen features. However, in the general case
>
> Can you clarify your definition of "general case"? From my PoV, it is
a
> lot more common to have guest with PV emulated device rather than
> with device attached. So it will be mandatory to access part of the
> memory (e.g. grant table).

Yes "general case" may have been poor wording on my part. I wanted to
say that configurations exist that do not require reading guest memory,
not that this was the most common (or even a common) case.

>
> > (in particular for static
> > workloads where the number of guest's is not changed after boot)
>
> That very much depend on how you configure your guest. If they have
> device assigned then possibly yes. Otherwise see above.

Yes right now we are assuming only assigned devices, no PV or emulated
ones.

>
> > Finally, our initial testing suggests that Xen never reads guest
> > memory (in a static, non-dom0-enchanced configuration), but have
> not
> > really explored this thoroughly.
> > We know at least these things work:
> > 	Dom0less virtual serial terminal
> > 	Domain scheduling
> > We are aware that these things currently depend on accessible guest
> > memory:
> > 	Some hypercalls take guest pointers as arguments
>
> There are not many hypercalls that don't take guest pointers.
>
> > 	Virtualized MMIO on arm needs to decode certain load/store
> > 	instructions
>
> On Arm, this can be avoided of the guest OS is not using such
> instruction. In fact they were only added to cater "broken" guest OS.
>

What do you mean by "broken" guests?

I see in the arm ARM where it discusses interpreting the syndrome
register. But I'm not understanding which instructions populate the
syndrome register and which do not. Why are guests using instructions
that don't populate the syndrome register considered "broken"? Is there
somewhere I can look to learn more?

> Also, this will probably be a lot more difficult on x86 as, AFAIK,
there
> is
> no instruction syndrome. So you will need to decode the instruction in
> order to emulate the access.
>
> >
> > It's likely that other Xen features require guest memory access.
>
> For Arm, guest memory access is also needed when using the GICv3 ITS
> and/or second-level SMMU (still in RFC).
>

Thanks for pointing this out. We will be sure to make note of these
limitations going forward.

>
> For x86, if you don't want to access the guest memory, then you may
> need to restrict to PVH as for HVM we need to emulate some devices in
> QEMU.
> That said, I am not sure PVH is even feasible.
>

Is that mostly in reference to the need decode instructions on x86 or
are there other reasons why you feel it might not be feasible to apply 
this to Xen on x86?

Thanks for taking the time to consider our work. I think our next step
is to rethink the implementation in terms of the split xenheap design
and try to avoid the need for superpage shattering, so I'll work on
that before pushing the idea further.

Thanks,
Jackson

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5317 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-15 19:27   ` Smith, Jackson
@ 2022-12-15 22:00     ` Julien Grall
  2022-12-16  1:46       ` Stefano Stabellini
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-15 22:00 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark



On 15/12/2022 19:27, Smith, Jackson wrote:
> Hi Julien,

Hi Jackson,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Tuesday, December 13, 2022 3:55 PM
> To: Smith, Jackson <rsmith@RiversideResearch.org>
>>
>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>> Hi Xen Developers,
>>
>> Hi Jackson,
>>
>> Thanks for sharing the prototype with the community. Some
>> questions/remarks below.
>>
>>> My team at Riverside Research is currently spending IRAD funding to
>>> prototype next-generation secure hypervisor design ideas on Xen. In
>>> particular, we are prototyping the idea of Virtual Memory Fuses for
>>> Software Enclaves, as described in this paper:
>>> https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note
>> that that
>>> paper talks about OS/Process while we have implemented the idea
>> for
>>> Hypervisor/VM.
>>>
>>> Our goal is to emulate something akin to Intel SGX or AMD SEV, but
>>> using only existing virtual memory features common in all
> processors.
>>> The basic idea is not to map guest memory into the hypervisor so
>> that
>>> a compromised hypervisor cannot compromise (e.g. read/write) the
>>> guest. This idea has been proposed before, however, Virtual Memory
>>> Fuses go one step further; they delete the hypervisor's mappings to
>>> its own page tables, essentially locking the virtual memory
>>> configuration for the lifetime of the system. This creates what we
>>> call "Software Enclaves", ensuring that an adversary with arbitrary
>>> code execution in the hypervisor STILL cannot read/write guest
>> memory.
>>
>> I am confused, if the attacker is able to execute arbitrary code, then
>> what prevent them to write code to map/unmap the page?
>>
>> Skimming through the paper (pages 5-6), it looks like you would need
>> to implement extra defense in Xen to be able to prevent map/unmap a
>> page.
>>
> 
> The key piece is deleting all virtual mappings to Xen's page table
> structures. From the paper (4.4.1 last paragraph), "Because all memory
> accesses operate through the MMU, even page table memory needs
> corresponding page table entries in order to be written to." Without a
> virtual mapping to the page table, no code can modify the page table
> because it cannot read or write the table. Therefore the mappings to the
> guest cannot be restored even with arbitrary code execution.
I don't think this is sufficient. Even if the page-tables not part of 
the virtual mapping, an attacker could still modify TTBR0_EL2 (that's a 
system register hold a host physical address). So, with a bit more work, 
you can gain access to everything (see more below).

AFAICT, this problem is pointed out in the paper (section 4.4.1):

"The remaining attack vector. Unfortunately, deleting the page
table mappings does not stop the kernel from creating an entirely
new page table with the necessary mappings and switching to it
as the active context. Although this would be very difficult for
an attacker, switching to a new context with a carefully crafted
new page table structure could compromise the VMFE."

I believe this will be easier to do it in Xen because the virtual layout 
is not very complex.

It would be a matter of inserting a new entry in the root table you 
control. A rough sequence would be:
    1) Allocate a page
    2) Prepare the page to act as a root (e.g. mapping of your code...)
    3) Map the "existing" root as a writable.
    4) Update TTBR0_EL2 to point to your new root
    5) Add a mapping in the "old" root
    6) Switch to the old root

So can you outline how you plan to prevent/mitigate it?

> 
>>>
>>> With this technique, we protect the integrity and confidentiality of
>>> guest memory. However, a compromised hypervisor can still
>> read/write
>>> register state during traps, or refuse to schedule a guest, denying
>>> service. We also recognize that because this technique precludes
>>> modifying Xen's page tables after startup, it may not be compatible
>>> with all of Xen's potential use cases. On the other hand, there are
>>> some uses cases (in particular statically defined embedded systems)
>>> where our technique could be adopted with minimal friction.
>>
>>   From what you wrote, this sounds very much like the project Citrix
> and
>> Amazon worked on called "Secret-free hypervisor" with a twist. In your
>> case, you want to prevent the hypervisor to map/unmap the guest
>> memory.
>>
>> You can find some details in [1]. The code is x86 only, but I don't
> see
>> any major blocker to port it on arm64.
>>
> 
> Yes, we are familiar with the "secret-free hypervisor" work. As you
> point out, both our work and the secret-free hypervisor remove the
> directmap region to mitigate the risk of leaking sensitive guest
> secrets. However, our work is slightly different because it additionally
> prevents attackers from tricking Xen into remapping a guest.

I understand your goal, but I don't think this is achieved (see above). 
You would need an entity to prevent write to TTBR0_EL2 in order to fully 
protect it.

> 
> We see our goals and the secret-free hypervisor goals as orthogonal.
> While the secret-free hypervisor views guests as untrusted and wants to
> keep compromised guests from leaking secrets, our work comes from the
> perspective of an individual guest trying to protect its secrets from
> the rest of the stack. So it wouldn't be unreasonable to say "I want a
> hypervisor that is 'secret-free' and implements VMF". We see them as
> different techniques with overlapping implementations.

I can see why you want to divide them. But to me if you have VMF, then 
you have a secret-free hypervisor in term of implementation.

The major difference is how the xenheap is dealt with. At the moment, 
for the implementation we are looking to still use the same heap.

However there are a few drawback in term of pages usage:
   * A page can be allocated anywhere in the memory map. So you can end 
to allocate a L1 (Arm) or L3 (x86) just for a single page
   * Contiguous pages may be allocated at different time.
   * Page-tables can be empty

x86 has some logic to handle the last two points. but Arm don't have it 
yet. I feel this is quite complex (in particular because of the 
break-before-make).

So one solution would be to use a split heap. The trouble is that 
xenheap memory would be more "limited". That might be OK for VMF, I need 
to think a bit more for secret-free hypervisor.

Another solution would be to use the vmap() (which would not be possible 
for VMF).

> Using the split xenheap approach means we don't have to worry about
> unmapping guest pagetables or xen's dynamically allocated tables.
> 
> We still need to unmap the handful of static pagetables that are
> declared at the top of xen/arch/arm/mm.c. Remember our goal is to
> prevent Xen from reading or writing its own page tables. We can't just
> unmap these static tables without shattering because they end up part of
> the superpages that map the xen binary. We're probably only shattering a
> single superpage for this right now. Maybe we can move the static tables
> to a superpage aligned region of the binary and pad that region so we
> can unmap an entire superpage without shattering?

For static pages you don't even need to shatter superpages because Xen 
is mapped with 4KB pages.

> In the future we might
> adjust the boot code to avoid the dependency on static page table
> locations.

You will always need at least a few static page tables for the initial 
switch the MMU on. Now, you could possibly allocate a new set out of Xen 
and then switch to it.

But I am not sure this is worth the trouble if you can easily unmap the 
static version afterwards.

>>
>>> Finally, our initial testing suggests that Xen never reads guest
>>> memory (in a static, non-dom0-enchanced configuration), but have
>> not
>>> really explored this thoroughly.
>>> We know at least these things work:
>>> 	Dom0less virtual serial terminal
>>> 	Domain scheduling
>>> We are aware that these things currently depend on accessible guest
>>> memory:
>>> 	Some hypercalls take guest pointers as arguments
>>
>> There are not many hypercalls that don't take guest pointers.
>>
>>> 	Virtualized MMIO on arm needs to decode certain load/store
>>> 	instructions
>>
>> On Arm, this can be avoided of the guest OS is not using such
>> instruction. In fact they were only added to cater "broken" guest OS.
>>
> 
> What do you mean by "broken" guests?
> 
> I see in the arm ARM where it discusses interpreting the syndrome
> register. But I'm not understanding which instructions populate the
> syndrome register and which do not. Why are guests using instructions
> that don't populate the syndrome register considered "broken"?

The short answer is they can't be easily/safely decoded as Xen read from 
the data cache but the processor read instruction from the instruction 
cache. There are situation where they could mismatch. For more details...

> Is there
> somewhere I can look to learn more?
... you can read [1], [2].


> 
>> Also, this will probably be a lot more difficult on x86 as, AFAIK,
> there
>> is
>> no instruction syndrome. So you will need to decode the instruction in
>> order to emulate the access.
>>
>>>
>>> It's likely that other Xen features require guest memory access.
>>
>> For Arm, guest memory access is also needed when using the GICv3 ITS
>> and/or second-level SMMU (still in RFC).
>>
> 
> Thanks for pointing this out. We will be sure to make note of these
> limitations going forward.
> 
>>
>> For x86, if you don't want to access the guest memory, then you may
>> need to restrict to PVH as for HVM we need to emulate some devices in
>> QEMU.
>> That said, I am not sure PVH is even feasible.
>>
> 
> Is that mostly in reference to the need decode instructions on x86 or
> are there other reasons why you feel it might not be feasible to apply
> this to Xen on x86?

I am not aware of any other. But it would probably best to ask with 
someone more knowledgeable than me on x86.

Cheers,

[1] 
https://lore.kernel.org/xen-devel/e2d041b2-3b38-f19b-2d8e-3a255b0ac07e@amd.com/
[2] 
https://lore.kernel.org/xen-devel/20211126131459.2bbc81ad@donnerap.cambridge.arm.com


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-15 22:00     ` Julien Grall
@ 2022-12-16  1:46       ` Stefano Stabellini
  2022-12-16  8:38         ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2022-12-16  1:46 UTC (permalink / raw)
  To: Julien Grall
  Cc: Smith, Jackson, Brookes, Scott, Xen-devel, Stefano Stabellini,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

On Thu, 15 Dec 2022, Julien Grall wrote:
> > > On 13/12/2022 19:48, Smith, Jackson wrote:
> > > > Hi Xen Developers,
> > > 
> > > Hi Jackson,
> > > 
> > > Thanks for sharing the prototype with the community. Some
> > > questions/remarks below.
> > > 
> > > > My team at Riverside Research is currently spending IRAD funding to
> > > > prototype next-generation secure hypervisor design ideas on Xen. In
> > > > particular, we are prototyping the idea of Virtual Memory Fuses for
> > > > Software Enclaves, as described in this paper:
> > > > https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note
> > > that that
> > > > paper talks about OS/Process while we have implemented the idea
> > > for
> > > > Hypervisor/VM.
> > > > 
> > > > Our goal is to emulate something akin to Intel SGX or AMD SEV, but
> > > > using only existing virtual memory features common in all
> > processors.
> > > > The basic idea is not to map guest memory into the hypervisor so
> > > that
> > > > a compromised hypervisor cannot compromise (e.g. read/write) the
> > > > guest. This idea has been proposed before, however, Virtual Memory
> > > > Fuses go one step further; they delete the hypervisor's mappings to
> > > > its own page tables, essentially locking the virtual memory
> > > > configuration for the lifetime of the system. This creates what we
> > > > call "Software Enclaves", ensuring that an adversary with arbitrary
> > > > code execution in the hypervisor STILL cannot read/write guest
> > > memory.
> > > 
> > > I am confused, if the attacker is able to execute arbitrary code, then
> > > what prevent them to write code to map/unmap the page?
> > > 
> > > Skimming through the paper (pages 5-6), it looks like you would need
> > > to implement extra defense in Xen to be able to prevent map/unmap a
> > > page.
> > > 
> > 
> > The key piece is deleting all virtual mappings to Xen's page table
> > structures. From the paper (4.4.1 last paragraph), "Because all memory
> > accesses operate through the MMU, even page table memory needs
> > corresponding page table entries in order to be written to." Without a
> > virtual mapping to the page table, no code can modify the page table
> > because it cannot read or write the table. Therefore the mappings to the
> > guest cannot be restored even with arbitrary code execution.
>
> I don't think this is sufficient. Even if the page-tables not part of the
> virtual mapping, an attacker could still modify TTBR0_EL2 (that's a system
> register hold a host physical address). So, with a bit more work, you can gain
> access to everything (see more below).
> 
> AFAICT, this problem is pointed out in the paper (section 4.4.1):
> 
> "The remaining attack vector. Unfortunately, deleting the page
> table mappings does not stop the kernel from creating an entirely
> new page table with the necessary mappings and switching to it
> as the active context. Although this would be very difficult for
> an attacker, switching to a new context with a carefully crafted
> new page table structure could compromise the VMFE."
> 
> I believe this will be easier to do it in Xen because the virtual layout is
> not very complex.
> 
> It would be a matter of inserting a new entry in the root table you control. A
> rough sequence would be:
>    1) Allocate a page
>    2) Prepare the page to act as a root (e.g. mapping of your code...)
>    3) Map the "existing" root as a writable.
>    4) Update TTBR0_EL2 to point to your new root
>    5) Add a mapping in the "old" root
>    6) Switch to the old root
> 
> So can you outline how you plan to prevent/mitigate it?

[...]

> > Yes, we are familiar with the "secret-free hypervisor" work. As you
> > point out, both our work and the secret-free hypervisor remove the
> > directmap region to mitigate the risk of leaking sensitive guest
> > secrets. However, our work is slightly different because it additionally
> > prevents attackers from tricking Xen into remapping a guest.
> 
> I understand your goal, but I don't think this is achieved (see above). You
> would need an entity to prevent write to TTBR0_EL2 in order to fully protect
> it.

Without a way to stop Xen from reading/writing TTBR0_EL2, we cannot
claim that the guest's secrets are 100% safe.

But the attacker would have to follow the sequence you outlines above to
change Xen's pagetables and remap guest memory before accessing it. It
is an additional obstacle for attackers that want to steal other guests'
secrets. The size of the code that the attacker would need to inject in
Xen would need to be bigger and more complex.

Every little helps :-)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-16  1:46       ` Stefano Stabellini
@ 2022-12-16  8:38         ` Julien Grall
  2022-12-20 22:17           ` Smith, Jackson
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-16  8:38 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Smith, Jackson, Brookes, Scott, Xen-devel, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

Hi Stefano,

On 16/12/2022 01:46, Stefano Stabellini wrote:
> On Thu, 15 Dec 2022, Julien Grall wrote:
>>>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>> Yes, we are familiar with the "secret-free hypervisor" work. As you
>>> point out, both our work and the secret-free hypervisor remove the
>>> directmap region to mitigate the risk of leaking sensitive guest
>>> secrets. However, our work is slightly different because it additionally
>>> prevents attackers from tricking Xen into remapping a guest.
>>
>> I understand your goal, but I don't think this is achieved (see above). You
>> would need an entity to prevent write to TTBR0_EL2 in order to fully protect
>> it.
> 
> Without a way to stop Xen from reading/writing TTBR0_EL2, we cannot
> claim that the guest's secrets are 100% safe.
> 
> But the attacker would have to follow the sequence you outlines above to
> change Xen's pagetables and remap guest memory before accessing it. It
> is an additional obstacle for attackers that want to steal other guests'
> secrets. The size of the code that the attacker would need to inject in
> Xen would need to be bigger and more complex.

Right, that's why I wrote with a bit more work. However, the nuance you 
mention doesn't seem to be present in the cover letter:

"This creates what we call "Software Enclaves", ensuring that an 
adversary with arbitrary code execution in the hypervisor STILL cannot 
read/write guest memory."

So if the end goal if really to protect against *all* sort of arbitrary 
code, then I think we should have a rough idea how this will look like 
in Xen.

 From a brief look, it doesn't look like it would be possible to prevent 
modification to TTBR0_EL2 (even from EL3). We would need to investigate 
if there are other bits in the architecture to help us.

> 
> Every little helps :-)

I can see how making the life of the attacker more difficult is 
appealing. Yet, the goal needs to be clarified and the risk with the 
approach acknowledged (see above).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-13 22:22   ` Demi Marie Obenour
  2022-12-13 23:05     ` Julien Grall
@ 2022-12-16 11:58     ` Julien Grall
  1 sibling, 0 replies; 33+ messages in thread
From: Julien Grall @ 2022-12-16 11:58 UTC (permalink / raw)
  To: Demi Marie Obenour, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, Stefano Stabellini, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi Demi,

On 13/12/2022 22:22, Demi Marie Obenour wrote:
> On Tue, Dec 13, 2022 at 08:55:28PM +0000, Julien Grall wrote:
>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>> Hi Xen Developers,
>>
>> Hi Jackson,
>>
>> Thanks for sharing the prototype with the community. Some questions/remarks
>> below.
> 
> [snip]
> 
>>> With this technique, we protect the integrity and confidentiality of
>>> guest memory. However, a compromised hypervisor can still read/write
>>> register state during traps, or refuse to schedule a guest, denying
>>> service. We also recognize that because this technique precludes
>>> modifying Xen's page tables after startup, it may not be compatible
>>> with all of Xen's potential use cases. On the other hand, there are
>>> some uses cases (in particular statically defined embedded systems)
>>> where our technique could be adopted with minimal friction.
>>
>>  From what you wrote, this sounds very much like the project Citrix and
>> Amazon worked on called "Secret-free hypervisor" with a twist. In your case,
>> you want to prevent the hypervisor to map/unmap the guest memory.
>>
>> You can find some details in [1]. The code is x86 only, but I don't see any
>> major blocker to port it on arm64.
> 
> Is there any way the secret-free hypervisor code could be upstreamed?

I have posted a new version with also a PoC for arm64:

https://lore.kernel.org/xen-devel/20221216114853.8227-1-julien@xen.org/T/#t

For convenience, I have also pushed a branch to my personal git:

https://xenbits.xen.org/gitweb/?p=people/julieng/xen-unstable.git;a=summary

branch no-directmap-v1

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-16  8:38         ` Julien Grall
@ 2022-12-20 22:17           ` Smith, Jackson
  2022-12-20 22:30             ` Demi Marie Obenour
  2022-12-22  0:38             ` Stefano Stabellini
  0 siblings, 2 replies; 33+ messages in thread
From: Smith, Jackson @ 2022-12-20 22:17 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Brookes, Scott, Xen-devel, bertrand.marquis, jbeulich,
	Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 2895 bytes --]

-----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Friday, December 16, 2022 3:39 AM
>
> Hi Stefano,
>
> On 16/12/2022 01:46, Stefano Stabellini wrote:
> > On Thu, 15 Dec 2022, Julien Grall wrote:
> >>>> On 13/12/2022 19:48, Smith, Jackson wrote:
> >>> Yes, we are familiar with the "secret-free hypervisor" work. As
you
> >>> point out, both our work and the secret-free hypervisor remove the
> >>> directmap region to mitigate the risk of leaking sensitive guest
> >>> secrets. However, our work is slightly different because it
> >>> additionally prevents attackers from tricking Xen into remapping a
> guest.
> >>
> >> I understand your goal, but I don't think this is achieved (see
> >> above). You would need an entity to prevent write to TTBR0_EL2 in
> >> order to fully protect it.
> >
> > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> cannot
> > claim that the guest's secrets are 100% safe.
> >
> > But the attacker would have to follow the sequence you outlines
> above
> > to change Xen's pagetables and remap guest memory before
> accessing it.
> > It is an additional obstacle for attackers that want to steal other
> guests'
> > secrets. The size of the code that the attacker would need to inject
> > in Xen would need to be bigger and more complex.
>
> Right, that's why I wrote with a bit more work. However, the nuance
> you mention doesn't seem to be present in the cover letter:
>
> "This creates what we call "Software Enclaves", ensuring that an
> adversary with arbitrary code execution in the hypervisor STILL cannot
> read/write guest memory."
>
> So if the end goal if really to protect against *all* sort of
arbitrary 
> code,
> then I think we should have a rough idea how this will look like in
Xen.
>
>  From a brief look, it doesn't look like it would be possible to
prevent
> modification to TTBR0_EL2 (even from EL3). We would need to
> investigate if there are other bits in the architecture to help us.
>
> >
> > Every little helps :-)
>
> I can see how making the life of the attacker more difficult is 
> appealing.
> Yet, the goal needs to be clarified and the risk with the approach
> acknowledged (see above).
>

You're right, we should have mentioned this weakness in our first email.
Sorry about the oversight! This is definitely still a limitation that we
have not yet overcome. However, we do think that the increase in
attacker workload that you and Stefano are discussing could still be
valuable to security conscious Xen users.

It would nice to find additional architecture features that we can use
to close this hole on arm, but there aren't any that stand out to me
either.

With this limitation in mind, what are the next steps we should take to
support this feature for the xen community? Is this increase in attacker
workload meaningful enough to justify the inclusion of VMF in Xen?

Thanks,
Jackson


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5317 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-20 22:17           ` Smith, Jackson
@ 2022-12-20 22:30             ` Demi Marie Obenour
  2022-12-22  0:53               ` Stefano Stabellini
  2022-12-22  0:38             ` Stefano Stabellini
  1 sibling, 1 reply; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-20 22:30 UTC (permalink / raw)
  To: Smith, Jackson, Julien Grall, Stefano Stabellini
  Cc: Brookes, Scott, Xen-devel, bertrand.marquis, jbeulich,
	Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 3929 bytes --]

On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote:
> -----Original Message-----
> > From: Julien Grall <julien@xen.org>
> > Sent: Friday, December 16, 2022 3:39 AM
> >
> > Hi Stefano,
> >
> > On 16/12/2022 01:46, Stefano Stabellini wrote:
> > > On Thu, 15 Dec 2022, Julien Grall wrote:
> > >>>> On 13/12/2022 19:48, Smith, Jackson wrote:
> > >>> Yes, we are familiar with the "secret-free hypervisor" work. As
> you
> > >>> point out, both our work and the secret-free hypervisor remove the
> > >>> directmap region to mitigate the risk of leaking sensitive guest
> > >>> secrets. However, our work is slightly different because it
> > >>> additionally prevents attackers from tricking Xen into remapping a
> > guest.
> > >>
> > >> I understand your goal, but I don't think this is achieved (see
> > >> above). You would need an entity to prevent write to TTBR0_EL2 in
> > >> order to fully protect it.
> > >
> > > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> > cannot
> > > claim that the guest's secrets are 100% safe.
> > >
> > > But the attacker would have to follow the sequence you outlines
> > above
> > > to change Xen's pagetables and remap guest memory before
> > accessing it.
> > > It is an additional obstacle for attackers that want to steal other
> > guests'
> > > secrets. The size of the code that the attacker would need to inject
> > > in Xen would need to be bigger and more complex.
> >
> > Right, that's why I wrote with a bit more work. However, the nuance
> > you mention doesn't seem to be present in the cover letter:
> >
> > "This creates what we call "Software Enclaves", ensuring that an
> > adversary with arbitrary code execution in the hypervisor STILL cannot
> > read/write guest memory."
> >
> > So if the end goal if really to protect against *all* sort of
> arbitrary 
> > code,
> > then I think we should have a rough idea how this will look like in
> Xen.
> >
> >  From a brief look, it doesn't look like it would be possible to
> prevent
> > modification to TTBR0_EL2 (even from EL3). We would need to
> > investigate if there are other bits in the architecture to help us.
> >
> > >
> > > Every little helps :-)
> >
> > I can see how making the life of the attacker more difficult is 
> > appealing.
> > Yet, the goal needs to be clarified and the risk with the approach
> > acknowledged (see above).
> >
> 
> You're right, we should have mentioned this weakness in our first email.
> Sorry about the oversight! This is definitely still a limitation that we
> have not yet overcome. However, we do think that the increase in
> attacker workload that you and Stefano are discussing could still be
> valuable to security conscious Xen users.
> 
> It would nice to find additional architecture features that we can use
> to close this hole on arm, but there aren't any that stand out to me
> either.
> 
> With this limitation in mind, what are the next steps we should take to
> support this feature for the xen community? Is this increase in attacker
> workload meaningful enough to justify the inclusion of VMF in Xen?

Personally, I don’t think so.  The kinds of workloads VMF is usable
for (no hypercalls) are likely easily portable to other hypervisors,
including formally verified microkernels such as seL4 that provide a
significantly higher level of assurance.  seL4’s proofs do need to be
ported to each particular board, but this is fairly simple.  Conversely,
workloads that need Xen’s features cannot use VMF, so VMF again is not
suitable.

Have you considered other approaches to improving security, such as
fuzzing Xen’s hypercall interface or even using formal methods?  Those
would benefit all users of Xen, not merely a small subset who already
have alternatives available.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-20 22:17           ` Smith, Jackson
  2022-12-20 22:30             ` Demi Marie Obenour
@ 2022-12-22  0:38             ` Stefano Stabellini
  2022-12-22  9:52               ` Julien Grall
  1 sibling, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2022-12-22  0:38 UTC (permalink / raw)
  To: Smith, Jackson
  Cc: Julien Grall, Stefano Stabellini, Brookes, Scott, Xen-devel,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

On Tue, 20 Dec 2022, Smith, Jackson wrote:
> > Hi Stefano,
> >
> > On 16/12/2022 01:46, Stefano Stabellini wrote:
> > > On Thu, 15 Dec 2022, Julien Grall wrote:
> > >>>> On 13/12/2022 19:48, Smith, Jackson wrote:
> > >>> Yes, we are familiar with the "secret-free hypervisor" work. As
> you
> > >>> point out, both our work and the secret-free hypervisor remove the
> > >>> directmap region to mitigate the risk of leaking sensitive guest
> > >>> secrets. However, our work is slightly different because it
> > >>> additionally prevents attackers from tricking Xen into remapping a
> > guest.
> > >>
> > >> I understand your goal, but I don't think this is achieved (see
> > >> above). You would need an entity to prevent write to TTBR0_EL2 in
> > >> order to fully protect it.
> > >
> > > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> > cannot
> > > claim that the guest's secrets are 100% safe.
> > >
> > > But the attacker would have to follow the sequence you outlines
> > above
> > > to change Xen's pagetables and remap guest memory before
> > accessing it.
> > > It is an additional obstacle for attackers that want to steal other
> > guests'
> > > secrets. The size of the code that the attacker would need to inject
> > > in Xen would need to be bigger and more complex.
> >
> > Right, that's why I wrote with a bit more work. However, the nuance
> > you mention doesn't seem to be present in the cover letter:
> >
> > "This creates what we call "Software Enclaves", ensuring that an
> > adversary with arbitrary code execution in the hypervisor STILL cannot
> > read/write guest memory."
> >
> > So if the end goal if really to protect against *all* sort of
> arbitrary 
> > code,
> > then I think we should have a rough idea how this will look like in
> Xen.
> >
> >  From a brief look, it doesn't look like it would be possible to
> prevent
> > modification to TTBR0_EL2 (even from EL3). We would need to
> > investigate if there are other bits in the architecture to help us.
> >
> > >
> > > Every little helps :-)
> >
> > I can see how making the life of the attacker more difficult is 
> > appealing.
> > Yet, the goal needs to be clarified and the risk with the approach
> > acknowledged (see above).
> >
> 
> You're right, we should have mentioned this weakness in our first email.
> Sorry about the oversight! This is definitely still a limitation that we
> have not yet overcome. However, we do think that the increase in
> attacker workload that you and Stefano are discussing could still be
> valuable to security conscious Xen users.
> 
> It would nice to find additional architecture features that we can use
> to close this hole on arm, but there aren't any that stand out to me
> either.
> 
> With this limitation in mind, what are the next steps we should take to
> support this feature for the xen community? Is this increase in attacker
> workload meaningful enough to justify the inclusion of VMF in Xen?

I think it could be valuable as an additional obstacle for the attacker
to overcome. The next step would be to port your series on top of
Julien's "Remove the directmap" patch series
https://marc.info/?l=xen-devel&m=167119090721116

Julien, what do you think?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-20 22:30             ` Demi Marie Obenour
@ 2022-12-22  0:53               ` Stefano Stabellini
  2022-12-22  4:33                 ` Demi Marie Obenour
  2022-12-22  9:31                 ` Julien Grall
  0 siblings, 2 replies; 33+ messages in thread
From: Stefano Stabellini @ 2022-12-22  0:53 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Smith, Jackson, Julien Grall, Stefano Stabellini, Brookes, Scott,
	Xen-devel, bertrand.marquis, jbeulich, Andrew Cooper,
	Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 4702 bytes --]

On Tue, 20 Dec 2022, Demi Marie Obenour wrote:
> On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote:
> > > Hi Stefano,
> > >
> > > On 16/12/2022 01:46, Stefano Stabellini wrote:
> > > > On Thu, 15 Dec 2022, Julien Grall wrote:
> > > >>>> On 13/12/2022 19:48, Smith, Jackson wrote:
> > > >>> Yes, we are familiar with the "secret-free hypervisor" work. As
> > you
> > > >>> point out, both our work and the secret-free hypervisor remove the
> > > >>> directmap region to mitigate the risk of leaking sensitive guest
> > > >>> secrets. However, our work is slightly different because it
> > > >>> additionally prevents attackers from tricking Xen into remapping a
> > > guest.
> > > >>
> > > >> I understand your goal, but I don't think this is achieved (see
> > > >> above). You would need an entity to prevent write to TTBR0_EL2 in
> > > >> order to fully protect it.
> > > >
> > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> > > cannot
> > > > claim that the guest's secrets are 100% safe.
> > > >
> > > > But the attacker would have to follow the sequence you outlines
> > > above
> > > > to change Xen's pagetables and remap guest memory before
> > > accessing it.
> > > > It is an additional obstacle for attackers that want to steal other
> > > guests'
> > > > secrets. The size of the code that the attacker would need to inject
> > > > in Xen would need to be bigger and more complex.
> > >
> > > Right, that's why I wrote with a bit more work. However, the nuance
> > > you mention doesn't seem to be present in the cover letter:
> > >
> > > "This creates what we call "Software Enclaves", ensuring that an
> > > adversary with arbitrary code execution in the hypervisor STILL cannot
> > > read/write guest memory."
> > >
> > > So if the end goal if really to protect against *all* sort of
> > arbitrary 
> > > code,
> > > then I think we should have a rough idea how this will look like in
> > Xen.
> > >
> > >  From a brief look, it doesn't look like it would be possible to
> > prevent
> > > modification to TTBR0_EL2 (even from EL3). We would need to
> > > investigate if there are other bits in the architecture to help us.
> > >
> > > >
> > > > Every little helps :-)
> > >
> > > I can see how making the life of the attacker more difficult is 
> > > appealing.
> > > Yet, the goal needs to be clarified and the risk with the approach
> > > acknowledged (see above).
> > >
> > 
> > You're right, we should have mentioned this weakness in our first email.
> > Sorry about the oversight! This is definitely still a limitation that we
> > have not yet overcome. However, we do think that the increase in
> > attacker workload that you and Stefano are discussing could still be
> > valuable to security conscious Xen users.
> > 
> > It would nice to find additional architecture features that we can use
> > to close this hole on arm, but there aren't any that stand out to me
> > either.
> > 
> > With this limitation in mind, what are the next steps we should take to
> > support this feature for the xen community? Is this increase in attacker
> > workload meaningful enough to justify the inclusion of VMF in Xen?
> 
> Personally, I don’t think so.  The kinds of workloads VMF is usable
> for (no hypercalls) are likely easily portable to other hypervisors,
> including formally verified microkernels such as seL4 that provide... 

What other hypervisors might or might not do should not be a factor in
this discussion and it would be best to leave it aside.

From an AMD/Xilinx point of view, most of our customers using Xen in
productions today don't use any hypercalls in one or more of their VMs.
Xen is great for these use-cases and it is rather common in embedded.
It is certainly a different configuration from what most are come to
expect from Xen on the server/desktop x86 side. There is no question
that guests without hypercalls are important for Xen on ARM.

As a Xen community we have a long history and strong interest in making
Xen more secure and also, more recently, safer (in the ISO 26262
safety-certification sense). The VMF work is very well aligned with both
of these efforts and any additional burder to attackers is certainly
good for Xen.

Now the question is what changes are necessary and how to make them to
the codebase. And if it turns out that some of the changes are not
applicable or too complex to accept, the decision will be made purely
from a code maintenance point of view and will have nothing to do with
VMs making no hypercalls being unimportant (i.e. if we don't accept one
or more patches is not going to have anything to do with the use-case
being unimportant or what other hypervisors might or might not do).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22  0:53               ` Stefano Stabellini
@ 2022-12-22  4:33                 ` Demi Marie Obenour
  2022-12-22  9:31                 ` Julien Grall
  1 sibling, 0 replies; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-22  4:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Smith, Jackson, Julien Grall, Brookes, Scott, Xen-devel,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 5215 bytes --]

On Wed, Dec 21, 2022 at 04:53:46PM -0800, Stefano Stabellini wrote:
> On Tue, 20 Dec 2022, Demi Marie Obenour wrote:
> > On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote:
> > > > Hi Stefano,
> > > >
> > > > On 16/12/2022 01:46, Stefano Stabellini wrote:
> > > > > On Thu, 15 Dec 2022, Julien Grall wrote:
> > > > >>>> On 13/12/2022 19:48, Smith, Jackson wrote:
> > > > >>> Yes, we are familiar with the "secret-free hypervisor" work. As
> > > you
> > > > >>> point out, both our work and the secret-free hypervisor remove the
> > > > >>> directmap region to mitigate the risk of leaking sensitive guest
> > > > >>> secrets. However, our work is slightly different because it
> > > > >>> additionally prevents attackers from tricking Xen into remapping a
> > > > guest.
> > > > >>
> > > > >> I understand your goal, but I don't think this is achieved (see
> > > > >> above). You would need an entity to prevent write to TTBR0_EL2 in
> > > > >> order to fully protect it.
> > > > >
> > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> > > > cannot
> > > > > claim that the guest's secrets are 100% safe.
> > > > >
> > > > > But the attacker would have to follow the sequence you outlines
> > > > above
> > > > > to change Xen's pagetables and remap guest memory before
> > > > accessing it.
> > > > > It is an additional obstacle for attackers that want to steal other
> > > > guests'
> > > > > secrets. The size of the code that the attacker would need to inject
> > > > > in Xen would need to be bigger and more complex.
> > > >
> > > > Right, that's why I wrote with a bit more work. However, the nuance
> > > > you mention doesn't seem to be present in the cover letter:
> > > >
> > > > "This creates what we call "Software Enclaves", ensuring that an
> > > > adversary with arbitrary code execution in the hypervisor STILL cannot
> > > > read/write guest memory."
> > > >
> > > > So if the end goal if really to protect against *all* sort of
> > > arbitrary 
> > > > code,
> > > > then I think we should have a rough idea how this will look like in
> > > Xen.
> > > >
> > > >  From a brief look, it doesn't look like it would be possible to
> > > prevent
> > > > modification to TTBR0_EL2 (even from EL3). We would need to
> > > > investigate if there are other bits in the architecture to help us.
> > > >
> > > > >
> > > > > Every little helps :-)
> > > >
> > > > I can see how making the life of the attacker more difficult is 
> > > > appealing.
> > > > Yet, the goal needs to be clarified and the risk with the approach
> > > > acknowledged (see above).
> > > >
> > > 
> > > You're right, we should have mentioned this weakness in our first email.
> > > Sorry about the oversight! This is definitely still a limitation that we
> > > have not yet overcome. However, we do think that the increase in
> > > attacker workload that you and Stefano are discussing could still be
> > > valuable to security conscious Xen users.
> > > 
> > > It would nice to find additional architecture features that we can use
> > > to close this hole on arm, but there aren't any that stand out to me
> > > either.
> > > 
> > > With this limitation in mind, what are the next steps we should take to
> > > support this feature for the xen community? Is this increase in attacker
> > > workload meaningful enough to justify the inclusion of VMF in Xen?
> > 
> > Personally, I don’t think so.  The kinds of workloads VMF is usable
> > for (no hypercalls) are likely easily portable to other hypervisors,
> > including formally verified microkernels such as seL4 that provide... 
> 
> What other hypervisors might or might not do should not be a factor in
> this discussion and it would be best to leave it aside.

Indeed so, sorry.

> From an AMD/Xilinx point of view, most of our customers using Xen in
> productions today don't use any hypercalls in one or more of their VMs.
> Xen is great for these use-cases and it is rather common in embedded.
> It is certainly a different configuration from what most are come to
> expect from Xen on the server/desktop x86 side. There is no question
> that guests without hypercalls are important for Xen on ARM.

I was completely unaware of this.

> As a Xen community we have a long history and strong interest in making
> Xen more secure and also, more recently, safer (in the ISO 26262
> safety-certification sense). The VMF work is very well aligned with both
> of these efforts and any additional burder to attackers is certainly
> good for Xen.

That it is.

> Now the question is what changes are necessary and how to make them to
> the codebase. And if it turns out that some of the changes are not
> applicable or too complex to accept, the decision will be made purely
> from a code maintenance point of view and will have nothing to do with
> VMs making no hypercalls being unimportant (i.e. if we don't accept one
> or more patches is not going to have anything to do with the use-case
> being unimportant or what other hypervisors might or might not do).

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22  0:53               ` Stefano Stabellini
  2022-12-22  4:33                 ` Demi Marie Obenour
@ 2022-12-22  9:31                 ` Julien Grall
  2022-12-22 21:28                   ` Stefano Stabellini
  1 sibling, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-22  9:31 UTC (permalink / raw)
  To: Stefano Stabellini, Demi Marie Obenour
  Cc: Smith, Jackson, Brookes, Scott, Xen-devel, bertrand.marquis,
	jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi Stefano,

On 22/12/2022 00:53, Stefano Stabellini wrote:
> On Tue, 20 Dec 2022, Demi Marie Obenour wrote:
>> On Tue, Dec 20, 2022 at 10:17:24PM +0000, Smith, Jackson wrote:
>>>> Hi Stefano,
>>>>
>>>> On 16/12/2022 01:46, Stefano Stabellini wrote:
>>>>> On Thu, 15 Dec 2022, Julien Grall wrote:
>>>>>>>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>>>>>> Yes, we are familiar with the "secret-free hypervisor" work. As
>>> you
>>>>>>> point out, both our work and the secret-free hypervisor remove the
>>>>>>> directmap region to mitigate the risk of leaking sensitive guest
>>>>>>> secrets. However, our work is slightly different because it
>>>>>>> additionally prevents attackers from tricking Xen into remapping a
>>>> guest.
>>>>>>
>>>>>> I understand your goal, but I don't think this is achieved (see
>>>>>> above). You would need an entity to prevent write to TTBR0_EL2 in
>>>>>> order to fully protect it.
>>>>>
>>>>> Without a way to stop Xen from reading/writing TTBR0_EL2, we
>>>> cannot
>>>>> claim that the guest's secrets are 100% safe.
>>>>>
>>>>> But the attacker would have to follow the sequence you outlines
>>>> above
>>>>> to change Xen's pagetables and remap guest memory before
>>>> accessing it.
>>>>> It is an additional obstacle for attackers that want to steal other
>>>> guests'
>>>>> secrets. The size of the code that the attacker would need to inject
>>>>> in Xen would need to be bigger and more complex.
>>>>
>>>> Right, that's why I wrote with a bit more work. However, the nuance
>>>> you mention doesn't seem to be present in the cover letter:
>>>>
>>>> "This creates what we call "Software Enclaves", ensuring that an
>>>> adversary with arbitrary code execution in the hypervisor STILL cannot
>>>> read/write guest memory."
>>>>
>>>> So if the end goal if really to protect against *all* sort of
>>> arbitrary
>>>> code,
>>>> then I think we should have a rough idea how this will look like in
>>> Xen.
>>>>
>>>>   From a brief look, it doesn't look like it would be possible to
>>> prevent
>>>> modification to TTBR0_EL2 (even from EL3). We would need to
>>>> investigate if there are other bits in the architecture to help us.
>>>>
>>>>>
>>>>> Every little helps :-)
>>>>
>>>> I can see how making the life of the attacker more difficult is
>>>> appealing.
>>>> Yet, the goal needs to be clarified and the risk with the approach
>>>> acknowledged (see above).
>>>>
>>>
>>> You're right, we should have mentioned this weakness in our first email.
>>> Sorry about the oversight! This is definitely still a limitation that we
>>> have not yet overcome. However, we do think that the increase in
>>> attacker workload that you and Stefano are discussing could still be
>>> valuable to security conscious Xen users.
>>>
>>> It would nice to find additional architecture features that we can use
>>> to close this hole on arm, but there aren't any that stand out to me
>>> either.
>>>
>>> With this limitation in mind, what are the next steps we should take to
>>> support this feature for the xen community? Is this increase in attacker
>>> workload meaningful enough to justify the inclusion of VMF in Xen?
>>
>> Personally, I don’t think so.  The kinds of workloads VMF is usable
>> for (no hypercalls) are likely easily portable to other hypervisors,
>> including formally verified microkernels such as seL4 that provide...
> 
> What other hypervisors might or might not do should not be a factor in
> this discussion and it would be best to leave it aside.

To be honest, Demi has a point. At the moment, VMF is a very niche 
use-case (see more below). So you would end up to use less than 10% of 
the normal Xen on Arm code. A lot of people will likely wonder why using 
Xen in this case?

> 
>  From an AMD/Xilinx point of view, most of our customers using Xen in
> productions today don't use any hypercalls in one or more of their VMs.
This suggests a mix of guests are running (some using hypercalls and 
other not). It would not be possible if you were using VMF.

> Xen is great for these use-cases and it is rather common in embedded.
> It is certainly a different configuration from what most are come to
> expect from Xen on the server/desktop x86 side. There is no question
> that guests without hypercalls are important for Xen on ARM. >
> As a Xen community we have a long history and strong interest in making
> Xen more secure and also, more recently, safer (in the ISO 26262
> safety-certification sense). The VMF work is very well aligned with both
> of these efforts and any additional burder to attackers is certainly
> good for Xen.

I agree that we have a strong focus on making Xen more secure. However, 
we also need to look at the use cases for it. As it stands, there will no:
   - IOREQ use (don't think about emulating TPM)
   - GICv3 ITS
   - stage-1 SMMUv3
   - decoding of instructions when there is no syndrome
   - hypercalls (including event channels)
   - dom0

That's a lot of Xen features that can't be used. Effectively you will 
make Xen more "secure" for a very few users.

> 
> Now the question is what changes are necessary and how to make them to
> the codebase. And if it turns out that some of the changes are not
> applicable or too complex to accept, the decision will be made purely
> from a code maintenance point of view and will have nothing to do with
> VMs making no hypercalls being unimportant (i.e. if we don't accept one
> or more patches is not going to have anything to do with the use-case
> being unimportant or what other hypervisors might or might not do).
I disagree, I think this is also about use cases. On the paper VMF look 
very great, but so far it still has a big flaw (the TTBR can be changed) 
and it would restrict a lot what you can do.

To me, if you can't secure the TTBR, then there are other way to improve 
the security of Xen for the same setup and more.

The biggest attack surface of Xen on Arm today are the hypercalls. So if 
you remove hypercalls access to the guest (or even compile out), then 
there is a lot less chance for an attacker to compromise Xen.

This is not exactly the same guarantee as VMF. But as I wrote before, if 
the attacker has access to Xen, then you are already doomed because you 
have to assume they can switch the TTBR.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22  0:38             ` Stefano Stabellini
@ 2022-12-22  9:52               ` Julien Grall
  2022-12-22 10:14                 ` Demi Marie Obenour
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-22  9:52 UTC (permalink / raw)
  To: Stefano Stabellini, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, bertrand.marquis, jbeulich,
	Andrew Cooper, Roger Pau Monné,
	George Dunlap, demi, Daniel P. Smith, christopher.w.clark

Hi Stefano,

On 22/12/2022 00:38, Stefano Stabellini wrote:
> On Tue, 20 Dec 2022, Smith, Jackson wrote:
>>> Hi Stefano,
>>>
>>> On 16/12/2022 01:46, Stefano Stabellini wrote:
>>>> On Thu, 15 Dec 2022, Julien Grall wrote:
>>>>>>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>>>>> Yes, we are familiar with the "secret-free hypervisor" work. As
>> you
>>>>>> point out, both our work and the secret-free hypervisor remove the
>>>>>> directmap region to mitigate the risk of leaking sensitive guest
>>>>>> secrets. However, our work is slightly different because it
>>>>>> additionally prevents attackers from tricking Xen into remapping a
>>> guest.
>>>>>
>>>>> I understand your goal, but I don't think this is achieved (see
>>>>> above). You would need an entity to prevent write to TTBR0_EL2 in
>>>>> order to fully protect it.
>>>>
>>>> Without a way to stop Xen from reading/writing TTBR0_EL2, we
>>> cannot
>>>> claim that the guest's secrets are 100% safe.
>>>>
>>>> But the attacker would have to follow the sequence you outlines
>>> above
>>>> to change Xen's pagetables and remap guest memory before
>>> accessing it.
>>>> It is an additional obstacle for attackers that want to steal other
>>> guests'
>>>> secrets. The size of the code that the attacker would need to inject
>>>> in Xen would need to be bigger and more complex.
>>>
>>> Right, that's why I wrote with a bit more work. However, the nuance
>>> you mention doesn't seem to be present in the cover letter:
>>>
>>> "This creates what we call "Software Enclaves", ensuring that an
>>> adversary with arbitrary code execution in the hypervisor STILL cannot
>>> read/write guest memory."
>>>
>>> So if the end goal if really to protect against *all* sort of
>> arbitrary
>>> code,
>>> then I think we should have a rough idea how this will look like in
>> Xen.
>>>
>>>   From a brief look, it doesn't look like it would be possible to
>> prevent
>>> modification to TTBR0_EL2 (even from EL3). We would need to
>>> investigate if there are other bits in the architecture to help us.
>>>
>>>>
>>>> Every little helps :-)
>>>
>>> I can see how making the life of the attacker more difficult is
>>> appealing.
>>> Yet, the goal needs to be clarified and the risk with the approach
>>> acknowledged (see above).
>>>
>>
>> You're right, we should have mentioned this weakness in our first email.
>> Sorry about the oversight! This is definitely still a limitation that we
>> have not yet overcome. However, we do think that the increase in
>> attacker workload that you and Stefano are discussing could still be
>> valuable to security conscious Xen users.
>>
>> It would nice to find additional architecture features that we can use
>> to close this hole on arm, but there aren't any that stand out to me
>> either.
>>
>> With this limitation in mind, what are the next steps we should take to
>> support this feature for the xen community? Is this increase in attacker
>> workload meaningful enough to justify the inclusion of VMF in Xen?
> 
> I think it could be valuable as an additional obstacle for the attacker
> to overcome. The next step would be to port your series on top of
> Julien's "Remove the directmap" patch series
> https://marc.info/?l=xen-devel&m=167119090721116
> 
> Julien, what do you think?

If we want Xen to be used in confidential compute, then we need a 
compelling story and prove that we are at least as secure as other 
hypervisors.

So I think we need to investigate a few areas:
    * Can we protect the TTBR? I don't think this can be done with the 
HW. But maybe I overlook it.
    * Can VMF be extended to more use-cases? For instances, for 
hypercalls, we could have bounce buffer.
    * If we can't fully secure VMF, can the attack surface be reduced 
(e.g. disable hypercalls at runtime/compile time)? Could we use a 
different architecture (I am thinking something like pKVM [1])?

Cheers,

[1] https://lwn.net/Articles/836693/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22  9:52               ` Julien Grall
@ 2022-12-22 10:14                 ` Demi Marie Obenour
  2022-12-22 10:21                   ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 10:14 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, bertrand.marquis, jbeulich,
	Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 4784 bytes --]

On Thu, Dec 22, 2022 at 09:52:11AM +0000, Julien Grall wrote:
> Hi Stefano,
> 
> On 22/12/2022 00:38, Stefano Stabellini wrote:
> > On Tue, 20 Dec 2022, Smith, Jackson wrote:
> > > > Hi Stefano,
> > > > 
> > > > On 16/12/2022 01:46, Stefano Stabellini wrote:
> > > > > On Thu, 15 Dec 2022, Julien Grall wrote:
> > > > > > > > On 13/12/2022 19:48, Smith, Jackson wrote:
> > > > > > > Yes, we are familiar with the "secret-free hypervisor" work. As
> > > you
> > > > > > > point out, both our work and the secret-free hypervisor remove the
> > > > > > > directmap region to mitigate the risk of leaking sensitive guest
> > > > > > > secrets. However, our work is slightly different because it
> > > > > > > additionally prevents attackers from tricking Xen into remapping a
> > > > guest.
> > > > > > 
> > > > > > I understand your goal, but I don't think this is achieved (see
> > > > > > above). You would need an entity to prevent write to TTBR0_EL2 in
> > > > > > order to fully protect it.
> > > > > 
> > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> > > > cannot
> > > > > claim that the guest's secrets are 100% safe.
> > > > > 
> > > > > But the attacker would have to follow the sequence you outlines
> > > > above
> > > > > to change Xen's pagetables and remap guest memory before
> > > > accessing it.
> > > > > It is an additional obstacle for attackers that want to steal other
> > > > guests'
> > > > > secrets. The size of the code that the attacker would need to inject
> > > > > in Xen would need to be bigger and more complex.
> > > > 
> > > > Right, that's why I wrote with a bit more work. However, the nuance
> > > > you mention doesn't seem to be present in the cover letter:
> > > > 
> > > > "This creates what we call "Software Enclaves", ensuring that an
> > > > adversary with arbitrary code execution in the hypervisor STILL cannot
> > > > read/write guest memory."
> > > > 
> > > > So if the end goal if really to protect against *all* sort of
> > > arbitrary
> > > > code,
> > > > then I think we should have a rough idea how this will look like in
> > > Xen.
> > > > 
> > > >   From a brief look, it doesn't look like it would be possible to
> > > prevent
> > > > modification to TTBR0_EL2 (even from EL3). We would need to
> > > > investigate if there are other bits in the architecture to help us.
> > > > 
> > > > > 
> > > > > Every little helps :-)
> > > > 
> > > > I can see how making the life of the attacker more difficult is
> > > > appealing.
> > > > Yet, the goal needs to be clarified and the risk with the approach
> > > > acknowledged (see above).
> > > > 
> > > 
> > > You're right, we should have mentioned this weakness in our first email.
> > > Sorry about the oversight! This is definitely still a limitation that we
> > > have not yet overcome. However, we do think that the increase in
> > > attacker workload that you and Stefano are discussing could still be
> > > valuable to security conscious Xen users.
> > > 
> > > It would nice to find additional architecture features that we can use
> > > to close this hole on arm, but there aren't any that stand out to me
> > > either.
> > > 
> > > With this limitation in mind, what are the next steps we should take to
> > > support this feature for the xen community? Is this increase in attacker
> > > workload meaningful enough to justify the inclusion of VMF in Xen?
> > 
> > I think it could be valuable as an additional obstacle for the attacker
> > to overcome. The next step would be to port your series on top of
> > Julien's "Remove the directmap" patch series
> > https://marc.info/?l=xen-devel&m=167119090721116
> > 
> > Julien, what do you think?
> 
> If we want Xen to be used in confidential compute, then we need a compelling
> story and prove that we are at least as secure as other hypervisors.
> 
> So I think we need to investigate a few areas:
>    * Can we protect the TTBR? I don't think this can be done with the HW.
> But maybe I overlook it.

This can be done by running most of Xen at a lower EL, and having only a
small trusted (and hopefully formally verified) kernel run at EL2.

>    * Can VMF be extended to more use-cases? For instances, for hypercalls,
> we could have bounce buffer.
>    * If we can't fully secure VMF, can the attack surface be reduced (e.g.
> disable hypercalls at runtime/compile time)? Could we use a different
> architecture (I am thinking something like pKVM [1])?
> 
> Cheers,
> 
> [1] https://lwn.net/Articles/836693/

pKVM has been formally verified already, in the form of seKVM.  So there
very much is precident for this.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22 10:14                 ` Demi Marie Obenour
@ 2022-12-22 10:21                   ` Julien Grall
  2022-12-22 10:28                     ` Demi Marie Obenour
  0 siblings, 1 reply; 33+ messages in thread
From: Julien Grall @ 2022-12-22 10:21 UTC (permalink / raw)
  To: Demi Marie Obenour, Stefano Stabellini, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, bertrand.marquis, jbeulich,
	Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark



On 22/12/2022 10:14, Demi Marie Obenour wrote:
> On Thu, Dec 22, 2022 at 09:52:11AM +0000, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 22/12/2022 00:38, Stefano Stabellini wrote:
>>> On Tue, 20 Dec 2022, Smith, Jackson wrote:
>>>>> Hi Stefano,
>>>>>
>>>>> On 16/12/2022 01:46, Stefano Stabellini wrote:
>>>>>> On Thu, 15 Dec 2022, Julien Grall wrote:
>>>>>>>>> On 13/12/2022 19:48, Smith, Jackson wrote:
>>>>>>>> Yes, we are familiar with the "secret-free hypervisor" work. As
>>>> you
>>>>>>>> point out, both our work and the secret-free hypervisor remove the
>>>>>>>> directmap region to mitigate the risk of leaking sensitive guest
>>>>>>>> secrets. However, our work is slightly different because it
>>>>>>>> additionally prevents attackers from tricking Xen into remapping a
>>>>> guest.
>>>>>>>
>>>>>>> I understand your goal, but I don't think this is achieved (see
>>>>>>> above). You would need an entity to prevent write to TTBR0_EL2 in
>>>>>>> order to fully protect it.
>>>>>>
>>>>>> Without a way to stop Xen from reading/writing TTBR0_EL2, we
>>>>> cannot
>>>>>> claim that the guest's secrets are 100% safe.
>>>>>>
>>>>>> But the attacker would have to follow the sequence you outlines
>>>>> above
>>>>>> to change Xen's pagetables and remap guest memory before
>>>>> accessing it.
>>>>>> It is an additional obstacle for attackers that want to steal other
>>>>> guests'
>>>>>> secrets. The size of the code that the attacker would need to inject
>>>>>> in Xen would need to be bigger and more complex.
>>>>>
>>>>> Right, that's why I wrote with a bit more work. However, the nuance
>>>>> you mention doesn't seem to be present in the cover letter:
>>>>>
>>>>> "This creates what we call "Software Enclaves", ensuring that an
>>>>> adversary with arbitrary code execution in the hypervisor STILL cannot
>>>>> read/write guest memory."
>>>>>
>>>>> So if the end goal if really to protect against *all* sort of
>>>> arbitrary
>>>>> code,
>>>>> then I think we should have a rough idea how this will look like in
>>>> Xen.
>>>>>
>>>>>    From a brief look, it doesn't look like it would be possible to
>>>> prevent
>>>>> modification to TTBR0_EL2 (even from EL3). We would need to
>>>>> investigate if there are other bits in the architecture to help us.
>>>>>
>>>>>>
>>>>>> Every little helps :-)
>>>>>
>>>>> I can see how making the life of the attacker more difficult is
>>>>> appealing.
>>>>> Yet, the goal needs to be clarified and the risk with the approach
>>>>> acknowledged (see above).
>>>>>
>>>>
>>>> You're right, we should have mentioned this weakness in our first email.
>>>> Sorry about the oversight! This is definitely still a limitation that we
>>>> have not yet overcome. However, we do think that the increase in
>>>> attacker workload that you and Stefano are discussing could still be
>>>> valuable to security conscious Xen users.
>>>>
>>>> It would nice to find additional architecture features that we can use
>>>> to close this hole on arm, but there aren't any that stand out to me
>>>> either.
>>>>
>>>> With this limitation in mind, what are the next steps we should take to
>>>> support this feature for the xen community? Is this increase in attacker
>>>> workload meaningful enough to justify the inclusion of VMF in Xen?
>>>
>>> I think it could be valuable as an additional obstacle for the attacker
>>> to overcome. The next step would be to port your series on top of
>>> Julien's "Remove the directmap" patch series
>>> https://marc.info/?l=xen-devel&m=167119090721116
>>>
>>> Julien, what do you think?
>>
>> If we want Xen to be used in confidential compute, then we need a compelling
>> story and prove that we are at least as secure as other hypervisors.
>>
>> So I think we need to investigate a few areas:
>>     * Can we protect the TTBR? I don't think this can be done with the HW.
>> But maybe I overlook it.
> 
> This can be done by running most of Xen at a lower EL, and having only a
> small trusted (and hopefully formally verified) kernel run at EL2.

This is what I hinted in my 3rd bullet. :) I didn't consider this for 
the first bullet because the goal of this question is to figure out 
whether we can leave all Xen running in EL2 and still have the same 
guarantee.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22 10:21                   ` Julien Grall
@ 2022-12-22 10:28                     ` Demi Marie Obenour
  0 siblings, 0 replies; 33+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 10:28 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, Smith, Jackson
  Cc: Brookes, Scott, Xen-devel, bertrand.marquis, jbeulich,
	Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

[-- Attachment #1: Type: text/plain, Size: 5284 bytes --]

On Thu, Dec 22, 2022 at 10:21:57AM +0000, Julien Grall wrote:
> 
> 
> On 22/12/2022 10:14, Demi Marie Obenour wrote:
> > On Thu, Dec 22, 2022 at 09:52:11AM +0000, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 22/12/2022 00:38, Stefano Stabellini wrote:
> > > > On Tue, 20 Dec 2022, Smith, Jackson wrote:
> > > > > > Hi Stefano,
> > > > > > 
> > > > > > On 16/12/2022 01:46, Stefano Stabellini wrote:
> > > > > > > On Thu, 15 Dec 2022, Julien Grall wrote:
> > > > > > > > > > On 13/12/2022 19:48, Smith, Jackson wrote:
> > > > > > > > > Yes, we are familiar with the "secret-free hypervisor" work. As
> > > > > you
> > > > > > > > > point out, both our work and the secret-free hypervisor remove the
> > > > > > > > > directmap region to mitigate the risk of leaking sensitive guest
> > > > > > > > > secrets. However, our work is slightly different because it
> > > > > > > > > additionally prevents attackers from tricking Xen into remapping a
> > > > > > guest.
> > > > > > > > 
> > > > > > > > I understand your goal, but I don't think this is achieved (see
> > > > > > > > above). You would need an entity to prevent write to TTBR0_EL2 in
> > > > > > > > order to fully protect it.
> > > > > > > 
> > > > > > > Without a way to stop Xen from reading/writing TTBR0_EL2, we
> > > > > > cannot
> > > > > > > claim that the guest's secrets are 100% safe.
> > > > > > > 
> > > > > > > But the attacker would have to follow the sequence you outlines
> > > > > > above
> > > > > > > to change Xen's pagetables and remap guest memory before
> > > > > > accessing it.
> > > > > > > It is an additional obstacle for attackers that want to steal other
> > > > > > guests'
> > > > > > > secrets. The size of the code that the attacker would need to inject
> > > > > > > in Xen would need to be bigger and more complex.
> > > > > > 
> > > > > > Right, that's why I wrote with a bit more work. However, the nuance
> > > > > > you mention doesn't seem to be present in the cover letter:
> > > > > > 
> > > > > > "This creates what we call "Software Enclaves", ensuring that an
> > > > > > adversary with arbitrary code execution in the hypervisor STILL cannot
> > > > > > read/write guest memory."
> > > > > > 
> > > > > > So if the end goal if really to protect against *all* sort of
> > > > > arbitrary
> > > > > > code,
> > > > > > then I think we should have a rough idea how this will look like in
> > > > > Xen.
> > > > > > 
> > > > > >    From a brief look, it doesn't look like it would be possible to
> > > > > prevent
> > > > > > modification to TTBR0_EL2 (even from EL3). We would need to
> > > > > > investigate if there are other bits in the architecture to help us.
> > > > > > 
> > > > > > > 
> > > > > > > Every little helps :-)
> > > > > > 
> > > > > > I can see how making the life of the attacker more difficult is
> > > > > > appealing.
> > > > > > Yet, the goal needs to be clarified and the risk with the approach
> > > > > > acknowledged (see above).
> > > > > > 
> > > > > 
> > > > > You're right, we should have mentioned this weakness in our first email.
> > > > > Sorry about the oversight! This is definitely still a limitation that we
> > > > > have not yet overcome. However, we do think that the increase in
> > > > > attacker workload that you and Stefano are discussing could still be
> > > > > valuable to security conscious Xen users.
> > > > > 
> > > > > It would nice to find additional architecture features that we can use
> > > > > to close this hole on arm, but there aren't any that stand out to me
> > > > > either.
> > > > > 
> > > > > With this limitation in mind, what are the next steps we should take to
> > > > > support this feature for the xen community? Is this increase in attacker
> > > > > workload meaningful enough to justify the inclusion of VMF in Xen?
> > > > 
> > > > I think it could be valuable as an additional obstacle for the attacker
> > > > to overcome. The next step would be to port your series on top of
> > > > Julien's "Remove the directmap" patch series
> > > > https://marc.info/?l=xen-devel&m=167119090721116
> > > > 
> > > > Julien, what do you think?
> > > 
> > > If we want Xen to be used in confidential compute, then we need a compelling
> > > story and prove that we are at least as secure as other hypervisors.
> > > 
> > > So I think we need to investigate a few areas:
> > >     * Can we protect the TTBR? I don't think this can be done with the HW.
> > > But maybe I overlook it.
> > 
> > This can be done by running most of Xen at a lower EL, and having only a
> > small trusted (and hopefully formally verified) kernel run at EL2.
> 
> This is what I hinted in my 3rd bullet. :) I didn't consider this for the
> first bullet because the goal of this question is to figure out whether we
> can leave all Xen running in EL2 and still have the same guarantee.

It should be possible (see Google Native Client) but whether or not it
is useful is questionable.  I expect the complexity of the needed
compiler patches and binary-level static analysis to be greater than
that of running most of Xen at a lower exception level.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22  9:31                 ` Julien Grall
@ 2022-12-22 21:28                   ` Stefano Stabellini
  2023-01-08 16:30                     ` Julien Grall
  0 siblings, 1 reply; 33+ messages in thread
From: Stefano Stabellini @ 2022-12-22 21:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Demi Marie Obenour, Smith, Jackson, Brookes,
	Scott, Xen-devel, bertrand.marquis, jbeulich, Andrew Cooper,
	Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

On Thu, 22 Dec 2022, Julien Grall wrote:
> > What other hypervisors might or might not do should not be a factor in
> > this discussion and it would be best to leave it aside.
> 
> To be honest, Demi has a point. At the moment, VMF is a very niche use-case
> (see more below). So you would end up to use less than 10% of the normal Xen
> on Arm code. A lot of people will likely wonder why using Xen in this case?

[...]

> >  From an AMD/Xilinx point of view, most of our customers using Xen in
> > productions today don't use any hypercalls in one or more of their VMs.
> This suggests a mix of guests are running (some using hypercalls and other
> not). It would not be possible if you were using VMF.

It is true that the current limitations are very restrictive.

In embedded, we have a few pure static partitioning deployments where no
hypercalls are required (Linux is using hypercalls today but it could do
without), so maybe VMF could be enabled, but admittedly in those cases
the main focus today is safety and fault tolerance, rather than
confidential computing.


> > Xen is great for these use-cases and it is rather common in embedded.
> > It is certainly a different configuration from what most are come to
> > expect from Xen on the server/desktop x86 side. There is no question
> > that guests without hypercalls are important for Xen on ARM. >
> > As a Xen community we have a long history and strong interest in making
> > Xen more secure and also, more recently, safer (in the ISO 26262
> > safety-certification sense). The VMF work is very well aligned with both
> > of these efforts and any additional burder to attackers is certainly
> > good for Xen.
> 
> I agree that we have a strong focus on making Xen more secure. However, we
> also need to look at the use cases for it. As it stands, there will no:
>   - IOREQ use (don't think about emulating TPM)
>   - GICv3 ITS
>   - stage-1 SMMUv3
>   - decoding of instructions when there is no syndrome
>   - hypercalls (including event channels)
>   - dom0
> 
> That's a lot of Xen features that can't be used. Effectively you will make Xen
> more "secure" for a very few users.

Among these, the main problems affecting AMD/Xilinx users today would be:
- decoding of instructions
- hypercalls, especially event channels

Decoding of instructions would affect all our deployments. For
hypercalls, even in static partitioning deployments, sometimes event
channels are used for VM-to-VM notifications.


> > Now the question is what changes are necessary and how to make them to
> > the codebase. And if it turns out that some of the changes are not
> > applicable or too complex to accept, the decision will be made purely
> > from a code maintenance point of view and will have nothing to do with
> > VMs making no hypercalls being unimportant (i.e. if we don't accept one
> > or more patches is not going to have anything to do with the use-case
> > being unimportant or what other hypervisors might or might not do).
> I disagree, I think this is also about use cases. On the paper VMF look very
> great, but so far it still has a big flaw (the TTBR can be changed) and it
> would restrict a lot what you can do.

We would need to be very clear in the commit messages and documentation
that with the current version of VMF we do *not* achieve confidential
computing and we do *not* offer protections comparable to AMD SEV. It is
still possible for Xen to access guest data, it is just a bit harder.

From an implementation perspective, if we can find a way to implement it
that would be easy to maintain, then it might still be worth it. It
would probably take only a small amount of changes on top of the "Remove
the directmap" series to make it so "map_domain_page" doesn't work
anymore after boot.

That might be worth exploring if you and Jackson agree?


One thing that would make it much more widely applicable is your idea of
hypercalls bounce buffers. VMF might work with hypercalls if the guest
always uses the same buffer to pass hypercalls parameters to Xen. That
one buffer could remain mapped in Xen for the lifetime of the VM and the
VM would know to use it only to pass parameters to Xen.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen
  2022-12-22 21:28                   ` Stefano Stabellini
@ 2023-01-08 16:30                     ` Julien Grall
  0 siblings, 0 replies; 33+ messages in thread
From: Julien Grall @ 2023-01-08 16:30 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Demi Marie Obenour, Smith, Jackson, Brookes, Scott, Xen-devel,
	bertrand.marquis, jbeulich, Andrew Cooper, Roger Pau Monné,
	George Dunlap, Daniel P. Smith, christopher.w.clark

Hi Stefano,

On 22/12/2022 21:28, Stefano Stabellini wrote:
> On Thu, 22 Dec 2022, Julien Grall wrote:
>>> What other hypervisors might or might not do should not be a factor in
>>> this discussion and it would be best to leave it aside.
>>
>> To be honest, Demi has a point. At the moment, VMF is a very niche use-case
>> (see more below). So you would end up to use less than 10% of the normal Xen
>> on Arm code. A lot of people will likely wonder why using Xen in this case?
> 
> [...]
> 
>>>   From an AMD/Xilinx point of view, most of our customers using Xen in
>>> productions today don't use any hypercalls in one or more of their VMs.
>> This suggests a mix of guests are running (some using hypercalls and other
>> not). It would not be possible if you were using VMF.
> 
> It is true that the current limitations are very restrictive.
> 
> In embedded, we have a few pure static partitioning deployments where no
> hypercalls are required (Linux is using hypercalls today but it could do
> without), so maybe VMF could be enabled, but admittedly in those cases
> the main focus today is safety and fault tolerance, rather than
> confidential computing.
> 
> 
>>> Xen is great for these use-cases and it is rather common in embedded.
>>> It is certainly a different configuration from what most are come to
>>> expect from Xen on the server/desktop x86 side. There is no question
>>> that guests without hypercalls are important for Xen on ARM. >
>>> As a Xen community we have a long history and strong interest in making
>>> Xen more secure and also, more recently, safer (in the ISO 26262
>>> safety-certification sense). The VMF work is very well aligned with both
>>> of these efforts and any additional burder to attackers is certainly
>>> good for Xen.
>>
>> I agree that we have a strong focus on making Xen more secure. However, we
>> also need to look at the use cases for it. As it stands, there will no:
>>    - IOREQ use (don't think about emulating TPM)
>>    - GICv3 ITS
>>    - stage-1 SMMUv3
>>    - decoding of instructions when there is no syndrome
>>    - hypercalls (including event channels)
>>    - dom0
>>
>> That's a lot of Xen features that can't be used. Effectively you will make Xen
>> more "secure" for a very few users.
> 
> Among these, the main problems affecting AMD/Xilinx users today would be:
> - decoding of instructions
> - hypercalls, especially event channels
> 
> Decoding of instructions would affect all our deployments. For
> hypercalls, even in static partitioning deployments, sometimes event
> channels are used for VM-to-VM notifications.
> 
> 
>>> Now the question is what changes are necessary and how to make them to
>>> the codebase. And if it turns out that some of the changes are not
>>> applicable or too complex to accept, the decision will be made purely
>>> from a code maintenance point of view and will have nothing to do with
>>> VMs making no hypercalls being unimportant (i.e. if we don't accept one
>>> or more patches is not going to have anything to do with the use-case
>>> being unimportant or what other hypervisors might or might not do).
>> I disagree, I think this is also about use cases. On the paper VMF look very
>> great, but so far it still has a big flaw (the TTBR can be changed) and it
>> would restrict a lot what you can do.
> 
> We would need to be very clear in the commit messages and documentation
> that with the current version of VMF we do *not* achieve confidential
> computing and we do *not* offer protections comparable to AMD SEV. It is
> still possible for Xen to access guest data, it is just a bit harder.
> 
>  From an implementation perspective, if we can find a way to implement it
> that would be easy to maintain, then it might still be worth it. It
> would probably take only a small amount of changes on top of the "Remove
> the directmap" series to make it so "map_domain_page" doesn't work
> anymore after boot.

None of the callers of map_domain_page() expect the function to fais. So 
some treewide changes will be needed in order to deal with 
map_domain_page() not working. This is not something I am willing to 
accept if the only user is VMF (at the moment I can't think of any other).

So instead, we would need to come up with a way where map_domain_page() 
will never be called at runtime when VMF is in use (maybe by compiling 
out some code?). I haven't really looked in details to say whether 
that's feasiable.

> 
> That might be worth exploring if you and Jackson agree?

I am OK to continue explore it because I think some bits will be still 
useful for the general use. As for the full solution, I will wait and 
see the results before deciding whether this is something that I would 
be happy to merge/maintain.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-01-08 16:30 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13 19:48 [RFC 0/4] Adding Virtual Memory Fuses to Xen Smith, Jackson
2022-12-13 19:50 ` [RFC 1/4] Add VMF Hypercall Smith, Jackson
2022-12-14  9:29   ` Jan Beulich
2022-12-13 19:53 ` [RFC 2/4] Add VMF tool Smith, Jackson
2022-12-13 19:54 ` [RFC 3/4] Add xen superpage splitting support to arm Smith, Jackson
2022-12-13 21:15   ` Julien Grall
2022-12-13 22:17     ` Demi Marie Obenour
2022-12-13 23:07       ` Julien Grall
2022-12-14  1:38         ` Demi Marie Obenour
2022-12-14  9:09           ` Julien Grall
2022-12-13 19:55 ` [RFC 4/4] Implement VMF for arm64 Smith, Jackson
2022-12-13 20:55 ` [RFC 0/4] Adding Virtual Memory Fuses to Xen Julien Grall
2022-12-13 22:22   ` Demi Marie Obenour
2022-12-13 23:05     ` Julien Grall
2022-12-14  1:28       ` Demi Marie Obenour
2022-12-14 14:06       ` Julien Grall
2022-12-16 11:58     ` Julien Grall
2022-12-15 19:27   ` Smith, Jackson
2022-12-15 22:00     ` Julien Grall
2022-12-16  1:46       ` Stefano Stabellini
2022-12-16  8:38         ` Julien Grall
2022-12-20 22:17           ` Smith, Jackson
2022-12-20 22:30             ` Demi Marie Obenour
2022-12-22  0:53               ` Stefano Stabellini
2022-12-22  4:33                 ` Demi Marie Obenour
2022-12-22  9:31                 ` Julien Grall
2022-12-22 21:28                   ` Stefano Stabellini
2023-01-08 16:30                     ` Julien Grall
2022-12-22  0:38             ` Stefano Stabellini
2022-12-22  9:52               ` Julien Grall
2022-12-22 10:14                 ` Demi Marie Obenour
2022-12-22 10:21                   ` Julien Grall
2022-12-22 10:28                     ` Demi Marie Obenour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.