linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
@ 2020-09-22  0:40 Vipin Sharma
  2020-09-22  0:40 ` [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller Vipin Sharma
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Vipin Sharma @ 2020-09-22  0:40 UTC (permalink / raw)
  To: thomas.lendacky, pbonzini, sean.j.christopherson, tj, lizefan
  Cc: joro, corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Vipin Sharma

Hello,

This patch series adds a new SEV controller for tracking and limiting
the usage of SEV ASIDs on the AMD SVM platform.

SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
but this resource is in very limited quantity on a host.

This limited quantity creates issues like SEV ASID starvation and
unoptimized scheduling in the cloud infrastructure.

SEV controller provides SEV ASID tracking and resource control
mechanisms.

Patch 1 - Overview, motivation, and implementation details of the SEV
          controller.
Patch 2 - Kernel documentation of the SEV controller for both
	  cgroup v1 and v2.

Thanks

Vipin Sharma (2):
  KVM: SVM: Create SEV cgroup controller.
  KVM: SVM: SEV cgroup controller documentation

 Documentation/admin-guide/cgroup-v1/sev.rst |  94 +++++
 Documentation/admin-guide/cgroup-v2.rst     |  56 ++-
 arch/x86/kvm/Makefile                       |   1 +
 arch/x86/kvm/svm/sev.c                      |  16 +-
 arch/x86/kvm/svm/sev_cgroup.c               | 414 ++++++++++++++++++++
 arch/x86/kvm/svm/sev_cgroup.h               |  40 ++
 include/linux/cgroup_subsys.h               |   3 +
 init/Kconfig                                |  14 +
 8 files changed, 634 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/sev.rst
 create mode 100644 arch/x86/kvm/svm/sev_cgroup.c
 create mode 100644 arch/x86/kvm/svm/sev_cgroup.h

-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-09-22  0:40 [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Vipin Sharma
@ 2020-09-22  0:40 ` Vipin Sharma
  2020-09-22  1:04   ` Randy Dunlap
  2020-09-22  0:40 ` [RFC Patch 2/2] KVM: SVM: SEV cgroup controller documentation Vipin Sharma
  2020-09-22  1:48 ` [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Sean Christopherson
  2 siblings, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-09-22  0:40 UTC (permalink / raw)
  To: thomas.lendacky, pbonzini, sean.j.christopherson, tj, lizefan
  Cc: joro, corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Vipin Sharma, Dionna Glaze, Erdem Aktas

Create SEV cgroup controller for SEV ASIDs on the AMD platform.

SEV ASIDs are used to encrypt virtual machines memory and isolate the
guests from the hypervisor. However, number of SEV ASIDs are limited on
a platform. This leads to the resource constraints and cause issues
like:

1. Some applications exhausting all of the SEV ASIDs and depriving
   others on a host.
2. No capability with the system admin to allocate and limit the number
   of SEV ASIDs used by tasks.
3. Difficult for the cloud service providers to optimally schedule VMs
   and sandboxes across its fleet without knowing the overall picture of
   SEV ASIDs usage.

SEV controller tracks the usage and provides capability to limit SEV
ASIDs used by tasks.

Controller is enabled by CGROUP_SEV config option, it is dependent on
KVM_AMD_SEV option in the config file.

SEV Controller has 3 interface files:

1. max - Sets the max limit of the SEV ASIDs in the cgroup.

2. current - Shows the current count of the SEV ASIDs in the cgroup.

3. events - Event file to show the SEV ASIDs allocation denied in the
	    cgroup.

When kvm-amd module is installed it calls SEV controller API and informs
how many SEV ASIDs are available on the platform. Controller use this
value to allocate an array which stores ASID to cgroup mapping.

New SEV ASID allocation gets charged to the task's SEV cgroup. Migration
of charge is not supported, so, a charged ASID remains charged to the
same cgroup until that SEV ASID is freed. This feature is similar to the
memory cgroup as it is a stateful resource

On deletion of an empty cgroup whose tasks have moved to some other
cgroup but a SEV ASID is still charged to it, the SEV ASID gets mapped
to the parent cgroup.

Mapping array tells which cgroup to uncharge, and update mapping when
the cgroup is deleted. Mapping array is freed when kvm-amd module is
unloaded.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Rientjes <rientjes@google.com>
Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
Reviewed-by: Erdem Aktas <erdemaktas@google.com>
---
 arch/x86/kvm/Makefile         |   1 +
 arch/x86/kvm/svm/sev.c        |  16 +-
 arch/x86/kvm/svm/sev_cgroup.c | 414 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/sev_cgroup.h |  40 ++++
 include/linux/cgroup_subsys.h |   3 +
 init/Kconfig                  |  14 ++
 6 files changed, 487 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kvm/svm/sev_cgroup.c
 create mode 100644 arch/x86/kvm/svm/sev_cgroup.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 4a3081e9f4b5..bbbf10fc1b50 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -16,6 +16,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o
+kvm-$(CONFIG_CGROUP_SEV)	+= svm/sev_cgroup.o
 
 kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o
 kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7bf7bf734979..2cc0bea21a76 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -17,6 +17,7 @@
 
 #include "x86.h"
 #include "svm.h"
+#include "sev_cgroup.h"
 
 static int sev_flush_asids(void);
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -80,7 +81,7 @@ static bool __sev_recycle_asids(void)
 static int sev_asid_new(void)
 {
 	bool retry = true;
-	int pos;
+	int pos, ret;
 
 	mutex_lock(&sev_bitmap_lock);
 
@@ -98,6 +99,12 @@ static int sev_asid_new(void)
 		return -EBUSY;
 	}
 
+	ret = sev_asid_try_charge(pos);
+	if (ret) {
+		mutex_unlock(&sev_bitmap_lock);
+		return ret;
+	}
+
 	__set_bit(pos, sev_asid_bitmap);
 
 	mutex_unlock(&sev_bitmap_lock);
@@ -127,6 +134,8 @@ static void sev_asid_free(int asid)
 		sd->sev_vmcbs[pos] = NULL;
 	}
 
+	sev_asid_uncharge(pos);
+
 	mutex_unlock(&sev_bitmap_lock);
 }
 
@@ -1143,6 +1152,9 @@ int __init sev_hardware_setup(void)
 	if (!status)
 		return 1;
 
+	if (sev_cgroup_setup(max_sev_asid))
+		return 1;
+
 	/*
 	 * Check SEV platform status.
 	 *
@@ -1157,6 +1169,7 @@ int __init sev_hardware_setup(void)
 	pr_info("SEV supported\n");
 
 err:
+	sev_cgroup_teardown();
 	kfree(status);
 	return rc;
 }
@@ -1170,6 +1183,7 @@ void sev_hardware_teardown(void)
 	bitmap_free(sev_reclaim_asid_bitmap);
 
 	sev_flush_asids();
+	sev_cgroup_teardown();
 }
 
 void pre_sev_run(struct vcpu_svm *svm, int cpu)
diff --git a/arch/x86/kvm/svm/sev_cgroup.c b/arch/x86/kvm/svm/sev_cgroup.c
new file mode 100644
index 000000000000..f76a934b8cf2
--- /dev/null
+++ b/arch/x86/kvm/svm/sev_cgroup.c
@@ -0,0 +1,414 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * SEV cgroup controller
+ *
+ * Copyright 2020 Google LLC
+ * Author: Vipin Sharma <vipinsh@google.com>
+ */
+
+#include <linux/cgroup.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/lockdep.h>
+
+#define MAX_SEV_ASIDS_STR "max"
+
+/**
+ * struct sev_cgroup - Stores SEV ASID related cgroup data.
+ * @css: cgroup subsys state object.
+ * @max: Max limit of the count of the SEV ASIDs in the cgroup.
+ * @usage: Current count of the SEV ASIDs in the cgroup.
+ * @allocation_failure_event: Number of times the SEV ASIDs allocation denied.
+ * @events_file: File handle for sev.events file.
+ */
+struct sev_cgroup {
+	struct cgroup_subsys_state css;
+	unsigned int max;
+	unsigned int usage;
+	unsigned long allocation_failure_event;
+	struct cgroup_file events_file;
+};
+
+/* Maximum number of sev asids supported in the platform */
+static unsigned int sev_max_asids;
+
+/* Global array to store which ASID is charged to which cgroup */
+static struct sev_cgroup **sev_asids_cgroup_array;
+
+/*
+ * To synchronize sev_asids_cgroup_array changes from charging/uncharging,
+ * css_offline, max, and printing used ASIDs.
+ */
+static DEFINE_MUTEX(sev_cgroup_lock);
+
+/**
+ * css_sev() - Get sev_cgroup from the css.
+ * @css: cgroup subsys state object.
+ *
+ * Context: Any context.
+ * Return:
+ * * %NULL - If @css is null.
+ * * struct sev_cgroup * - SEV cgroup of the specified css.
+ */
+static struct sev_cgroup *css_sev(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct sev_cgroup, css) : NULL;
+}
+
+/**
+ * parent_sev_cgroup() - Get the parent sev cgroup in the cgroup hierarchy
+ * @sevcg: sev cgroup node whose parent is needed.
+ *
+ * Context: Any context.
+ * Return:
+ * * struct sev_cgroup * - Parent sev cgroup in the hierarchy.
+ * * %NULL - If @sevcg is null or it is the root in the hierarchy.
+ */
+static struct sev_cgroup *parent_sev_cgroup(struct sev_cgroup *sevcg)
+{
+	return sevcg ? css_sev(sevcg->css.parent) : NULL;
+}
+
+/*
+ * sev_asid_cgroup_dec() - Decrement the SEV ASID usage in the cgroup.
+ * @sevcg: SEV cgroup.
+ *
+ * Context: Any context. Expects sev_cgroup_lock mutex to be held by the
+ *	    caller.
+ */
+static void sev_asid_cgroup_dec(struct sev_cgroup *sevcg)
+{
+	lockdep_assert_held(&sev_cgroup_lock);
+	sevcg->usage--;
+	/*
+	 * If this ever becomes max then there is a bug in the SEV cgroup code.
+	 */
+	WARN_ON_ONCE(sevcg->usage == UINT_MAX);
+}
+
+/**
+ * sev_asid_try_charge() - Try charging an SEV ASID to the cgroup.
+ * @pos: Index of SEV ASID in the SEV ASIDs bitmap.
+ *
+ * Try charging an SEV ASID to the current task's cgroup and all its ancestors
+ * up to the root. If charging is not possible due to the limit constraint,
+ * then notify the event file and return -errorno.
+ *
+ * Context: Process context. Takes and release sev_cgroup_lock mutex.
+ * Return:
+ * * 0 - If successfully charged the cgroup.
+ * * -EINVAL - If pos is not valid.
+ * * -EBUSY - If usage has already reached the limit.
+ */
+int sev_asid_try_charge(int pos)
+{
+	struct sev_cgroup *start, *i, *j;
+	int ret = 0;
+
+	mutex_lock(&sev_cgroup_lock);
+
+	start = css_sev(task_css(current, sev_cgrp_id));
+
+	for (i = start; i; i = parent_sev_cgroup(i)) {
+		if (i->usage == i->max)
+			goto e_limit;
+
+		i->usage++;
+	}
+
+	sev_asids_cgroup_array[pos] = start;
+exit:
+	mutex_unlock(&sev_cgroup_lock);
+	return ret;
+
+e_limit:
+	for (j = start; j != i; j = parent_sev_cgroup(j))
+		sev_asid_cgroup_dec(j);
+
+	start->allocation_failure_event++;
+	cgroup_file_notify(&start->events_file);
+
+	ret = -EBUSY;
+	goto exit;
+}
+EXPORT_SYMBOL(sev_asid_try_charge);
+
+/**
+ * sev_asid_uncharge() - Uncharge an SEV ASID from the cgroup.
+ * @pos: Index of SEV ASID in the SEV ASIDs bitmap.
+ *
+ * Uncharge an SEV ASID from the cgroup to which it was charged in
+ * sev_asid_try_charge().
+ *
+ * Context: Process context. Takes and release sev_cgroup_lock mutex.
+ */
+void sev_asid_uncharge(int pos)
+{
+	struct sev_cgroup *i;
+
+	mutex_lock(&sev_cgroup_lock);
+
+	for (i = sev_asids_cgroup_array[pos]; i; i = parent_sev_cgroup(i))
+		sev_asid_cgroup_dec(i);
+
+	sev_asids_cgroup_array[pos] = NULL;
+
+	mutex_unlock(&sev_cgroup_lock);
+}
+EXPORT_SYMBOL(sev_asid_uncharge);
+
+/**
+ * sev_cgroup_setup() - Setup the sev cgroup before charging.
+ * @max: Maximum number of SEV ASIDs supported by the platform.
+ *
+ * Initialize the sev_asids_cgroup_array which stores ASID to cgroup mapping.
+ *
+ * Context: Process context. Takes and release sev_cgroup_lock mutex.
+ * Return:
+ * * 0 - If setup was successful.
+ * * -ENOMEM - If memory not available to allocate the array.
+ */
+int sev_cgroup_setup(unsigned int max)
+{
+	int ret = 0;
+
+	mutex_lock(&sev_cgroup_lock);
+
+	sev_max_asids = max;
+	sev_asids_cgroup_array = kcalloc(sev_max_asids,
+					 sizeof(struct sev_cgroup *),
+					 GFP_KERNEL);
+	if (!sev_asids_cgroup_array) {
+		sev_max_asids = 0;
+		ret = -ENOMEM;
+	}
+
+	mutex_unlock(&sev_cgroup_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL(sev_cgroup_setup);
+
+/**
+ * sev_cgroup_teardown() - Release resources, no more charging/uncharging will
+ *			   happen.
+ *
+ * Context: Process context. Takes and release sev_cgroup_lock mutex.
+ */
+void sev_cgroup_teardown(void)
+{
+	mutex_lock(&sev_cgroup_lock);
+
+	kfree(sev_asids_cgroup_array);
+	sev_asids_cgroup_array = NULL;
+	sev_max_asids = 0;
+
+	mutex_unlock(&sev_cgroup_lock);
+}
+EXPORT_SYMBOL(sev_cgroup_teardown);
+
+/**
+ * sev_max_write() - Take user supplied max value limit for the cgroup.
+ * @of: Handler for the file.
+ * @buf: Data from the user.
+ * @nbytes: Number of bytes of the data.
+ * @off: Offset in the file.
+ *
+ * Context: Process context. Takes and release sev_cgroup_lock mutex.
+ * Return:
+ * * >= 0 - Number of bytes read in the buffer.
+ * * -EINVAL - If @buf is lower than the current usage, negative, exceeds max
+ *	       value of u32, or not a number.
+ */
+static ssize_t sev_max_write(struct kernfs_open_file *of, char *buf,
+			     size_t nbytes, loff_t off)
+{
+	struct sev_cgroup *sevcg;
+	unsigned int max;
+	int err;
+
+	buf = strstrip(buf);
+	if (!strcmp(buf, MAX_SEV_ASIDS_STR)) {
+		max = UINT_MAX;
+	} else {
+		err = kstrtouint(buf, 0, &max);
+		if (err)
+			return err;
+	}
+
+	sevcg = css_sev(of_css(of));
+
+	mutex_lock(&sev_cgroup_lock);
+
+	if (max < sevcg->usage) {
+		mutex_unlock(&sev_cgroup_lock);
+		return -EINVAL;
+	}
+
+	sevcg->max = max;
+
+	mutex_unlock(&sev_cgroup_lock);
+	return nbytes;
+}
+
+/**
+ * sev_max_show() - Print the current max limit in the cgroup.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * @Return: 0 to denote successful print.
+ */
+static int sev_max_show(struct seq_file *sf, void *v)
+{
+	unsigned int max = css_sev(seq_css(sf))->max;
+
+	if (max == UINT_MAX)
+		seq_printf(sf, "%s\n", MAX_SEV_ASIDS_STR);
+	else
+		seq_printf(sf, "%u\n", max);
+
+	return 0;
+}
+
+/**
+ * sev_current() - Get the current usage of SEV ASIDs in the cgroup.
+ * @css: cgroup subsys state object
+ * @cft: Handler for cgroup interface file
+ *
+ * Context: Any context.
+ * Return: Current count of SEV ASIDs used in the cgroup.
+ */
+static u64 sev_current(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+	return css_sev(css)->usage;
+}
+
+/**
+ * sev_events() - Show the tally of events that occurred in the SEV cgroup.
+ * @sf: Interface file.
+ * @v: Arguments passed.
+ *
+ * Context: Any context.
+ * Return: 0 to denote the successful print.
+ */
+static int sev_events(struct seq_file *sf, void *v)
+{
+	struct cgroup_subsys_state *css = seq_css(sf);
+
+	seq_printf(sf, "max %lu\n", css_sev(css)->allocation_failure_event);
+	return 0;
+}
+
+/* sev cgroup interface files */
+static struct cftype sev_files[] = {
+	{
+		/* Maximum count of SEV ASIDs allowed */
+		.name = "max",
+		.write = sev_max_write,
+		.seq_show = sev_max_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{
+		/* Current usage of SEV ASIDs */
+		.name = "current",
+		.read_u64 = sev_current,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{
+		/*
+		 * Flat keyed event file.
+		 *
+		 * max %allocation_failure_event
+		 *    Number of times SEV ASIDs not allocated because current
+		 *    usage reached the max limit
+		 */
+		.name = "events",
+		.file_offset = offsetof(struct sev_cgroup, events_file),
+		.seq_show = sev_events,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{}
+};
+
+/**
+ * sev_css_alloc() - Allocate a sev cgroup node in the cgroup hieararchy.
+ * @parent_css: cgroup subsys state of the parent cgroup node.
+ *
+ * Context: Process context.
+ * Return:
+ * * struct cgroup_subsys_state * - Pointer to css field of struct sev_cgroup.
+ * * ERR_PTR(-ENOMEM) - No memory available to create sev_cgroup node.
+ */
+static struct cgroup_subsys_state *
+sev_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct sev_cgroup *sevcg;
+
+	sevcg = kzalloc(sizeof(*sevcg), GFP_KERNEL);
+	if (!sevcg)
+		return ERR_PTR(-ENOMEM);
+
+	sevcg->max = UINT_MAX;
+	sevcg->usage = 0;
+	sevcg->allocation_failure_event = 0;
+
+	return &sevcg->css;
+}
+
+/**
+ * sev_css_free() - Free the sev_cgroup that @css belongs to.
+ * @css: cgroup subsys state object
+ *
+ * Context: Any context.
+ */
+static void sev_css_free(struct cgroup_subsys_state *css)
+{
+	kfree(css_sev(css));
+}
+
+/**
+ * sev_css_offline() - cgroup is killed, move charges to parent.
+ * @css: css of the killed cgroup.
+ *
+ * Since charges do not migrate when the task moves, a killed css might have
+ * charges. Update the sev_asids_cgroup_array to point to the @css->parent.
+ * Parent is already charged in sev_asid_try_charge(), so its usage need not
+ * change.
+ *
+ * Context: Process context. Takes and release sev_cgroup_lock mutex.
+ */
+static void sev_css_offline(struct cgroup_subsys_state *css)
+{
+	struct sev_cgroup *sevcg, *parentcg;
+	int i;
+
+	if (!css->parent)
+		return;
+
+	sevcg = css_sev(css);
+
+	mutex_lock(&sev_cgroup_lock);
+
+	if (!sevcg->usage) {
+		mutex_unlock(&sev_cgroup_lock);
+		return;
+	}
+
+	parentcg = parent_sev_cgroup(sevcg);
+
+	for (i = 0; i < sev_max_asids; i++) {
+		if (sev_asids_cgroup_array[i] == sevcg)
+			sev_asids_cgroup_array[i] = parentcg;
+	}
+
+	mutex_unlock(&sev_cgroup_lock);
+}
+
+struct cgroup_subsys sev_cgrp_subsys = {
+	.css_alloc = sev_css_alloc,
+	.css_free = sev_css_free,
+	.css_offline = sev_css_offline,
+	.legacy_cftypes = sev_files,
+	.dfl_cftypes = sev_files
+};
diff --git a/arch/x86/kvm/svm/sev_cgroup.h b/arch/x86/kvm/svm/sev_cgroup.h
new file mode 100644
index 000000000000..d2d69870a005
--- /dev/null
+++ b/arch/x86/kvm/svm/sev_cgroup.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * SEV cgroup interface for charging and uncharging the cgroup.
+ *
+ * Copyright 2020 Google LLC
+ * Author: Vipin Sharma <vipinsh@google.com>
+ */
+
+#ifndef _SEV_CGROUP_H_
+#define _SEV_CGROUP_H_
+
+#ifdef CONFIG_CGROUP_SEV
+
+int sev_asid_try_charge(int pos);
+void sev_asid_uncharge(int pos);
+int sev_cgroup_setup(unsigned int max);
+void sev_cgroup_teardown(void);
+
+#else /* CONFIG_CGROUP_SEV */
+
+static inline int sev_asid_try_charge(int pos)
+{
+	return 0;
+}
+
+static inline void sev_asid_uncharge(int pos)
+{
+}
+
+static inline int sev_cgroup_setup(unsigned int max)
+{
+	return 0;
+}
+
+static inline void sev_cgroup_teardown(void)
+{
+}
+#endif /* CONFIG_CGROUP_SEV */
+
+#endif /* _SEV_CGROUP_H_ */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..d21a5b4a2037 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,9 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_SEV)
+SUBSYS(sev)
+#endif
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/init/Kconfig b/init/Kconfig
index d6a0b31b13dc..1a57c362b803 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1101,6 +1101,20 @@ config CGROUP_BPF
 	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
 	  inet sockets.
 
+config CGROUP_SEV
+	bool "SEV ASID controller"
+	depends on KVM_AMD_SEV
+	default n
+	help
+	  Provides a controller for AMD SEV ASIDs. This controller limits and
+	  shows the total usage of SEV ASIDs used in encrypted VMs on AMD
+	  processors. Whenever a new encrypted VM is created using SEV on an
+	  AMD processor, this controller will check the current limit in the
+	  cgroup to which the task belongs and will deny the SEV ASID if the
+	  cgroup has already reached its limit.
+
+	  Say N if unsure.
+
 config CGROUP_DEBUG
 	bool "Debug controller"
 	default n
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC Patch 2/2] KVM: SVM: SEV cgroup controller documentation
  2020-09-22  0:40 [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Vipin Sharma
  2020-09-22  0:40 ` [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller Vipin Sharma
@ 2020-09-22  0:40 ` Vipin Sharma
  2020-09-22  1:48 ` [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Sean Christopherson
  2 siblings, 0 replies; 29+ messages in thread
From: Vipin Sharma @ 2020-09-22  0:40 UTC (permalink / raw)
  To: thomas.lendacky, pbonzini, sean.j.christopherson, tj, lizefan
  Cc: joro, corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Vipin Sharma, Dionna Glaze, Erdem Aktas

SEV cgroup controller documentation.

Documentation for both cgroup versions, v1 and v2, of SEV cgroup
controller. SEV controller is used to distribute and account SEV ASIDs
usage by KVM on AMD processor.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Rientjes <rientjes@google.com>
Reviewed-by: Dionna Glaze <dionnaglaze@google.com>
Reviewed-by: Erdem Aktas <erdemaktas@google.com>
---
 Documentation/admin-guide/cgroup-v1/sev.rst | 94 +++++++++++++++++++++
 Documentation/admin-guide/cgroup-v2.rst     | 56 +++++++++++-
 2 files changed, 147 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/sev.rst

diff --git a/Documentation/admin-guide/cgroup-v1/sev.rst b/Documentation/admin-guide/cgroup-v1/sev.rst
new file mode 100644
index 000000000000..04d0024360a1
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/sev.rst
@@ -0,0 +1,94 @@
+==============
+SEV Controller
+==============
+
+Overview
+========
+
+The SEV controller regulates the distribution of SEV ASIDs. SEV ASIDs are used
+in creating encrypted VMs on AMD processors. SEV ASIDs are stateful and one
+ASID is only used in one KVM object at a time. It cannot be used with other KVM
+before unbinding it from the previous KVM.
+
+All SEV ASIDs are tracked by this controller and it allows for accounting and
+distribution of this resource.
+
+How to Enable Controller
+========================
+
+- Enable memory encryption on AMD platform::
+
+        CONFIG_KVM_AMD_SEV=y
+
+- Enable SEV controller::
+
+        CONFIG_CGROUP_SEV=y
+
+- Above options will build SEV controller support in the kernel.
+  To mount sev controller::
+
+        mount -t cgroup -o sev none /sys/fs/cgroup/sev
+
+Interface Files
+==============
+
+  sev.current
+        A read-only single value file which exists on non-root cgroups.
+
+        The total number of SEV ASIDs currently in use by the cgroup and its
+        descendants.
+
+  sev.max
+        A read-write single value file which exists on non-root cgroups. The
+        default is "max".
+
+        SEV ASIDs usage hard limit. If the cgroup's current SEV ASIDs usage
+        reach this limit then the new SEV VMs creation will return error
+        -EBUSY.  This limit cannot be set lower than sev.current.
+
+  sev.events
+        A read-only flat-keyed single value file which exists on non-root
+        cgroups. A value change in this file generates a file modified event.
+
+          max
+                 The number of times the cgroup's SEV ASIDs usage was about to
+                 go over the max limit. This is a tally of SEV VM creation
+                 failures in the cgroup.
+
+Hierarchy
+=========
+
+SEV controller supports hierarchical accounting. It supports following
+features:
+
+1. SEV ASID usage in the cgroup includes itself and its descendent cgroups.
+2. SEV ASID usage can never exceed the max limit set in the cgroup and its
+   ancestor's chain up to the root.
+3. SEV events keep a tally of SEV VM creation failures in the cgroup and not in
+   its child subtree.
+
+Suppose the following example hierarchy::
+
+                        root
+                        /  \
+                       A    B
+                       |
+                       C
+
+1. A will show the count of SEV ASID used in A and C.
+2. C's SEV ASID usage may not exceed any of the max limits set in C, A, or
+   root.
+3. A's event file lists only SEV VM creation failed in A, and not the ones in
+   C.
+
+Migration and SEV ASID ownership
+================================
+
+An SEV ASID is charged to the cgroup which instantiated it, and stays charged
+to that cgroup until that SEV ASID is freed. Migrating a process to a different
+cgroup do not move the SEV ASID charge to the destination cgroup where the
+process has moved.
+
+Deletion of a cgroup with existing ASIDs charges will migrate those ASIDs to
+the parent cgroup.
+
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6be43781ec7f..66b8bdee8ff3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -63,8 +63,11 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
        5-7-1. RDMA Interface Files
      5-8. HugeTLB
        5.8-1. HugeTLB Interface Files
-     5-8. Misc
-       5-8-1. perf_event
+     5-9 SEV
+       5-9-1 SEV Interface Files
+       5-9-2 SEV ASIDs Ownership
+     5-10. Misc
+       5-10-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -2109,6 +2112,54 @@ HugeTLB Interface Files
 	are local to the cgroup i.e. not hierarchical. The file modified event
 	generated on this file reflects only the local events.
 
+SEV
+---
+
+The SEV controller regulates the distribution of SEV ASIDs. SEV ASIDs are used
+in creating encrypted VMs on AMD processors. SEV ASIDs are stateful and one
+ASID is only used in one KVM object at a time. It cannot be used with other KVM
+before unbinding it from the previous KVM.
+
+All SEV ASIDs are tracked by this controller and it allows for accounting and
+distribution of this resource.
+
+SEV Interface Files
+~~~~~~~~~~~~~~~~~~~
+
+  sev.current
+        A read-only single value file which exists on non-root cgroups.
+
+        The total number of SEV ASIDs currently in use by the cgroup and its
+        descendants.
+
+  sev.max
+        A read-write single value file which exists on non-root cgroups. The
+        default is "max".
+
+        SEV ASIDs usage hard limit. If the cgroup's current SEV ASIDs usage
+        reach this limit then the new SEV VMs creation will return error
+        -EBUSY.  This limit cannot be set lower than sev.current.
+
+  sev.events
+        A read-only flat-keyed single value file which exists on non-root
+        cgroups. A value change in this file generates a file modified event.
+
+          max
+                 The number of times the cgroup's SEV ASIDs usage was about to
+                 go over the max limit. This is a tally of SEV VM creation
+                 failures in the cgroup.
+
+SEV ASIDs Ownership
+~~~~~~~~~~~~~~~~~~~
+
+An SEV ASID is charged to the cgroup which instantiated it, and stays charged
+to the cgroup until the ASID is freed. Migrating a process to a different
+cgroup do not move the SEV ASID charge to the destination cgroup where the
+process has moved.
+
+Deletion of a cgroup with existing ASIDs charges will migrate those ASIDs to
+the parent cgroup.
+
 Misc
 ----
 
@@ -2120,7 +2171,6 @@ automatically enabled on the v2 hierarchy so that perf events can
 always be filtered by cgroup v2 path.  The controller can still be
 moved to a legacy hierarchy after v2 hierarchy is populated.
 
-
 Non-normative information
 -------------------------
 
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-09-22  0:40 ` [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller Vipin Sharma
@ 2020-09-22  1:04   ` Randy Dunlap
  2020-09-22  1:22     ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Randy Dunlap @ 2020-09-22  1:04 UTC (permalink / raw)
  To: Vipin Sharma, thomas.lendacky, pbonzini, sean.j.christopherson,
	tj, lizefan
  Cc: joro, corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Dionna Glaze, Erdem Aktas

Hi,

On 9/21/20 5:40 PM, Vipin Sharma wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index d6a0b31b13dc..1a57c362b803 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1101,6 +1101,20 @@ config CGROUP_BPF
>  	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
>  	  inet sockets.
>  
> +config CGROUP_SEV
> +	bool "SEV ASID controller"
> +	depends on KVM_AMD_SEV
> +	default n
> +	help
> +	  Provides a controller for AMD SEV ASIDs. This controller limits and
> +	  shows the total usage of SEV ASIDs used in encrypted VMs on AMD
> +	  processors. Whenever a new encrypted VM is created using SEV on an
> +	  AMD processor, this controller will check the current limit in the
> +	  cgroup to which the task belongs and will deny the SEV ASID if the
> +	  cgroup has already reached its limit.
> +
> +	  Say N if unsure.

Something here (either in the bool prompt string or the help text) should
let a reader know w.t.h. SEV means.

Without having to look in other places...

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-09-22  1:04   ` Randy Dunlap
@ 2020-09-22  1:22     ` Sean Christopherson
  2020-09-22 16:05       ` Vipin Sharma
  2020-11-03 16:39       ` James Bottomley
  0 siblings, 2 replies; 29+ messages in thread
From: Sean Christopherson @ 2020-09-22  1:22 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Vipin Sharma, thomas.lendacky, pbonzini, tj, lizefan, joro,
	corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Dionna Glaze, Erdem Aktas

On Mon, Sep 21, 2020 at 06:04:04PM -0700, Randy Dunlap wrote:
> Hi,
> 
> On 9/21/20 5:40 PM, Vipin Sharma wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index d6a0b31b13dc..1a57c362b803 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1101,6 +1101,20 @@ config CGROUP_BPF
> >  	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
> >  	  inet sockets.
> >  
> > +config CGROUP_SEV
> > +	bool "SEV ASID controller"
> > +	depends on KVM_AMD_SEV
> > +	default n
> > +	help
> > +	  Provides a controller for AMD SEV ASIDs. This controller limits and
> > +	  shows the total usage of SEV ASIDs used in encrypted VMs on AMD
> > +	  processors. Whenever a new encrypted VM is created using SEV on an
> > +	  AMD processor, this controller will check the current limit in the
> > +	  cgroup to which the task belongs and will deny the SEV ASID if the
> > +	  cgroup has already reached its limit.
> > +
> > +	  Say N if unsure.
> 
> Something here (either in the bool prompt string or the help text) should
> let a reader know w.t.h. SEV means.
> 
> Without having to look in other places...

ASIDs too.  I'd also love to see more info in the docs and/or cover letter
to explain why ASID management on SEV requires a cgroup.  I know what an
ASID is, and have a decent idea of how KVM manages ASIDs for legacy VMs, but
I know nothing about why ASIDs are limited for SEV and not legacy VMs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-22  0:40 [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Vipin Sharma
  2020-09-22  0:40 ` [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller Vipin Sharma
  2020-09-22  0:40 ` [RFC Patch 2/2] KVM: SVM: SEV cgroup controller documentation Vipin Sharma
@ 2020-09-22  1:48 ` Sean Christopherson
  2020-09-22 21:14   ` Vipin Sharma
  2020-09-23 12:47   ` Paolo Bonzini
  2 siblings, 2 replies; 29+ messages in thread
From: Sean Christopherson @ 2020-09-22  1:48 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: thomas.lendacky, pbonzini, tj, lizefan, joro, corbet,
	brijesh.singh, jon.grimm, eric.vantassell, gingell, rientjes,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> Hello,
> 
> This patch series adds a new SEV controller for tracking and limiting
> the usage of SEV ASIDs on the AMD SVM platform.
> 
> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> but this resource is in very limited quantity on a host.
> 
> This limited quantity creates issues like SEV ASID starvation and
> unoptimized scheduling in the cloud infrastructure.
> 
> SEV controller provides SEV ASID tracking and resource control
> mechanisms.

This should be genericized to not be SEV specific.  TDX has a similar
scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
(gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
would change anything, I think it'd just be a bunch of renaming.  The hardest
part would probably be figuring out a name :-).

Another idea would be to go even more generic and implement a KVM cgroup
that accounts the number of VMs of a particular type, e.g. legacy, SEV,
SEV-ES?, and TDX.  That has potential future problems though as it falls
apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
light of day.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-09-22  1:22     ` Sean Christopherson
@ 2020-09-22 16:05       ` Vipin Sharma
  2020-11-03 16:39       ` James Bottomley
  1 sibling, 0 replies; 29+ messages in thread
From: Vipin Sharma @ 2020-09-22 16:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Randy Dunlap, thomas.lendacky, pbonzini, tj, lizefan, joro,
	corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Dionna Glaze, Erdem Aktas

On Mon, Sep 21, 2020 at 06:22:28PM -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 06:04:04PM -0700, Randy Dunlap wrote:
> > Hi,
> > 
> > On 9/21/20 5:40 PM, Vipin Sharma wrote:
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index d6a0b31b13dc..1a57c362b803 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1101,6 +1101,20 @@ config CGROUP_BPF
> > >  	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
> > >  	  inet sockets.
> > >  
> > > +config CGROUP_SEV
> > > +	bool "SEV ASID controller"
> > > +	depends on KVM_AMD_SEV
> > > +	default n
> > > +	help
> > > +	  Provides a controller for AMD SEV ASIDs. This controller limits and
> > > +	  shows the total usage of SEV ASIDs used in encrypted VMs on AMD
> > > +	  processors. Whenever a new encrypted VM is created using SEV on an
> > > +	  AMD processor, this controller will check the current limit in the
> > > +	  cgroup to which the task belongs and will deny the SEV ASID if the
> > > +	  cgroup has already reached its limit.
> > > +
> > > +	  Say N if unsure.
> > 
> > Something here (either in the bool prompt string or the help text) should
> > let a reader know w.t.h. SEV means.
> > 
> > Without having to look in other places...
> 
> ASIDs too.  I'd also love to see more info in the docs and/or cover letter
> to explain why ASID management on SEV requires a cgroup.  I know what an
> ASID is, and have a decent idea of how KVM manages ASIDs for legacy VMs, but
> I know nothing about why ASIDs are limited for SEV and not legacy VMs.

Thanks for the feedback, I will add more details in the Kconfig and the
documentation about SEV and ASID.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-22  1:48 ` [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Sean Christopherson
@ 2020-09-22 21:14   ` Vipin Sharma
       [not found]     ` <20200924192116.GC9649@linux.intel.com>
  2020-09-23 12:47   ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-09-22 21:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: thomas.lendacky, pbonzini, tj, lizefan, joro, corbet,
	brijesh.singh, jon.grimm, eric.vantassell, gingell, rientjes,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> > Hello,
> > 
> > This patch series adds a new SEV controller for tracking and limiting
> > the usage of SEV ASIDs on the AMD SVM platform.
> > 
> > SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> > but this resource is in very limited quantity on a host.
> > 
> > This limited quantity creates issues like SEV ASID starvation and
> > unoptimized scheduling in the cloud infrastructure.
> > 
> > SEV controller provides SEV ASID tracking and resource control
> > mechanisms.
> 
> This should be genericized to not be SEV specific.  TDX has a similar
> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> would change anything, I think it'd just be a bunch of renaming.  The hardest
> part would probably be figuring out a name :-).
> 
> Another idea would be to go even more generic and implement a KVM cgroup
> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> SEV-ES?, and TDX.  That has potential future problems though as it falls
> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> light of day.

I read about the TDX and its use of the KeyID for encrypting VMs. TDX
has two kinds of KeyIDs private and shared.

On AMD platform there are two types of ASIDs for encryption.
1. SEV ASID - Normal runtime guest memory encryption.
2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
		 integrity.

Both types of ASIDs have their own maximum value which is provisioned in
the firmware

So, we are talking about 4 different types of resources:
1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
3. Intel TDX private KeyID
4. Intel TDX shared KeyID

TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
with the same name which can be used by both platforms will not be easy,
and extensible with the future enhancements. This will get even more
difficult if Arm also comes up with something similar but with different
nuances.

I like the idea of the KVM cgroup and when it is mounted it will have
different files based on the hardware platform.

1. KVM cgroup on AMD will have:
sev.max & sev.current.
sev_es.max & sev_es.current.

2. KVM cgroup mounted on Intel:
tdx_private_keys.max
tdx_shared_keys.max

The KVM cgroup can be used to have control files which are generic (no
use case in my mind right now) and hardware platform specific files
also.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-22  1:48 ` [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Sean Christopherson
  2020-09-22 21:14   ` Vipin Sharma
@ 2020-09-23 12:47   ` Paolo Bonzini
  2020-09-28  9:12     ` Janosch Frank
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2020-09-23 12:47 UTC (permalink / raw)
  To: Sean Christopherson, Vipin Sharma
  Cc: thomas.lendacky, tj, lizefan, joro, corbet, brijesh.singh,
	jon.grimm, eric.vantassell, gingell, rientjes, kvm, x86, cgroups,
	linux-doc, linux-kernel, kvm-ppc, linux-s390, Paul Mackerras,
	Janosch Frank

On 22/09/20 03:48, Sean Christopherson wrote:
> This should be genericized to not be SEV specific.  TDX has a similar
> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> would change anything, I think it'd just be a bunch of renaming.  The hardest
> part would probably be figuring out a name :-).
> 
> Another idea would be to go even more generic and implement a KVM cgroup
> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> SEV-ES?, and TDX.  That has potential future problems though as it falls
> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> light of day.

Or also MANY:1 (we are thinking of having multiple VMs share the same
SEV ASID).

It might even be the same on s390 and PPC, in which case we probably
want to implement this in virt/kvm.  Paul, Janosch, do you think this
would make sense for you?  The original commit message is below.

Paolo

> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>> Hello,
>>
>> This patch series adds a new SEV controller for tracking and limiting
>> the usage of SEV ASIDs on the AMD SVM platform.
>>
>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>> but this resource is in very limited quantity on a host.
>>
>> This limited quantity creates issues like SEV ASID starvation and
>> unoptimized scheduling in the cloud infrastructure.
>>
>> SEV controller provides SEV ASID tracking and resource control
>> mechanisms.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
       [not found]     ` <20200924192116.GC9649@linux.intel.com>
@ 2020-09-24 19:55       ` Tom Lendacky
  2020-09-25 22:22         ` Vipin Sharma
  2020-10-01 18:08         ` Peter Gonda
  0 siblings, 2 replies; 29+ messages in thread
From: Tom Lendacky @ 2020-09-24 19:55 UTC (permalink / raw)
  To: Sean Christopherson, Vipin Sharma
  Cc: pbonzini, tj, lizefan, joro, corbet, brijesh.singh, jon.grimm,
	eric.vantassell, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel

On 9/24/20 2:21 PM, Sean Christopherson wrote:
> On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
>> On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
>>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>>> Hello,
>>>>
>>>> This patch series adds a new SEV controller for tracking and limiting
>>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>>
>>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>>> but this resource is in very limited quantity on a host.
>>>>
>>>> This limited quantity creates issues like SEV ASID starvation and
>>>> unoptimized scheduling in the cloud infrastructure.
>>>>
>>>> SEV controller provides SEV ASID tracking and resource control
>>>> mechanisms.
>>>
>>> This should be genericized to not be SEV specific.  TDX has a similar
>>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>>> part would probably be figuring out a name :-).
>>>
>>> Another idea would be to go even more generic and implement a KVM cgroup
>>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>>> light of day.
>>
>> I read about the TDX and its use of the KeyID for encrypting VMs. TDX
>> has two kinds of KeyIDs private and shared.
> 
> To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> because those KeyIDs can be used without TDX or KVM in the picture.
> 
>> On AMD platform there are two types of ASIDs for encryption.
>> 1. SEV ASID - Normal runtime guest memory encryption.
>> 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
>> 		 integrity.
>>
>> Both types of ASIDs have their own maximum value which is provisioned in
>> the firmware
> 
> Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?

SEV-SNP and SEV-ES share the same ASID range.

Thanks,
Tom

> 
>> So, we are talking about 4 different types of resources:
>> 1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
>> 2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
>> 3. Intel TDX private KeyID
>> 4. Intel TDX shared KeyID
>>
>> TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
>> with the same name which can be used by both platforms will not be easy,
>> and extensible with the future enhancements. This will get even more
>> difficult if Arm also comes up with something similar but with different
>> nuances.
> 
> Honest question, what's easier for userspace/orchestration layers?  Having an
> abstract but common name, or conrete but different names?  My gut reaction is
> to provide a common interface, but I can see how that could do more harm than
> good, e.g. some amount of hardware capabilitiy discovery is possible with
> concrete names.  And I'm guessing there's already a fair amount of vendor
> specific knowledge bleeding into userspace for these features...
> 
> And if SNP is adding another ASID namespace, trying to abstract the types is
> probably a lost cause.
> 
>  From a code perspective, I doubt it will matter all that much, e.g. it should
> be easy enough to provide helpers for exposing a new asid/key type.
> 
>> I like the idea of the KVM cgroup and when it is mounted it will have
>> different files based on the hardware platform.
> 
> I don't think a KVM cgroup is the correct approach, e.g. there are potential
> use cases for "legacy" MKTME without KVM.  Maybe something like Encryption
> Keys cgroup?
> 
>> 1. KVM cgroup on AMD will have:
>> sev.max & sev.current.
>> sev_es.max & sev_es.current.
>>
>> 2. KVM cgroup mounted on Intel:
>> tdx_private_keys.max
>> tdx_shared_keys.max
>>
>> The KVM cgroup can be used to have control files which are generic (no
>> use case in my mind right now) and hardware platform specific files
>> also.
> 
> My "generic KVM cgroup" suggestion was probably a pretty bad suggestion.
> Except for ASIDs/KeyIDs, KVM itself doesn't manage any constrained resources,
> e.g. memory, logical CPUs, time slices, etc... are all generic resources that
> are consumed by KVM but managed elsewhere.  We definitely don't want to change
> that, nor do I think we want to do anything, such as creating a KVM cgroup,
> that would imply that having KVM manage resources is a good idea.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-24 19:55       ` Tom Lendacky
@ 2020-09-25 22:22         ` Vipin Sharma
  2020-10-02 20:48           ` Vipin Sharma
  2020-10-01 18:08         ` Peter Gonda
  1 sibling, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-09-25 22:22 UTC (permalink / raw)
  To: Tom Lendacky, Sean Christopherson, pbonzini
  Cc: tj, lizefan, joro, corbet, brijesh.singh, jon.grimm,
	eric.vantassell, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel

On Thu, Sep 24, 2020 at 02:55:18PM -0500, Tom Lendacky wrote:
> On 9/24/20 2:21 PM, Sean Christopherson wrote:
> > On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
> > > On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> > > > On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> > > > > Hello,
> > > > > 
> > > > > This patch series adds a new SEV controller for tracking and limiting
> > > > > the usage of SEV ASIDs on the AMD SVM platform.
> > > > > 
> > > > > SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> > > > > but this resource is in very limited quantity on a host.
> > > > > 
> > > > > This limited quantity creates issues like SEV ASID starvation and
> > > > > unoptimized scheduling in the cloud infrastructure.
> > > > > 
> > > > > SEV controller provides SEV ASID tracking and resource control
> > > > > mechanisms.
> > > > 
> > > > This should be genericized to not be SEV specific.  TDX has a similar
> > > > scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> > > > (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> > > > would change anything, I think it'd just be a bunch of renaming.  The hardest
> > > > part would probably be figuring out a name :-).
> > > > 
> > > > Another idea would be to go even more generic and implement a KVM cgroup
> > > > that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> > > > SEV-ES?, and TDX.  That has potential future problems though as it falls
> > > > apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> > > > account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> > > > light of day.
> > > 
> > > I read about the TDX and its use of the KeyID for encrypting VMs. TDX
> > > has two kinds of KeyIDs private and shared.
> > 
> > To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> > because those KeyIDs can be used without TDX or KVM in the picture.
> > 
> > > On AMD platform there are two types of ASIDs for encryption.
> > > 1. SEV ASID - Normal runtime guest memory encryption.
> > > 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
> > > 		 integrity.
> > > 
> > > Both types of ASIDs have their own maximum value which is provisioned in
> > > the firmware
> > 
> > Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> > or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> > expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> > SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
> 
> SEV-SNP and SEV-ES share the same ASID range.
> 
> Thanks,
> Tom
> 
> > 
> > > So, we are talking about 4 different types of resources:
> > > 1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
> > > 2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
> > > 3. Intel TDX private KeyID
> > > 4. Intel TDX shared KeyID
> > > 
> > > TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
> > > with the same name which can be used by both platforms will not be easy,
> > > and extensible with the future enhancements. This will get even more
> > > difficult if Arm also comes up with something similar but with different
> > > nuances.
> > 
> > Honest question, what's easier for userspace/orchestration layers?  Having an
> > abstract but common name, or conrete but different names?  My gut reaction is
> > to provide a common interface, but I can see how that could do more harm than
> > good, e.g. some amount of hardware capabilitiy discovery is possible with
> > concrete names.  And I'm guessing there's already a fair amount of vendor
> > specific knowledge bleeding into userspace for these features...

I agree with you that the abstract name is better than the concrete
name, I also feel that we must provide HW extensions. Here is one
approach:

Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
suggestions)

Control files: slots.{max, current, events}

Contents of the slot.max:
default max
	default: Corresponds to all kinds of encryption capabilities on
		 a platform. For AMD, it will be SEV and SEV-ES.  For
		 Intel, it will be TDX and MKTME. This can also be used
		 by other devices not just CPU.

	max: max or any number to denote limit on the cgroup.

A user who wants the finer control, then they need to know about the
capabilities a platform provides and use them, e.g. on AMD:

$ echo "sev-es 1000" > slot.max
$ cat slots.max
default max sev-es 1000

This means that in the cgroup maximum SEV-ES ASIDs which can be used is
1000 and SEV ASIDs is max (default, no limit).  Each platform can
provide their own extensions which can be overwritten by a user,
otherwise extensions will have the default limit.

This is kind of similar to the IO and the rdma controller.

I think it is keeping abstraction for userspace and also providing finer
control for HW specific features.

What do you think about the above approach?  

> > 
> > And if SNP is adding another ASID namespace, trying to abstract the types is
> > probably a lost cause.
> > 
> >  From a code perspective, I doubt it will matter all that much, e.g. it should
> > be easy enough to provide helpers for exposing a new asid/key type.
> > 
> > > I like the idea of the KVM cgroup and when it is mounted it will have
> > > different files based on the hardware platform.
> > 
> > I don't think a KVM cgroup is the correct approach, e.g. there are potential
> > use cases for "legacy" MKTME without KVM.  Maybe something like Encryption
> > Keys cgroup?

Added some suggestion in the above approach. I think we should not add
key in the name because here limitation is on the number of keys that
can be used simultaneously. 

> > 
> > > 1. KVM cgroup on AMD will have:
> > > sev.max & sev.current.
> > > sev_es.max & sev_es.current.
> > > 
> > > 2. KVM cgroup mounted on Intel:
> > > tdx_private_keys.max
> > > tdx_shared_keys.max
> > > 
> > > The KVM cgroup can be used to have control files which are generic (no
> > > use case in my mind right now) and hardware platform specific files
> > > also.
> > 
> > My "generic KVM cgroup" suggestion was probably a pretty bad suggestion.
> > Except for ASIDs/KeyIDs, KVM itself doesn't manage any constrained resources,
> > e.g. memory, logical CPUs, time slices, etc... are all generic resources that
> > are consumed by KVM but managed elsewhere.  We definitely don't want to change
> > that, nor do I think we want to do anything, such as creating a KVM cgroup,
> > that would imply that having KVM manage resources is a good idea.
> > 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-23 12:47   ` Paolo Bonzini
@ 2020-09-28  9:12     ` Janosch Frank
  2020-09-28  9:21       ` Christian Borntraeger
  0 siblings, 1 reply; 29+ messages in thread
From: Janosch Frank @ 2020-09-28  9:12 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vipin Sharma
  Cc: thomas.lendacky, tj, lizefan, joro, corbet, brijesh.singh,
	jon.grimm, eric.vantassell, gingell, rientjes, kvm, x86, cgroups,
	linux-doc, linux-kernel, kvm-ppc, linux-s390, Paul Mackerras,
	Christian Borntraeger


[-- Attachment #1.1: Type: text/plain, Size: 2056 bytes --]

On 9/23/20 2:47 PM, Paolo Bonzini wrote:
> On 22/09/20 03:48, Sean Christopherson wrote:
>> This should be genericized to not be SEV specific.  TDX has a similar
>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>> part would probably be figuring out a name :-).
>>
>> Another idea would be to go even more generic and implement a KVM cgroup
>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>> light of day.
> 
> Or also MANY:1 (we are thinking of having multiple VMs share the same
> SEV ASID).
> 
> It might even be the same on s390 and PPC, in which case we probably
> want to implement this in virt/kvm.  Paul, Janosch, do you think this
> would make sense for you?  The original commit message is below.
> 
> Paolo
> 
>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>> Hello,
>>>
>>> This patch series adds a new SEV controller for tracking and limiting
>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>
>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>> but this resource is in very limited quantity on a host.
>>>
>>> This limited quantity creates issues like SEV ASID starvation and
>>> unoptimized scheduling in the cloud infrastructure.
>>>
>>> SEV controller provides SEV ASID tracking and resource control
>>> mechanisms.
> 

On s390 we currently support a few million protected guests per LPAR so
guest IDs are not exactly scarce. However having accounting for them
might add some value nevertheless, especially when having large amount
of protected containers.

@Christian: Any thoughts on this?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-28  9:12     ` Janosch Frank
@ 2020-09-28  9:21       ` Christian Borntraeger
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Borntraeger @ 2020-09-28  9:21 UTC (permalink / raw)
  To: Janosch Frank, Paolo Bonzini, Sean Christopherson, Vipin Sharma
  Cc: thomas.lendacky, tj, lizefan, joro, corbet, brijesh.singh,
	jon.grimm, eric.vantassell, gingell, rientjes, kvm, x86, cgroups,
	linux-doc, linux-kernel, kvm-ppc, linux-s390, Paul Mackerras

On 28.09.20 11:12, Janosch Frank wrote:
> On 9/23/20 2:47 PM, Paolo Bonzini wrote:
>> On 22/09/20 03:48, Sean Christopherson wrote:
>>> This should be genericized to not be SEV specific.  TDX has a similar
>>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>>> part would probably be figuring out a name :-).
>>>
>>> Another idea would be to go even more generic and implement a KVM cgroup
>>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>>> light of day.
>>
>> Or also MANY:1 (we are thinking of having multiple VMs share the same
>> SEV ASID).
>>
>> It might even be the same on s390 and PPC, in which case we probably
>> want to implement this in virt/kvm.  Paul, Janosch, do you think this
>> would make sense for you?  The original commit message is below.
>>
>> Paolo
>>
>>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>>> Hello,
>>>>
>>>> This patch series adds a new SEV controller for tracking and limiting
>>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>>
>>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>>> but this resource is in very limited quantity on a host.
>>>>
>>>> This limited quantity creates issues like SEV ASID starvation and
>>>> unoptimized scheduling in the cloud infrastructure.
>>>>
>>>> SEV controller provides SEV ASID tracking and resource control
>>>> mechanisms.
>>
> 
> On s390 we currently support a few million protected guests per LPAR so
> guest IDs are not exactly scarce. However having accounting for them
> might add some value nevertheless, especially when having large amount
> of protected containers.
> 
> @Christian: Any thoughts on this?

Yes, maybe it is a good idea to limit containers to only have a sane number
of secure guests, so that a malicious pod cannot consume all IDs by calling
CREATE_VM and KVM_PV_ENABLE million times or so.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-24 19:55       ` Tom Lendacky
  2020-09-25 22:22         ` Vipin Sharma
@ 2020-10-01 18:08         ` Peter Gonda
  2020-10-01 22:44           ` Tom Lendacky
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Gonda @ 2020-10-01 18:08 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sean Christopherson, Vipin Sharma, Paolo Bonzini, tj, lizefan,
	Joerg Roedel, corbet, Singh, Brijesh, Grimm, Jon,
	eric.vantassell, Matt Gingell, David Rientjes, kvm list, x86,
	cgroups, linux-doc, linux-kernel

On Thu, Sep 24, 2020 at 1:55 PM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 9/24/20 2:21 PM, Sean Christopherson wrote:
> > On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
> >> On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> >>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> >>>> Hello,
> >>>>
> >>>> This patch series adds a new SEV controller for tracking and limiting
> >>>> the usage of SEV ASIDs on the AMD SVM platform.
> >>>>
> >>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> >>>> but this resource is in very limited quantity on a host.
> >>>>
> >>>> This limited quantity creates issues like SEV ASID starvation and
> >>>> unoptimized scheduling in the cloud infrastructure.
> >>>>
> >>>> SEV controller provides SEV ASID tracking and resource control
> >>>> mechanisms.
> >>>
> >>> This should be genericized to not be SEV specific.  TDX has a similar
> >>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> >>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> >>> would change anything, I think it'd just be a bunch of renaming.  The hardest
> >>> part would probably be figuring out a name :-).
> >>>
> >>> Another idea would be to go even more generic and implement a KVM cgroup
> >>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> >>> SEV-ES?, and TDX.  That has potential future problems though as it falls
> >>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> >>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> >>> light of day.
> >>
> >> I read about the TDX and its use of the KeyID for encrypting VMs. TDX
> >> has two kinds of KeyIDs private and shared.
> >
> > To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> > because those KeyIDs can be used without TDX or KVM in the picture.
> >
> >> On AMD platform there are two types of ASIDs for encryption.
> >> 1. SEV ASID - Normal runtime guest memory encryption.
> >> 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
> >>               integrity.
> >>
> >> Both types of ASIDs have their own maximum value which is provisioned in
> >> the firmware
> >
> > Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> > or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> > expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> > SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
>
> SEV-SNP and SEV-ES share the same ASID range.

Where is this documented? From the SEV-SNP FW ABI Spec 0.8 "The
firmware checks that ASID is an encryption capable ASID. If not, the
firmware returns INVALID_ASID." that doesn't seem clear that an SEV-ES
ASID is required. Should this document be more clear?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-10-01 18:08         ` Peter Gonda
@ 2020-10-01 22:44           ` Tom Lendacky
  0 siblings, 0 replies; 29+ messages in thread
From: Tom Lendacky @ 2020-10-01 22:44 UTC (permalink / raw)
  To: Peter Gonda
  Cc: Sean Christopherson, Vipin Sharma, Paolo Bonzini, tj, lizefan,
	Joerg Roedel, corbet, Singh, Brijesh, Grimm, Jon,
	eric.vantassell, Matt Gingell, David Rientjes, kvm list, x86,
	cgroups, linux-doc, linux-kernel

On 10/1/20 1:08 PM, Peter Gonda wrote:
> On Thu, Sep 24, 2020 at 1:55 PM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 9/24/20 2:21 PM, Sean Christopherson wrote:
>>> On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
>>>> On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
>>>>> On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch series adds a new SEV controller for tracking and limiting
>>>>>> the usage of SEV ASIDs on the AMD SVM platform.
>>>>>>
>>>>>> SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
>>>>>> but this resource is in very limited quantity on a host.
>>>>>>
>>>>>> This limited quantity creates issues like SEV ASID starvation and
>>>>>> unoptimized scheduling in the cloud infrastructure.
>>>>>>
>>>>>> SEV controller provides SEV ASID tracking and resource control
>>>>>> mechanisms.
>>>>>
>>>>> This should be genericized to not be SEV specific.  TDX has a similar
>>>>> scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
>>>>> (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
>>>>> would change anything, I think it'd just be a bunch of renaming.  The hardest
>>>>> part would probably be figuring out a name :-).
>>>>>
>>>>> Another idea would be to go even more generic and implement a KVM cgroup
>>>>> that accounts the number of VMs of a particular type, e.g. legacy, SEV,
>>>>> SEV-ES?, and TDX.  That has potential future problems though as it falls
>>>>> apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
>>>>> account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
>>>>> light of day.
>>>>
>>>> I read about the TDX and its use of the KeyID for encrypting VMs. TDX
>>>> has two kinds of KeyIDs private and shared.
>>>
>>> To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
>>> because those KeyIDs can be used without TDX or KVM in the picture.
>>>
>>>> On AMD platform there are two types of ASIDs for encryption.
>>>> 1. SEV ASID - Normal runtime guest memory encryption.
>>>> 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
>>>>                integrity.
>>>>
>>>> Both types of ASIDs have their own maximum value which is provisioned in
>>>> the firmware
>>>
>>> Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
>>> or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
>>> expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
>>> SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
>>
>> SEV-SNP and SEV-ES share the same ASID range.
> 
> Where is this documented? From the SEV-SNP FW ABI Spec 0.8 "The
> firmware checks that ASID is an encryption capable ASID. If not, the
> firmware returns INVALID_ASID." that doesn't seem clear that an SEV-ES
> ASID is required. Should this document be more clear?

I let the owner of the spec know and it will be updated.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-09-25 22:22         ` Vipin Sharma
@ 2020-10-02 20:48           ` Vipin Sharma
  2020-11-03  2:06             ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-10-02 20:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: thomas.lendacky, pbonzini, tj, lizefan, joro, corbet,
	brijesh.singh, jon.grimm, eric.vantassell, gingell, rientjes,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:
> On Thu, Sep 24, 2020 at 02:55:18PM -0500, Tom Lendacky wrote:
> > On 9/24/20 2:21 PM, Sean Christopherson wrote:
> > > On Tue, Sep 22, 2020 at 02:14:04PM -0700, Vipin Sharma wrote:
> > > > On Mon, Sep 21, 2020 at 06:48:38PM -0700, Sean Christopherson wrote:
> > > > > On Mon, Sep 21, 2020 at 05:40:22PM -0700, Vipin Sharma wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > This patch series adds a new SEV controller for tracking and limiting
> > > > > > the usage of SEV ASIDs on the AMD SVM platform.
> > > > > > 
> > > > > > SEV ASIDs are used in creating encrypted VM and lightweight sandboxes
> > > > > > but this resource is in very limited quantity on a host.
> > > > > > 
> > > > > > This limited quantity creates issues like SEV ASID starvation and
> > > > > > unoptimized scheduling in the cloud infrastructure.
> > > > > > 
> > > > > > SEV controller provides SEV ASID tracking and resource control
> > > > > > mechanisms.
> > > > > 
> > > > > This should be genericized to not be SEV specific.  TDX has a similar
> > > > > scarcity issue in the form of key IDs, which IIUC are analogous to SEV ASIDs
> > > > > (gave myself a quick crash course on SEV ASIDs).  Functionally, I doubt it
> > > > > would change anything, I think it'd just be a bunch of renaming.  The hardest
> > > > > part would probably be figuring out a name :-).
> > > > > 
> > > > > Another idea would be to go even more generic and implement a KVM cgroup
> > > > > that accounts the number of VMs of a particular type, e.g. legacy, SEV,
> > > > > SEV-ES?, and TDX.  That has potential future problems though as it falls
> > > > > apart if hardware every supports 1:MANY VMs:KEYS, or if there is a need to
> > > > > account keys outside of KVM, e.g. if MKTME for non-KVM cases ever sees the
> > > > > light of day.
> > > > 
> > > > I read about the TDX and its use of the KeyID for encrypting VMs. TDX
> > > > has two kinds of KeyIDs private and shared.
> > > 
> > > To clarify, "shared" KeyIDs are simply legacy MKTME KeyIDs.  This is relevant
> > > because those KeyIDs can be used without TDX or KVM in the picture.
> > > 
> > > > On AMD platform there are two types of ASIDs for encryption.
> > > > 1. SEV ASID - Normal runtime guest memory encryption.
> > > > 2. SEV-ES ASID - Extends SEV ASID by adding register state encryption with
> > > > 		 integrity.
> > > > 
> > > > Both types of ASIDs have their own maximum value which is provisioned in
> > > > the firmware
> > > 
> > > Ugh, I missed that detail in the SEV-ES RFC.  Does SNP add another ASID type,
> > > or does it reuse SEV-ES ASIDs?  If it does add another type, is that trend
> > > expected to continue, i.e. will SEV end up with SEV, SEV-ES, SEV-ES-SNP,
> > > SEV-ES-SNP-X, SEV-ES-SNP-X-Y, etc...?
> > 
> > SEV-SNP and SEV-ES share the same ASID range.
> > 
> > Thanks,
> > Tom
> > 
> > > 
> > > > So, we are talking about 4 different types of resources:
> > > > 1. AMD SEV ASID (implemented in this patch as sev.* files in SEV cgroup)
> > > > 2. AMD SEV-ES ASID (in future, adding files like sev_es.*)
> > > > 3. Intel TDX private KeyID
> > > > 4. Intel TDX shared KeyID
> > > > 
> > > > TDX private KeyID is similar to SEV and SEV-ES ASID. I think coming up
> > > > with the same name which can be used by both platforms will not be easy,
> > > > and extensible with the future enhancements. This will get even more
> > > > difficult if Arm also comes up with something similar but with different
> > > > nuances.
> > > 
> > > Honest question, what's easier for userspace/orchestration layers?  Having an
> > > abstract but common name, or conrete but different names?  My gut reaction is
> > > to provide a common interface, but I can see how that could do more harm than
> > > good, e.g. some amount of hardware capabilitiy discovery is possible with
> > > concrete names.  And I'm guessing there's already a fair amount of vendor
> > > specific knowledge bleeding into userspace for these features...
> 
> I agree with you that the abstract name is better than the concrete
> name, I also feel that we must provide HW extensions. Here is one
> approach:
> 
> Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
> suggestions)
> 
> Control files: slots.{max, current, events}
> 
> Contents of the slot.max:
> default max
> 	default: Corresponds to all kinds of encryption capabilities on
> 		 a platform. For AMD, it will be SEV and SEV-ES.  For
> 		 Intel, it will be TDX and MKTME. This can also be used
> 		 by other devices not just CPU.
> 
> 	max: max or any number to denote limit on the cgroup.
> 
> A user who wants the finer control, then they need to know about the
> capabilities a platform provides and use them, e.g. on AMD:
> 
> $ echo "sev-es 1000" > slot.max
> $ cat slots.max
> default max sev-es 1000
> 
> This means that in the cgroup maximum SEV-ES ASIDs which can be used is
> 1000 and SEV ASIDs is max (default, no limit).  Each platform can
> provide their own extensions which can be overwritten by a user,
> otherwise extensions will have the default limit.
> 
> This is kind of similar to the IO and the rdma controller.
> 
> I think it is keeping abstraction for userspace and also providing finer
> control for HW specific features.
> 
> What do you think about the above approach?  
> 
Hi Sean,

Any feedback/concern for the above abstraction approach?

Thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-10-02 20:48           ` Vipin Sharma
@ 2020-11-03  2:06             ` Sean Christopherson
  2020-11-14  0:26               ` David Rientjes
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2020-11-03  2:06 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: thomas.lendacky, pbonzini, tj, lizefan, joro, corbet,
	brijesh.singh, jon.grimm, eric.vantassell, gingell, rientjes,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:
> On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:
> > I agree with you that the abstract name is better than the concrete
> > name, I also feel that we must provide HW extensions. Here is one
> > approach:
> > 
> > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
> > suggestions)
> > 
> > Control files: slots.{max, current, events}

I don't particularly like the "slots" name, mostly because it could be confused
with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
love those names either, but "encryption" and "IDs" are the two obvious
commonalities betwee TDX's encryption key IDs and SEV's encryption address
space IDs.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-09-22  1:22     ` Sean Christopherson
  2020-09-22 16:05       ` Vipin Sharma
@ 2020-11-03 16:39       ` James Bottomley
  2020-11-03 18:10         ` Sean Christopherson
  1 sibling, 1 reply; 29+ messages in thread
From: James Bottomley @ 2020-11-03 16:39 UTC (permalink / raw)
  To: Sean Christopherson, Randy Dunlap
  Cc: Vipin Sharma, thomas.lendacky, pbonzini, tj, lizefan, joro,
	corbet, brijesh.singh, jon.grimm, eric.vantassell, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Dionna Glaze, Erdem Aktas

On Mon, 2020-09-21 at 18:22 -0700, Sean Christopherson wrote:
> On Mon, Sep 21, 2020 at 06:04:04PM -0700, Randy Dunlap wrote:
> > Hi,
> > 
> > On 9/21/20 5:40 PM, Vipin Sharma wrote:
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index d6a0b31b13dc..1a57c362b803 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1101,6 +1101,20 @@ config CGROUP_BPF
> > >  	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path
> > > of
> > >  	  inet sockets.
> > >  
> > > +config CGROUP_SEV
> > > +	bool "SEV ASID controller"
> > > +	depends on KVM_AMD_SEV
> > > +	default n
> > > +	help
> > > +	  Provides a controller for AMD SEV ASIDs. This controller
> > > limits and
> > > +	  shows the total usage of SEV ASIDs used in encrypted VMs on
> > > AMD
> > > +	  processors. Whenever a new encrypted VM is created using SEV
> > > on an
> > > +	  AMD processor, this controller will check the current limit
> > > in the
> > > +	  cgroup to which the task belongs and will deny the SEV ASID
> > > if the
> > > +	  cgroup has already reached its limit.
> > > +
> > > +	  Say N if unsure.
> > 
> > Something here (either in the bool prompt string or the help text)
> > should let a reader know w.t.h. SEV means.
> > 
> > Without having to look in other places...
> 
> ASIDs too.  I'd also love to see more info in the docs and/or cover
> letter to explain why ASID management on SEV requires a cgroup.  I
> know what an ASID is, and have a decent idea of how KVM manages ASIDs
> for legacy VMs, but I know nothing about why ASIDs are limited for
> SEV and not legacy VMs.

Well, also, why would we only have a cgroup for ASIDs but not MSIDs?

For the reader at home a Space ID (SID) is simply a tag that can be
placed on a cache line to control things like flushing.  Intel and AMD
use MSIDs which are allocated per process to allow fast context
switching by flushing all the process pages using a flush by SID. 
ASIDs are also used by both Intel and AMD to control nested/extended
paging of virtual machines, so ASIDs are allocated per VM.  So far it's
universal.

AMD invented a mechanism for tying their memory encryption technology
to the ASID asserted on the memory bus, so now they can do encrypted
virtual machines since each VM is tagged by ASID which the memory
encryptor sees.  It is suspected that the forthcoming intel TDX
technology to encrypt VMs will operate in the same way as well.  This
isn't everything you have to do to get an encrypted VM, but it's a core
part of it.

The problem with SIDs (both A and M) is that they get crammed into
spare bits in the CPU (like the upper bits of %CR3 for MSID) so we
don't have enough of them to do a 1:1 mapping of MSID to process or
ASID to VM.  Thus we have to ration them somewhat, which is what I
assume this patch is about?

James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-11-03 16:39       ` James Bottomley
@ 2020-11-03 18:10         ` Sean Christopherson
  2020-11-03 22:43           ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2020-11-03 18:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Randy Dunlap, Vipin Sharma, thomas.lendacky, pbonzini, tj,
	lizefan, joro, corbet, brijesh.singh, jon.grimm, eric.vantassell,
	gingell, rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Dionna Glaze, Erdem Aktas

On Tue, Nov 03, 2020 at 08:39:12AM -0800, James Bottomley wrote:
> On Mon, 2020-09-21 at 18:22 -0700, Sean Christopherson wrote:
> > ASIDs too.  I'd also love to see more info in the docs and/or cover
> > letter to explain why ASID management on SEV requires a cgroup.  I
> > know what an ASID is, and have a decent idea of how KVM manages ASIDs
> > for legacy VMs, but I know nothing about why ASIDs are limited for
> > SEV and not legacy VMs.
> 
> Well, also, why would we only have a cgroup for ASIDs but not MSIDs?

Assuming MSID==PCID in Intel terminology, which may be a bad assumption, the
answer is that rationing PCIDs is a fools errand, at least on Intel CPUs.

> For the reader at home a Space ID (SID) is simply a tag that can be
> placed on a cache line to control things like flushing.  Intel and AMD
> use MSIDs which are allocated per process to allow fast context
> switching by flushing all the process pages using a flush by SID. 
> ASIDs are also used by both Intel and AMD to control nested/extended
> paging of virtual machines, so ASIDs are allocated per VM.  So far it's
> universal.

On Intel CPUs, multiple things factor into the actual ASID that is used to tag
TLB entries.  And underneath the hood, there are a _very_ limited number of
ASIDs that are globally shared, i.e. a process in the host has an ASID, same
as a process in the guest, and the CPU only supports tagging translations for
N ASIDs at any given time.

E.g. with TDX, the hardware/real ASID is derived from:

   VPID + PCID + SEAM + EPTP

where VPID=0 for host, PCID=0 if PCID is disabled, SEAM=1 for the TDX-Module
and TDX VMs, and obviously EPTP is invalid/ignored when EPT is disabled.

> AMD invented a mechanism for tying their memory encryption technology
> to the ASID asserted on the memory bus, so now they can do encrypted
> virtual machines since each VM is tagged by ASID which the memory
> encryptor sees.  It is suspected that the forthcoming intel TDX
> technology to encrypt VMs will operate in the same way as well.  This

TDX uses MKTME keys, which are not tied to the ASID.  The KeyID is part of the
physical address, at least in the initial hardware implementations, which means
that from a memory perspective, each KeyID is a unique physical address.  This
is completely orthogonal to ASIDs, e.g. a given KeyID+PA combo can have
mutliple TLB entries if it's accessed by multiple ASIDs.

> isn't everything you have to do to get an encrypted VM, but it's a core
> part of it.
> 
> The problem with SIDs (both A and M) is that they get crammed into
> spare bits in the CPU (like the upper bits of %CR3 for MSID) so we

This CR3 reference is why I assume MSID==PCID, but the PCID is carved out of
the lower bits (11:0) of CR3, which is why I'm unsure I interpreted this
correctly.

> don't have enough of them to do a 1:1 mapping of MSID to process or
> ASID to VM.  Thus we have to ration them somewhat, which is what I
> assume this patch is about?

This cgroup is more about a hard limitation than about performance.

With PCIDs, VPIDs, and AMD's ASIDs, there is always the option of recycling an
existing ID (used for PCIDs and ASIDs), or simply disabling the feature (used
for VPIDs).  In both cases, exhausting the resource affects performance due to
incurring TLB flushes at transition points, but doesn't prevent creating new
processes/VMs.

And due to the way PCID=>ASID derivation works on Intel CPUs, the kernel
doesn't even bother trying to use a large number of PCIDs.  IIRC, the current
number of PCIDs used by the kernel is 5, i.e. the kernel intentionally
recycles PCIDs long before it's forced to do so by the architectural
limitation of 4k PCIDs, because using more than 5 PCIDs actually hurts
performance (forced PCID recycling allows the kernel to keep *its* ASID live
by flushing userspace PCIDs, whereas CPU recycling of ASIDs is indiscriminate).

MKTME KeyIDs and SEV ASIDs are different.  There is a hard, relatively low
limit on the number of IDs that are available, and exhausting that pool
effectively prevents creating a new encrypted VM[*].  E.g. with TDX, on first
gen hardware there is a hard limit of 127 KeyIDs that can be used to create
TDX VMs.  IIRC, SEV-ES is capped 512 or so ASIDs.  Hitting that cap means no
more protected VMs can be created.

[*] KeyID exhaustion for TDX is a hard restriction, the old VM _must_ be torn
    down to reuse the KeyID.  ASID exhaustion for SEV is not technically a
    hard limit, e.g. KVM could theoretically park a VM to reuse its ASID, but
    for all intents and purposes that VM is no longer live.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller.
  2020-11-03 18:10         ` Sean Christopherson
@ 2020-11-03 22:43           ` James Bottomley
  0 siblings, 0 replies; 29+ messages in thread
From: James Bottomley @ 2020-11-03 22:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Randy Dunlap, Vipin Sharma, thomas.lendacky, pbonzini, tj,
	lizefan, joro, corbet, brijesh.singh, jon.grimm, eric.vantassell,
	gingell, rientjes, kvm, x86, cgroups, linux-doc, linux-kernel,
	Dionna Glaze, Erdem Aktas

On Tue, 2020-11-03 at 10:10 -0800, Sean Christopherson wrote:
> On Tue, Nov 03, 2020 at 08:39:12AM -0800, James Bottomley wrote:
> > On Mon, 2020-09-21 at 18:22 -0700, Sean Christopherson wrote:
> > > ASIDs too.  I'd also love to see more info in the docs and/or
> > > cover letter to explain why ASID management on SEV requires a
> > > cgroup.  I know what an ASID is, and have a decent idea of how
> > > KVM manages ASIDs for legacy VMs, but I know nothing about why
> > > ASIDs are limited for SEV and not legacy VMs.
> > 
> > Well, also, why would we only have a cgroup for ASIDs but not
> > MSIDs?
> 
> Assuming MSID==PCID in Intel terminology, which may be a bad
> assumption, the answer is that rationing PCIDs is a fools errand, at
> least on Intel CPUs.

Yes, sorry, I should probably have confessed that I'm most used to
parisc SIDs, which are additional 32 bit qualifiers the CPU explicitly
adds to every virtual address.  The perform exactly the same function,
though except they're a bit more explicit (and we have more bits).  On
PA every virtual address is actually a GVA consisting of 32 bit of SID
and 64 bits of VA and we use this 96 byte address for virtual indexing
and things.  And parisc doesn't have virtualization acceleration so we
only have one type of SID.

Thanks for the rest of the elaboration.

James



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-03  2:06             ` Sean Christopherson
@ 2020-11-14  0:26               ` David Rientjes
  2020-11-24 19:16                 ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: David Rientjes @ 2020-11-14  0:26 UTC (permalink / raw)
  To: Sean Christopherson, Janosch Frank, Christian Borntraeger
  Cc: Vipin Sharma, Lendacky, Thomas, pbonzini, tj, lizefan, joro,
	corbet, Singh, Brijesh, Grimm, Jon, Van Tassell, Eric, gingell,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Mon, 2 Nov 2020, Sean Christopherson wrote:

> On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:
> > On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:
> > > I agree with you that the abstract name is better than the concrete
> > > name, I also feel that we must provide HW extensions. Here is one
> > > approach:
> > > 
> > > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to
> > > suggestions)
> > > 
> > > Control files: slots.{max, current, events}
> 
> I don't particularly like the "slots" name, mostly because it could be confused
> with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
> love those names either, but "encryption" and "IDs" are the two obvious
> commonalities betwee TDX's encryption key IDs and SEV's encryption address
> space IDs.
> 

Looping Janosch and Christian back into the thread.

I interpret this suggestion as
encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel 
offerings, which was my thought on this as well.

Certainly the kernel could provide a single interface for all of these and 
key value pairs depending on the underlying encryption technology but it 
seems to only introduce additional complexity in the kernel in string 
parsing that can otherwise be avoided.  I think we all agree that a single 
interface for all encryption keys or one-value-per-file could be done in 
the kernel and handled by any userspace agent that is configuring these 
values.

I think Vipin is adding a root level file that describes how many keys we 
have available on the platform for each technology.  So I think this comes 
down to, for example, a single encryption.max file vs 
encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned 
separately so we treat them as their own resource here.

So which is easier?

$ cat encryption.sev.max
10
$ echo -n 15 > encryption.sev.max

or

$ cat encryption.max
sev 10
sev_es 10
keyid 0
$ echo -n "sev 10" > encryption.max

I would argue the former is simplest (always preferring 
one-value-per-file) and avoids any string parsing or resource controller 
lookups that need to match on that string in the kernel.

The set of encryption.{sev,sev_es,keyid} files that exist would depend on
CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or 
CONFIG_INTEL_TDX is configured.  Both can be configured so we have all 
three files, but the root file will obviously indicate 0 keys available 
for one of them (can't run on AMD and Intel at the same time :).

So I'm inclined to suggest that the one-value-per-file format is the ideal 
way to go unless there are objections to it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-14  0:26               ` David Rientjes
@ 2020-11-24 19:16                 ` Sean Christopherson
  2020-11-24 19:49                   ` Vipin Sharma
  2020-11-27 18:01                   ` Christian Borntraeger
  0 siblings, 2 replies; 29+ messages in thread
From: Sean Christopherson @ 2020-11-24 19:16 UTC (permalink / raw)
  To: David Rientjes
  Cc: Janosch Frank, Christian Borntraeger, Vipin Sharma, Lendacky,
	Thomas, pbonzini, tj, lizefan, joro, corbet, Singh, Brijesh,
	Grimm, Jon, VanTassell, Eric, gingell, kvm, x86, cgroups,
	linux-doc, linux-kernel

On Fri, Nov 13, 2020, David Rientjes wrote:                                     
>                                                                               
> On Mon, 2 Nov 2020, Sean Christopherson wrote:                                
>                                                                               
> > On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:               
> > > On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:             
> > > > I agree with you that the abstract name is better than the concrete     
> > > > name, I also feel that we must provide HW extensions. Here is one       
> > > > approach:                                                               
> > > >                                                                         
> > > > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to     
> > > > suggestions)                                                            
> > > >                                                                         
> > > > Control files: slots.{max, current, events}                             
> >                                                                             
> > I don't particularly like the "slots" name, mostly because it could be confused
> > with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
> > love those names either, but "encryption" and "IDs" are the two obvious     
> > commonalities betwee TDX's encryption key IDs and SEV's encryption address  
> > space IDs.                                                                  
> >                                                                             
>                                                                               
> Looping Janosch and Christian back into the thread.                           
>                                                                               
> I interpret this suggestion as                                                
> encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         

I think it makes sense to use encryption_ids instead of simply encryption, that
way it's clear the cgroup is accounting ids as opposed to restricting what
techs can be used on yes/no basis.

> offerings, which was my thought on this as well.                              
>                                                                               
> Certainly the kernel could provide a single interface for all of these and    
> key value pairs depending on the underlying encryption technology but it      
> seems to only introduce additional complexity in the kernel in string         
> parsing that can otherwise be avoided.  I think we all agree that a single    
> interface for all encryption keys or one-value-per-file could be done in      
> the kernel and handled by any userspace agent that is configuring these       
> values.                                                                       
>                                                                               
> I think Vipin is adding a root level file that describes how many keys we     
> have available on the platform for each technology.  So I think this comes    
> down to, for example, a single encryption.max file vs                         
> encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      

Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
no need to enumerate platform total, but not knowing how many of the allowed IDs
have been allocated seems problematic.

> separately so we treat them as their own resource here.                       
>                                                                               
> So which is easier?                                                           
>                                                                               
> $ cat encryption.sev.max                                                      
> 10                                                                            
> $ echo -n 15 > encryption.sev.max                                             
>                                                                               
> or                                                                            
>                                                                               
> $ cat encryption.max                                                          
> sev 10                                                                        
> sev_es 10                                                                     
> keyid 0                                                                       
> $ echo -n "sev 10" > encryption.max                                           
>                                                                               
> I would argue the former is simplest (always preferring                       
> one-value-per-file) and avoids any string parsing or resource controller      
> lookups that need to match on that string in the kernel.                      

Ya, I prefer individual files as well.

I don't think "keyid" is the best name for TDX, it doesn't leave any wiggle room
if there are other flavors of key IDs on Intel platform, e.g. private vs. shared
in the future.  It's also inconsistent with the SEV names, e.g. "asid" isn't
mentioned anywhere.  And "keyid" sort of reads as "max key id", rather than "max
number of keyids".  Maybe "tdx_private", or simply "tdx"?  Doesn't have to be
solved now though, there's plenty of time before TDX will be upstream. :-)

> The set of encryption.{sev,sev_es,keyid} files that exist would depend on     
> CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or                
> CONFIG_INTEL_TDX is configured.  Both can be configured so we have all        
> three files, but the root file will obviously indicate 0 keys available       
> for one of them (can't run on AMD and Intel at the same time :).              
>                                                                               
> So I'm inclined to suggest that the one-value-per-file format is the ideal    
> way to go unless there are objections to it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 19:16                 ` Sean Christopherson
@ 2020-11-24 19:49                   ` Vipin Sharma
  2020-11-24 20:18                     ` David Rientjes
  2020-11-27 18:01                   ` Christian Borntraeger
  1 sibling, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-11-24 19:49 UTC (permalink / raw)
  To: Sean Christopherson, rientjes
  Cc: Janosch Frank, Christian Borntraeger, Lendacky, Thomas, pbonzini,
	tj, lizefan, joro, corbet, Singh, Brijesh, Grimm, Jon,
	VanTassell, Eric, gingell, kvm, x86, cgroups, linux-doc,
	linux-kernel

On Tue, Nov 24, 2020 at 07:16:29PM +0000, Sean Christopherson wrote:
> On Fri, Nov 13, 2020, David Rientjes wrote:                                     
> >                                                                               
> > On Mon, 2 Nov 2020, Sean Christopherson wrote:                                
> >                                                                               
> > > On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:               
> > > > On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:             
> > > > > I agree with you that the abstract name is better than the concrete     
> > > > > name, I also feel that we must provide HW extensions. Here is one       
> > > > > approach:                                                               
> > > > >                                                                         
> > > > > Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to     
> > > > > suggestions)                                                            
> > > > >                                                                         
> > > > > Control files: slots.{max, current, events}                             
> > >                                                                             
> > > I don't particularly like the "slots" name, mostly because it could be confused
> > > with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
> > > love those names either, but "encryption" and "IDs" are the two obvious     
> > > commonalities betwee TDX's encryption key IDs and SEV's encryption address  
> > > space IDs.                                                                  
> > >                                                                             
> >                                                                               
> > Looping Janosch and Christian back into the thread.                           
> >                                                                               
> > I interpret this suggestion as                                                
> > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> 
> I think it makes sense to use encryption_ids instead of simply encryption, that
> way it's clear the cgroup is accounting ids as opposed to restricting what
> techs can be used on yes/no basis.
> 
> > offerings, which was my thought on this as well.                              
> >                                                                               
> > Certainly the kernel could provide a single interface for all of these and    
> > key value pairs depending on the underlying encryption technology but it      
> > seems to only introduce additional complexity in the kernel in string         
> > parsing that can otherwise be avoided.  I think we all agree that a single    
> > interface for all encryption keys or one-value-per-file could be done in      
> > the kernel and handled by any userspace agent that is configuring these       
> > values.                                                                       
> >                                                                               
> > I think Vipin is adding a root level file that describes how many keys we     
> > have available on the platform for each technology.  So I think this comes    
> > down to, for example, a single encryption.max file vs                         
> > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> 
> Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> no need to enumerate platform total, but not knowing how many of the allowed IDs
> have been allocated seems problematic.
> 

We will be showing encryption_ids.{sev,sev_es}.{max,current}
I am inclined to not provide "events" as I am not using it, let me know
if this file is required, I can provide it then.

I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
total available ids on the platform. This one will be useful for
scheduling jobs in the cloud infrastructure based on total supported
capacity.

> > separately so we treat them as their own resource here.                       
> >                                                                               
> > So which is easier?                                                           
> >                                                                               
> > $ cat encryption.sev.max                                                      
> > 10                                                                            
> > $ echo -n 15 > encryption.sev.max                                             
> >                                                                               
> > or                                                                            
> >                                                                               
> > $ cat encryption.max                                                          
> > sev 10                                                                        
> > sev_es 10                                                                     
> > keyid 0                                                                       
> > $ echo -n "sev 10" > encryption.max                                           
> >                                                                               
> > I would argue the former is simplest (always preferring                       
> > one-value-per-file) and avoids any string parsing or resource controller      
> > lookups that need to match on that string in the kernel.                      
> 
> Ya, I prefer individual files as well.
> 
> I don't think "keyid" is the best name for TDX, it doesn't leave any wiggle room
> if there are other flavors of key IDs on Intel platform, e.g. private vs. shared
> in the future.  It's also inconsistent with the SEV names, e.g. "asid" isn't
> mentioned anywhere.  And "keyid" sort of reads as "max key id", rather than "max
> number of keyids".  Maybe "tdx_private", or simply "tdx"?  Doesn't have to be
> solved now though, there's plenty of time before TDX will be upstream. :-)
> 
> > The set of encryption.{sev,sev_es,keyid} files that exist would depend on     
> > CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or                
> > CONFIG_INTEL_TDX is configured.  Both can be configured so we have all        
> > three files, but the root file will obviously indicate 0 keys available       
> > for one of them (can't run on AMD and Intel at the same time :).              
> >                                                                               
> > So I'm inclined to suggest that the one-value-per-file format is the ideal    
> > way to go unless there are objections to it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 19:49                   ` Vipin Sharma
@ 2020-11-24 20:18                     ` David Rientjes
  2020-11-24 21:08                       ` Vipin Sharma
  0 siblings, 1 reply; 29+ messages in thread
From: David Rientjes @ 2020-11-24 20:18 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: Sean Christopherson, Janosch Frank, Christian Borntraeger,
	Lendacky, Thomas, pbonzini, tj, lizefan, joro, corbet, Singh,
	Brijesh, Grimm, Jon, VanTassell, Eric, gingell, kvm, x86,
	cgroups, linux-doc, linux-kernel

On Tue, 24 Nov 2020, Vipin Sharma wrote:

> > > Looping Janosch and Christian back into the thread.                           
> > >                                                                               
> > > I interpret this suggestion as                                                
> > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > 
> > I think it makes sense to use encryption_ids instead of simply encryption, that
> > way it's clear the cgroup is accounting ids as opposed to restricting what
> > techs can be used on yes/no basis.
> > 

Agreed.

> > > offerings, which was my thought on this as well.                              
> > >                                                                               
> > > Certainly the kernel could provide a single interface for all of these and    
> > > key value pairs depending on the underlying encryption technology but it      
> > > seems to only introduce additional complexity in the kernel in string         
> > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > interface for all encryption keys or one-value-per-file could be done in      
> > > the kernel and handled by any userspace agent that is configuring these       
> > > values.                                                                       
> > >                                                                               
> > > I think Vipin is adding a root level file that describes how many keys we     
> > > have available on the platform for each technology.  So I think this comes    
> > > down to, for example, a single encryption.max file vs                         
> > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > 
> > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > have been allocated seems problematic.
> > 
> 
> We will be showing encryption_ids.{sev,sev_es}.{max,current}
> I am inclined to not provide "events" as I am not using it, let me know
> if this file is required, I can provide it then.
> 
> I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> total available ids on the platform. This one will be useful for
> scheduling jobs in the cloud infrastructure based on total supported
> capacity.
> 

Makes sense.  I assume the stat file is only at the cgroup root level 
since it would otherwise be duplicating its contents in every cgroup under 
it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
but max = 100 :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 20:18                     ` David Rientjes
@ 2020-11-24 21:08                       ` Vipin Sharma
  2020-11-24 21:27                         ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-11-24 21:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: Sean Christopherson, Janosch Frank, Christian Borntraeger,
	Thomas, pbonzini, tj, lizefan, joro, corbet, Brijesh, Jon, Eric,
	gingell, kvm, x86, cgroups, linux-doc, linux-kernel

On Tue, Nov 24, 2020 at 12:18:45PM -0800, David Rientjes wrote:
> On Tue, 24 Nov 2020, Vipin Sharma wrote:
> 
> > > > Looping Janosch and Christian back into the thread.                           
> > > >                                                                               
> > > > I interpret this suggestion as                                                
> > > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > > 
> > > I think it makes sense to use encryption_ids instead of simply encryption, that
> > > way it's clear the cgroup is accounting ids as opposed to restricting what
> > > techs can be used on yes/no basis.
> > > 
> 
> Agreed.
> 
> > > > offerings, which was my thought on this as well.                              
> > > >                                                                               
> > > > Certainly the kernel could provide a single interface for all of these and    
> > > > key value pairs depending on the underlying encryption technology but it      
> > > > seems to only introduce additional complexity in the kernel in string         
> > > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > > interface for all encryption keys or one-value-per-file could be done in      
> > > > the kernel and handled by any userspace agent that is configuring these       
> > > > values.                                                                       
> > > >                                                                               
> > > > I think Vipin is adding a root level file that describes how many keys we     
> > > > have available on the platform for each technology.  So I think this comes    
> > > > down to, for example, a single encryption.max file vs                         
> > > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > > 
> > > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > > have been allocated seems problematic.
> > > 
> > 
> > We will be showing encryption_ids.{sev,sev_es}.{max,current}
> > I am inclined to not provide "events" as I am not using it, let me know
> > if this file is required, I can provide it then.
> > 
> > I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> > total available ids on the platform. This one will be useful for
> > scheduling jobs in the cloud infrastructure based on total supported
> > capacity.
> > 
> 
> Makes sense.  I assume the stat file is only at the cgroup root level 
> since it would otherwise be duplicating its contents in every cgroup under 
> it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
> but max = 100 :)

Yes, only at root.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 21:08                       ` Vipin Sharma
@ 2020-11-24 21:27                         ` Sean Christopherson
  2020-11-24 22:21                           ` Vipin Sharma
  0 siblings, 1 reply; 29+ messages in thread
From: Sean Christopherson @ 2020-11-24 21:27 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: David Rientjes, Janosch Frank, Christian Borntraeger, Thomas,
	pbonzini, tj, lizefan, joro, corbet, Brijesh, Jon, Eric, gingell,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Tue, Nov 24, 2020, Vipin Sharma wrote:
> On Tue, Nov 24, 2020 at 12:18:45PM -0800, David Rientjes wrote:
> > On Tue, 24 Nov 2020, Vipin Sharma wrote:
> > 
> > > > > Looping Janosch and Christian back into the thread.                           
> > > > >                                                                               
> > > > > I interpret this suggestion as                                                
> > > > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > > > 
> > > > I think it makes sense to use encryption_ids instead of simply encryption, that
> > > > way it's clear the cgroup is accounting ids as opposed to restricting what
> > > > techs can be used on yes/no basis.
> > > > 
> > 
> > Agreed.
> > 
> > > > > offerings, which was my thought on this as well.                              
> > > > >                                                                               
> > > > > Certainly the kernel could provide a single interface for all of these and    
> > > > > key value pairs depending on the underlying encryption technology but it      
> > > > > seems to only introduce additional complexity in the kernel in string         
> > > > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > > > interface for all encryption keys or one-value-per-file could be done in      
> > > > > the kernel and handled by any userspace agent that is configuring these       
> > > > > values.                                                                       
> > > > >                                                                               
> > > > > I think Vipin is adding a root level file that describes how many keys we     
> > > > > have available on the platform for each technology.  So I think this comes    
> > > > > down to, for example, a single encryption.max file vs                         
> > > > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > > > 
> > > > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > > > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > > > have been allocated seems problematic.
> > > > 
> > > 
> > > We will be showing encryption_ids.{sev,sev_es}.{max,current}
> > > I am inclined to not provide "events" as I am not using it, let me know
> > > if this file is required, I can provide it then.

I've no objection to omitting current until it's needed.

> > > I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> > > total available ids on the platform. This one will be useful for
> > > scheduling jobs in the cloud infrastructure based on total supported
> > > capacity.
> > > 
> > 
> > Makes sense.  I assume the stat file is only at the cgroup root level 
> > since it would otherwise be duplicating its contents in every cgroup under 
> > it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
> > but max = 100 :)
> 
> Yes, only at root.

Is a root level stat file needed?  Can't the infrastructure do .max - .current
on the root cgroup to calculate the number of available ids in the system?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 21:27                         ` Sean Christopherson
@ 2020-11-24 22:21                           ` Vipin Sharma
  2020-11-24 23:18                             ` Sean Christopherson
  0 siblings, 1 reply; 29+ messages in thread
From: Vipin Sharma @ 2020-11-24 22:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Rientjes, Janosch Frank, Christian Borntraeger, Thomas,
	pbonzini, tj, lizefan, joro, corbet, Brijesh, Jon, Eric, gingell,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Tue, Nov 24, 2020 at 09:27:25PM +0000, Sean Christopherson wrote:
> On Tue, Nov 24, 2020, Vipin Sharma wrote:
> > On Tue, Nov 24, 2020 at 12:18:45PM -0800, David Rientjes wrote:
> > > On Tue, 24 Nov 2020, Vipin Sharma wrote:
> > > 
> > > > > > Looping Janosch and Christian back into the thread.                           
> > > > > >                                                                               
> > > > > > I interpret this suggestion as                                                
> > > > > > encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> > > > > 
> > > > > I think it makes sense to use encryption_ids instead of simply encryption, that
> > > > > way it's clear the cgroup is accounting ids as opposed to restricting what
> > > > > techs can be used on yes/no basis.
> > > > > 
> > > 
> > > Agreed.
> > > 
> > > > > > offerings, which was my thought on this as well.                              
> > > > > >                                                                               
> > > > > > Certainly the kernel could provide a single interface for all of these and    
> > > > > > key value pairs depending on the underlying encryption technology but it      
> > > > > > seems to only introduce additional complexity in the kernel in string         
> > > > > > parsing that can otherwise be avoided.  I think we all agree that a single    
> > > > > > interface for all encryption keys or one-value-per-file could be done in      
> > > > > > the kernel and handled by any userspace agent that is configuring these       
> > > > > > values.                                                                       
> > > > > >                                                                               
> > > > > > I think Vipin is adding a root level file that describes how many keys we     
> > > > > > have available on the platform for each technology.  So I think this comes    
> > > > > > down to, for example, a single encryption.max file vs                         
> > > > > > encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> > > > > 
> > > > > Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> > > > > no need to enumerate platform total, but not knowing how many of the allowed IDs
> > > > > have been allocated seems problematic.
> > > > > 
> > > > 
> > > > We will be showing encryption_ids.{sev,sev_es}.{max,current}
> > > > I am inclined to not provide "events" as I am not using it, let me know
> > > > if this file is required, I can provide it then.
> 
> I've no objection to omitting current until it's needed.
> 
> > > > I will provide an encryption_ids.{sev,sev_es}.stat file, which shows
> > > > total available ids on the platform. This one will be useful for
> > > > scheduling jobs in the cloud infrastructure based on total supported
> > > > capacity.
> > > > 
> > > 
> > > Makes sense.  I assume the stat file is only at the cgroup root level 
> > > since it would otherwise be duplicating its contents in every cgroup under 
> > > it.  Probably not very helpful for child cgroup to see stat = 509 ASIDs 
> > > but max = 100 :)
> > 
> > Yes, only at root.
> 
> Is a root level stat file needed?  Can't the infrastructure do .max - .current
> on the root cgroup to calculate the number of available ids in the system?

For an efficient scheduling of workloads in the cloud infrastructure, a
scheduler needs to know the total capacity supported and the current
usage of the host to get the overall picture. There are some issues with
.max -.current approach:

1. Cgroup v2 convention is to not put resource control files in the
   root. This will mean we need to sum (.max -.current) in all of the
   immediate children of the root.

2. .max can have any limit unless we add a check to not allow a user to
   set any value more than the supported one. This will theoretically
   change the encryption_ids cgroup resource distribution model from the
   limit based to the allocation based. It will require the same
   validations in the children cgroups. I think providing separate file on
   the root is a simpler approach.

For someone to set the max limit, they need to know what is the
supported capacity. In the case of SEV and SEV-ES it is not shown
anywhere and the only way to know this is to use a CPUID instructions.
The "stat" file will provide an easy way to know it.

Since current approach is not migrating charges, this means when a
process migrates to an another cgroup and the old cgroup is deleted
(user won't see it but it will be present in the cgroup hierarchy
internally), we cannot get the correct usage by going through other
cgroup directories in this case.

I am suggesting that the root stat file should show both available and
used information.
$ cat encryption_ids.sev.stat
total 509
used 102

It will be very easy for a cloud scheduler to retrieve the system state
quickly.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 22:21                           ` Vipin Sharma
@ 2020-11-24 23:18                             ` Sean Christopherson
  0 siblings, 0 replies; 29+ messages in thread
From: Sean Christopherson @ 2020-11-24 23:18 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: David Rientjes, Janosch Frank, Christian Borntraeger, Thomas,
	pbonzini, tj, lizefan, joro, corbet, Brijesh, Jon, Eric, gingell,
	kvm, x86, cgroups, linux-doc, linux-kernel

On Tue, Nov 24, 2020, Vipin Sharma wrote:
> On Tue, Nov 24, 2020 at 09:27:25PM +0000, Sean Christopherson wrote:
> > Is a root level stat file needed?  Can't the infrastructure do .max - .current
> > on the root cgroup to calculate the number of available ids in the system?
> 
> For an efficient scheduling of workloads in the cloud infrastructure, a
> scheduler needs to know the total capacity supported and the current
> usage of the host to get the overall picture. There are some issues with
> .max -.current approach:
> 
> 1. Cgroup v2 convention is to not put resource control files in the
>    root. This will mean we need to sum (.max -.current) in all of the
>    immediate children of the root.

Ah, that's annoying.  Now that you mention it, I do vaguely recall this behavior.
 
> 2. .max can have any limit unless we add a check to not allow a user to
>    set any value more than the supported one. 

Duh, didn't think that one through.

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs
  2020-11-24 19:16                 ` Sean Christopherson
  2020-11-24 19:49                   ` Vipin Sharma
@ 2020-11-27 18:01                   ` Christian Borntraeger
  1 sibling, 0 replies; 29+ messages in thread
From: Christian Borntraeger @ 2020-11-27 18:01 UTC (permalink / raw)
  To: Sean Christopherson, David Rientjes
  Cc: Janosch Frank, Vipin Sharma, Lendacky, Thomas, pbonzini, tj,
	lizefan, joro, corbet, Singh, Brijesh, Grimm, Jon, VanTassell,
	Eric, gingell, kvm, x86, cgroups, linux-doc, linux-kernel



On 24.11.20 20:16, Sean Christopherson wrote:
> On Fri, Nov 13, 2020, David Rientjes wrote:                                     
>>                                                                               
>> On Mon, 2 Nov 2020, Sean Christopherson wrote:                                
>>                                                                               
>>> On Fri, Oct 02, 2020 at 01:48:10PM -0700, Vipin Sharma wrote:               
>>>> On Fri, Sep 25, 2020 at 03:22:20PM -0700, Vipin Sharma wrote:             
>>>>> I agree with you that the abstract name is better than the concrete     
>>>>> name, I also feel that we must provide HW extensions. Here is one       
>>>>> approach:                                                               
>>>>>                                                                         
>>>>> Cgroup name: cpu_encryption, encryption_slots, or memcrypt (open to     
>>>>> suggestions)                                                            
>>>>>                                                                         
>>>>> Control files: slots.{max, current, events}                             
>>>                                                                             
>>> I don't particularly like the "slots" name, mostly because it could be confused
>>> with KVM's memslots.  Maybe encryption_ids.ids.{max, current, events}?  I don't
>>> love those names either, but "encryption" and "IDs" are the two obvious     
>>> commonalities betwee TDX's encryption key IDs and SEV's encryption address  
>>> space IDs.                                                                  
>>>                                                                             
>>                                                                               
>> Looping Janosch and Christian back into the thread.                           
>>                                                                               
>> I interpret this suggestion as                                                
>> encryption.{sev,sev_es,keyids}.{max,current,events} for AMD and Intel         
> 
> I think it makes sense to use encryption_ids instead of simply encryption, that
> way it's clear the cgroup is accounting ids as opposed to restricting what
> techs can be used on yes/no basis.

For what its worth the IDs for s390x are called SEIDs (secure execution IDs)

> 
>> offerings, which was my thought on this as well.                              
>>                                                                               
>> Certainly the kernel could provide a single interface for all of these and    
>> key value pairs depending on the underlying encryption technology but it      
>> seems to only introduce additional complexity in the kernel in string         
>> parsing that can otherwise be avoided.  I think we all agree that a single    
>> interface for all encryption keys or one-value-per-file could be done in      
>> the kernel and handled by any userspace agent that is configuring these       
>> values.                                                                       
>>                                                                               
>> I think Vipin is adding a root level file that describes how many keys we     
>> have available on the platform for each technology.  So I think this comes    
>> down to, for example, a single encryption.max file vs                         
>> encryption.{sev,sev_es,keyid}.max.  SEV and SEV-ES ASIDs are provisioned      
> 
> Are you suggesting that the cgroup omit "current" and "events"?  I agree there's
> no need to enumerate platform total, but not knowing how many of the allowed IDs
> have been allocated seems problematic.
> 
>> separately so we treat them as their own resource here.                       
>>                                                                               
>> So which is easier?                                                           
>>                                                                               
>> $ cat encryption.sev.max                                                      
>> 10                                                                            
>> $ echo -n 15 > encryption.sev.max                                             
>>                                                                               
>> or                                                                            
>>                                                                               
>> $ cat encryption.max                                                          
>> sev 10                                                                        
>> sev_es 10                                                                     
>> keyid 0                                                                       
>> $ echo -n "sev 10" > encryption.max                                           
>>                                                                               
>> I would argue the former is simplest (always preferring                       
>> one-value-per-file) and avoids any string parsing or resource controller      
>> lookups that need to match on that string in the kernel.                      

I like the idea of having encryption_ids.max for all platforms. 
If we go for individual files using "seid" for s390 seems the best name.

> 
> Ya, I prefer individual files as well.
> 
> I don't think "keyid" is the best name for TDX, it doesn't leave any wiggle room
> if there are other flavors of key IDs on Intel platform, e.g. private vs. shared
> in the future.  It's also inconsistent with the SEV names, e.g. "asid" isn't
> mentioned anywhere.  And "keyid" sort of reads as "max key id", rather than "max
> number of keyids".  Maybe "tdx_private", or simply "tdx"?  Doesn't have to be
> solved now though, there's plenty of time before TDX will be upstream. :-)
> 
>> The set of encryption.{sev,sev_es,keyid} files that exist would depend on     
>> CONFIG_CGROUP_ENCRYPTION and whether CONFIG_AMD_MEM_ENCRYPT or                
>> CONFIG_INTEL_TDX is configured.  Both can be configured so we have all        
>> three files, but the root file will obviously indicate 0 keys available       
>> for one of them (can't run on AMD and Intel at the same time :).              
>>                                                                               
>> So I'm inclined to suggest that the one-value-per-file format is the ideal    
>> way to go unless there are objections to it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-11-27 18:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22  0:40 [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Vipin Sharma
2020-09-22  0:40 ` [RFC Patch 1/2] KVM: SVM: Create SEV cgroup controller Vipin Sharma
2020-09-22  1:04   ` Randy Dunlap
2020-09-22  1:22     ` Sean Christopherson
2020-09-22 16:05       ` Vipin Sharma
2020-11-03 16:39       ` James Bottomley
2020-11-03 18:10         ` Sean Christopherson
2020-11-03 22:43           ` James Bottomley
2020-09-22  0:40 ` [RFC Patch 2/2] KVM: SVM: SEV cgroup controller documentation Vipin Sharma
2020-09-22  1:48 ` [RFC Patch 0/2] KVM: SVM: Cgroup support for SVM SEV ASIDs Sean Christopherson
2020-09-22 21:14   ` Vipin Sharma
     [not found]     ` <20200924192116.GC9649@linux.intel.com>
2020-09-24 19:55       ` Tom Lendacky
2020-09-25 22:22         ` Vipin Sharma
2020-10-02 20:48           ` Vipin Sharma
2020-11-03  2:06             ` Sean Christopherson
2020-11-14  0:26               ` David Rientjes
2020-11-24 19:16                 ` Sean Christopherson
2020-11-24 19:49                   ` Vipin Sharma
2020-11-24 20:18                     ` David Rientjes
2020-11-24 21:08                       ` Vipin Sharma
2020-11-24 21:27                         ` Sean Christopherson
2020-11-24 22:21                           ` Vipin Sharma
2020-11-24 23:18                             ` Sean Christopherson
2020-11-27 18:01                   ` Christian Borntraeger
2020-10-01 18:08         ` Peter Gonda
2020-10-01 22:44           ` Tom Lendacky
2020-09-23 12:47   ` Paolo Bonzini
2020-09-28  9:12     ` Janosch Frank
2020-09-28  9:21       ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).