All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-03-30  4:42 ` Vipin Sharma
  0 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-03-30  4:42 UTC (permalink / raw)
  To: tj, mkoutny, jacob.jun.pan, rdunlap, thomas.lendacky,
	brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes,
	frankja, borntraeger
  Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, hpa, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel, Vipin Sharma

Hello,

This patch series is creating a new misc cgroup controller for limiting
and tracking of resources which are not abstract like other cgroup
controllers.

This controller was initially proposed as encryption_id but after the
feedbacks and use cases for other resources, it is now changed to misc
cgroup.
https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh@google.com/

Most of the cloud infrastructure use cgroups for knowing the host state,
track the resources usage, enforce limits on them, etc. They use this
info to optimize work allocation in the fleet and make sure no rogue job
consumes more than it needs and starves others.

There are resources on a system which are not abstract enough like other
cgroup controllers and are available in a limited quantity on a host.

One of them is Secure Encrypted Virtualization (SEV) ASID on AMD CPU.
SEV ASIDs are used for creating encrypted VMs. SEV is mostly be used by
the cloud providers for providing confidential VMs. Since SEV ASIDs are
limited, there is a need to schedule encrypted VMs in a cloud
infrastructure based on SEV ASIDs availability and also to limit its
usage.

There are similar requirements for other resource types like TDX keys,
IOASIDs and SEID.

Adding these resources to a cgroup controller is a natural choice with
least amount of friction. Cgroup itself says it is a mechanism to
distribute system resources along the hierarchy in a controlled
mechanism and configurable manner. Most of the resources in cgroups are
abstracted enough but there are still some resources which are not
abstract but have limited availability or have specific use cases.

Misc controller is a generic controller which can be used by these
kinds of resources.

One suggestion was to use BPF for this purpose, however, there are
couple of things which might not be addressed with BPF:
1. Which controller to use in v1 case? These are not abstract resources
   so in v1 where each controller have their own hierarchy it might not
   be easy to identify the best controller to use for BPF.

2. Abstracting out a single BPF program which can help with all of the
   resources types might not be possible, because resources we are
   working with are not similar and abstract enough, for example network
   packets, and there will be different places in the source code to use
   these resources.

A new cgroup controller tends to give much easier and well integrated
solution when it comes to scheduling and limiting a resource with
existing tools in a cloud infrastructure.

Changes in RFC v4:
1. Misc controller patch is split into two patches. One for generic misc
   controller and second for adding SEV and SEV-ES resource.
2. Using READ_ONCE and WRITE_ONCE for variable accesses.
3. Updated documentation.
4. Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.
5. Included cgroup header in misc_cgroup.h.
6. misc_cg_reduce_charge changed to misc_cg_cancel_charge.
7. misc_cg set to NULL after uncharge.
8. Added WARN_ON if misc_cg not NULL before charging in SEV/SEV-ES.

Changes in RFC v3:
1. Changed implementation to support 64 bit counters.
2. Print kernel logs only once per resource per cgroup.
3. Capacity can be set less than the current usage.

Changes in RFC v2:
1. Documentation fixes.
2. Added kernel log messages.
3. Changed charge API to treat misc_cg as input parameter.
4. Added helper APIs to get and release references on the cgroup.

[1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh@google.com
[2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh@google.com/
[3] https://lore.kernel.org/lkml/20210304231946.2766648-1-vipinsh@google.com/

Vipin Sharma (3):
  cgroup: Add misc cgroup controller
  cgroup: Miscellaneous cgroup documentation.
  svm/sev: Register SEV and SEV-ES ASIDs to the misc controller

 Documentation/admin-guide/cgroup-v1/index.rst |   1 +
 Documentation/admin-guide/cgroup-v1/misc.rst  |   4 +
 Documentation/admin-guide/cgroup-v2.rst       |  73 +++-
 arch/x86/kvm/svm/sev.c                        |  70 ++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/misc_cgroup.h                   | 132 ++++++
 init/Kconfig                                  |  14 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/misc.c                          | 407 ++++++++++++++++++
 10 files changed, 695 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst
 create mode 100644 include/linux/misc_cgroup.h
 create mode 100644 kernel/cgroup/misc.c

-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-03-30  4:42 ` Vipin Sharma
  0 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-03-30  4:42 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w,
	rdunlap-wEGCiKHe2LqWVfeAwA7xHQ, thomas.lendacky-5C7GfCeVMHo,
	brijesh.singh-5C7GfCeVMHo, jon.grimm-5C7GfCeVMHo,
	eric.vantassell-5C7GfCeVMHo, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	hannes-druUgvl0LCNAfugRpC6u6w, frankja-tEXmvtCZX7AybS5Ee8rs3A,
	borntraeger-tA70FqPdS9bQT0dZR+AlfA
  Cc: corbet-T1hC0tSOHrs, seanjc-hpIqsD4AKlfQT0dZR+AlfA,
	vkuznets-H+wXaHxf7aLQT0dZR+AlfA,
	wanpengli-1Nz4purKYjRBDgjK7y7TUQ,
	jmattson-hpIqsD4AKlfQT0dZR+AlfA, joro-zLv9SwRftAIdnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	bp-Gina5bIWoIWzQB+pC5nmwQ, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	gingell-hpIqsD4AKlfQT0dZR+AlfA, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	kvm-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vipin Sharma

Hello,

This patch series is creating a new misc cgroup controller for limiting
and tracking of resources which are not abstract like other cgroup
controllers.

This controller was initially proposed as encryption_id but after the
feedbacks and use cases for other resources, it is now changed to misc
cgroup.
https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/

Most of the cloud infrastructure use cgroups for knowing the host state,
track the resources usage, enforce limits on them, etc. They use this
info to optimize work allocation in the fleet and make sure no rogue job
consumes more than it needs and starves others.

There are resources on a system which are not abstract enough like other
cgroup controllers and are available in a limited quantity on a host.

One of them is Secure Encrypted Virtualization (SEV) ASID on AMD CPU.
SEV ASIDs are used for creating encrypted VMs. SEV is mostly be used by
the cloud providers for providing confidential VMs. Since SEV ASIDs are
limited, there is a need to schedule encrypted VMs in a cloud
infrastructure based on SEV ASIDs availability and also to limit its
usage.

There are similar requirements for other resource types like TDX keys,
IOASIDs and SEID.

Adding these resources to a cgroup controller is a natural choice with
least amount of friction. Cgroup itself says it is a mechanism to
distribute system resources along the hierarchy in a controlled
mechanism and configurable manner. Most of the resources in cgroups are
abstracted enough but there are still some resources which are not
abstract but have limited availability or have specific use cases.

Misc controller is a generic controller which can be used by these
kinds of resources.

One suggestion was to use BPF for this purpose, however, there are
couple of things which might not be addressed with BPF:
1. Which controller to use in v1 case? These are not abstract resources
   so in v1 where each controller have their own hierarchy it might not
   be easy to identify the best controller to use for BPF.

2. Abstracting out a single BPF program which can help with all of the
   resources types might not be possible, because resources we are
   working with are not similar and abstract enough, for example network
   packets, and there will be different places in the source code to use
   these resources.

A new cgroup controller tends to give much easier and well integrated
solution when it comes to scheduling and limiting a resource with
existing tools in a cloud infrastructure.

Changes in RFC v4:
1. Misc controller patch is split into two patches. One for generic misc
   controller and second for adding SEV and SEV-ES resource.
2. Using READ_ONCE and WRITE_ONCE for variable accesses.
3. Updated documentation.
4. Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.
5. Included cgroup header in misc_cgroup.h.
6. misc_cg_reduce_charge changed to misc_cg_cancel_charge.
7. misc_cg set to NULL after uncharge.
8. Added WARN_ON if misc_cg not NULL before charging in SEV/SEV-ES.

Changes in RFC v3:
1. Changed implementation to support 64 bit counters.
2. Print kernel logs only once per resource per cgroup.
3. Capacity can be set less than the current usage.

Changes in RFC v2:
1. Documentation fixes.
2. Added kernel log messages.
3. Changed charge API to treat misc_cg as input parameter.
4. Added helper APIs to get and release references on the cgroup.

[1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
[2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
[3] https://lore.kernel.org/lkml/20210304231946.2766648-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/

Vipin Sharma (3):
  cgroup: Add misc cgroup controller
  cgroup: Miscellaneous cgroup documentation.
  svm/sev: Register SEV and SEV-ES ASIDs to the misc controller

 Documentation/admin-guide/cgroup-v1/index.rst |   1 +
 Documentation/admin-guide/cgroup-v1/misc.rst  |   4 +
 Documentation/admin-guide/cgroup-v2.rst       |  73 +++-
 arch/x86/kvm/svm/sev.c                        |  70 ++-
 arch/x86/kvm/svm/svm.h                        |   1 +
 include/linux/cgroup_subsys.h                 |   4 +
 include/linux/misc_cgroup.h                   | 132 ++++++
 init/Kconfig                                  |  14 +
 kernel/cgroup/Makefile                        |   1 +
 kernel/cgroup/misc.c                          | 407 ++++++++++++++++++
 10 files changed, 695 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst
 create mode 100644 include/linux/misc_cgroup.h
 create mode 100644 kernel/cgroup/misc.c

-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/3] cgroup: Add misc cgroup controller
@ 2021-03-30  4:42   ` Vipin Sharma
  0 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-03-30  4:42 UTC (permalink / raw)
  To: tj, mkoutny, jacob.jun.pan, rdunlap, thomas.lendacky,
	brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes,
	frankja, borntraeger
  Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, hpa, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel, Vipin Sharma

The Miscellaneous cgroup provides the resource limiting and tracking
mechanism for the scalar resources which cannot be abstracted like the
other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC
config option.

A resource can be added to the controller via enum misc_res_type{} in
the include/linux/misc_cgroup.h file and the corresponding name via
misc_res_name[] in the kernel/cgroup/misc.c file. Provider of the
resource must set its capacity prior to using the resource by calling
misc_cg_set_capacity().

Once a capacity is set then the resource usage can be updated using
charge and uncharge APIs. All of the APIs to interact with misc
controller are in include/linux/misc_cgroup.h.

Miscellaneous controller provides 3 interface files. If two misc
resources (res_a and res_b) are registered then:

misc.capacity
A read-only flat-keyed file shown only in the root cgroup.  It shows
miscellaneous scalar resources available on the platform along with
their quantities::

    $ cat misc.capacity
    res_a 50
    res_b 10

misc.current
A read-only flat-keyed file shown in the non-root cgroups.  It shows
the current usage of the resources in the cgroup and its children::

    $ cat misc.current
    res_a 3
    res_b 0

misc.max
A read-write flat-keyed file shown in the non root cgroups. Allowed
maximum usage of the resources in the cgroup and its children.::

    $ cat misc.max
    res_a max
    res_b 4

Limit can be set by::

    # echo res_a 1 > misc.max

Limit can be set to max by::

    # echo res_a max > misc.max

Limits can be set more than the capacity value in the misc.capacity
file.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Rientjes <rientjes@google.com>
---
 include/linux/cgroup_subsys.h |   4 +
 include/linux/misc_cgroup.h   | 126 +++++++++++
 init/Kconfig                  |  14 ++
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/misc.c          | 401 ++++++++++++++++++++++++++++++++++
 5 files changed, 546 insertions(+)
 create mode 100644 include/linux/misc_cgroup.h
 create mode 100644 kernel/cgroup/misc.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..445235487230 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_MISC)
+SUBSYS(misc)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
new file mode 100644
index 000000000000..1195d36558b4
--- /dev/null
+++ b/include/linux/misc_cgroup.h
@@ -0,0 +1,126 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Miscellaneous cgroup controller.
+ *
+ * Copyright 2020 Google LLC
+ * Author: Vipin Sharma <vipinsh@google.com>
+ */
+#ifndef _MISC_CGROUP_H_
+#define _MISC_CGROUP_H_
+
+/**
+ * Types of misc cgroup entries supported by the host.
+ */
+enum misc_res_type {
+	MISC_CG_RES_TYPES
+};
+
+struct misc_cg;
+
+#ifdef CONFIG_CGROUP_MISC
+
+#include <linux/cgroup.h>
+
+/**
+ * struct misc_res: Per cgroup per misc type resource
+ * @max: Maximum limit on the resource.
+ * @usage: Current usage of the resource.
+ * @failed: True if charged failed for the resource in a cgroup.
+ */
+struct misc_res {
+	unsigned long max;
+	atomic_long_t usage;
+	bool failed;
+};
+
+/**
+ * struct misc_cg - Miscellaneous controller's cgroup structure.
+ * @css: cgroup subsys state object.
+ * @res: Array of misc resources usage in the cgroup.
+ */
+struct misc_cg {
+	struct cgroup_subsys_state css;
+	struct misc_res res[MISC_CG_RES_TYPES];
+};
+
+unsigned long misc_cg_res_total_usage(enum misc_res_type type);
+int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity);
+int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
+		       unsigned long amount);
+void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg,
+		      unsigned long amount);
+
+/**
+ * css_misc() - Get misc cgroup from the css.
+ * @css: cgroup subsys state object.
+ *
+ * Context: Any context.
+ * Return:
+ * * %NULL - If @css is null.
+ * * struct misc_cg* - misc cgroup pointer of the passed css.
+ */
+static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct misc_cg, css) : NULL;
+}
+
+/*
+ * get_current_misc_cg() - Find and get the misc cgroup of the current task.
+ *
+ * Returned cgroup has its ref count increased by 1. Caller must call
+ * put_misc_cg() to return the reference.
+ *
+ * Return: Misc cgroup to which the current task belongs to.
+ */
+static inline struct misc_cg *get_current_misc_cg(void)
+{
+	return css_misc(task_get_css(current, misc_cgrp_id));
+}
+
+/*
+ * put_misc_cg() - Put the misc cgroup and reduce its ref count.
+ * @cg - cgroup to put.
+ */
+static inline void put_misc_cg(struct misc_cg *cg)
+{
+	if (cg)
+		css_put(&cg->css);
+}
+
+#else /* !CONFIG_CGROUP_MISC */
+
+unsigned long misc_cg_res_total_usage(enum misc_res_type type)
+{
+	return 0;
+}
+
+static inline int misc_cg_set_capacity(enum misc_res_type type,
+				       unsigned long capacity)
+{
+	return 0;
+}
+
+static inline int misc_cg_try_charge(enum misc_res_type type,
+				     struct misc_cg *cg,
+				     unsigned long amount)
+{
+	return 0;
+}
+
+static inline void misc_cg_uncharge(enum misc_res_type type,
+				    struct misc_cg *cg,
+				    unsigned long amount)
+{
+}
+
+static inline struct misc_cg *get_current_misc_cg(void)
+{
+	return NULL;
+}
+
+static inline void put_misc_cg(struct misc_cg *cg)
+{
+}
+
+#endif /* CONFIG_CGROUP_MISC */
+#endif /* _MISC_CGROUP_H_ */
diff --git a/init/Kconfig b/init/Kconfig
index 5f5c776ef192..18ece598a297 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1110,6 +1110,20 @@ config CGROUP_BPF
 	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
 	  inet sockets.
 
+config CGROUP_MISC
+	bool "Misc resource controller"
+	default n
+	help
+	  Provides a controller for miscellaneous resources on a host.
+
+	  Miscellaneous scalar resources are the resources on the host system
+	  which cannot be abstracted like the other cgroups. This controller
+	  tracks and limits the miscellaneous resources used by a process
+	  attached to a cgroup hierarchy.
+
+	  For more information, please check misc cgroup section in
+	  /Documentation/admin-guide/cgroup-v2.rst.
+
 config CGROUP_DEBUG
 	bool "Debug controller"
 	default n
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 5d7a76bfbbb7..12f8457ad1f9 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
+obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
new file mode 100644
index 000000000000..4352bc4a3bd5
--- /dev/null
+++ b/kernel/cgroup/misc.c
@@ -0,0 +1,401 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Miscellaneous cgroup controller
+ *
+ * Copyright 2020 Google LLC
+ * Author: Vipin Sharma <vipinsh@google.com>
+ */
+
+#include <linux/limits.h>
+#include <linux/cgroup.h>
+#include <linux/errno.h>
+#include <linux/atomic.h>
+#include <linux/slab.h>
+#include <linux/misc_cgroup.h>
+
+#define MAX_STR "max"
+#define MAX_NUM ULONG_MAX
+
+/* Miscellaneous res name, keep it in sync with enum misc_res_type */
+static const char *const misc_res_name[] = {
+};
+
+/* Root misc cgroup */
+static struct misc_cg root_cg;
+
+/*
+ * Miscellaneous resources capacity for the entire machine. 0 capacity means
+ * resource is not initialized or not present in the host.
+ *
+ * root_cg.max and capacity are independent of each other. root_cg.max can be
+ * more than the actual capacity. We are using Limits resource distribution
+ * model of cgroup for miscellaneous controller.
+ */
+static unsigned long misc_res_capacity[MISC_CG_RES_TYPES];
+
+/**
+ * parent_misc() - Get the parent of the passed misc cgroup.
+ * @cgroup: cgroup whose parent needs to be fetched.
+ *
+ * Context: Any context.
+ * Return:
+ * * struct misc_cg* - Parent of the @cgroup.
+ * * %NULL - If @cgroup is null or the passed cgroup does not have a parent.
+ */
+static struct misc_cg *parent_misc(struct misc_cg *cgroup)
+{
+	return cgroup ? css_misc(cgroup->css.parent) : NULL;
+}
+
+/**
+ * valid_type() - Check if @type is valid or not.
+ * @type: misc res type.
+ *
+ * Context: Any context.
+ * Return:
+ * * true - If valid type.
+ * * false - If not valid type.
+ */
+static inline bool valid_type(enum misc_res_type type)
+{
+	return type >= 0 && type < MISC_CG_RES_TYPES;
+}
+
+/**
+ * misc_cg_res_total_usage() - Get the current total usage of the resource.
+ * @type: misc res type.
+ *
+ * Context: Any context.
+ * Return: Current total usage of the resource.
+ */
+unsigned long misc_cg_res_total_usage(enum misc_res_type type)
+{
+	if (valid_type(type))
+		return atomic_long_read(&root_cg.res[type].usage);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(misc_cg_res_total_usage);
+
+/**
+ * misc_cg_set_capacity() - Set the capacity of the misc cgroup res.
+ * @type: Type of the misc res.
+ * @capacity: Supported capacity of the misc res on the host.
+ *
+ * If capacity is 0 then the charging a misc cgroup fails for that type.
+ *
+ * Context: Any context.
+ * Return:
+ * * %0 - Successfully registered the capacity.
+ * * %-EINVAL - If @type is invalid.
+ */
+int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity)
+{
+	if (!valid_type(type))
+		return -EINVAL;
+
+	WRITE_ONCE(misc_res_capacity[type], capacity);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(misc_cg_set_capacity);
+
+/**
+ * misc_cg_cancel_charge() - Cancel the charge from the misc cgroup.
+ * @type: Misc res type in misc cg to cancel the charge from.
+ * @cg: Misc cgroup to cancel charge from.
+ * @amount: Amount to cancel.
+ *
+ * Context: Any context.
+ */
+static void misc_cg_cancel_charge(enum misc_res_type type, struct misc_cg *cg,
+				  unsigned long amount)
+{
+	WARN_ONCE(atomic_long_add_negative(-amount, &cg->res[type].usage),
+		  "misc cgroup resource %s became less than 0",
+		  misc_res_name[type]);
+}
+
+/**
+ * misc_cg_try_charge() - Try charging the misc cgroup.
+ * @type: Misc res type to charge.
+ * @cg: Misc cgroup which will be charged.
+ * @amount: Amount to charge.
+ *
+ * Charge @amount to the misc cgroup. Caller must use the same cgroup during
+ * the uncharge call.
+ *
+ * Context: Any context.
+ * Return:
+ * * %0 - If successfully charged.
+ * * -EINVAL - If @type is invalid or misc res has 0 capacity.
+ * * -EBUSY - If max limit will be crossed or total usage will be more than the
+ *	      capacity.
+ */
+int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
+		       unsigned long amount)
+{
+	struct misc_cg *i, *j;
+	int ret;
+	struct misc_res *res;
+	int new_usage;
+
+	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
+		return -EINVAL;
+
+	if (!amount)
+		return 0;
+
+	for (i = cg; i; i = parent_misc(i)) {
+		res = &i->res[type];
+
+		new_usage = atomic_long_add_return(amount, &res->usage);
+		if (new_usage > READ_ONCE(res->max) ||
+		    new_usage > READ_ONCE(misc_res_capacity[type])) {
+			if (!res->failed) {
+				pr_info("cgroup: charge rejected by the misc controller for %s resource in ",
+					misc_res_name[type]);
+				pr_cont_cgroup_path(i->css.cgroup);
+				pr_cont("\n");
+				res->failed = true;
+			}
+			ret = -EBUSY;
+			goto err_charge;
+		}
+	}
+	return 0;
+
+err_charge:
+	for (j = cg; j != i; j = parent_misc(j))
+		misc_cg_cancel_charge(type, j, amount);
+	misc_cg_cancel_charge(type, i, amount);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(misc_cg_try_charge);
+
+/**
+ * misc_cg_uncharge() - Uncharge the misc cgroup.
+ * @type: Misc res type which was charged.
+ * @cg: Misc cgroup which will be uncharged.
+ * @amount: Charged amount.
+ *
+ * Context: Any context.
+ */
+void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg,
+		      unsigned long amount)
+{
+	struct misc_cg *i;
+
+	if (!(amount && valid_type(type) && cg))
+		return;
+
+	for (i = cg; i; i = parent_misc(i))
+		misc_cg_cancel_charge(type, i, amount);
+}
+EXPORT_SYMBOL_GPL(misc_cg_uncharge);
+
+/**
+ * misc_cg_max_show() - Show the misc cgroup max limit.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_max_show(struct seq_file *sf, void *v)
+{
+	int i;
+	struct misc_cg *cg = css_misc(seq_css(sf));
+	unsigned long max;
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		if (READ_ONCE(misc_res_capacity[i])) {
+			max = READ_ONCE(cg->res[i].max);
+			if (max == MAX_NUM)
+				seq_printf(sf, "%s max\n", misc_res_name[i]);
+			else
+				seq_printf(sf, "%s %lu\n", misc_res_name[i],
+					   max);
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * misc_cg_max_write() - Update the maximum limit of the cgroup.
+ * @of: Handler for the file.
+ * @buf: Data from the user. It should be either "max", 0, or a positive
+ *	 integer.
+ * @nbytes: Number of bytes of the data.
+ * @off: Offset in the file.
+ *
+ * User can pass data like:
+ * echo sev 23 > misc.max, OR
+ * echo sev max > misc.max
+ *
+ * Context: Any context.
+ * Return:
+ * * >= 0 - Number of bytes processed in the input.
+ * * -EINVAL - If buf is not valid.
+ * * -ERANGE - If number is bigger than the unsigned long capacity.
+ */
+static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf,
+				 size_t nbytes, loff_t off)
+{
+	struct misc_cg *cg;
+	unsigned long max;
+	int ret = 0, i;
+	enum misc_res_type type = MISC_CG_RES_TYPES;
+	char *token;
+
+	buf = strstrip(buf);
+	token = strsep(&buf, " ");
+
+	if (!token || !buf)
+		return -EINVAL;
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		if (!strcmp(misc_res_name[i], token)) {
+			type = i;
+			break;
+		}
+	}
+
+	if (type == MISC_CG_RES_TYPES)
+		return -EINVAL;
+
+	if (!strcmp(MAX_STR, buf)) {
+		max = MAX_NUM;
+	} else {
+		ret = kstrtoul(buf, 0, &max);
+		if (ret)
+			return ret;
+	}
+
+	cg = css_misc(of_css(of));
+
+	if (READ_ONCE(misc_res_capacity[type]))
+		WRITE_ONCE(cg->res[type].max, max);
+	else
+		ret = -EINVAL;
+
+	return ret ? ret : nbytes;
+}
+
+/**
+ * misc_cg_current_show() - Show the current usage of the misc cgroup.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_current_show(struct seq_file *sf, void *v)
+{
+	int i;
+	unsigned long usage;
+	struct misc_cg *cg = css_misc(seq_css(sf));
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		usage = atomic_long_read(&cg->res[i].usage);
+		if (READ_ONCE(misc_res_capacity[i]) || usage)
+			seq_printf(sf, "%s %lu\n", misc_res_name[i], usage);
+	}
+
+	return 0;
+}
+
+/**
+ * misc_cg_capacity_show() - Show the total capacity of misc res on the host.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Only present in the root cgroup directory.
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_capacity_show(struct seq_file *sf, void *v)
+{
+	int i;
+	unsigned long cap;
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		cap = READ_ONCE(misc_res_capacity[i]);
+		if (cap)
+			seq_printf(sf, "%s %lu\n", misc_res_name[i], cap);
+	}
+
+	return 0;
+}
+
+/* Misc cgroup interface files */
+static struct cftype misc_cg_files[] = {
+	{
+		.name = "max",
+		.write = misc_cg_max_write,
+		.seq_show = misc_cg_max_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{
+		.name = "current",
+		.seq_show = misc_cg_current_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{
+		.name = "capacity",
+		.seq_show = misc_cg_capacity_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+	{}
+};
+
+/**
+ * misc_cg_alloc() - Allocate misc cgroup.
+ * @parent_css: Parent cgroup.
+ *
+ * Context: Process context.
+ * Return:
+ * * struct cgroup_subsys_state* - css of the allocated cgroup.
+ * * ERR_PTR(-ENOMEM) - No memory available to allocate.
+ */
+static struct cgroup_subsys_state *
+misc_cg_alloc(struct cgroup_subsys_state *parent_css)
+{
+	enum misc_res_type i;
+	struct misc_cg *cg;
+
+	if (!parent_css) {
+		cg = &root_cg;
+	} else {
+		cg = kzalloc(sizeof(*cg), GFP_KERNEL);
+		if (!cg)
+			return ERR_PTR(-ENOMEM);
+	}
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		WRITE_ONCE(cg->res[i].max, MAX_NUM);
+		atomic_long_set(&cg->res[i].usage, 0);
+	}
+
+	return &cg->css;
+}
+
+/**
+ * misc_cg_free() - Free the misc cgroup.
+ * @css: cgroup subsys object.
+ *
+ * Context: Any context.
+ */
+static void misc_cg_free(struct cgroup_subsys_state *css)
+{
+	kfree(css_misc(css));
+}
+
+/* Cgroup controller callbacks */
+struct cgroup_subsys misc_cgrp_subsys = {
+	.css_alloc = misc_cg_alloc,
+	.css_free = misc_cg_free,
+	.legacy_cftypes = misc_cg_files,
+	.dfl_cftypes = misc_cg_files,
+};
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 1/3] cgroup: Add misc cgroup controller
@ 2021-03-30  4:42   ` Vipin Sharma
  0 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-03-30  4:42 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w,
	rdunlap-wEGCiKHe2LqWVfeAwA7xHQ, thomas.lendacky-5C7GfCeVMHo,
	brijesh.singh-5C7GfCeVMHo, jon.grimm-5C7GfCeVMHo,
	eric.vantassell-5C7GfCeVMHo, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	hannes-druUgvl0LCNAfugRpC6u6w, frankja-tEXmvtCZX7AybS5Ee8rs3A,
	borntraeger-tA70FqPdS9bQT0dZR+AlfA
  Cc: corbet-T1hC0tSOHrs, seanjc-hpIqsD4AKlfQT0dZR+AlfA,
	vkuznets-H+wXaHxf7aLQT0dZR+AlfA,
	wanpengli-1Nz4purKYjRBDgjK7y7TUQ,
	jmattson-hpIqsD4AKlfQT0dZR+AlfA, joro-zLv9SwRftAIdnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	bp-Gina5bIWoIWzQB+pC5nmwQ, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	gingell-hpIqsD4AKlfQT0dZR+AlfA, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	kvm-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Vipin Sharma

The Miscellaneous cgroup provides the resource limiting and tracking
mechanism for the scalar resources which cannot be abstracted like the
other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC
config option.

A resource can be added to the controller via enum misc_res_type{} in
the include/linux/misc_cgroup.h file and the corresponding name via
misc_res_name[] in the kernel/cgroup/misc.c file. Provider of the
resource must set its capacity prior to using the resource by calling
misc_cg_set_capacity().

Once a capacity is set then the resource usage can be updated using
charge and uncharge APIs. All of the APIs to interact with misc
controller are in include/linux/misc_cgroup.h.

Miscellaneous controller provides 3 interface files. If two misc
resources (res_a and res_b) are registered then:

misc.capacity
A read-only flat-keyed file shown only in the root cgroup.  It shows
miscellaneous scalar resources available on the platform along with
their quantities::

    $ cat misc.capacity
    res_a 50
    res_b 10

misc.current
A read-only flat-keyed file shown in the non-root cgroups.  It shows
the current usage of the resources in the cgroup and its children::

    $ cat misc.current
    res_a 3
    res_b 0

misc.max
A read-write flat-keyed file shown in the non root cgroups. Allowed
maximum usage of the resources in the cgroup and its children.::

    $ cat misc.max
    res_a max
    res_b 4

Limit can be set by::

    # echo res_a 1 > misc.max

Limit can be set to max by::

    # echo res_a max > misc.max

Limits can be set more than the capacity value in the misc.capacity
file.

Signed-off-by: Vipin Sharma <vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Reviewed-by: David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 include/linux/cgroup_subsys.h |   4 +
 include/linux/misc_cgroup.h   | 126 +++++++++++
 init/Kconfig                  |  14 ++
 kernel/cgroup/Makefile        |   1 +
 kernel/cgroup/misc.c          | 401 ++++++++++++++++++++++++++++++++++
 5 files changed, 546 insertions(+)
 create mode 100644 include/linux/misc_cgroup.h
 create mode 100644 kernel/cgroup/misc.c

diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcff3b4..445235487230 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -61,6 +61,10 @@ SUBSYS(pids)
 SUBSYS(rdma)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_MISC)
+SUBSYS(misc)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
new file mode 100644
index 000000000000..1195d36558b4
--- /dev/null
+++ b/include/linux/misc_cgroup.h
@@ -0,0 +1,126 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Miscellaneous cgroup controller.
+ *
+ * Copyright 2020 Google LLC
+ * Author: Vipin Sharma <vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
+ */
+#ifndef _MISC_CGROUP_H_
+#define _MISC_CGROUP_H_
+
+/**
+ * Types of misc cgroup entries supported by the host.
+ */
+enum misc_res_type {
+	MISC_CG_RES_TYPES
+};
+
+struct misc_cg;
+
+#ifdef CONFIG_CGROUP_MISC
+
+#include <linux/cgroup.h>
+
+/**
+ * struct misc_res: Per cgroup per misc type resource
+ * @max: Maximum limit on the resource.
+ * @usage: Current usage of the resource.
+ * @failed: True if charged failed for the resource in a cgroup.
+ */
+struct misc_res {
+	unsigned long max;
+	atomic_long_t usage;
+	bool failed;
+};
+
+/**
+ * struct misc_cg - Miscellaneous controller's cgroup structure.
+ * @css: cgroup subsys state object.
+ * @res: Array of misc resources usage in the cgroup.
+ */
+struct misc_cg {
+	struct cgroup_subsys_state css;
+	struct misc_res res[MISC_CG_RES_TYPES];
+};
+
+unsigned long misc_cg_res_total_usage(enum misc_res_type type);
+int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity);
+int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
+		       unsigned long amount);
+void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg,
+		      unsigned long amount);
+
+/**
+ * css_misc() - Get misc cgroup from the css.
+ * @css: cgroup subsys state object.
+ *
+ * Context: Any context.
+ * Return:
+ * * %NULL - If @css is null.
+ * * struct misc_cg* - misc cgroup pointer of the passed css.
+ */
+static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct misc_cg, css) : NULL;
+}
+
+/*
+ * get_current_misc_cg() - Find and get the misc cgroup of the current task.
+ *
+ * Returned cgroup has its ref count increased by 1. Caller must call
+ * put_misc_cg() to return the reference.
+ *
+ * Return: Misc cgroup to which the current task belongs to.
+ */
+static inline struct misc_cg *get_current_misc_cg(void)
+{
+	return css_misc(task_get_css(current, misc_cgrp_id));
+}
+
+/*
+ * put_misc_cg() - Put the misc cgroup and reduce its ref count.
+ * @cg - cgroup to put.
+ */
+static inline void put_misc_cg(struct misc_cg *cg)
+{
+	if (cg)
+		css_put(&cg->css);
+}
+
+#else /* !CONFIG_CGROUP_MISC */
+
+unsigned long misc_cg_res_total_usage(enum misc_res_type type)
+{
+	return 0;
+}
+
+static inline int misc_cg_set_capacity(enum misc_res_type type,
+				       unsigned long capacity)
+{
+	return 0;
+}
+
+static inline int misc_cg_try_charge(enum misc_res_type type,
+				     struct misc_cg *cg,
+				     unsigned long amount)
+{
+	return 0;
+}
+
+static inline void misc_cg_uncharge(enum misc_res_type type,
+				    struct misc_cg *cg,
+				    unsigned long amount)
+{
+}
+
+static inline struct misc_cg *get_current_misc_cg(void)
+{
+	return NULL;
+}
+
+static inline void put_misc_cg(struct misc_cg *cg)
+{
+}
+
+#endif /* CONFIG_CGROUP_MISC */
+#endif /* _MISC_CGROUP_H_ */
diff --git a/init/Kconfig b/init/Kconfig
index 5f5c776ef192..18ece598a297 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1110,6 +1110,20 @@ config CGROUP_BPF
 	  BPF_CGROUP_INET_INGRESS will be executed on the ingress path of
 	  inet sockets.
 
+config CGROUP_MISC
+	bool "Misc resource controller"
+	default n
+	help
+	  Provides a controller for miscellaneous resources on a host.
+
+	  Miscellaneous scalar resources are the resources on the host system
+	  which cannot be abstracted like the other cgroups. This controller
+	  tracks and limits the miscellaneous resources used by a process
+	  attached to a cgroup hierarchy.
+
+	  For more information, please check misc cgroup section in
+	  /Documentation/admin-guide/cgroup-v2.rst.
+
 config CGROUP_DEBUG
 	bool "Debug controller"
 	default n
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 5d7a76bfbbb7..12f8457ad1f9 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
+obj-$(CONFIG_CGROUP_MISC) += misc.o
 obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
new file mode 100644
index 000000000000..4352bc4a3bd5
--- /dev/null
+++ b/kernel/cgroup/misc.c
@@ -0,0 +1,401 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Miscellaneous cgroup controller
+ *
+ * Copyright 2020 Google LLC
+ * Author: Vipin Sharma <vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
+ */
+
+#include <linux/limits.h>
+#include <linux/cgroup.h>
+#include <linux/errno.h>
+#include <linux/atomic.h>
+#include <linux/slab.h>
+#include <linux/misc_cgroup.h>
+
+#define MAX_STR "max"
+#define MAX_NUM ULONG_MAX
+
+/* Miscellaneous res name, keep it in sync with enum misc_res_type */
+static const char *const misc_res_name[] = {
+};
+
+/* Root misc cgroup */
+static struct misc_cg root_cg;
+
+/*
+ * Miscellaneous resources capacity for the entire machine. 0 capacity means
+ * resource is not initialized or not present in the host.
+ *
+ * root_cg.max and capacity are independent of each other. root_cg.max can be
+ * more than the actual capacity. We are using Limits resource distribution
+ * model of cgroup for miscellaneous controller.
+ */
+static unsigned long misc_res_capacity[MISC_CG_RES_TYPES];
+
+/**
+ * parent_misc() - Get the parent of the passed misc cgroup.
+ * @cgroup: cgroup whose parent needs to be fetched.
+ *
+ * Context: Any context.
+ * Return:
+ * * struct misc_cg* - Parent of the @cgroup.
+ * * %NULL - If @cgroup is null or the passed cgroup does not have a parent.
+ */
+static struct misc_cg *parent_misc(struct misc_cg *cgroup)
+{
+	return cgroup ? css_misc(cgroup->css.parent) : NULL;
+}
+
+/**
+ * valid_type() - Check if @type is valid or not.
+ * @type: misc res type.
+ *
+ * Context: Any context.
+ * Return:
+ * * true - If valid type.
+ * * false - If not valid type.
+ */
+static inline bool valid_type(enum misc_res_type type)
+{
+	return type >= 0 && type < MISC_CG_RES_TYPES;
+}
+
+/**
+ * misc_cg_res_total_usage() - Get the current total usage of the resource.
+ * @type: misc res type.
+ *
+ * Context: Any context.
+ * Return: Current total usage of the resource.
+ */
+unsigned long misc_cg_res_total_usage(enum misc_res_type type)
+{
+	if (valid_type(type))
+		return atomic_long_read(&root_cg.res[type].usage);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(misc_cg_res_total_usage);
+
+/**
+ * misc_cg_set_capacity() - Set the capacity of the misc cgroup res.
+ * @type: Type of the misc res.
+ * @capacity: Supported capacity of the misc res on the host.
+ *
+ * If capacity is 0 then the charging a misc cgroup fails for that type.
+ *
+ * Context: Any context.
+ * Return:
+ * * %0 - Successfully registered the capacity.
+ * * %-EINVAL - If @type is invalid.
+ */
+int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity)
+{
+	if (!valid_type(type))
+		return -EINVAL;
+
+	WRITE_ONCE(misc_res_capacity[type], capacity);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(misc_cg_set_capacity);
+
+/**
+ * misc_cg_cancel_charge() - Cancel the charge from the misc cgroup.
+ * @type: Misc res type in misc cg to cancel the charge from.
+ * @cg: Misc cgroup to cancel charge from.
+ * @amount: Amount to cancel.
+ *
+ * Context: Any context.
+ */
+static void misc_cg_cancel_charge(enum misc_res_type type, struct misc_cg *cg,
+				  unsigned long amount)
+{
+	WARN_ONCE(atomic_long_add_negative(-amount, &cg->res[type].usage),
+		  "misc cgroup resource %s became less than 0",
+		  misc_res_name[type]);
+}
+
+/**
+ * misc_cg_try_charge() - Try charging the misc cgroup.
+ * @type: Misc res type to charge.
+ * @cg: Misc cgroup which will be charged.
+ * @amount: Amount to charge.
+ *
+ * Charge @amount to the misc cgroup. Caller must use the same cgroup during
+ * the uncharge call.
+ *
+ * Context: Any context.
+ * Return:
+ * * %0 - If successfully charged.
+ * * -EINVAL - If @type is invalid or misc res has 0 capacity.
+ * * -EBUSY - If max limit will be crossed or total usage will be more than the
+ *	      capacity.
+ */
+int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
+		       unsigned long amount)
+{
+	struct misc_cg *i, *j;
+	int ret;
+	struct misc_res *res;
+	int new_usage;
+
+	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
+		return -EINVAL;
+
+	if (!amount)
+		return 0;
+
+	for (i = cg; i; i = parent_misc(i)) {
+		res = &i->res[type];
+
+		new_usage = atomic_long_add_return(amount, &res->usage);
+		if (new_usage > READ_ONCE(res->max) ||
+		    new_usage > READ_ONCE(misc_res_capacity[type])) {
+			if (!res->failed) {
+				pr_info("cgroup: charge rejected by the misc controller for %s resource in ",
+					misc_res_name[type]);
+				pr_cont_cgroup_path(i->css.cgroup);
+				pr_cont("\n");
+				res->failed = true;
+			}
+			ret = -EBUSY;
+			goto err_charge;
+		}
+	}
+	return 0;
+
+err_charge:
+	for (j = cg; j != i; j = parent_misc(j))
+		misc_cg_cancel_charge(type, j, amount);
+	misc_cg_cancel_charge(type, i, amount);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(misc_cg_try_charge);
+
+/**
+ * misc_cg_uncharge() - Uncharge the misc cgroup.
+ * @type: Misc res type which was charged.
+ * @cg: Misc cgroup which will be uncharged.
+ * @amount: Charged amount.
+ *
+ * Context: Any context.
+ */
+void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg,
+		      unsigned long amount)
+{
+	struct misc_cg *i;
+
+	if (!(amount && valid_type(type) && cg))
+		return;
+
+	for (i = cg; i; i = parent_misc(i))
+		misc_cg_cancel_charge(type, i, amount);
+}
+EXPORT_SYMBOL_GPL(misc_cg_uncharge);
+
+/**
+ * misc_cg_max_show() - Show the misc cgroup max limit.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_max_show(struct seq_file *sf, void *v)
+{
+	int i;
+	struct misc_cg *cg = css_misc(seq_css(sf));
+	unsigned long max;
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		if (READ_ONCE(misc_res_capacity[i])) {
+			max = READ_ONCE(cg->res[i].max);
+			if (max == MAX_NUM)
+				seq_printf(sf, "%s max\n", misc_res_name[i]);
+			else
+				seq_printf(sf, "%s %lu\n", misc_res_name[i],
+					   max);
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * misc_cg_max_write() - Update the maximum limit of the cgroup.
+ * @of: Handler for the file.
+ * @buf: Data from the user. It should be either "max", 0, or a positive
+ *	 integer.
+ * @nbytes: Number of bytes of the data.
+ * @off: Offset in the file.
+ *
+ * User can pass data like:
+ * echo sev 23 > misc.max, OR
+ * echo sev max > misc.max
+ *
+ * Context: Any context.
+ * Return:
+ * * >= 0 - Number of bytes processed in the input.
+ * * -EINVAL - If buf is not valid.
+ * * -ERANGE - If number is bigger than the unsigned long capacity.
+ */
+static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf,
+				 size_t nbytes, loff_t off)
+{
+	struct misc_cg *cg;
+	unsigned long max;
+	int ret = 0, i;
+	enum misc_res_type type = MISC_CG_RES_TYPES;
+	char *token;
+
+	buf = strstrip(buf);
+	token = strsep(&buf, " ");
+
+	if (!token || !buf)
+		return -EINVAL;
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		if (!strcmp(misc_res_name[i], token)) {
+			type = i;
+			break;
+		}
+	}
+
+	if (type == MISC_CG_RES_TYPES)
+		return -EINVAL;
+
+	if (!strcmp(MAX_STR, buf)) {
+		max = MAX_NUM;
+	} else {
+		ret = kstrtoul(buf, 0, &max);
+		if (ret)
+			return ret;
+	}
+
+	cg = css_misc(of_css(of));
+
+	if (READ_ONCE(misc_res_capacity[type]))
+		WRITE_ONCE(cg->res[type].max, max);
+	else
+		ret = -EINVAL;
+
+	return ret ? ret : nbytes;
+}
+
+/**
+ * misc_cg_current_show() - Show the current usage of the misc cgroup.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_current_show(struct seq_file *sf, void *v)
+{
+	int i;
+	unsigned long usage;
+	struct misc_cg *cg = css_misc(seq_css(sf));
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		usage = atomic_long_read(&cg->res[i].usage);
+		if (READ_ONCE(misc_res_capacity[i]) || usage)
+			seq_printf(sf, "%s %lu\n", misc_res_name[i], usage);
+	}
+
+	return 0;
+}
+
+/**
+ * misc_cg_capacity_show() - Show the total capacity of misc res on the host.
+ * @sf: Interface file
+ * @v: Arguments passed
+ *
+ * Only present in the root cgroup directory.
+ *
+ * Context: Any context.
+ * Return: 0 to denote successful print.
+ */
+static int misc_cg_capacity_show(struct seq_file *sf, void *v)
+{
+	int i;
+	unsigned long cap;
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		cap = READ_ONCE(misc_res_capacity[i]);
+		if (cap)
+			seq_printf(sf, "%s %lu\n", misc_res_name[i], cap);
+	}
+
+	return 0;
+}
+
+/* Misc cgroup interface files */
+static struct cftype misc_cg_files[] = {
+	{
+		.name = "max",
+		.write = misc_cg_max_write,
+		.seq_show = misc_cg_max_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{
+		.name = "current",
+		.seq_show = misc_cg_current_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
+	{
+		.name = "capacity",
+		.seq_show = misc_cg_capacity_show,
+		.flags = CFTYPE_ONLY_ON_ROOT,
+	},
+	{}
+};
+
+/**
+ * misc_cg_alloc() - Allocate misc cgroup.
+ * @parent_css: Parent cgroup.
+ *
+ * Context: Process context.
+ * Return:
+ * * struct cgroup_subsys_state* - css of the allocated cgroup.
+ * * ERR_PTR(-ENOMEM) - No memory available to allocate.
+ */
+static struct cgroup_subsys_state *
+misc_cg_alloc(struct cgroup_subsys_state *parent_css)
+{
+	enum misc_res_type i;
+	struct misc_cg *cg;
+
+	if (!parent_css) {
+		cg = &root_cg;
+	} else {
+		cg = kzalloc(sizeof(*cg), GFP_KERNEL);
+		if (!cg)
+			return ERR_PTR(-ENOMEM);
+	}
+
+	for (i = 0; i < MISC_CG_RES_TYPES; i++) {
+		WRITE_ONCE(cg->res[i].max, MAX_NUM);
+		atomic_long_set(&cg->res[i].usage, 0);
+	}
+
+	return &cg->css;
+}
+
+/**
+ * misc_cg_free() - Free the misc cgroup.
+ * @css: cgroup subsys object.
+ *
+ * Context: Any context.
+ */
+static void misc_cg_free(struct cgroup_subsys_state *css)
+{
+	kfree(css_misc(css));
+}
+
+/* Cgroup controller callbacks */
+struct cgroup_subsys misc_cgrp_subsys = {
+	.css_alloc = misc_cg_alloc,
+	.css_free = misc_cg_free,
+	.legacy_cftypes = misc_cg_files,
+	.dfl_cftypes = misc_cg_files,
+};
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/3] cgroup: Miscellaneous cgroup documentation.
  2021-03-30  4:42 ` Vipin Sharma
  (?)
  (?)
@ 2021-03-30  4:42 ` Vipin Sharma
  -1 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-03-30  4:42 UTC (permalink / raw)
  To: tj, mkoutny, jacob.jun.pan, rdunlap, thomas.lendacky,
	brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes,
	frankja, borntraeger
  Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, hpa, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel, Vipin Sharma

Documentation of miscellaneous cgroup controller. This new controller is
used to track and limit the usage of scalar resources.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Rientjes <rientjes@google.com>
---
 Documentation/admin-guide/cgroup-v1/index.rst |  1 +
 Documentation/admin-guide/cgroup-v1/misc.rst  |  4 +
 Documentation/admin-guide/cgroup-v2.rst       | 73 ++++++++++++++++++-
 3 files changed, 76 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst

diff --git a/Documentation/admin-guide/cgroup-v1/index.rst b/Documentation/admin-guide/cgroup-v1/index.rst
index 226f64473e8e..99fbc8a64ba9 100644
--- a/Documentation/admin-guide/cgroup-v1/index.rst
+++ b/Documentation/admin-guide/cgroup-v1/index.rst
@@ -17,6 +17,7 @@ Control Groups version 1
     hugetlb
     memcg_test
     memory
+    misc
     net_cls
     net_prio
     pids
diff --git a/Documentation/admin-guide/cgroup-v1/misc.rst b/Documentation/admin-guide/cgroup-v1/misc.rst
new file mode 100644
index 000000000000..661614c24df3
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/misc.rst
@@ -0,0 +1,4 @@
+===============
+Misc controller
+===============
+Please refer "Misc" documentation in Documentation/admin-guide/cgroup-v2.rst
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 64c62b979f2f..b1e81aa8598a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -65,8 +65,11 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
        5-7-1. RDMA Interface Files
      5-8. HugeTLB
        5.8-1. HugeTLB Interface Files
-     5-8. Misc
-       5-8-1. perf_event
+     5-9. Misc
+       5.9-1 Miscellaneous cgroup Interface Files
+       5.9-2 Migration and Ownership
+     5-10. Others
+       5-10-1. perf_event
      5-N. Non-normative information
        5-N-1. CPU controller root cgroup process behaviour
        5-N-2. IO controller root cgroup process behaviour
@@ -2171,6 +2174,72 @@ HugeTLB Interface Files
 Misc
 ----
 
+The Miscellaneous cgroup provides the resource limiting and tracking
+mechanism for the scalar resources which cannot be abstracted like the other
+cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config
+option.
+
+A resource can be added to the controller via enum misc_res_type{} in the
+include/linux/misc_cgroup.h file and the corresponding name via misc_res_name[]
+in the kernel/cgroup/misc.c file. Provider of the resource must set its
+capacity prior to using the resource by calling misc_cg_set_capacity().
+
+Once a capacity is set then the resource usage can be updated using charge and
+uncharge APIs. All of the APIs to interact with misc controller are in
+include/linux/misc_cgroup.h.
+
+Misc Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+Miscellaneous controller provides 3 interface files. If two misc resources (res_a and res_b) are registered then:
+
+  misc.capacity
+        A read-only flat-keyed file shown only in the root cgroup.  It shows
+        miscellaneous scalar resources available on the platform along with
+        their quantities::
+
+	  $ cat misc.capacity
+	  res_a 50
+	  res_b 10
+
+  misc.current
+        A read-only flat-keyed file shown in the non-root cgroups.  It shows
+        the current usage of the resources in the cgroup and its children.::
+
+	  $ cat misc.current
+	  res_a 3
+	  res_b 0
+
+  misc.max
+        A read-write flat-keyed file shown in the non root cgroups. Allowed
+        maximum usage of the resources in the cgroup and its children.::
+
+	  $ cat misc.max
+	  res_a max
+	  res_b 4
+
+	Limit can be set by::
+
+	  # echo res_a 1 > misc.max
+
+	Limit can be set to max by::
+
+	  # echo res_a max > misc.max
+
+        Limits can be set higher than the capacity value in the misc.capacity
+        file.
+
+Migration and Ownership
+~~~~~~~~~~~~~~~~~~~~~~~
+
+A miscellaneous scalar resource is charged to the cgroup in which it is used
+first, and stays charged to that cgroup until that resource is freed. Migrating
+a process to a different cgroup does not move the charge to the destination
+cgroup where the process has moved.
+
+Others
+------
+
 perf_event
 ~~~~~~~~~~
 
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/3] svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
  2021-03-30  4:42 ` Vipin Sharma
                   ` (2 preceding siblings ...)
  (?)
@ 2021-03-30  4:42 ` Vipin Sharma
  -1 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-03-30  4:42 UTC (permalink / raw)
  To: tj, mkoutny, jacob.jun.pan, rdunlap, thomas.lendacky,
	brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes,
	frankja, borntraeger
  Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, hpa, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel, Vipin Sharma

Secure Encrypted Virtualization (SEV) and Secure Encrypted
Virtualization - Encrypted State (SEV-ES) ASIDs are used to encrypt KVMs
on AMD platform. These ASIDs are available in the limited quantities on
a host.

Register their capacity and usage to the misc controller for tracking
via cgroups.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Rientjes <rientjes@google.com>
---
 arch/x86/kvm/svm/sev.c      | 70 +++++++++++++++++++++++++++++++------
 arch/x86/kvm/svm/svm.h      |  1 +
 include/linux/misc_cgroup.h |  6 ++++
 kernel/cgroup/misc.c        |  6 ++++
 4 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 874ea309279f..214eefb20414 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,6 +14,7 @@
 #include <linux/psp-sev.h>
 #include <linux/pagemap.h>
 #include <linux/swap.h>
+#include <linux/misc_cgroup.h>
 #include <linux/processor.h>
 #include <linux/trace_events.h>
 #include <asm/fpu/internal.h>
@@ -28,6 +29,21 @@
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
+#ifndef CONFIG_KVM_AMD_SEV
+/*
+ * When this config is not defined, SEV feature is not supported and APIs in
+ * this file are not used but this file still gets compiled into the KVM AMD
+ * module.
+ *
+ * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum
+ * misc_res_type {} defined in linux/misc_cgroup.h.
+ *
+ * Below macros allow compilation to succeed.
+ */
+#define MISC_CG_RES_SEV MISC_CG_RES_TYPES
+#define MISC_CG_RES_SEV_ES MISC_CG_RES_TYPES
+#endif
+
 static u8 sev_enc_bit;
 static int sev_flush_asids(void);
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -89,8 +105,19 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
 
 static int sev_asid_new(struct kvm_sev_info *sev)
 {
-	int pos, min_asid, max_asid;
+	int pos, min_asid, max_asid, ret;
 	bool retry = true;
+	enum misc_res_type type;
+
+	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	WARN_ON(sev->misc_cg);
+	sev->misc_cg = get_current_misc_cg();
+	ret = misc_cg_try_charge(type, sev->misc_cg, 1);
+	if (ret) {
+		put_misc_cg(sev->misc_cg);
+		sev->misc_cg = NULL;
+		return ret;
+	}
 
 	mutex_lock(&sev_bitmap_lock);
 
@@ -108,7 +135,8 @@ static int sev_asid_new(struct kvm_sev_info *sev)
 			goto again;
 		}
 		mutex_unlock(&sev_bitmap_lock);
-		return -EBUSY;
+		ret = -EBUSY;
+		goto e_uncharge;
 	}
 
 	__set_bit(pos, sev_asid_bitmap);
@@ -116,6 +144,11 @@ static int sev_asid_new(struct kvm_sev_info *sev)
 	mutex_unlock(&sev_bitmap_lock);
 
 	return pos + 1;
+e_uncharge:
+	misc_cg_uncharge(type, sev->misc_cg, 1);
+	put_misc_cg(sev->misc_cg);
+	sev->misc_cg = NULL;
+	return ret;
 }
 
 static int sev_get_asid(struct kvm *kvm)
@@ -125,14 +158,15 @@ static int sev_get_asid(struct kvm *kvm)
 	return sev->asid;
 }
 
-static void sev_asid_free(int asid)
+static void sev_asid_free(struct kvm_sev_info *sev)
 {
 	struct svm_cpu_data *sd;
 	int cpu, pos;
+	enum misc_res_type type;
 
 	mutex_lock(&sev_bitmap_lock);
 
-	pos = asid - 1;
+	pos = sev->asid - 1;
 	__set_bit(pos, sev_reclaim_asid_bitmap);
 
 	for_each_possible_cpu(cpu) {
@@ -141,6 +175,11 @@ static void sev_asid_free(int asid)
 	}
 
 	mutex_unlock(&sev_bitmap_lock);
+
+	type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV;
+	misc_cg_uncharge(type, sev->misc_cg, 1);
+	put_misc_cg(sev->misc_cg);
+	sev->misc_cg = NULL;
 }
 
 static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
@@ -188,19 +227,20 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	asid = sev_asid_new(sev);
 	if (asid < 0)
 		return ret;
+	sev->asid = asid;
 
 	ret = sev_platform_init(&argp->error);
 	if (ret)
 		goto e_free;
 
 	sev->active = true;
-	sev->asid = asid;
 	INIT_LIST_HEAD(&sev->regions_list);
 
 	return 0;
 
 e_free:
-	sev_asid_free(asid);
+	sev_asid_free(sev);
+	sev->asid = 0;
 	return ret;
 }
 
@@ -1315,12 +1355,12 @@ void sev_vm_destroy(struct kvm *kvm)
 	mutex_unlock(&kvm->lock);
 
 	sev_unbind_asid(kvm, sev->handle);
-	sev_asid_free(sev->asid);
+	sev_asid_free(sev);
 }
 
 void __init sev_hardware_setup(void)
 {
-	unsigned int eax, ebx, ecx, edx;
+	unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
 	bool sev_es_supported = false;
 	bool sev_supported = false;
 
@@ -1352,7 +1392,11 @@ void __init sev_hardware_setup(void)
 	if (!sev_reclaim_asid_bitmap)
 		goto out;
 
-	pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);
+	sev_asid_count = max_sev_asid - min_sev_asid + 1;
+	if (misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count))
+		goto out;
+
+	pr_info("SEV supported: %u ASIDs\n", sev_asid_count);
 	sev_supported = true;
 
 	/* SEV-ES support requested? */
@@ -1367,7 +1411,11 @@ void __init sev_hardware_setup(void)
 	if (min_sev_asid == 1)
 		goto out;
 
-	pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1);
+	sev_es_asid_count = min_sev_asid - 1;
+	if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count))
+		goto out;
+
+	pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count);
 	sev_es_supported = true;
 
 out:
@@ -1382,6 +1430,8 @@ void sev_hardware_teardown(void)
 
 	bitmap_free(sev_asid_bitmap);
 	bitmap_free(sev_reclaim_asid_bitmap);
+	misc_cg_set_capacity(MISC_CG_RES_SEV, 0);
+	misc_cg_set_capacity(MISC_CG_RES_SEV_ES, 0);
 
 	sev_flush_asids();
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 39e071fdab0c..9806aaebc37f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -65,6 +65,7 @@ struct kvm_sev_info {
 	unsigned long pages_locked; /* Number of pages locked */
 	struct list_head regions_list;  /* List of registered regions */
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
+	struct misc_cg *misc_cg; /* For misc cgroup accounting */
 };
 
 struct kvm_svm {
diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h
index 1195d36558b4..c5af592481c0 100644
--- a/include/linux/misc_cgroup.h
+++ b/include/linux/misc_cgroup.h
@@ -12,6 +12,12 @@
  * Types of misc cgroup entries supported by the host.
  */
 enum misc_res_type {
+#ifdef CONFIG_KVM_AMD_SEV
+	/* AMD SEV ASIDs resource */
+	MISC_CG_RES_SEV,
+	/* AMD SEV-ES ASIDs resource */
+	MISC_CG_RES_SEV_ES,
+#endif
 	MISC_CG_RES_TYPES
 };
 
diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index 4352bc4a3bd5..ec02d963cad1 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -18,6 +18,12 @@
 
 /* Miscellaneous res name, keep it in sync with enum misc_res_type */
 static const char *const misc_res_name[] = {
+#ifdef CONFIG_KVM_AMD_SEV
+	/* AMD SEV ASIDs resource */
+	"sev",
+	/* AMD SEV-ES ASIDs resource */
+	"sev_es",
+#endif
 };
 
 /* Root misc cgroup */
-- 
2.31.0.291.g576ba9dcdaf-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
  2021-03-30  4:42 ` Vipin Sharma
                   ` (3 preceding siblings ...)
  (?)
@ 2021-04-04 17:35 ` Tejun Heo
  2021-04-05  0:29     ` Vipin Sharma
  -1 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2021-04-04 17:35 UTC (permalink / raw)
  To: Vipin Sharma
  Cc: mkoutny, jacob.jun.pan, rdunlap, thomas.lendacky, brijesh.singh,
	jon.grimm, eric.vantassell, pbonzini, hannes, frankja,
	borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro,
	tglx, mingo, bp, hpa, gingell, rientjes, kvm, x86, cgroups,
	linux-doc, linux-kernel

Applied to cgroup/for-5.13. If there are further issues, let's address them
incrementally.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
  2021-04-04 17:35 ` [PATCH v4 0/3] cgroup: New misc cgroup controller Tejun Heo
@ 2021-04-05  0:29     ` Vipin Sharma
  0 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-04-05  0:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Koutný,
	Jacob Pan, Randy Dunlap, Tom Lendacky, Brijesh, Jon, Eric,
	pbonzini, hannes, Janosch Frank, Christian Borntraeger, corbet,
	Sean Christopherson, vkuznets, wanpengli, Jim Mattson, joro,
	tglx, mingo, bp, hpa, Matt Gingell, David Rientjes, kvm, x86,
	cgroups, linux-doc, linux-kernel

On Sun, Apr 4, 2021 at 10:35 AM Tejun Heo <tj@kernel.org> wrote:
>
> Applied to cgroup/for-5.13. If there are further issues, let's address them
> incrementally.
>
> Thanks.
>
> --
> tejun

Thanks Tejun for accepting and guiding through each version of this
patch series.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-04-05  0:29     ` Vipin Sharma
  0 siblings, 0 replies; 14+ messages in thread
From: Vipin Sharma @ 2021-04-05  0:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Michal Koutný,
	Jacob Pan, Randy Dunlap, Tom Lendacky, Brijesh, Jon, Eric,
	pbonzini, hannes, Janosch Frank, Christian Borntraeger, corbet,
	Sean Christopherson, vkuznets, wanpengli, Jim Mattson, joro,
	tglx, mingo, bp, hpa, Matt Gingell, David Rientjes, kvm, x86,
	cgroups, linux-doc, linux-kernel

On Sun, Apr 4, 2021 at 10:35 AM Tejun Heo <tj@kernel.org> wrote:
>
> Applied to cgroup/for-5.13. If there are further issues, let's address them
> incrementally.
>
> Thanks.
>
> --
> tejun

Thanks Tejun for accepting and guiding through each version of this
patch series.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-09-23 15:35   ` Xingyou Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Xingyou Chen @ 2021-09-23 15:35 UTC (permalink / raw)
  To: Vipin Sharma, tj, mkoutny, jacob.jun.pan, rdunlap,
	thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell,
	pbonzini, hannes, frankja, borntraeger, brian.welty
  Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, hpa, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel



在 2021/3/30 12:42, Vipin Sharma 写道:
> Hello,
> 
> This patch series is creating a new misc cgroup controller for limiting
> and tracking of resources which are not abstract like other cgroup
> controllers.
> 
> This controller was initially proposed as encryption_id but after the
> feedbacks and use cases for other resources, it is now changed to misc
> cgroup.
> https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh@google.com/
> 
> Most of the cloud infrastructure use cgroups for knowing the host state,
> track the resources usage, enforce limits on them, etc. They use this
> info to optimize work allocation in the fleet and make sure no rogue job
> consumes more than it needs and starves others.
> 
> There are resources on a system which are not abstract enough like other
> cgroup controllers and are available in a limited quantity on a host.
> 
> One of them is Secure Encrypted Virtualization (SEV) ASID on AMD CPU.
> SEV ASIDs are used for creating encrypted VMs. SEV is mostly be used by
> the cloud providers for providing confidential VMs. Since SEV ASIDs are
> limited, there is a need to schedule encrypted VMs in a cloud
> infrastructure based on SEV ASIDs availability and also to limit its
> usage.
> 
> There are similar requirements for other resource types like TDX keys,
> IOASIDs and SEID.
> 
> Adding these resources to a cgroup controller is a natural choice with
> least amount of friction. Cgroup itself says it is a mechanism to
> distribute system resources along the hierarchy in a controlled
> mechanism and configurable manner. Most of the resources in cgroups are
> abstracted enough but there are still some resources which are not
> abstract but have limited availability or have specific use cases.
> 
> Misc controller is a generic controller which can be used by these
> kinds of resources.

Will we make this dynamic? Let resources be registered via something 
like misc_cg_res_{register,unregister}, at compile time or runtime, 
instead of hard coded into misc_res_name/misc_res_capacity etc.

There are needs as noted in drmcg session earlier this year. We may
make misc cgroup stable, and let device drivers to register their
own resources.

This may make misc cgroup controller more complex than expected, but
simpler than adding multiple similar controllers.

> 
> One suggestion was to use BPF for this purpose, however, there are
> couple of things which might not be addressed with BPF:
> 1. Which controller to use in v1 case? These are not abstract resources
>     so in v1 where each controller have their own hierarchy it might not
>     be easy to identify the best controller to use for BPF.
> 
> 2. Abstracting out a single BPF program which can help with all of the
>     resources types might not be possible, because resources we are
>     working with are not similar and abstract enough, for example network
>     packets, and there will be different places in the source code to use
>     these resources.
> 
> A new cgroup controller tends to give much easier and well integrated
> solution when it comes to scheduling and limiting a resource with
> existing tools in a cloud infrastructure.
> 
> Changes in RFC v4:
> 1. Misc controller patch is split into two patches. One for generic misc
>     controller and second for adding SEV and SEV-ES resource.
> 2. Using READ_ONCE and WRITE_ONCE for variable accesses.
> 3. Updated documentation.
> 4. Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.
> 5. Included cgroup header in misc_cgroup.h.
> 6. misc_cg_reduce_charge changed to misc_cg_cancel_charge.
> 7. misc_cg set to NULL after uncharge.
> 8. Added WARN_ON if misc_cg not NULL before charging in SEV/SEV-ES.
> 
> Changes in RFC v3:
> 1. Changed implementation to support 64 bit counters.
> 2. Print kernel logs only once per resource per cgroup.
> 3. Capacity can be set less than the current usage.
> 
> Changes in RFC v2:
> 1. Documentation fixes.
> 2. Added kernel log messages.
> 3. Changed charge API to treat misc_cg as input parameter.
> 4. Added helper APIs to get and release references on the cgroup.
> 
> [1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh@google.com
> [2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh@google.com/
> [3] https://lore.kernel.org/lkml/20210304231946.2766648-1-vipinsh@google.com/
> 
> Vipin Sharma (3):
>    cgroup: Add misc cgroup controller
>    cgroup: Miscellaneous cgroup documentation.
>    svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
> 
>   Documentation/admin-guide/cgroup-v1/index.rst |   1 +
>   Documentation/admin-guide/cgroup-v1/misc.rst  |   4 +
>   Documentation/admin-guide/cgroup-v2.rst       |  73 +++-
>   arch/x86/kvm/svm/sev.c                        |  70 ++-
>   arch/x86/kvm/svm/svm.h                        |   1 +
>   include/linux/cgroup_subsys.h                 |   4 +
>   include/linux/misc_cgroup.h                   | 132 ++++++
>   init/Kconfig                                  |  14 +
>   kernel/cgroup/Makefile                        |   1 +
>   kernel/cgroup/misc.c                          | 407 ++++++++++++++++++
>   10 files changed, 695 insertions(+), 12 deletions(-)
>   create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst
>   create mode 100644 include/linux/misc_cgroup.h
>   create mode 100644 kernel/cgroup/misc.c
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-09-23 15:35   ` Xingyou Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Xingyou Chen @ 2021-09-23 15:35 UTC (permalink / raw)
  To: Vipin Sharma, tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w,
	rdunlap-wEGCiKHe2LqWVfeAwA7xHQ, thomas.lendacky-5C7GfCeVMHo,
	brijesh.singh-5C7GfCeVMHo, jon.grimm-5C7GfCeVMHo,
	eric.vantassell-5C7GfCeVMHo, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	hannes-druUgvl0LCNAfugRpC6u6w, frankja-tEXmvtCZX7AybS5Ee8rs3A,
	borntraeger-tA70FqPdS9bQT0dZR+AlfA,
	brian.welty-ral2JQCrhuEAvxtiuMwx3w
  Cc: corbet-T1hC0tSOHrs, seanjc-hpIqsD4AKlfQT0dZR+AlfA,
	vkuznets-H+wXaHxf7aLQT0dZR+AlfA,
	wanpengli-1Nz4purKYjRBDgjK7y7TUQ,
	jmattson-hpIqsD4AKlfQT0dZR+AlfA, joro-zLv9SwRftAIdnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	bp-Gina5bIWoIWzQB+pC5nmwQ, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	gingell-hpIqsD4AKlfQT0dZR+AlfA, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	kvm-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA



在 2021/3/30 12:42, Vipin Sharma 写道:
> Hello,
> 
> This patch series is creating a new misc cgroup controller for limiting
> and tracking of resources which are not abstract like other cgroup
> controllers.
> 
> This controller was initially proposed as encryption_id but after the
> feedbacks and use cases for other resources, it is now changed to misc
> cgroup.
> https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> 
> Most of the cloud infrastructure use cgroups for knowing the host state,
> track the resources usage, enforce limits on them, etc. They use this
> info to optimize work allocation in the fleet and make sure no rogue job
> consumes more than it needs and starves others.
> 
> There are resources on a system which are not abstract enough like other
> cgroup controllers and are available in a limited quantity on a host.
> 
> One of them is Secure Encrypted Virtualization (SEV) ASID on AMD CPU.
> SEV ASIDs are used for creating encrypted VMs. SEV is mostly be used by
> the cloud providers for providing confidential VMs. Since SEV ASIDs are
> limited, there is a need to schedule encrypted VMs in a cloud
> infrastructure based on SEV ASIDs availability and also to limit its
> usage.
> 
> There are similar requirements for other resource types like TDX keys,
> IOASIDs and SEID.
> 
> Adding these resources to a cgroup controller is a natural choice with
> least amount of friction. Cgroup itself says it is a mechanism to
> distribute system resources along the hierarchy in a controlled
> mechanism and configurable manner. Most of the resources in cgroups are
> abstracted enough but there are still some resources which are not
> abstract but have limited availability or have specific use cases.
> 
> Misc controller is a generic controller which can be used by these
> kinds of resources.

Will we make this dynamic? Let resources be registered via something 
like misc_cg_res_{register,unregister}, at compile time or runtime, 
instead of hard coded into misc_res_name/misc_res_capacity etc.

There are needs as noted in drmcg session earlier this year. We may
make misc cgroup stable, and let device drivers to register their
own resources.

This may make misc cgroup controller more complex than expected, but
simpler than adding multiple similar controllers.

> 
> One suggestion was to use BPF for this purpose, however, there are
> couple of things which might not be addressed with BPF:
> 1. Which controller to use in v1 case? These are not abstract resources
>     so in v1 where each controller have their own hierarchy it might not
>     be easy to identify the best controller to use for BPF.
> 
> 2. Abstracting out a single BPF program which can help with all of the
>     resources types might not be possible, because resources we are
>     working with are not similar and abstract enough, for example network
>     packets, and there will be different places in the source code to use
>     these resources.
> 
> A new cgroup controller tends to give much easier and well integrated
> solution when it comes to scheduling and limiting a resource with
> existing tools in a cloud infrastructure.
> 
> Changes in RFC v4:
> 1. Misc controller patch is split into two patches. One for generic misc
>     controller and second for adding SEV and SEV-ES resource.
> 2. Using READ_ONCE and WRITE_ONCE for variable accesses.
> 3. Updated documentation.
> 4. Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.
> 5. Included cgroup header in misc_cgroup.h.
> 6. misc_cg_reduce_charge changed to misc_cg_cancel_charge.
> 7. misc_cg set to NULL after uncharge.
> 8. Added WARN_ON if misc_cg not NULL before charging in SEV/SEV-ES.
> 
> Changes in RFC v3:
> 1. Changed implementation to support 64 bit counters.
> 2. Print kernel logs only once per resource per cgroup.
> 3. Capacity can be set less than the current usage.
> 
> Changes in RFC v2:
> 1. Documentation fixes.
> 2. Added kernel log messages.
> 3. Changed charge API to treat misc_cg as input parameter.
> 4. Added helper APIs to get and release references on the cgroup.
> 
> [1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
> [2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> [3] https://lore.kernel.org/lkml/20210304231946.2766648-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> 
> Vipin Sharma (3):
>    cgroup: Add misc cgroup controller
>    cgroup: Miscellaneous cgroup documentation.
>    svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
> 
>   Documentation/admin-guide/cgroup-v1/index.rst |   1 +
>   Documentation/admin-guide/cgroup-v1/misc.rst  |   4 +
>   Documentation/admin-guide/cgroup-v2.rst       |  73 +++-
>   arch/x86/kvm/svm/sev.c                        |  70 ++-
>   arch/x86/kvm/svm/svm.h                        |   1 +
>   include/linux/cgroup_subsys.h                 |   4 +
>   include/linux/misc_cgroup.h                   | 132 ++++++
>   init/Kconfig                                  |  14 +
>   kernel/cgroup/Makefile                        |   1 +
>   kernel/cgroup/misc.c                          | 407 ++++++++++++++++++
>   10 files changed, 695 insertions(+), 12 deletions(-)
>   create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst
>   create mode 100644 include/linux/misc_cgroup.h
>   create mode 100644 kernel/cgroup/misc.c
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-09-23 15:38   ` Xingyou Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Xingyou Chen @ 2021-09-23 15:38 UTC (permalink / raw)
  To: Vipin Sharma, tj, mkoutny, jacob.jun.pan, rdunlap,
	thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell,
	pbonzini, hannes, frankja, borntraeger, brian.welty
  Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo,
	bp, hpa, gingell, rientjes, kvm, x86, cgroups, linux-doc,
	linux-kernel

On 2021/3/30 12:42, Vipin Sharma wrote:
> Hello,
> 
> This patch series is creating a new misc cgroup controller for limiting
> and tracking of resources which are not abstract like other cgroup
> controllers.
> 
> This controller was initially proposed as encryption_id but after the
> feedbacks and use cases for other resources, it is now changed to misc
> cgroup.
> https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh@google.com/
> 
> Most of the cloud infrastructure use cgroups for knowing the host state,
> track the resources usage, enforce limits on them, etc. They use this
> info to optimize work allocation in the fleet and make sure no rogue job
> consumes more than it needs and starves others.
> 
> There are resources on a system which are not abstract enough like other
> cgroup controllers and are available in a limited quantity on a host.
> 
> One of them is Secure Encrypted Virtualization (SEV) ASID on AMD CPU.
> SEV ASIDs are used for creating encrypted VMs. SEV is mostly be used by
> the cloud providers for providing confidential VMs. Since SEV ASIDs are
> limited, there is a need to schedule encrypted VMs in a cloud
> infrastructure based on SEV ASIDs availability and also to limit its
> usage.
> 
> There are similar requirements for other resource types like TDX keys,
> IOASIDs and SEID.
> 
> Adding these resources to a cgroup controller is a natural choice with
> least amount of friction. Cgroup itself says it is a mechanism to
> distribute system resources along the hierarchy in a controlled
> mechanism and configurable manner. Most of the resources in cgroups are
> abstracted enough but there are still some resources which are not
> abstract but have limited availability or have specific use cases.
> 
> Misc controller is a generic controller which can be used by these
> kinds of resources.

Will we make this dynamic? Let resources be registered via something
like misc_cg_res_{register,unregister}, at compile time or runtime,
instead of hard coded into misc_res_name/misc_res_capacity etc.

There are needs as noted in drmcg session earlier this year. We may
make misc cgroup stable, and let device drivers to register their
own resources.

This may make misc cgroup controller more complex than expected,
but simpler than adding multiple similar controllers.

> 
> One suggestion was to use BPF for this purpose, however, there are
> couple of things which might not be addressed with BPF:
> 1. Which controller to use in v1 case? These are not abstract resources
>     so in v1 where each controller have their own hierarchy it might not
>     be easy to identify the best controller to use for BPF.
> 
> 2. Abstracting out a single BPF program which can help with all of the
>     resources types might not be possible, because resources we are
>     working with are not similar and abstract enough, for example network
>     packets, and there will be different places in the source code to use
>     these resources.
> 
> A new cgroup controller tends to give much easier and well integrated
> solution when it comes to scheduling and limiting a resource with
> existing tools in a cloud infrastructure.
> 
> Changes in RFC v4:
> 1. Misc controller patch is split into two patches. One for generic misc
>     controller and second for adding SEV and SEV-ES resource.
> 2. Using READ_ONCE and WRITE_ONCE for variable accesses.
> 3. Updated documentation.
> 4. Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.
> 5. Included cgroup header in misc_cgroup.h.
> 6. misc_cg_reduce_charge changed to misc_cg_cancel_charge.
> 7. misc_cg set to NULL after uncharge.
> 8. Added WARN_ON if misc_cg not NULL before charging in SEV/SEV-ES.
> 
> Changes in RFC v3:
> 1. Changed implementation to support 64 bit counters.
> 2. Print kernel logs only once per resource per cgroup.
> 3. Capacity can be set less than the current usage.
> 
> Changes in RFC v2:
> 1. Documentation fixes.
> 2. Added kernel log messages.
> 3. Changed charge API to treat misc_cg as input parameter.
> 4. Added helper APIs to get and release references on the cgroup.
> 
> [1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh@google.com
> [2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh@google.com/
> [3] https://lore.kernel.org/lkml/20210304231946.2766648-1-vipinsh@google.com/
> 
> Vipin Sharma (3):
>    cgroup: Add misc cgroup controller
>    cgroup: Miscellaneous cgroup documentation.
>    svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
> 
>   Documentation/admin-guide/cgroup-v1/index.rst |   1 +
>   Documentation/admin-guide/cgroup-v1/misc.rst  |   4 +
>   Documentation/admin-guide/cgroup-v2.rst       |  73 +++-
>   arch/x86/kvm/svm/sev.c                        |  70 ++-
>   arch/x86/kvm/svm/svm.h                        |   1 +
>   include/linux/cgroup_subsys.h                 |   4 +
>   include/linux/misc_cgroup.h                   | 132 ++++++
>   init/Kconfig                                  |  14 +
>   kernel/cgroup/Makefile                        |   1 +
>   kernel/cgroup/misc.c                          | 407 ++++++++++++++++++
>   10 files changed, 695 insertions(+), 12 deletions(-)
>   create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst
>   create mode 100644 include/linux/misc_cgroup.h
>   create mode 100644 kernel/cgroup/misc.c
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
@ 2021-09-23 15:38   ` Xingyou Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Xingyou Chen @ 2021-09-23 15:38 UTC (permalink / raw)
  To: Vipin Sharma, tj-DgEjT+Ai2ygdnm+yROfE0A, mkoutny-IBi9RG/b67k,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w,
	rdunlap-wEGCiKHe2LqWVfeAwA7xHQ, thomas.lendacky-5C7GfCeVMHo,
	brijesh.singh-5C7GfCeVMHo, jon.grimm-5C7GfCeVMHo,
	eric.vantassell-5C7GfCeVMHo, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	hannes-druUgvl0LCNAfugRpC6u6w, frankja-tEXmvtCZX7AybS5Ee8rs3A,
	borntraeger-tA70FqPdS9bQT0dZR+AlfA,
	brian.welty-ral2JQCrhuEAvxtiuMwx3w
  Cc: corbet-T1hC0tSOHrs, seanjc-hpIqsD4AKlfQT0dZR+AlfA,
	vkuznets-H+wXaHxf7aLQT0dZR+AlfA,
	wanpengli-1Nz4purKYjRBDgjK7y7TUQ,
	jmattson-hpIqsD4AKlfQT0dZR+AlfA, joro-zLv9SwRftAIdnm+yROfE0A,
	tglx-hfZtesqFncYOwBW4kG4KsQ, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	bp-Gina5bIWoIWzQB+pC5nmwQ, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	gingell-hpIqsD4AKlfQT0dZR+AlfA, rientjes-hpIqsD4AKlfQT0dZR+AlfA,
	kvm-u79uwXL29TY76Z2rM5mHXA, x86-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 2021/3/30 12:42, Vipin Sharma wrote:
> Hello,
> 
> This patch series is creating a new misc cgroup controller for limiting
> and tracking of resources which are not abstract like other cgroup
> controllers.
> 
> This controller was initially proposed as encryption_id but after the
> feedbacks and use cases for other resources, it is now changed to misc
> cgroup.
> https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> 
> Most of the cloud infrastructure use cgroups for knowing the host state,
> track the resources usage, enforce limits on them, etc. They use this
> info to optimize work allocation in the fleet and make sure no rogue job
> consumes more than it needs and starves others.
> 
> There are resources on a system which are not abstract enough like other
> cgroup controllers and are available in a limited quantity on a host.
> 
> One of them is Secure Encrypted Virtualization (SEV) ASID on AMD CPU.
> SEV ASIDs are used for creating encrypted VMs. SEV is mostly be used by
> the cloud providers for providing confidential VMs. Since SEV ASIDs are
> limited, there is a need to schedule encrypted VMs in a cloud
> infrastructure based on SEV ASIDs availability and also to limit its
> usage.
> 
> There are similar requirements for other resource types like TDX keys,
> IOASIDs and SEID.
> 
> Adding these resources to a cgroup controller is a natural choice with
> least amount of friction. Cgroup itself says it is a mechanism to
> distribute system resources along the hierarchy in a controlled
> mechanism and configurable manner. Most of the resources in cgroups are
> abstracted enough but there are still some resources which are not
> abstract but have limited availability or have specific use cases.
> 
> Misc controller is a generic controller which can be used by these
> kinds of resources.

Will we make this dynamic? Let resources be registered via something
like misc_cg_res_{register,unregister}, at compile time or runtime,
instead of hard coded into misc_res_name/misc_res_capacity etc.

There are needs as noted in drmcg session earlier this year. We may
make misc cgroup stable, and let device drivers to register their
own resources.

This may make misc cgroup controller more complex than expected,
but simpler than adding multiple similar controllers.

> 
> One suggestion was to use BPF for this purpose, however, there are
> couple of things which might not be addressed with BPF:
> 1. Which controller to use in v1 case? These are not abstract resources
>     so in v1 where each controller have their own hierarchy it might not
>     be easy to identify the best controller to use for BPF.
> 
> 2. Abstracting out a single BPF program which can help with all of the
>     resources types might not be possible, because resources we are
>     working with are not similar and abstract enough, for example network
>     packets, and there will be different places in the source code to use
>     these resources.
> 
> A new cgroup controller tends to give much easier and well integrated
> solution when it comes to scheduling and limiting a resource with
> existing tools in a cloud infrastructure.
> 
> Changes in RFC v4:
> 1. Misc controller patch is split into two patches. One for generic misc
>     controller and second for adding SEV and SEV-ES resource.
> 2. Using READ_ONCE and WRITE_ONCE for variable accesses.
> 3. Updated documentation.
> 4. Changed EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.
> 5. Included cgroup header in misc_cgroup.h.
> 6. misc_cg_reduce_charge changed to misc_cg_cancel_charge.
> 7. misc_cg set to NULL after uncharge.
> 8. Added WARN_ON if misc_cg not NULL before charging in SEV/SEV-ES.
> 
> Changes in RFC v3:
> 1. Changed implementation to support 64 bit counters.
> 2. Print kernel logs only once per resource per cgroup.
> 3. Capacity can be set less than the current usage.
> 
> Changes in RFC v2:
> 1. Documentation fixes.
> 2. Added kernel log messages.
> 3. Changed charge API to treat misc_cg as input parameter.
> 4. Added helper APIs to get and release references on the cgroup.
> 
> [1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
> [2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> [3] https://lore.kernel.org/lkml/20210304231946.2766648-1-vipinsh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/
> 
> Vipin Sharma (3):
>    cgroup: Add misc cgroup controller
>    cgroup: Miscellaneous cgroup documentation.
>    svm/sev: Register SEV and SEV-ES ASIDs to the misc controller
> 
>   Documentation/admin-guide/cgroup-v1/index.rst |   1 +
>   Documentation/admin-guide/cgroup-v1/misc.rst  |   4 +
>   Documentation/admin-guide/cgroup-v2.rst       |  73 +++-
>   arch/x86/kvm/svm/sev.c                        |  70 ++-
>   arch/x86/kvm/svm/svm.h                        |   1 +
>   include/linux/cgroup_subsys.h                 |   4 +
>   include/linux/misc_cgroup.h                   | 132 ++++++
>   init/Kconfig                                  |  14 +
>   kernel/cgroup/Makefile                        |   1 +
>   kernel/cgroup/misc.c                          | 407 ++++++++++++++++++
>   10 files changed, 695 insertions(+), 12 deletions(-)
>   create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst
>   create mode 100644 include/linux/misc_cgroup.h
>   create mode 100644 kernel/cgroup/misc.c
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/3] cgroup: New misc cgroup controller
  2021-09-23 15:38   ` Xingyou Chen
  (?)
@ 2021-09-23 16:01   ` Tejun Heo
  -1 siblings, 0 replies; 14+ messages in thread
From: Tejun Heo @ 2021-09-23 16:01 UTC (permalink / raw)
  To: Xingyou Chen
  Cc: Vipin Sharma, mkoutny, jacob.jun.pan, rdunlap, thomas.lendacky,
	brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes,
	frankja, borntraeger, brian.welty, corbet, seanjc, vkuznets,
	wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell,
	rientjes, kvm, x86, cgroups, linux-doc, linux-kernel

On Thu, Sep 23, 2021 at 11:38:49PM +0800, Xingyou Chen wrote:
> > Misc controller is a generic controller which can be used by these
> > kinds of resources.
> 
> Will we make this dynamic? Let resources be registered via something
> like misc_cg_res_{register,unregister}, at compile time or runtime,
> instead of hard coded into misc_res_name/misc_res_capacity etc.
> 
> There are needs as noted in drmcg session earlier this year. We may
> make misc cgroup stable, and let device drivers to register their
> own resources.

Not too likely given that the need for one-off resources for a specific
driver seems to indicate lack of proper abstraction and control mechanism
more than anything else. Even for cases where there are genuine needs for
per-hardware knobs, I think it's prudent to enforce a review cycle which
involves people who aren't directly working on the specific driver.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-23 16:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30  4:42 [PATCH v4 0/3] cgroup: New misc cgroup controller Vipin Sharma
2021-03-30  4:42 ` Vipin Sharma
2021-03-30  4:42 ` [PATCH v4 1/3] cgroup: Add " Vipin Sharma
2021-03-30  4:42   ` Vipin Sharma
2021-03-30  4:42 ` [PATCH v4 2/3] cgroup: Miscellaneous cgroup documentation Vipin Sharma
2021-03-30  4:42 ` [PATCH v4 3/3] svm/sev: Register SEV and SEV-ES ASIDs to the misc controller Vipin Sharma
2021-04-04 17:35 ` [PATCH v4 0/3] cgroup: New misc cgroup controller Tejun Heo
2021-04-05  0:29   ` Vipin Sharma
2021-04-05  0:29     ` Vipin Sharma
2021-09-23 15:35 ` Xingyou Chen
2021-09-23 15:35   ` Xingyou Chen
2021-09-23 15:38 ` Xingyou Chen
2021-09-23 15:38   ` Xingyou Chen
2021-09-23 16:01   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.