* [Patch v3 0/2] cgroup: New misc cgroup controller @ 2021-03-04 23:19 Vipin Sharma 2021-03-04 23:19 ` [Patch v3 1/2] cgroup: sev: Add " Vipin Sharma ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Vipin Sharma @ 2021-03-04 23:19 UTC (permalink / raw) To: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Vipin Sharma Hello This patch series is creating a new misc cgroup controller for limiting and tracking of resources which are not abstract like other cgroup controllers. This controller was initially proposed as encryption_id but after the feedbacks, it is now changed to misc cgroup. https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh@google.com/ Changes in RFC v3: 1. Changed implementation to support 64 bit counters. 2. Print kernel logs only once per resource per cgroup. 3. Capacity can be set less than the current usage. Changes in RFC v2: 1. Documentation fixes. 2. Added kernel log messages. 3. Changed charge API to treat misc_cg as input parameter. 4. Added helper APIs to get and release references on the cgroup. [1] https://lore.kernel.org/lkml/20210218195549.1696769-1-vipinsh@google.com [2] https://lore.kernel.org/lkml/20210302081705.1990283-1-vipinsh@google.com/ Vipin Sharma (2): cgroup: sev: Add misc cgroup controller cgroup: sev: Miscellaneous cgroup documentation. Documentation/admin-guide/cgroup-v1/index.rst | 1 + Documentation/admin-guide/cgroup-v1/misc.rst | 4 + Documentation/admin-guide/cgroup-v2.rst | 69 ++- arch/x86/kvm/svm/sev.c | 65 ++- arch/x86/kvm/svm/svm.h | 1 + include/linux/cgroup_subsys.h | 4 + include/linux/misc_cgroup.h | 130 ++++++ init/Kconfig | 14 + kernel/cgroup/Makefile | 1 + kernel/cgroup/misc.c | 402 ++++++++++++++++++ 10 files changed, 679 insertions(+), 12 deletions(-) create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst create mode 100644 include/linux/misc_cgroup.h create mode 100644 kernel/cgroup/misc.c -- 2.30.1.766.gb4fecdf3b7-goog ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-04 23:19 [Patch v3 0/2] cgroup: New misc cgroup controller Vipin Sharma @ 2021-03-04 23:19 ` Vipin Sharma 2021-03-11 18:59 ` Michal Koutný 2021-03-19 21:28 ` Jacob Pan 2021-03-04 23:19 ` [Patch v3 2/2] cgroup: sev: Miscellaneous cgroup documentation Vipin Sharma 2021-03-07 12:48 ` [Patch v3 0/2] cgroup: New misc cgroup controller Tejun Heo 2 siblings, 2 replies; 19+ messages in thread From: Vipin Sharma @ 2021-03-04 23:19 UTC (permalink / raw) To: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Vipin Sharma The Miscellaneous cgroup provides the resource limiting and tracking mechanism for the scalar resources which cannot be abstracted like the other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config option. The first two resources added to the miscellaneous controller are Secure Encrypted Virtualization (SEV) ASIDs and SEV - Encrypted State (SEV-ES) ASIDs. These limited ASIDs are used for encrypting virtual machines memory on the AMD platform Miscellaneous controller provides 3 interface files: misc.capacity A read-only flat-keyed file shown only in the root cgroup. It shows miscellaneous scalar resources available on the platform along with their quantities:: $ cat misc.capacity sev 50 sev_es 10 misc.current A read-only flat-keyed file shown in the non-root cgroups. It shows the current usage of the resources in the cgroup and its children:: $ cat misc.current sev 3 sev_es 0 misc.max A read-write flat-keyed file shown in the non root cgroups. Allowed maximum usage of the resources in the cgroup and its children.:: $ cat misc.max sev max sev_es 4 Limit can be set by:: # echo sev 1 > misc.max Limit can be set to max by:: # echo sev max > misc.max Limits can be set more than the capacity value in the misc.capacity file. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Rientjes <rientjes@google.com> --- arch/x86/kvm/svm/sev.c | 65 +++++- arch/x86/kvm/svm/svm.h | 1 + include/linux/cgroup_subsys.h | 4 + include/linux/misc_cgroup.h | 130 +++++++++++ init/Kconfig | 14 ++ kernel/cgroup/Makefile | 1 + kernel/cgroup/misc.c | 402 ++++++++++++++++++++++++++++++++++ 7 files changed, 607 insertions(+), 10 deletions(-) create mode 100644 include/linux/misc_cgroup.h create mode 100644 kernel/cgroup/misc.c diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 48017fef1cd9..dd05a1522862 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -14,6 +14,7 @@ #include <linux/psp-sev.h> #include <linux/pagemap.h> #include <linux/swap.h> +#include <linux/misc_cgroup.h> #include <linux/processor.h> #include <linux/trace_events.h> #include <asm/fpu/internal.h> @@ -27,6 +28,21 @@ #define __ex(x) __kvm_handle_fault_on_reboot(x) +#ifndef CONFIG_KVM_AMD_SEV +/* + * When this config is not defined, SEV feature is not supported and APIs in + * this file are not used but this file still gets compiled into the KVM AMD + * module. + * + * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum + * misc_res_type {} defined in linux/misc_cgroup.h. + * + * Below macros allow compilation to succeed. + */ +#define MISC_CG_RES_SEV MISC_CG_RES_TYPES +#define MISC_CG_RES_SEV_ES MISC_CG_RES_TYPES +#endif + static u8 sev_enc_bit; static int sev_flush_asids(void); static DECLARE_RWSEM(sev_deactivate_lock); @@ -88,8 +104,17 @@ static bool __sev_recycle_asids(int min_asid, int max_asid) static int sev_asid_new(struct kvm_sev_info *sev) { - int pos, min_asid, max_asid; + int pos, min_asid, max_asid, ret; bool retry = true; + enum misc_res_type type; + + type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV; + sev->misc_cg = get_current_misc_cg(); + ret = misc_cg_try_charge(type, sev->misc_cg, 1); + if (ret) { + put_misc_cg(sev->misc_cg); + return ret; + } mutex_lock(&sev_bitmap_lock); @@ -107,7 +132,8 @@ static int sev_asid_new(struct kvm_sev_info *sev) goto again; } mutex_unlock(&sev_bitmap_lock); - return -EBUSY; + ret = -EBUSY; + goto e_uncharge; } __set_bit(pos, sev_asid_bitmap); @@ -115,6 +141,10 @@ static int sev_asid_new(struct kvm_sev_info *sev) mutex_unlock(&sev_bitmap_lock); return pos + 1; +e_uncharge: + misc_cg_uncharge(type, sev->misc_cg, 1); + put_misc_cg(sev->misc_cg); + return ret; } static int sev_get_asid(struct kvm *kvm) @@ -124,14 +154,15 @@ static int sev_get_asid(struct kvm *kvm) return sev->asid; } -static void sev_asid_free(int asid) +static void sev_asid_free(struct kvm_sev_info *sev) { struct svm_cpu_data *sd; int cpu, pos; + enum misc_res_type type; mutex_lock(&sev_bitmap_lock); - pos = asid - 1; + pos = sev->asid - 1; __set_bit(pos, sev_reclaim_asid_bitmap); for_each_possible_cpu(cpu) { @@ -140,6 +171,10 @@ static void sev_asid_free(int asid) } mutex_unlock(&sev_bitmap_lock); + + type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV; + misc_cg_uncharge(type, sev->misc_cg, 1); + put_misc_cg(sev->misc_cg); } static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) @@ -187,19 +222,19 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp) asid = sev_asid_new(sev); if (asid < 0) return ret; + sev->asid = asid; ret = sev_platform_init(&argp->error); if (ret) goto e_free; sev->active = true; - sev->asid = asid; INIT_LIST_HEAD(&sev->regions_list); return 0; e_free: - sev_asid_free(asid); + sev_asid_free(sev); return ret; } @@ -1243,12 +1278,12 @@ void sev_vm_destroy(struct kvm *kvm) mutex_unlock(&kvm->lock); sev_unbind_asid(kvm, sev->handle); - sev_asid_free(sev->asid); + sev_asid_free(sev); } void __init sev_hardware_setup(void) { - unsigned int eax, ebx, ecx, edx; + unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count; bool sev_es_supported = false; bool sev_supported = false; @@ -1280,7 +1315,11 @@ void __init sev_hardware_setup(void) if (!sev_reclaim_asid_bitmap) goto out; - pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1); + sev_asid_count = max_sev_asid - min_sev_asid + 1; + if (misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count)) + goto out; + + pr_info("SEV supported: %u ASIDs\n", sev_asid_count); sev_supported = true; /* SEV-ES support requested? */ @@ -1295,7 +1334,11 @@ void __init sev_hardware_setup(void) if (min_sev_asid == 1) goto out; - pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1); + sev_es_asid_count = min_sev_asid - 1; + if (misc_cg_set_capacity(MISC_CG_RES_SEV_ES, sev_es_asid_count)) + goto out; + + pr_info("SEV-ES supported: %u ASIDs\n", sev_es_asid_count); sev_es_supported = true; out: @@ -1310,6 +1353,8 @@ void sev_hardware_teardown(void) bitmap_free(sev_asid_bitmap); bitmap_free(sev_reclaim_asid_bitmap); + misc_cg_set_capacity(MISC_CG_RES_SEV, 0); + misc_cg_set_capacity(MISC_CG_RES_SEV_ES, 0); sev_flush_asids(); } diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 6e7d070f8b86..8ed6ebf47885 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -79,6 +79,7 @@ struct kvm_sev_info { unsigned long pages_locked; /* Number of pages locked */ struct list_head regions_list; /* List of registered regions */ u64 ap_jump_table; /* SEV-ES AP Jump Table address */ + struct misc_cg *misc_cg; /* For misc cgroup accounting */ }; struct kvm_svm { diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index acb77dcff3b4..445235487230 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -61,6 +61,10 @@ SUBSYS(pids) SUBSYS(rdma) #endif +#if IS_ENABLED(CONFIG_CGROUP_MISC) +SUBSYS(misc) +#endif + /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h new file mode 100644 index 000000000000..c807840c9238 --- /dev/null +++ b/include/linux/misc_cgroup.h @@ -0,0 +1,130 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Miscellaneous cgroup controller. + * + * Copyright 2020 Google LLC + * Author: Vipin Sharma <vipinsh@google.com> + */ +#ifndef _MISC_CGROUP_H_ +#define _MISC_CGROUP_H_ + +/** + * Types of misc cgroup entries supported by the host. + */ +enum misc_res_type { +#ifdef CONFIG_KVM_AMD_SEV + /* AMD SEV ASIDs resource */ + MISC_CG_RES_SEV, + /* AMD SEV-ES ASIDs resource */ + MISC_CG_RES_SEV_ES, +#endif + MISC_CG_RES_TYPES +}; + +struct misc_cg; + +#ifdef CONFIG_CGROUP_MISC + +/** + * struct misc_res: Per cgroup per misc type resource + * @max: Maximum limit on the resource. + * @usage: Current usage of the resource. + * @failed: True if charged failed for the resource in a cgroup. + */ +struct misc_res { + unsigned long max; + atomic_long_t usage; + bool failed; +}; + +/** + * struct misc_cg - Miscellaneous controller's cgroup structure. + * @css: cgroup subsys state object. + * @res: Array of misc resources usage in the cgroup. + */ +struct misc_cg { + struct cgroup_subsys_state css; + struct misc_res res[MISC_CG_RES_TYPES]; +}; + +unsigned long misc_cg_res_total_usage(enum misc_res_type type); +int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity); +int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, + unsigned long amount); +void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, + unsigned long amount); + +/** + * css_misc() - Get misc cgroup from the css. + * @css: cgroup subsys state object. + * + * Context: Any context. + * Return: + * * %NULL - If @css is null. + * * struct misc_cg* - misc cgroup pointer of the passed css. + */ +static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct misc_cg, css) : NULL; +} + +/* + * get_current_misc_cg() - Find and get the misc cgroup of the current task. + * + * Returned cgroup has its ref count increased by 1. Caller must call + * put_misc_cg() to return the reference. + * + * Return: Misc cgroup to which the current task belongs to. + */ +static inline struct misc_cg *get_current_misc_cg(void) +{ + return css_misc(task_get_css(current, misc_cgrp_id)); +} + +/* + * put_misc_cg() - Put the misc cgroup and reduce its ref count. + * @cg - cgroup to put. + */ +static inline void put_misc_cg(struct misc_cg *cg) +{ + if (cg) + css_put(&cg->css); +} + +#else /* !CONFIG_CGROUP_MISC */ + +unsigned long misc_cg_res_total_usage(enum misc_res_type type) +{ + return 0; +} + +static inline int misc_cg_set_capacity(enum misc_res_type type, + unsigned long capacity) +{ + return 0; +} + +static inline int misc_cg_try_charge(enum misc_res_type type, + struct misc_cg *cg, + unsigned long amount) +{ + return 0; +} + +static inline void misc_cg_uncharge(enum misc_res_type type, + struct misc_cg *cg, + unsigned long amount) +{ +} + +static inline struct misc_cg *get_current_misc_cg(void) +{ + return NULL; +} + +static inline void put_misc_cg(struct misc_cg *cg) +{ +} + +#endif /* CONFIG_CGROUP_MISC */ +#endif /* _MISC_CGROUP_H_ */ diff --git a/init/Kconfig b/init/Kconfig index 29ad68325028..0b392135e555 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1105,6 +1105,20 @@ config CGROUP_BPF BPF_CGROUP_INET_INGRESS will be executed on the ingress path of inet sockets. +config CGROUP_MISC + bool "Misc resource controller" + default n + help + Provides a controller for miscellaneous resources on a host. + + Miscellaneous scalar resources are the resources on the host system + which cannot be abstracted like the other cgroups. This controller + tracks and limits the miscellaneous resources used by a process + attached to a cgroup hierarchy. + + For more information, please check misc cgroup section in + /Documentation/admin-guide/cgroup-v2.rst. + config CGROUP_DEBUG bool "Debug controller" default n diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile index 5d7a76bfbbb7..12f8457ad1f9 100644 --- a/kernel/cgroup/Makefile +++ b/kernel/cgroup/Makefile @@ -5,4 +5,5 @@ obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o obj-$(CONFIG_CGROUP_PIDS) += pids.o obj-$(CONFIG_CGROUP_RDMA) += rdma.o obj-$(CONFIG_CPUSETS) += cpuset.o +obj-$(CONFIG_CGROUP_MISC) += misc.o obj-$(CONFIG_CGROUP_DEBUG) += debug.o diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c new file mode 100644 index 000000000000..393d66c0b933 --- /dev/null +++ b/kernel/cgroup/misc.c @@ -0,0 +1,402 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Miscellaneous cgroup controller + * + * Copyright 2020 Google LLC + * Author: Vipin Sharma <vipinsh@google.com> + */ + +#include <linux/limits.h> +#include <linux/cgroup.h> +#include <linux/errno.h> +#include <linux/atomic.h> +#include <linux/slab.h> +#include <linux/misc_cgroup.h> + +#define MAX_STR "max" +#define MAX_NUM ULONG_MAX + +/* Miscellaneous res name, keep it in sync with enum misc_res_type */ +static const char *const misc_res_name[] = { +#ifdef CONFIG_KVM_AMD_SEV + "sev", + "sev_es", +#endif +}; + +/* Root misc cgroup */ +static struct misc_cg root_cg; + +/* + * Miscellaneous resources capacity for the entire machine. 0 capacity means + * resource is not initialized or not present in the host. + * + * root_cg.max and capacity are independent of each other. root_cg.max can be + * more than the actual capacity. We are using Limits resource distribution + * model of cgroup for miscellaneous controller. + */ +static unsigned long misc_res_capacity[MISC_CG_RES_TYPES]; + +/** + * parent_misc() - Get the parent of the passed misc cgroup. + * @cgroup: cgroup whose parent needs to be fetched. + * + * Context: Any context. + * Return: + * * struct misc_cg* - Parent of the @cgroup. + * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + */ +static struct misc_cg *parent_misc(struct misc_cg *cgroup) +{ + return cgroup ? css_misc(cgroup->css.parent) : NULL; +} + +/** + * valid_type() - Check if @type is valid or not. + * @type: misc res type. + * + * Context: Any context. + * Return: + * * true - If valid type. + * * false - If not valid type. + */ +static inline bool valid_type(enum misc_res_type type) +{ + return type >= 0 && type < MISC_CG_RES_TYPES; +} + +/** + * misc_cg_res_total_usage() - Get the current total usage of the resource. + * @type: misc res type. + * + * Context: Any context. + * Return: Current total usage of the resource. + */ +unsigned long misc_cg_res_total_usage(enum misc_res_type type) +{ + if (valid_type(type)) + return atomic_long_read(&root_cg.res[type].usage); + + return 0; +} +EXPORT_SYMBOL(misc_cg_res_total_usage); + +/** + * misc_cg_set_capacity() - Set the capacity of the misc cgroup res. + * @type: Type of the misc res. + * @capacity: Supported capacity of the misc res on the host. + * + * If capacity is 0 then the charging a misc cgroup fails for that type. + * + * Context: Any context. + * Return: + * * %0 - Successfully registered the capacity. + * * %-EINVAL - If @type is invalid. + */ +int misc_cg_set_capacity(enum misc_res_type type, unsigned long capacity) +{ + if (!valid_type(type)) + return -EINVAL; + + misc_res_capacity[type] = capacity; + return 0; +} +EXPORT_SYMBOL(misc_cg_set_capacity); + +/** + * misc_cg_reduce_charge() - Reduce the charge from the misc cgroup. + * @type: Misc res type in misc cg to reduce the charge from. + * @cg: Misc cgroup to reduce charge from. + * @amount: Amount to reduce. + * + * Context: Any context. + */ +static void misc_cg_reduce_charge(enum misc_res_type type, struct misc_cg *cg, + unsigned long amount) +{ + WARN_ONCE(atomic_long_add_negative(-amount, &cg->res[type].usage), + "misc cgroup resource %s became less than 0", + misc_res_name[type]); +} + +/** + * misc_cg_try_charge() - Try charging the misc cgroup. + * @type: Misc res type to charge. + * @cg: Misc cgroup which will be charged. + * @amount: Amount to charge. + * + * Charge @amount to the misc cgroup. Caller must use the same cgroup during + * the uncharge call. + * + * Context: Any context. + * Return: + * * %0 - If successfully charged. + * * -EINVAL - If @type is invalid or misc res has 0 capacity. + * * -EBUSY - If max limit will be crossed or total usage will be more than the + * capacity. + */ +int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, + unsigned long amount) +{ + struct misc_cg *i, *j; + int ret; + struct misc_res *res; + int new_usage; + + if (!(valid_type(type) && cg && misc_res_capacity[type])) + return -EINVAL; + + if (!amount) + return 0; + + for (i = cg; i; i = parent_misc(i)) { + res = &i->res[type]; + + new_usage = atomic_long_add_return(amount, &res->usage); + if (new_usage > res->max || + new_usage > misc_res_capacity[type]) { + if (!res->failed) { + pr_info("cgroup: charge rejected by the misc controller for %s resource in ", + misc_res_name[type]); + pr_cont_cgroup_path(i->css.cgroup); + pr_cont("\n"); + res->failed = true; + } + ret = -EBUSY; + goto err_charge; + } + } + return 0; + +err_charge: + for (j = cg; j != i; j = parent_misc(j)) + misc_cg_reduce_charge(type, j, amount); + misc_cg_reduce_charge(type, i, amount); + return ret; +} +EXPORT_SYMBOL(misc_cg_try_charge); + +/** + * misc_cg_uncharge() - Uncharge the misc cgroup. + * @type: Misc res type which was charged. + * @cg: Misc cgroup which will be uncharged. + * @amount: Charged amount. + * + * Context: Any context. + */ +void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, + unsigned long amount) +{ + struct misc_cg *i; + + if (!(amount && valid_type(type) && cg)) + return; + + for (i = cg; i; i = parent_misc(i)) + misc_cg_reduce_charge(type, i, amount); +} +EXPORT_SYMBOL(misc_cg_uncharge); + +/** + * misc_cg_max_show() - Show the misc cgroup max limit. + * @sf: Interface file + * @v: Arguments passed + * + * Context: Any context. + * Return: 0 to denote successful print. + */ +static int misc_cg_max_show(struct seq_file *sf, void *v) +{ + int i; + struct misc_cg *cg = css_misc(seq_css(sf)); + + for (i = 0; i < MISC_CG_RES_TYPES; i++) { + if (misc_res_capacity[i]) { + if (cg->res[i].max == MAX_NUM) + seq_printf(sf, "%s max\n", misc_res_name[i]); + else + seq_printf(sf, "%s %lu\n", misc_res_name[i], + cg->res[i].max); + } + } + + return 0; +} + +/** + * misc_cg_max_write() - Update the maximum limit of the cgroup. + * @of: Handler for the file. + * @buf: Data from the user. It should be either "max", 0, or a positive + * integer. + * @nbytes: Number of bytes of the data. + * @off: Offset in the file. + * + * User can pass data like: + * echo sev 23 > misc.max, OR + * echo sev max > misc.max + * + * Context: Any context. + * Return: + * * >= 0 - Number of bytes processed in the input. + * * -EINVAL - If buf is not valid. + * * -ERANGE - If number is bigger than the unsigned long capacity. + */ +static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct misc_cg *cg; + unsigned long max; + int ret = 0, i; + enum misc_res_type type = MISC_CG_RES_TYPES; + char *token; + + buf = strstrip(buf); + token = strsep(&buf, " "); + + if (!token || !buf) + return -EINVAL; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) { + if (!strcmp(misc_res_name[i], token)) { + type = i; + break; + } + } + + if (type == MISC_CG_RES_TYPES) + return -EINVAL; + + if (!strcmp(MAX_STR, buf)) { + max = ULONG_MAX; + } else { + ret = kstrtoul(buf, 0, &max); + if (ret) + return ret; + } + + cg = css_misc(of_css(of)); + + if (misc_res_capacity[type]) + cg->res[type].max = max; + else + ret = -EINVAL; + + return ret ? ret : nbytes; +} + +/** + * misc_cg_current_show() - Show the current usage of the misc cgroup. + * @sf: Interface file + * @v: Arguments passed + * + * Context: Any context. + * Return: 0 to denote successful print. + */ +static int misc_cg_current_show(struct seq_file *sf, void *v) +{ + int i; + struct misc_cg *cg = css_misc(seq_css(sf)); + + for (i = 0; i < MISC_CG_RES_TYPES; i++) { + if (misc_res_capacity[i]) + seq_printf(sf, "%s %lu\n", misc_res_name[i], + atomic_long_read(&cg->res[i].usage)); + } + + return 0; +} + +/** + * misc_cg_capacity_show() - Show the total capacity of misc res on the host. + * @sf: Interface file + * @v: Arguments passed + * + * Only present in the root cgroup directory. + * + * Context: Any context. + * Return: 0 to denote successful print. + */ +static int misc_cg_capacity_show(struct seq_file *sf, void *v) +{ + int i; + unsigned long cap; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) { + cap = READ_ONCE(misc_res_capacity[i]); + if (cap) + seq_printf(sf, "%s %lu\n", misc_res_name[i], cap); + } + + return 0; +} + +/* Misc cgroup interface files */ +static struct cftype misc_cg_files[] = { + { + .name = "max", + .write = misc_cg_max_write, + .seq_show = misc_cg_max_show, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { + .name = "current", + .seq_show = misc_cg_current_show, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { + .name = "capacity", + .seq_show = misc_cg_capacity_show, + .flags = CFTYPE_ONLY_ON_ROOT, + }, + {} +}; + +/** + * misc_cg_alloc() - Allocate misc cgroup. + * @parent_css: Parent cgroup. + * + * Context: Process context. + * Return: + * * struct cgroup_subsys_state* - css of the allocated cgroup. + * * ERR_PTR(-ENOMEM) - No memory available to allocate. + */ +static struct cgroup_subsys_state * +misc_cg_alloc(struct cgroup_subsys_state *parent_css) +{ + enum misc_res_type i; + struct misc_cg *cg; + + if (!parent_css) { + cg = &root_cg; + } else { + cg = kzalloc(sizeof(*cg), GFP_KERNEL); + if (!cg) + return ERR_PTR(-ENOMEM); + } + + for (i = 0; i < MISC_CG_RES_TYPES; i++) { + cg->res[i].max = MAX_NUM; + atomic_long_set(&cg->res[i].usage, 0); + } + + return &cg->css; +} + +/** + * misc_cg_free() - Free the misc cgroup. + * @css: cgroup subsys object. + * + * Context: Any context. + */ +static void misc_cg_free(struct cgroup_subsys_state *css) +{ + kfree(css_misc(css)); +} + +/* Cgroup controller callbacks */ +struct cgroup_subsys misc_cgrp_subsys = { + .css_alloc = misc_cg_alloc, + .css_free = misc_cg_free, + .legacy_cftypes = misc_cg_files, + .dfl_cftypes = misc_cg_files, +}; -- 2.30.1.766.gb4fecdf3b7-goog ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-04 23:19 ` [Patch v3 1/2] cgroup: sev: Add " Vipin Sharma @ 2021-03-11 18:59 ` Michal Koutný 2021-03-12 19:07 ` Vipin Sharma 2021-03-12 19:48 ` Vipin Sharma 2021-03-19 21:28 ` Jacob Pan 1 sibling, 2 replies; 19+ messages in thread From: Michal Koutný @ 2021-03-11 18:59 UTC (permalink / raw) To: Vipin Sharma Cc: tj, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3902 bytes --] Hi Vipin. On Thu, Mar 04, 2021 at 03:19:45PM -0800, Vipin Sharma <vipinsh@google.com> wrote: > arch/x86/kvm/svm/sev.c | 65 +++++- > arch/x86/kvm/svm/svm.h | 1 + > include/linux/cgroup_subsys.h | 4 + > include/linux/misc_cgroup.h | 130 +++++++++++ > init/Kconfig | 14 ++ > kernel/cgroup/Makefile | 1 + > kernel/cgroup/misc.c | 402 ++++++++++++++++++++++++++++++++++ Given different two-fold nature (SEV caller vs misc controller) of some remarks below, I think it makes sense to split this into two patches: a) generic controller implementation, b) hooking the controller into SEV ASIDs management. > +#ifndef CONFIG_KVM_AMD_SEV > +/* > + * When this config is not defined, SEV feature is not supported and APIs in > + * this file are not used but this file still gets compiled into the KVM AMD > + * module. > + * > + * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum > + * misc_res_type {} defined in linux/misc_cgroup.h. BTW, was there any progress on conditioning sev.c build on CONFIG_KVM_AMD_SEV? (So that the defines workaround isn't needeed.) > static int sev_asid_new(struct kvm_sev_info *sev) > { > - int pos, min_asid, max_asid; > + int pos, min_asid, max_asid, ret; > bool retry = true; > + enum misc_res_type type; > + > + type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV; > + sev->misc_cg = get_current_misc_cg(); > + ret = misc_cg_try_charge(type, sev->misc_cg, 1); It may be safer to WARN_ON(sev->misc_cg) at this point (see below). > [...] > +e_uncharge: > + misc_cg_uncharge(type, sev->misc_cg, 1); > + put_misc_cg(sev->misc_cg); > + return ret; vvv > @@ -140,6 +171,10 @@ static void sev_asid_free(int asid) > } > > mutex_unlock(&sev_bitmap_lock); > + > + type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV; > + misc_cg_uncharge(type, sev->misc_cg, 1); > + put_misc_cg(sev->misc_cg); It may be safer to set sev->misc_cg to NULL here. (IIUC, with current asid_{new,free} calls it shouldn't matter but why to rely on it in the future.) > +++ b/kernel/cgroup/misc.c > [...] > +static void misc_cg_reduce_charge(enum misc_res_type type, struct misc_cg *cg, > + unsigned long amount) misc_cg_cancel_charge seems to be a name more consistent with what we already have in pids and memory controller. > +static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, > + size_t nbytes, loff_t off) > +{ > [...] > + > + if (!strcmp(MAX_STR, buf)) { > + max = ULONG_MAX; MAX_NUM for consistency with other places. > + } else { > + ret = kstrtoul(buf, 0, &max); > + if (ret) > + return ret; > + } > + > + cg = css_misc(of_css(of)); > + > + if (misc_res_capacity[type]) > + cg->res[type].max = max; In theory, parallel writers can clash here, so having the limit atomic type to prevent this would resolve it. See also commit a713af394cf3 ("cgroup: pids: use atomic64_t for pids->limit"). > +static int misc_cg_current_show(struct seq_file *sf, void *v) > +{ > + int i; > + struct misc_cg *cg = css_misc(seq_css(sf)); > + > + for (i = 0; i < MISC_CG_RES_TYPES; i++) { > + if (misc_res_capacity[i]) Since there can be some residual charges after removing capacity (before draining), maybe the condition along the line if (misc_res_capacity[i] || atomic_long_read(&cg->res[i].usage)) would be more informative for debugging. > +static int misc_cg_capacity_show(struct seq_file *sf, void *v) > +{ > + int i; > + unsigned long cap; > + > + for (i = 0; i < MISC_CG_RES_TYPES; i++) { > + cap = READ_ONCE(misc_res_capacity[i]); Why is READ_ONCE only here and not in other places that (actually) check against the set capacity value? Also, there should be a paired WRITE_ONCCE in misc_cg_set_capacity(). Thanks, Michal [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-11 18:59 ` Michal Koutný @ 2021-03-12 19:07 ` Vipin Sharma 2021-03-15 18:34 ` Michal Koutný 2021-03-12 19:48 ` Vipin Sharma 1 sibling, 1 reply; 19+ messages in thread From: Vipin Sharma @ 2021-03-12 19:07 UTC (permalink / raw) To: Michal Koutný Cc: tj, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel On Thu, Mar 11, 2021 at 07:59:03PM +0100, Michal Koutný wrote: > Given different two-fold nature (SEV caller vs misc controller) of some > remarks below, I think it makes sense to split this into two patches: > a) generic controller implementation, > b) hooking the controller into SEV ASIDs management. Sounds good. I will split it. > > + if (misc_res_capacity[type]) > > + cg->res[type].max = max; > In theory, parallel writers can clash here, so having the limit atomic > type to prevent this would resolve it. See also commit a713af394cf3 > ("cgroup: pids: use atomic64_t for pids->limit"). We should be fine without atomic64_t because we are using unsigned long and not 64 bit explicitly. This will work on both 32 and 64 bit machines. But I will add READ_ONCE and WRITE_ONCE because of potential chances of load tearing and store tearing. Do you agree? > > +static int misc_cg_capacity_show(struct seq_file *sf, void *v) > > +{ > > + int i; > > + unsigned long cap; > > + > > + for (i = 0; i < MISC_CG_RES_TYPES; i++) { > > + cap = READ_ONCE(misc_res_capacity[i]); > Why is READ_ONCE only here and not in other places that (actually) check > against the set capacity value? Also, there should be a paired > WRITE_ONCCE in misc_cg_set_capacity(). This was only here to avoid multiple reads of capacity and making sure if condition and seq_print will see the same value. Also, I was not aware of load and store tearing of properly aligned and machine word size variables. I will add READ_ONCE and WRITE_ONCE at other places. Thanks Vipin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-12 19:07 ` Vipin Sharma @ 2021-03-15 18:34 ` Michal Koutný 0 siblings, 0 replies; 19+ messages in thread From: Michal Koutný @ 2021-03-15 18:34 UTC (permalink / raw) To: Vipin Sharma Cc: tj, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 811 bytes --] On Fri, Mar 12, 2021 at 11:07:14AM -0800, Vipin Sharma <vipinsh@google.com> wrote: > We should be fine without atomic64_t because we are using unsigned > long and not 64 bit explicitly. This will work on both 32 and 64 bit > machines. I see. > But I will add READ_ONCE and WRITE_ONCE because of potential chances of > load tearing and store tearing. > > Do you agree? Yes. > This was only here to avoid multiple reads of capacity and making sure > if condition and seq_print will see the same value. Aha. > Also, I was not aware of load and store tearing of properly aligned > and machine word size variables. I will add READ_ONCE and WRITE_ONCE > at other places. Yeah, although it's theoretical, I think it also serves well to annotate such unsychronized accesses. Thanks, Michal [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-11 18:59 ` Michal Koutný 2021-03-12 19:07 ` Vipin Sharma @ 2021-03-12 19:48 ` Vipin Sharma 2021-03-12 20:51 ` Sean Christopherson 1 sibling, 1 reply; 19+ messages in thread From: Vipin Sharma @ 2021-03-12 19:48 UTC (permalink / raw) To: Michal Koutný, thomas.lendacky, brijesh.singh Cc: tj, rdunlap, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel On Thu, Mar 11, 2021 at 07:59:03PM +0100, Michal Koutný wrote: > > +#ifndef CONFIG_KVM_AMD_SEV > > +/* > > + * When this config is not defined, SEV feature is not supported and APIs in > > + * this file are not used but this file still gets compiled into the KVM AMD > > + * module. > > + * > > + * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum > > + * misc_res_type {} defined in linux/misc_cgroup.h. > BTW, was there any progress on conditioning sev.c build on > CONFIG_KVM_AMD_SEV? (So that the defines workaround isn't needeed.) Tom, Brijesh, Is this something you guys thought about or have some plans to do in the future? Basically to not include sev.c in compilation if CONFIG_KVM_AMD_SEV is disabled. Michal, This should not be a blocker for misc controller. This thing can change independent of misc cgroup. Thanks Vipin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-12 19:48 ` Vipin Sharma @ 2021-03-12 20:51 ` Sean Christopherson 2021-03-12 21:18 ` Tom Lendacky 0 siblings, 1 reply; 19+ messages in thread From: Sean Christopherson @ 2021-03-12 20:51 UTC (permalink / raw) To: Vipin Sharma Cc: Michal Koutný, thomas.lendacky, brijesh.singh, tj, rdunlap, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel On Fri, Mar 12, 2021, Vipin Sharma wrote: > On Thu, Mar 11, 2021 at 07:59:03PM +0100, Michal Koutný wrote: > > > +#ifndef CONFIG_KVM_AMD_SEV > > > +/* > > > + * When this config is not defined, SEV feature is not supported and APIs in > > > + * this file are not used but this file still gets compiled into the KVM AMD > > > + * module. > > > + * > > > + * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum > > > + * misc_res_type {} defined in linux/misc_cgroup.h. > > BTW, was there any progress on conditioning sev.c build on > > CONFIG_KVM_AMD_SEV? (So that the defines workaround isn't needeed.) > > Tom, Brijesh, > Is this something you guys thought about or have some plans to do in the > future? Basically to not include sev.c in compilation if > CONFIG_KVM_AMD_SEV is disabled. It's crossed my mind, but the number of stubs needed made me back off. I'm certainly not opposed to the idea, it's just not a trivial change. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-12 20:51 ` Sean Christopherson @ 2021-03-12 21:18 ` Tom Lendacky 0 siblings, 0 replies; 19+ messages in thread From: Tom Lendacky @ 2021-03-12 21:18 UTC (permalink / raw) To: Sean Christopherson, Vipin Sharma Cc: Michal Koutný, brijesh.singh, tj, rdunlap, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel On 3/12/21 2:51 PM, Sean Christopherson wrote: > On Fri, Mar 12, 2021, Vipin Sharma wrote: >> On Thu, Mar 11, 2021 at 07:59:03PM +0100, Michal Koutný wrote: >>>> +#ifndef CONFIG_KVM_AMD_SEV >>>> +/* >>>> + * When this config is not defined, SEV feature is not supported and APIs in >>>> + * this file are not used but this file still gets compiled into the KVM AMD >>>> + * module. >>>> + * >>>> + * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum >>>> + * misc_res_type {} defined in linux/misc_cgroup.h. >>> BTW, was there any progress on conditioning sev.c build on >>> CONFIG_KVM_AMD_SEV? (So that the defines workaround isn't needeed.) >> >> Tom, Brijesh, >> Is this something you guys thought about or have some plans to do in the >> future? Basically to not include sev.c in compilation if >> CONFIG_KVM_AMD_SEV is disabled. > > It's crossed my mind, but the number of stubs needed made me back off. I'm > certainly not opposed to the idea, it's just not a trivial change. Right, I looked at it when I was doing the SEV-ES work and came to the same conclusion. Thanks, Tom > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-04 23:19 ` [Patch v3 1/2] cgroup: sev: Add " Vipin Sharma 2021-03-11 18:59 ` Michal Koutný @ 2021-03-19 21:28 ` Jacob Pan 2021-03-22 18:54 ` Vipin Sharma 1 sibling, 1 reply; 19+ messages in thread From: Jacob Pan @ 2021-03-19 21:28 UTC (permalink / raw) To: Vipin Sharma Cc: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Jacob Pan Hi Vipin, On Thu, 4 Mar 2021 15:19:45 -0800, Vipin Sharma <vipinsh@google.com> wrote: > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Miscellaneous cgroup controller. > + * > + * Copyright 2020 Google LLC > + * Author: Vipin Sharma <vipinsh@google.com> > + */ > +#ifndef _MISC_CGROUP_H_ > +#define _MISC_CGROUP_H_ > + nit: should you do #include <linux/cgroup.h>? Otherwise, css may be undefined. > +/** > + * Types of misc cgroup entries supported by the host. > + */ Thanks, Jacob ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-19 21:28 ` Jacob Pan @ 2021-03-22 18:54 ` Vipin Sharma 2021-03-24 16:17 ` Jacob Pan 0 siblings, 1 reply; 19+ messages in thread From: Vipin Sharma @ 2021-03-22 18:54 UTC (permalink / raw) To: Jacob Pan Cc: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Jacob Pan On Fri, Mar 19, 2021 at 02:28:01PM -0700, Jacob Pan wrote: > On Thu, 4 Mar 2021 15:19:45 -0800, Vipin Sharma <vipinsh@google.com> wrote: > > +#ifndef _MISC_CGROUP_H_ > > +#define _MISC_CGROUP_H_ > > + > nit: should you do #include <linux/cgroup.h>? > Otherwise, css may be undefined. User of this controller will use get_curernt_misc_cg() API which returns a pointer. Ideally the user should use this pointer and they shouldn't have any need to access "css" in their code. They also don't need to create a object of 'struct misc_cg{}', because that won't be correct misc cgroup object. They should just declare a pointer like we are doing here in 'struct kvm_sev_info {}'. If they do need to use "css" then they can include cgroup header in their code. Let me know if I am overlooking something here. Thanks Vipin Sharma ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-22 18:54 ` Vipin Sharma @ 2021-03-24 16:17 ` Jacob Pan 2021-03-24 22:09 ` Vipin Sharma 0 siblings, 1 reply; 19+ messages in thread From: Jacob Pan @ 2021-03-24 16:17 UTC (permalink / raw) To: Vipin Sharma Cc: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Jacob Pan, jacob.jun.pan Hi Vipin, On Mon, 22 Mar 2021 11:54:39 -0700, Vipin Sharma <vipinsh@google.com> wrote: > On Fri, Mar 19, 2021 at 02:28:01PM -0700, Jacob Pan wrote: > > On Thu, 4 Mar 2021 15:19:45 -0800, Vipin Sharma <vipinsh@google.com> > > wrote: > > > +#ifndef _MISC_CGROUP_H_ > > > +#define _MISC_CGROUP_H_ > > > + > > nit: should you do #include <linux/cgroup.h>? > > Otherwise, css may be undefined. > > User of this controller will use get_curernt_misc_cg() API which returns > a pointer. Ideally the user should use this pointer and they shouldn't > have any need to access "css" in their code. They also don't need to > create a object of 'struct misc_cg{}', because that won't be correct misc > cgroup object. They should just declare a pointer like we are doing here > in 'struct kvm_sev_info {}'. > > If they do need to use "css" then they can include cgroup header in their > code. > I didn't mean the users of misc_cgroup will use css directly. I meant if I want to use misc cgruop in ioasid.c, I have to do the following to avoid undefined css: #include <linux/cgroup.h> #include <linux/misc_cgroup.h> So it might be simpler if you do #include <linux/cgroup.h> inside misc_cgroup.h. Then in ioasid.c, I only need to do #include <linux/misc_cgroup.h>. > Let me know if I am overlooking something here. > > Thanks > Vipin Sharma Thanks, Jacob ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller 2021-03-24 16:17 ` Jacob Pan @ 2021-03-24 22:09 ` Vipin Sharma 0 siblings, 0 replies; 19+ messages in thread From: Vipin Sharma @ 2021-03-24 22:09 UTC (permalink / raw) To: Jacob Pan Cc: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Jacob Pan On Wed, Mar 24, 2021 at 09:17:01AM -0700, Jacob Pan wrote: > I didn't mean the users of misc_cgroup will use css directly. I meant if I > want to use misc cgruop in ioasid.c, I have to do the following to avoid > undefined css: > #include <linux/cgroup.h> > #include <linux/misc_cgroup.h> > > So it might be simpler if you do #include <linux/cgroup.h> inside > misc_cgroup.h. Then in ioasid.c, I only need to do > #include <linux/misc_cgroup.h>. Sorry, I misunderstood the comment first time. I agree with you, I will add cgroup header file in the misc_cgroup.h after #ifdef CONFIG_CGROUP_MISC statement. Thanks Vipin ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Patch v3 2/2] cgroup: sev: Miscellaneous cgroup documentation. 2021-03-04 23:19 [Patch v3 0/2] cgroup: New misc cgroup controller Vipin Sharma 2021-03-04 23:19 ` [Patch v3 1/2] cgroup: sev: Add " Vipin Sharma @ 2021-03-04 23:19 ` Vipin Sharma 2021-03-07 12:48 ` [Patch v3 0/2] cgroup: New misc cgroup controller Tejun Heo 2 siblings, 0 replies; 19+ messages in thread From: Vipin Sharma @ 2021-03-04 23:19 UTC (permalink / raw) To: tj, mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger Cc: corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel, Vipin Sharma Documentation of miscellaneous cgroup controller. This new controller is used to track and limit the usage of scalar resources. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Rientjes <rientjes@google.com> --- Documentation/admin-guide/cgroup-v1/index.rst | 1 + Documentation/admin-guide/cgroup-v1/misc.rst | 4 ++ Documentation/admin-guide/cgroup-v2.rst | 69 ++++++++++++++++++- 3 files changed, 72 insertions(+), 2 deletions(-) create mode 100644 Documentation/admin-guide/cgroup-v1/misc.rst diff --git a/Documentation/admin-guide/cgroup-v1/index.rst b/Documentation/admin-guide/cgroup-v1/index.rst index 226f64473e8e..99fbc8a64ba9 100644 --- a/Documentation/admin-guide/cgroup-v1/index.rst +++ b/Documentation/admin-guide/cgroup-v1/index.rst @@ -17,6 +17,7 @@ Control Groups version 1 hugetlb memcg_test memory + misc net_cls net_prio pids diff --git a/Documentation/admin-guide/cgroup-v1/misc.rst b/Documentation/admin-guide/cgroup-v1/misc.rst new file mode 100644 index 000000000000..661614c24df3 --- /dev/null +++ b/Documentation/admin-guide/cgroup-v1/misc.rst @@ -0,0 +1,4 @@ +=============== +Misc controller +=============== +Please refer "Misc" documentation in Documentation/admin-guide/cgroup-v2.rst diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 1de8695c264b..74777323b7fd 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -63,8 +63,11 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou 5-7-1. RDMA Interface Files 5-8. HugeTLB 5.8-1. HugeTLB Interface Files - 5-8. Misc - 5-8-1. perf_event + 5-9. Misc + 5.9-1 Miscellaneous cgroup Interface Files + 5.9-2 Migration and Ownership + 5-10. Others + 5-10-1. perf_event 5-N. Non-normative information 5-N-1. CPU controller root cgroup process behaviour 5-N-2. IO controller root cgroup process behaviour @@ -2163,6 +2166,68 @@ HugeTLB Interface Files Misc ---- +The Miscellaneous cgroup provides the resource limiting and tracking +mechanism for the scalar resources which cannot be abstracted like the other +cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config +option. + +The first two resources added to the miscellaneous controller are Secure +Encrypted Virtualization (SEV) ASIDs and SEV - Encrypted State (SEV-ES) ASIDs. +These limited ASIDs are used for encrypting virtual machines memory on the AMD +platform. + +Misc Interface Files +~~~~~~~~~~~~~~~~~~~~ + +Miscellaneous controller provides 3 interface files: + + misc.capacity + A read-only flat-keyed file shown only in the root cgroup. It shows + miscellaneous scalar resources available on the platform along with + their quantities:: + + $ cat misc.capacity + sev 50 + sev_es 10 + + misc.current + A read-only flat-keyed file shown in the non-root cgroups. It shows + the current usage of the resources in the cgroup and its children.:: + + $ cat misc.current + sev 3 + sev_es 0 + + misc.max + A read-write flat-keyed file shown in the non root cgroups. Allowed + maximum usage of the resources in the cgroup and its children.:: + + $ cat misc.max + sev max + sev_es 4 + + Limit can be set by:: + + # echo sev 1 > misc.max + + Limit can be set to max by:: + + # echo sev max > misc.max + + Limits can be set higher than the capacity value in the misc.capacity + file. + +Migration and Ownership +~~~~~~~~~~~~~~~~~~~~~~~ + +A miscellaneous scalar resource is charged to the cgroup in which it is used +first, and stays charged to that cgroup until that resource is freed. Migrating +a process to a different cgroup does not move the charge to the destination +cgroup where the process has moved. + +Others +------ + perf_event ~~~~~~~~~~ -- 2.30.1.766.gb4fecdf3b7-goog ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch v3 0/2] cgroup: New misc cgroup controller 2021-03-04 23:19 [Patch v3 0/2] cgroup: New misc cgroup controller Vipin Sharma 2021-03-04 23:19 ` [Patch v3 1/2] cgroup: sev: Add " Vipin Sharma 2021-03-04 23:19 ` [Patch v3 2/2] cgroup: sev: Miscellaneous cgroup documentation Vipin Sharma @ 2021-03-07 12:48 ` Tejun Heo 2021-03-11 18:58 ` Michal Koutný 2 siblings, 1 reply; 19+ messages in thread From: Tejun Heo @ 2021-03-07 12:48 UTC (permalink / raw) To: Vipin Sharma Cc: mkoutny, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel Hello, On Thu, Mar 04, 2021 at 03:19:44PM -0800, Vipin Sharma wrote: > This patch series is creating a new misc cgroup controller for limiting > and tracking of resources which are not abstract like other cgroup > controllers. > > This controller was initially proposed as encryption_id but after > the feedbacks, it is now changed to misc cgroup. > https://lore.kernel.org/lkml/20210108012846.4134815-2-vipinsh@google.com/ Vipin, thank you very much for your persistence and patience. The patchset looks good to me. Michal, as you've been reviewing the series, can you please take another look and ack them if you don't find anything objectionable? -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 0/2] cgroup: New misc cgroup controller 2021-03-07 12:48 ` [Patch v3 0/2] cgroup: New misc cgroup controller Tejun Heo @ 2021-03-11 18:58 ` Michal Koutný 2021-03-11 19:39 ` Tejun Heo 2021-03-12 17:49 ` Vipin Sharma 0 siblings, 2 replies; 19+ messages in thread From: Michal Koutný @ 2021-03-11 18:58 UTC (permalink / raw) To: Tejun Heo Cc: Vipin Sharma, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1488 bytes --] Hello. On Sun, Mar 07, 2021 at 07:48:40AM -0500, Tejun Heo <tj@kernel.org> wrote: > Vipin, thank you very much for your persistence and patience. Yes, and thanks for taking my remarks into account. > Michal, as you've been reviewing the series, can you please take > another look and ack them if you don't find anything objectionable? Honestly, I'm still sitting on the fence whether this needs a new controller and whether the miscontroller (:-p) is a good approach in the long term [1]. I admit, I didn't follow the past dicussions completely, however, (Vipin) could it be in the cover letter/commit messages shortly summarized why cgroups and a controller were chosen to implement restrictions of these resources, what were the alternatives any why were they rejected? In the previous discussion, I saw the reasoning for the list of the resources to be hardwired in the controller itself in order to get some scrutiny of possible changes. That makes sense to me. But with that, is it necessary to commit to the new controller API via EXPORT_SYMBOL? (I don't mean this as a licensing question but what the external API should be (if any).) Besides the generic remarks above, I'd still suggest some slight implementation changes, posted inline to the patch. Thanks, Michal [1] Currently, only one thing comes to my mind -- the delegation via cgroup.subtree_control. The miscontroller may add possibly further resources whose delegation granularity is bunched up under one entry. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 0/2] cgroup: New misc cgroup controller 2021-03-11 18:58 ` Michal Koutný @ 2021-03-11 19:39 ` Tejun Heo 2021-03-12 17:49 ` Vipin Sharma 1 sibling, 0 replies; 19+ messages in thread From: Tejun Heo @ 2021-03-11 19:39 UTC (permalink / raw) To: Michal Koutný Cc: Vipin Sharma, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel Hello, On Thu, Mar 11, 2021 at 07:58:19PM +0100, Michal Koutný wrote: > > Michal, as you've been reviewing the series, can you please take > > another look and ack them if you don't find anything objectionable? > Honestly, I'm still sitting on the fence whether this needs a new > controller and whether the miscontroller (:-p) is a good approach in the > long term [1]. Yeah, it's a bit of cop-out. My take is that the underlying hardware feature isn't mature enough to have reasonable abstraction built on top of them. Given time, maybe future iterations will get there or maybe it's a passing fad and people will mostly forget about these. In the meantime, keeping them out of cgroup is one direction, a relatively high friction one but still viable. Or we can provide something of a halfway house so that people who have immediate needs can still leverage the existing infrastructure while controlling the amount of time, energy and future lock-ins they take. So, that's misc controller. I'm somewhat ambivalent but we've had multiple of these things popping up in the past several years and containment seems to be a reasonable approach at this point. > [1] Currently, only one thing comes to my mind -- the delegation via > cgroup.subtree_control. The miscontroller may add possibly further > resources whose delegation granularity is bunched up under one entry. Controller enabling and delegation in themselves aren't supposed to have resource or security implications, so I don't think it's a practical problem. Thanks. -- tejun ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 0/2] cgroup: New misc cgroup controller 2021-03-11 18:58 ` Michal Koutný 2021-03-11 19:39 ` Tejun Heo @ 2021-03-12 17:49 ` Vipin Sharma 2021-03-15 19:10 ` Michal Koutný 1 sibling, 1 reply; 19+ messages in thread From: Vipin Sharma @ 2021-03-12 17:49 UTC (permalink / raw) To: Michal Koutný Cc: Tejun Heo, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel On Thu, Mar 11, 2021 at 07:58:19PM +0100, Michal Koutný wrote: > I admit, I didn't follow the past dicussions completely, however, > (Vipin) could it be in the cover letter/commit messages shortly > summarized why cgroups and a controller were chosen to implement > restrictions of these resources, what were the alternatives any why were > they rejected? I will add some more information in the cover letter of the next version. Basically, SEV will mostly be used by cloud providers for providing confidential VMs. Since they are limited we need a good way to schedule these jobs in cloud infrastructure. To achieve this we either come up with some ioctl for "/dev/sev" to know about its usage, availability, etc. This requires existing scheduling mechanism in the cloud to have an extension for this interaction. Now same thing needs to be done for TDX. IBM SEID doesn't have scarcity of this resource but they are also interested in tracking and limiting the usage. Each one coming up with their own interaction is a duplicate effort when they all need similar thing. One can say that abstraction should be at KVM level but these resources can be used outside VM as well. Most of the cloud infrastructure use cgroups for knowing the host state, track the resources usage, enforce limits on them, etc. They use this info to optimize work allocation in the fleet and make sure no rogue job consumes more than it needs and starves other. Adding these resources to cgroup is a natural choice with least friction. Cgroup itself says it is a mechanism to distribute system resources along the hierarchy in a controlled mechanism and configurable manner. Most of the resources in cgroups are abstracted enough but their are still resources which are not abstract but have limited availability or have specific use cases. > > In the previous discussion, I saw the reasoning for the list of the > resources to be hardwired in the controller itself in order to get some > scrutiny of possible changes. That makes sense to me. But with that, is > it necessary to commit to the new controller API via EXPORT_SYMBOL? (I > don't mean this as a licensing question but what the external API should > be (if any).) As per my understanding this is the only for way for loadable modules (kvm-amd in this case) to access Kernel APIs. Let me know if there is a better way to do it. > > Besides the generic remarks above, I'd still suggest some slight > implementation changes, posted inline to the patch. I will work on them. I appreciate you guys taking out time and helping me out with this patch series. Thanks Vipin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 0/2] cgroup: New misc cgroup controller 2021-03-12 17:49 ` Vipin Sharma @ 2021-03-15 19:10 ` Michal Koutný 2021-03-22 18:24 ` Vipin Sharma 0 siblings, 1 reply; 19+ messages in thread From: Michal Koutný @ 2021-03-15 19:10 UTC (permalink / raw) To: Vipin Sharma Cc: Tejun Heo, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1283 bytes --] On Fri, Mar 12, 2021 at 09:49:26AM -0800, Vipin Sharma <vipinsh@google.com> wrote: > I will add some more information in the cover letter of the next version. Thanks. > Each one coming up with their own interaction is a duplicate effort > when they all need similar thing. Could this be expressed as a new BPF hook (when allocating/freeing such a resource unit)? The decision could be made based on the configured limit or even some other predicate. (I saw this proposed already but I haven't seen some more reasoning whether it's worse/better. IMO, BPF hooks are "cheaper" than full-blown controllers, though it's still new user API.) > As per my understanding this is the only for way for loadable modules > (kvm-amd in this case) to access Kernel APIs. Let me know if there is a > better way to do it. I understood the symbols are exported for such modularized builds. However, making them non-GPL exposes them to any out-of-tree modules, although, the resource types are supposed to stay hardcoded in the misc controller. So my point was to make them EXPORT_SYMBOL_GPL to mark they're just a means of implementing the modularized builds and not an API. (But they'd remain API for out-of-tree GPL modules anyway, so take this reasoning of mine with a grain of salt.) Michal [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 0/2] cgroup: New misc cgroup controller 2021-03-15 19:10 ` Michal Koutný @ 2021-03-22 18:24 ` Vipin Sharma 0 siblings, 0 replies; 19+ messages in thread From: Vipin Sharma @ 2021-03-22 18:24 UTC (permalink / raw) To: Michal Koutný Cc: Tejun Heo, rdunlap, thomas.lendacky, brijesh.singh, jon.grimm, eric.vantassell, pbonzini, hannes, frankja, borntraeger, corbet, seanjc, vkuznets, wanpengli, jmattson, joro, tglx, mingo, bp, hpa, gingell, rientjes, dionnaglaze, kvm, x86, cgroups, linux-doc, linux-kernel On Mon, Mar 15, 2021 at 08:10:09PM +0100, Michal Koutný wrote: > On Fri, Mar 12, 2021 at 09:49:26AM -0800, Vipin Sharma <vipinsh@google.com> wrote: > > I will add some more information in the cover letter of the next version. > Thanks. > > > Each one coming up with their own interaction is a duplicate effort > > when they all need similar thing. > Could this be expressed as a new BPF hook (when allocating/freeing such > a resource unit)? > > The decision could be made based on the configured limit or even some > other predicate. > > (I saw this proposed already but I haven't seen some more reasoning > whether it's worse/better. IMO, BPF hooks are "cheaper" than full-blown > controllers, though it's still new user API.) I am not much knowledgeable with BPF, so, I might be wrong here. There are couple of things which might not be addressed with BPF: 1. Which controller to use in v1 case? These are not abstract resources so in v1 where each controller have their own hierarchy it might not be easy to identify the best controller. 2. It seems to me that we won't be able to abstract out a single BPF program which can help with all of the resources types we are planning to use, again, because it is not an abstract type like network packets, and there will be different places in the source code to use these resources. To me a cgroup tends to give much easier and well integrated solution when it comes to scheduling and limiting a resource with existing tools in a cloud infrastructure. > > > > As per my understanding this is the only for way for loadable modules > > (kvm-amd in this case) to access Kernel APIs. Let me know if there is a > > better way to do it. > I understood the symbols are exported for such modularized builds. > However, making them non-GPL exposes them to any out-of-tree modules, > although, the resource types are supposed to stay hardcoded in the misc > controller. So my point was to make them EXPORT_SYMBOL_GPL to mark > they're just a means of implementing the modularized builds and not an > API. (But they'd remain API for out-of-tree GPL modules anyway, so take > this reasoning of mine with a grain of salt.) > I see, I will change it to GPL. Thanks Vipin ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-03-24 22:10 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-04 23:19 [Patch v3 0/2] cgroup: New misc cgroup controller Vipin Sharma 2021-03-04 23:19 ` [Patch v3 1/2] cgroup: sev: Add " Vipin Sharma 2021-03-11 18:59 ` Michal Koutný 2021-03-12 19:07 ` Vipin Sharma 2021-03-15 18:34 ` Michal Koutný 2021-03-12 19:48 ` Vipin Sharma 2021-03-12 20:51 ` Sean Christopherson 2021-03-12 21:18 ` Tom Lendacky 2021-03-19 21:28 ` Jacob Pan 2021-03-22 18:54 ` Vipin Sharma 2021-03-24 16:17 ` Jacob Pan 2021-03-24 22:09 ` Vipin Sharma 2021-03-04 23:19 ` [Patch v3 2/2] cgroup: sev: Miscellaneous cgroup documentation Vipin Sharma 2021-03-07 12:48 ` [Patch v3 0/2] cgroup: New misc cgroup controller Tejun Heo 2021-03-11 18:58 ` Michal Koutný 2021-03-11 19:39 ` Tejun Heo 2021-03-12 17:49 ` Vipin Sharma 2021-03-15 19:10 ` Michal Koutný 2021-03-22 18:24 ` Vipin Sharma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).