Hi Vipin. On Thu, Mar 04, 2021 at 03:19:45PM -0800, Vipin Sharma wrote: > arch/x86/kvm/svm/sev.c | 65 +++++- > arch/x86/kvm/svm/svm.h | 1 + > include/linux/cgroup_subsys.h | 4 + > include/linux/misc_cgroup.h | 130 +++++++++++ > init/Kconfig | 14 ++ > kernel/cgroup/Makefile | 1 + > kernel/cgroup/misc.c | 402 ++++++++++++++++++++++++++++++++++ Given different two-fold nature (SEV caller vs misc controller) of some remarks below, I think it makes sense to split this into two patches: a) generic controller implementation, b) hooking the controller into SEV ASIDs management. > +#ifndef CONFIG_KVM_AMD_SEV > +/* > + * When this config is not defined, SEV feature is not supported and APIs in > + * this file are not used but this file still gets compiled into the KVM AMD > + * module. > + * > + * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum > + * misc_res_type {} defined in linux/misc_cgroup.h. BTW, was there any progress on conditioning sev.c build on CONFIG_KVM_AMD_SEV? (So that the defines workaround isn't needeed.) > static int sev_asid_new(struct kvm_sev_info *sev) > { > - int pos, min_asid, max_asid; > + int pos, min_asid, max_asid, ret; > bool retry = true; > + enum misc_res_type type; > + > + type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV; > + sev->misc_cg = get_current_misc_cg(); > + ret = misc_cg_try_charge(type, sev->misc_cg, 1); It may be safer to WARN_ON(sev->misc_cg) at this point (see below). > [...] > +e_uncharge: > + misc_cg_uncharge(type, sev->misc_cg, 1); > + put_misc_cg(sev->misc_cg); > + return ret; vvv > @@ -140,6 +171,10 @@ static void sev_asid_free(int asid) > } > > mutex_unlock(&sev_bitmap_lock); > + > + type = sev->es_active ? MISC_CG_RES_SEV_ES : MISC_CG_RES_SEV; > + misc_cg_uncharge(type, sev->misc_cg, 1); > + put_misc_cg(sev->misc_cg); It may be safer to set sev->misc_cg to NULL here. (IIUC, with current asid_{new,free} calls it shouldn't matter but why to rely on it in the future.) > +++ b/kernel/cgroup/misc.c > [...] > +static void misc_cg_reduce_charge(enum misc_res_type type, struct misc_cg *cg, > + unsigned long amount) misc_cg_cancel_charge seems to be a name more consistent with what we already have in pids and memory controller. > +static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, > + size_t nbytes, loff_t off) > +{ > [...] > + > + if (!strcmp(MAX_STR, buf)) { > + max = ULONG_MAX; MAX_NUM for consistency with other places. > + } else { > + ret = kstrtoul(buf, 0, &max); > + if (ret) > + return ret; > + } > + > + cg = css_misc(of_css(of)); > + > + if (misc_res_capacity[type]) > + cg->res[type].max = max; In theory, parallel writers can clash here, so having the limit atomic type to prevent this would resolve it. See also commit a713af394cf3 ("cgroup: pids: use atomic64_t for pids->limit"). > +static int misc_cg_current_show(struct seq_file *sf, void *v) > +{ > + int i; > + struct misc_cg *cg = css_misc(seq_css(sf)); > + > + for (i = 0; i < MISC_CG_RES_TYPES; i++) { > + if (misc_res_capacity[i]) Since there can be some residual charges after removing capacity (before draining), maybe the condition along the line if (misc_res_capacity[i] || atomic_long_read(&cg->res[i].usage)) would be more informative for debugging. > +static int misc_cg_capacity_show(struct seq_file *sf, void *v) > +{ > + int i; > + unsigned long cap; > + > + for (i = 0; i < MISC_CG_RES_TYPES; i++) { > + cap = READ_ONCE(misc_res_capacity[i]); Why is READ_ONCE only here and not in other places that (actually) check against the set capacity value? Also, there should be a paired WRITE_ONCCE in misc_cg_set_capacity(). Thanks, Michal