From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755397AbcBXQtc (ORCPT ); Wed, 24 Feb 2016 11:49:32 -0500 Received: from mail-am1on0062.outbound.protection.outlook.com ([157.56.112.62]:17221 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754418AbcBXQt1 (ORCPT ); Wed, 24 Feb 2016 11:49:27 -0500 Authentication-Results: spf=pass (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=mellanox.com; From: Haggai Eran Subject: Re: [PATCHv6 1/3] rdmacg: Added rdma cgroup controller To: Parav Pandit , , , , , , , , , , , References: <1455966006-13774-1-git-send-email-pandit.parav@gmail.com> <1455966006-13774-2-git-send-email-pandit.parav@gmail.com> CC: , , , , , , , Message-ID: <56CDAC7A.6030206@mellanox.com> Date: Wed, 24 Feb 2016 15:13:30 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <1455966006-13774-2-git-send-email-pandit.parav@gmail.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.0.52.254] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;AM1FFO11FD033;1:CeCp+6Ur+Xl2kTPeiFYzNqViuQPVKwZTyFKqalekcBVquS45tR0JArM/ulibtv/F/row0svRKmuEXrtH1rmMF11mbUEPtJG57yURGbscCi7tQ4jx33zoLtyHl2qHWF3DCBkEojtGryQEp9GRs+dl1I6ie1t1+sjrqAVx7zRMEOiD5YUC3Bhc2pp8GVadAv+e+Wl2qMigyPkKfAmULVG8jjFQLxwabUvgjQIA4wWY546naexffR224M0VSUt5SnPXL5ipvDst1R+sThwMcBETHOfRgOWI8R2wfUdprhIvzTSS7Sxq5CdGDZvVSX8WSFvbQzRx4QKa02n/TMfnJO8Nc7p9kec9UK3IsxPJgXzyD2kGNHrhC5Gd8203JepmxGiP3KDFtxpVoekKx5MFsoRmbh7GlzKk4xN1tadndln0kbqqOzG07luNxAJSRHDMns+BAon16+xBwZafYy8G8667Ug== X-Forefront-Antispam-Report: CIP:193.47.165.134;CTRY:IL;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(24454002)(199003)(479174004)(164054003)(189002)(87936001)(3846002)(50986999)(1096002)(11100500001)(6806005)(65816999)(54356999)(586003)(76176999)(1220700001)(2201001)(36756003)(65806001)(189998001)(5008740100001)(23746002)(5004730100002)(86362001)(47776003)(230700001)(4326007)(59896002)(65956001)(2906002)(6116002)(92566002)(575784001)(50466002)(83506001)(5890100001)(33656002)(2950100001)(4001350100001)(106466001)(80316001)(19580395003)(19580405001)(77096005)(5001770100001)(64126003)(3940600001)(921003)(1121003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB3PR05MB425;H:mtlcas13.mtl.com;FPR:;SPF:Pass;MLV:sfv;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB425;2:eurlbv4AsBeP/zl/QhjHvk3tbVaBLt0gdH8jx1sl5vsZvq7Ltavlos/IhwpTCoRLARe8wue1Pg6nSqrBv8JEdv9bbQWcV0Q6ZwKv/myjI76qdPlqY4o9FtfiDkZJ7oKpHdi5ge/3oPYIBJ4aKv32SQ==;3:hKs7+6Eb7fuq8eyxT25fV8Cn+ksbSMHiWmVMAVduKOPEtHmPbD7xPD8Gvpj61jKY8m5rZDNdeppWjwPvzWq4N36MPZ/W+CT5oW9ftU2ybPEl8k59p6z71l6Id2CPGCIlBNLNp3RQsI5ZI2BS6o77MMOD3Kmg/mZrOaANVyGJEWXgZFeIgz5d89TUgG1Dfo/1zA2vuQZYMBUDlATpMyQIbdaHtWwME5Tne82aAWNVu8TG+LbYmX4B7k+HiPManulGvZU6ZFCEFx/I93gGFmB7gg==;25:/TZ2e5REu+WeHL7dSohjujsvPRhzDNEIoD4224//+d1Ztn7eCOIhBykjhVZLm98Dt2nXVdeqNMar3tBV3jft+WZDoBqKu+UBxuTnoCSzAPHfHzmSdNNddiwFPScQrqTtrw9w1sM0lezFXxEObjWHxQ/pIhCUJAzkqcVzQtPRdGTG1hkvbpEsxXY5NOoGbBIXEAXmtehasMGhPKjRoBcqZuFN2WDArvfGHwwSvMtqHFI0P+ypmKBkakWtd3pjpj2Lv0GWYGhz22W0RkToVmo50Q5/JkE7BhQqzQSKYvLFJ0LBoofoVDxFtRspG3e/FMZVpRZiwoVa74LDLFKHGj/tOA== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501001);SRVR:DB3PR05MB425; X-MS-Office365-Filtering-Correlation-Id: 8912dd9f-e9e6-4fe7-7451-08d33d1c71d8 X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB425;20:5Rnl7hDx1AjCpb7SfWXutO0HSlsrjOjA466H8CPOP4MrCKYiCAYihpueUIbs0kwOHyBDGC4PGsdBY933ESN6zjskNuRAy8/c9V9PzWwvmB+PuBQz3xylxKsqfRaT0wUJKR+nBPmJx2JFyWwVtSFByTFRVwPS5c2DrrW4daGZbzsSXC+RQfYKY1ARJ13E9IiUEYImWIYCn5t+X23fe+cZTOyX5kF4LWv2u7kX+UKSG6mToIx+plfvck6Onz5fYyNY0/3eJQdDMJLTr7hR9asStLvSWD4yhswtnPa7BGXJNXDR/2liXih9bYY5NZxrdFDRGabYLYz3x8v6MMVTem6Eb6lsK8fAMapPZktVcm16uCpf5Dh1cXDA2xe6hRQuSVSf48QE5skunDylDVmsCbgufbUW/IxVLzBiJphCfGyo5iAtUiSE41beqgp+EWAnaTKBHrefcCY22MJDRbyc/RFwa9vyOA5qcLVNr9g+ByNidxYrYkOh/z+zZitbt0esem3H X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(13023025)(13017025)(13015025)(13018025)(5005006)(13024025)(8121501046)(3002001)(10201501046);SRVR:DB3PR05MB425;BCL:0;PCL:0;RULEID:;SRVR:DB3PR05MB425; X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB425;4:c2hJm7TPqqh2Rxfc/Lb0HPZjjiDERJB/jVFhxwECEHVxIkkfDLE5a6OmUq9MknJAn44K7wo8eg+pMcmX9UTVYny2i9hiqF8DgLcedE2FLSx6en8wtzqdlKKSh7E08csKcQTe8ybt7H4ZcooqOO7vCo+IOBHtqVfOK/jgHUy+D+u7udx1QPK3a6jZDWGcbDrLhlqUNwSAb5F6NLKN27dZ9Y0qXgRMlq+TIepShNdqtDACLJ6Ui37CnOVofX02tgYstVyetYDhIU+7yW052MyHqzTccULOyAMXoj2JUh/q18B3DzPR/+mp7gHgoHpSfDeqXiMWHv4L9guEXicqsAlGCfodQB4rJaEF311f9ySK+vlXCG3vEr4rFAZf2ma6fsBfbsuSK02gLoaedOVEVBePOBjD2LsH+cVBGpPIYPLl/sK7DnCVlu7eCSQbPhu16Tsw8wogYIwUivbSbdSgFp/7uQ== X-Forefront-PRVS: 08626BE3A5 X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;DB3PR05MB425;23:YzXo6+IrbyJaGRdzEyQ6hCjwwjplvqHfWCBstD?= =?Windows-1252?Q?5rrGqRB94VtrDG++LOg3l9EMJ8IDm14eG/1/bQTVY8rOhqHLxhqcBJff?= =?Windows-1252?Q?Auwmq2ddVL4ffv6/tK811VBJ+CcSVmf4HW72MUzwj+JfqPPA4672f4ew?= =?Windows-1252?Q?H0AYo8qt8Lrm7QkScglrpBvY5Uuk4CUh36HuVqGr0tpKeKOnMJhYP6HY?= =?Windows-1252?Q?0VVWGyfJB08kke67XpfOO13z6qBed4bgezgV6oFAJa8rww+8ySZcmtK3?= =?Windows-1252?Q?UQ+j3T6/8+TVbfssPwo3mhhzSZ0xF/IW+E/ZJLZvvTh0agD/eqGsouDu?= =?Windows-1252?Q?EqJt9zrYaLoxyBx/kNGkOymzErf5diX/LRQTmPyJVKLmqe0A7DM0Lfj4?= =?Windows-1252?Q?jjaWWpm9ozv4e3G8JoIV9rmT+oh6LLX6TAIPAwFE7B+Tr72N7iPVlEai?= =?Windows-1252?Q?4VA1UzZeOO1l+G257oRPA+jK+cDwbz9J+UTzRb2jviSdD1OamKjnAEK8?= =?Windows-1252?Q?qfUJi+WUsfoVWhckgz/Og9n0DY8Bm2nW8BMykG0byhqsv05xpWOh77cK?= =?Windows-1252?Q?W/tcTCOK9RxdiapjIPnJw7PqeW+eO+fxkapLaW1KlHc+3RkSWM2TcoXm?= =?Windows-1252?Q?46UA7W1V/MgwmD60nwjzjc143FyielsFgeaygh1oBVCjGdWNGKntArHf?= =?Windows-1252?Q?fpLbz/sJBGSZP0vmMKwn4PwSkCdS9UG933+nmTEIrpq35ML4u9DHterF?= =?Windows-1252?Q?zpwZgTjzifI5P04pa26Qpdoa9z/7ADEuBpUD3naKS9IZdyn15El/KxDI?= =?Windows-1252?Q?O54RgBudUnPb0ZJDnmpOJnXD32SJKoVg+kLn1f+7bfFGWJU95R+jcD7f?= =?Windows-1252?Q?x09kKyJYqxumIRavou9F7DxY6KRFVu49qmDeO6RGRVusvYoG8GEzedob?= =?Windows-1252?Q?FhlO4Jg7V4tWEIfUHoYSr9oMlLJphJBs1t0T4z1pT9aW385yB1aJNOyB?= =?Windows-1252?Q?KAquUZquiz2eO2Vi4ZvE1IKZ15Emo6LOxKI5J5c+EZt0poFBXII+0Rat?= =?Windows-1252?Q?fbm/C8QDJGxGXRMB71SHGEAYJDKxnUrdjzmwVdUcB4+VvkRBwknG00UT?= =?Windows-1252?Q?L/o3i3/o/s7KyF/9chVT4XBoJt+yS22AnabFNN3fYWe18Al15HEigLh+?= =?Windows-1252?Q?7aXR7Ee1b5U+nO+7q8XGoWBdkJeRGmgcC3wuh32aKE/zRa6LX8msPvW/?= =?Windows-1252?Q?NSoBtRxl9sw+w3E2FPOhN8fXnvF3n27L+DZzftW/u5Q70Yb5gAPg7egq?= =?Windows-1252?Q?0UJGgM3BkG/84iccnbUUZCbCzVtJwBZ1UGgSDbFzWH+751CV+/vWmFUZ?= =?Windows-1252?Q?W7leX+lfzKf6K2ea5WDMKx1SMWS2qdr/rDACaBDLqe9mqMrbi+2xY=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB3PR05MB425;5:yMddmrjgYNRGK9r35Zgdh23iV0mLSj/GDkpWJmEGcmaH8QhFx88Hr3RNRI1mcHwfky7l7q0xRLxPtfhHJs6P78nbL0RUyid23z9ATSP3f1bFF3eRMQvXjxGRDVW+y0mKNMf+ATzyBipzQPjI03VcIQ==;24:wasck4su/J4L3TTpRpOoiBcBG0g8cq6+HV4asYzBsEnIDjfsoY6V7r1/xNBd484RGsJw5NokN3FXMpVyvJwBKejppa6aSiGBHbeDRjN218s= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Feb 2016 13:14:36.5109 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b;Ip=[193.47.165.134];Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR05MB425 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Overall I the patch looks good to me. I have a few comments below. On 20/02/2016 13:00, Parav Pandit wrote: > Resource pool is created/destroyed dynamically whenever > charging/uncharging occurs respectively and whenever user > configuration is done. Its a tradeoff of memory vs little more code Its -> It's > space that creates resource pool whenever necessary, > instead of creating them during cgroup creation and device registration > time. > > Signed-off-by: Parav Pandit > diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h > new file mode 100644 > index 0000000..b370733 > --- /dev/null > +++ b/include/linux/cgroup_rdma.h > +struct rdmacg_device { > + struct rdmacg_pool_info pool_info; > + struct list_head rdmacg_list; > + struct list_head rpool_head; > + spinlock_t rpool_lock; /* protects rsource pool list */ rsource -> resource > + char *name; > +}; > + > +/* APIs for RDMA/IB stack to publish when a device wants to > + * participate in resource accounting > + */ > +int rdmacg_register_device(struct rdmacg_device *device); > +void rdmacg_unregister_device(struct rdmacg_device *device); > + > +/* APIs for RDMA/IB stack to charge/uncharge pool specific resources */ > +int rdmacg_try_charge(struct rdma_cgroup **rdmacg, > + struct rdmacg_device *device, > + int resource_index, > + int num); > +void rdmacg_uncharge(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int resource_index, > + int num); > +void rdmacg_query_limit(struct rdmacg_device *device, > + int *limits, int max_count); You can drop the max_count parameter, and require the caller to always provide pool_info->table_len items, couldn't you? > + > +#endif /* CONFIG_CGROUP_RDMA */ > +#endif /* _CGROUP_RDMA_H */ > diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h > index 0df0336a..d0e597c 100644 > --- a/include/linux/cgroup_subsys.h > +++ b/include/linux/cgroup_subsys.h > @@ -56,6 +56,10 @@ SUBSYS(hugetlb) > SUBSYS(pids) > #endif > > +#if IS_ENABLED(CONFIG_CGROUP_RDMA) > +SUBSYS(rdma) > +#endif > + > /* > * The following subsystems are not supported on the default hierarchy. > */ > diff --git a/init/Kconfig b/init/Kconfig > index 2232080..1741b65 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1054,6 +1054,16 @@ config CGROUP_PIDS > since the PIDs limit only affects a process's ability to fork, not to > attach to a cgroup. > > +config CGROUP_RDMA > + bool "RDMA controller" > + help > + Provides enforcement of RDMA resources defined by IB stack. > + It is fairly easy for consumers to exhaust RDMA resources, which > + can result into resource unavailibility to other consumers. unavailibility -> unavailability > + RDMA controller is designed to stop this from happening. > + Attaching processes with active RDMA resources to the cgroup > + hierarchy is allowed even if can cross the hierarchy's limit. > + > config CGROUP_FREEZER > bool "Freezer controller" > help > diff --git a/kernel/Makefile b/kernel/Makefile > index baa55e5..501f5df 100644 > +/** > + * uncharge_cg_resource - uncharge resource for rdma cgroup > + * @cg: pointer to cg to uncharge and all parents in hierarchy It only uncharges a single cg, right? > + * @device: pointer to ib device > + * @index: index of the resource to uncharge in cg (resource pool) > + * @num: the number of rdma resource to uncharge > + * > + * It also frees the resource pool in the hierarchy for the resource pool > + * which was created as part of charing operation. charing -> charging > + */ > +static void uncharge_cg_resource(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int index, int num) > +{ > + struct rdmacg_resource_pool *rpool; > + struct rdmacg_pool_info *pool_info = &device->pool_info; > + > + spin_lock(&cg->rpool_list_lock); > + rpool = find_cg_rpool_locked(cg, device); Is it possible for rpool to be NULL? > + > + /* > + * A negative count (or overflow) is invalid, > + * it indicates a bug in the rdma controller. > + */ > + rpool->resources[index].usage -= num; > + > + WARN_ON_ONCE(rpool->resources[index].usage < 0); > + rpool->refcnt--; > + if (rpool->refcnt == 0 && rpool->num_max_cnt == pool_info->table_len) { > + /* > + * No user of the rpool and all entries are set to max, so > + * safe to delete this rpool. > + */ > + list_del(&rpool->cg_list); > + spin_unlock(&cg->rpool_list_lock); > + > + free_cg_rpool(rpool); > + return; > + } > + spin_unlock(&cg->rpool_list_lock); > +} > +/** > + * charge_cg_resource - charge resource for rdma cgroup > + * @cg: pointer to cg to charge > + * @device: pointer to rdmacg device > + * @index: index of the resource to charge in cg (resource pool) > + * @num: the number of rdma resource to charge > + */ > +static int charge_cg_resource(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int index, int num) > +{ > + struct rdmacg_resource_pool *rpool; > + s64 new; > + int ret = 0; > + > +retry: > + spin_lock(&cg->rpool_list_lock); > + rpool = find_cg_rpool_locked(cg, device); > + if (!rpool) { > + spin_unlock(&cg->rpool_list_lock); > + ret = alloc_cg_rpool(cg, device); > + if (ret) > + goto err; > + else > + goto retry; Instead of retrying after allocation of a new rpool, why not just return the newly allocated rpool (or the existing one) from alloc_cg_rpool? > + } > + new = num + rpool->resources[index].usage; > + if (new > rpool->resources[index].max) { > + ret = -EAGAIN; > + } else { > + rpool->refcnt++; > + rpool->resources[index].usage = new; > + } > + spin_unlock(&cg->rpool_list_lock); > +err: > + return ret; > +} > +static ssize_t rdmacg_resource_set_max(struct kernfs_open_file *of, > + char *buf, size_t nbytes, loff_t off) > +{ > + struct rdma_cgroup *cg = css_rdmacg(of_css(of)); > + const char *dev_name; > + struct rdmacg_resource_pool *rpool; > + struct rdmacg_device *device; > + char *options = strstrip(buf); > + struct rdmacg_pool_info *pool_info; > + u64 enables = 0; This limits the number of resources to 64. Sounds fine to me, but I think there should be a check somewhere (maybe in rdmacg_register_device()?) to make sure someone doesn't pass too many resources. > + int *new_limits; > + int i = 0, ret = 0; > + > + /* extract the device name first */ > + dev_name = strsep(&options, " "); > + if (!dev_name) { > + ret = -EINVAL; > + goto err; > + } > + > + /* acquire lock to synchronize with hot plug devices */ > + mutex_lock(&dev_mutex); > + > + device = rdmacg_get_device_locked(dev_name); > + if (!device) { > + ret = -ENODEV; > + goto parse_err; > + } > + > + pool_info = &device->pool_info; > + > + new_limits = kcalloc(pool_info->table_len, sizeof(int), GFP_KERNEL); > + if (!new_limits) { > + ret = -ENOMEM; > + goto parse_err; > + } > + > + ret = rdmacg_parse_limits(options, pool_info, new_limits, &enables); > + if (ret) > + goto opt_err; > + > +retry: > + spin_lock(&cg->rpool_list_lock); > + rpool = find_cg_rpool_locked(cg, device); > + if (!rpool) { > + spin_unlock(&cg->rpool_list_lock); > + ret = alloc_cg_rpool(cg, device); > + if (ret) > + goto opt_err; > + else > + goto retry; You can avoid the retry here too. Perhaps this can go into a function. > + } > + > + /* now set the new limits of the rpool */ > + while (enables) { > + /* if user set the limit, enables bit is set */ > + if (enables & BIT(i)) { > + enables &= ~BIT(i); > + set_resource_limit(rpool, i, new_limits[i]); > + } > + i++; > + } > + if (rpool->refcnt == 0 && > + rpool->num_max_cnt == pool_info->table_len) { > + /* > + * No user of the rpool and all entries are > + * set to max, so safe to delete this rpool. > + */ > + list_del(&rpool->cg_list); > + spin_unlock(&cg->rpool_list_lock); > + free_cg_rpool(rpool); > + } else { > + spin_unlock(&cg->rpool_list_lock); > + } You should consider putting this piece of code in a function (the check of the reference counts and release of the rpool). > + > +opt_err: > + kfree(new_limits); > +parse_err: > + mutex_unlock(&dev_mutex); > +err: > + return ret ?: nbytes; > +} > + > + > +static int print_rpool_values(struct seq_file *sf, This can return void. > + struct rdmacg_pool_info *pool_info, > + u32 *value_tbl) > +{ > + int i; > + > + for (i = 0; i < pool_info->table_len; i++) { > + seq_puts(sf, pool_info->resource_name_table[i]); > + seq_putc(sf, '='); > + if (value_tbl[i] == S32_MAX) > + seq_puts(sf, RDMACG_MAX_STR); > + else > + seq_printf(sf, "%d", value_tbl[i]); > + seq_putc(sf, ' '); > + } > + return 0; > +} > + Thanks, Haggai