From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755250AbdDQRgp (ORCPT ); Mon, 17 Apr 2017 13:36:45 -0400 Received: from mail-db5eur01on0121.outbound.protection.outlook.com ([104.47.2.121]:16707 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755176AbdDQRgi (ORCPT ); Mon, 17 Apr 2017 13:36:38 -0400 Authentication-Results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=virtuozzo.com; Subject: [PATCH 2/2] pid_ns: Introduce ioctl to set vector of ns_last_pid's on ns hierarhy From: Kirill Tkhai To: , , , , , , , , , , , , , , , , Date: Mon, 17 Apr 2017 20:36:17 +0300 Message-ID: <149245057248.17600.1341652606136269734.stgit@localhost.localdomain> In-Reply-To: <149245014695.17600.12640895883798122726.stgit@localhost.localdomain> References: <149245014695.17600.12640895883798122726.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: HE1PR09CA0083.eurprd09.prod.outlook.com (10.174.50.155) To HE1PR0802MB2284.eurprd08.prod.outlook.com (10.172.127.14) X-MS-Office365-Filtering-Correlation-Id: 2f8f07ac-a46f-4822-5806-08d485b84455 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:HE1PR0802MB2284; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0802MB2284;3:yA+lEqN8iy3GIC0YSF2yyfjYb5Asv1PqbMpnZr1Ija7tgOlHqkquBIpYrg573SHe3K0y7XnfZoFMb/TCHR+shX18yR9qtrTySmDckZvu9EZ0wG82AZBltIMIthfax2nel7ZlRFYcjcIiVFbzByrgXjVA75b6Ea5rPtkkUbS52OXi7Yk+GYGZW17tfU+7WvZwNGtR+4qe5FE982ZD2wgAtx/gFcfFtcrxEq+1NFWySQgEAUOo3kWZptUjpc22fvELL9uJndUXw8JxTqd/VqQhVC1EGZjIRI1i16mZfv8twWk80OqwVscGCvwUvK/SumULqe95u/7Xoe6myd8r5rMEdQ==;25:qbxKjj/m37ryRB7a7mpEdo6GZu9SYAmlZfp/n3QWkRFb3SouxY8J0Ltks8VjmOJePsHYgNnjo/4OzS3jmrytEz/nSQ+t4xSgbW6m6bPNOPVBYmpOO6VqjNPPmB8etn3yXJTWXfFOyG7e6JuUZTlwrlQy9a+IJt7jrdE1fpACVum4SyZQjv7DC9oLxqfxLMxC9QMUwA1oi3h4/cGQUZWVb4+1n/mOFRa+LI0qcnylCMlu8kQUwqZP0YY9ddvTlcTqFpFLuG6eJiz+TKW24xH+/r5LXybtvc8Y8rk220NldfcPkGXKhKRoPm1RZrx2fx18uNnkM/diWuMnpoMk3DBaprTMoZPH8ZWDCkb3ZLdo4Rjcd+D4/pW1B+lZ/dlggXTbPsgOkwz9lX/yz+YZo4FinjK8dPZqZNWI5t7ZcIbCwQg3SM3mXiHzX+96kZdSL63jQzQRb3i33SVSy8NPMn6JPg== X-Microsoft-Exchange-Diagnostics: 1;HE1PR0802MB2284;31:Og3b3aIgytStqWKA3drH7tcMXvwJuT1V0f8aqH3r0juLQQ/ytwKpvlmWp4y1Y/rDyQp+aJbb4eAiyM4Oy4xHjD7zIUzZ1ZhQN4pBYilNaE3m4SY5JOV9FUmC66CACTl1GSgQTDBrrqMr417SquJZCMwfIBHW9aSLmUztAPdMp/mCYzi+P8gFpBwXg9ugn7SgN7iTGyebvS1/umlH6QkpfBHCMtCWysHHs6bNzV+wYYRQIzrgPXnIZie+cBlLqjMf;20:guYRtQ/t5OzhnK3k+/+YDpqEbQBf8eVIkStJ7CSBqPDa2EQ9f9DIDXFYJwVIcn30pyyeACIDRZdF93dcRFSO4pamiBeNMhGMRZF56Z6iVXOUWRth9eLqCS4LYat3cnBnGps1+RYMDZLnpMj5B9w/Fqem+V1saVhfvKzvKNL0sBSyKB7DPTSN8l1yfKzmUWsm9NgppWhgMjzxME2j0TJJX41glt3BivZkkJ+Csi2/D5BrcVbtBNr9jN+gYeAt+0eT+WeqJ56oXmfOXgvJC2sQkQJQC2Fk7QEuDc7s85bszCH3wjS5qRhpbTQvg/w4QwU63qwpqDetwgLYLyAuLqAsZNqY8OUuhH3JGugfCqMo/FcZBS2GDGhvZfe2T9FtfIssWrHqF/VT1ob+OUMIQIfWxNT4TO/oDY9x9CYVnFZlgpc= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6041248)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(20161123560025)(20161123555025)(20161123562025)(6072148);SRVR:HE1PR0802MB2284;BCL:0;PCL:0;RULEID:;SRVR:HE1PR0802MB2284; X-Microsoft-Exchange-Diagnostics: 1;HE1PR0802MB2284;4:z5BNAS3HHgFeXUMmHtt3ltWmEu8v18WRm/SQ/Kg1A4auySNbGbH2GopEz6XMqdXkFD4sGU4NQ8ieO1XudIWbioTdvL2q4ne4lBWrbVM1mGkedhXH4dRyI1fD4OiA0BDQ6qn4HCkCdEG+R/hXb1vAWTkdxySEruaPtJDtEa9au+IyoH/EfavdIKzaDV0nhnUYyRVCN7KFX3TzzTEcChRTPnIIzrxYPeCbvI4WHGaRDCDY14irkDpXyOXYF5/6F78492eK+0sy/sFv8hb6V7Ev+NiGCdHwkQ36FqhOzjPbn2X27gyJLYSeueJFBIDtIRUpwOV/lz2oQHTfysmPE72MMLsXA+ion/RaLDldGD46Sp04gDigQNI0SHOPCU1hgw6K/YdPrMUEjix2CFmvYx9eKYLj3d3SmvXGhVhO1pjDsK9Eqi8oU+MzGQN11tiyI4LQb4ARBcyGgh2Li/mEEp2LDtxZT+4RH+EVOJOeMdRWsEXFnTKOPXk18csR46CuD3Z2w4HMCDL5F5E5GxWggHXORMdvz+qYh0Xyfx0Wf46mRe+PA7fCU/gRCyrN6B7KqI9tAx4owsC0jMtZbgVKoXSK+yepdVmplvoVw47n95EV1Wje2MGEbcjBuGBxAG8TUgWMFyvgNgiiuArC+zoIG+4UJ6vnMt5it7IP76lZrFdZRDQfy0FR+gyMTEvimmLIbopsB48QzjXM2jELKBlsKJ4Y0g== X-Forefront-PRVS: 02801ACE41 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6069001)(6009001)(39400400002)(39450400003)(39830400002)(9686003)(61506002)(50466002)(66066001)(38730400002)(47776003)(53936002)(6666003)(7736002)(5660300001)(305945005)(23676002)(83506001)(2950100002)(7416002)(81166006)(6506006)(54356999)(8676002)(76176999)(6116002)(50986999)(103116003)(230700001)(3846002)(2906002)(42186005)(4001350100001)(2201001)(189998001)(575784001)(86362001)(25786009)(33646002)(921003)(1121003);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR0802MB2284;H:localhost.localdomain;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtIRTFQUjA4MDJNQjIyODQ7MjM6aFNEdlZDUzZRUWY0NmZueHhveXlDOFJI?= =?utf-8?B?aDdqSDB6TndCVmpFdHdadVBqKzUwVzYxZ242LzkyVEFiSk1aT2hoN1ZlcERD?= =?utf-8?B?elNJaFpHc21Va2tjWElhbjZUVklFUlBNZ3l4VXZIQkZzeHVQVi9BZ0Njc0hT?= =?utf-8?B?SkpWa2dDcmFucUhqQXl3YS9STzN3RkorT200Z09POHJJN3g3SjNQVFlIOGI1?= =?utf-8?B?SVdyNm9EbjI5S1ZtWVdNL21jZHpFTEdGaVRVZHNBWXk1M0luUVRGVGE4ZWFr?= =?utf-8?B?ckQ5OXVlWkx4NURwNEJCd0VpSlBCUmxSdTVhTUhMY2o2cDZwcjNLR0ZVcVBs?= =?utf-8?B?VzExMitTSG1WWUpjVHZCeFBMMFN6MGxoK3BWRjJsenptakxGeS9EelBOSlo2?= =?utf-8?B?d0EzT1kyTTVaM2pldWVNbDlCUk1STm5BWUJWU2lNSWNxNHpIVW04NGgwb3BF?= =?utf-8?B?QWNtanhEaGNPZVVCMlJOSFIxVm1KanFGNFJhNWZsK0FsSHQ5MDk3WktaV2M4?= =?utf-8?B?ZjdNbXVnZnJjak9TWVlmNENCUExicFVwVVhESDg0cngxeWpUdFVzUzBxZ0gy?= =?utf-8?B?aFYrbzlaSDR3anRybTRQYlpYZE92MVI0ZWFmQmJiTWZWbG5LVGdEeDRzSHd3?= =?utf-8?B?VzJKQWhPbURFZnh5aDhRRUJ5c3RNK09SMEtIOGllRlhOWUptVk90NElQOVI0?= =?utf-8?B?WGxuakRvdFd3MjdrSFZkUjZJSlFMYjZSOFYyWHQ2WCs5ZVErWEtxM010VWl0?= =?utf-8?B?Qmt2V3dIZDlsUHhROTdXUkZPSVFRWXJQK1RTenZ1OUF5Zk04Yko1Sm5Dc3pp?= =?utf-8?B?VVZFOElVc3dDbVZvdG9iWXhyTlljSXd0SU5XaFZyQUtFV08wd21iRyt1bEpK?= =?utf-8?B?c3I1dHpqVnFMMXBoZ01FSGhzb0dtQWZQYWdGMm9wR2I1L0E4TDNJN2FRdEQ3?= =?utf-8?B?QjdXcElkaFJ4aDZLM2tqbGxQTUJ5Q2RMMVQ5ZXpZNUJ0elArUko4d28zWlZz?= =?utf-8?B?NFVSRytvdC9XWjl0eldDbkM0SkU3SXVFSHQ1VzBoUzdLTVFRb1hsZWVmTW1q?= =?utf-8?B?ZVZlNHZEWjQweHo0Q2RES1RSRm4wZ1NwT0l3NmdNaWpSN09BZEw3ZU5GVk8r?= =?utf-8?B?L1RKOUdCeVlydldmQzN0SG5YZ04wa2JHRTAwWWJabkQwU2djbmM5NnJBcjhW?= =?utf-8?B?eklqU0lObm05cTI2Z050QVZBWDNoSXMvdW9aUHl0QlZGQVJqOFQwTnNUNHdp?= =?utf-8?B?SzZHQXN4YW5iZHBldjY2RmhNWEZWM0xXWVZsMFpMOWRuN3hRN25hNE1LOHA5?= =?utf-8?B?cDVNTjlZUVhzQ2xoVnpFQXM3SDA4aWIrN3hJR1BUSkhML1RDRU9LTnZZUFAw?= =?utf-8?B?NDlabDAwRzkrelRsZ0FKQU9xaFJMbFhzZitFVVNIV0lqUTFLYWtHMkV6UkxT?= =?utf-8?Q?mDkZ72ac=3D?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR0802MB2284;6:xLUe5Wr8WpxzSZHJnzoJE4mmgBl1/eAjWd9X7p6cKR/3yGxMvjJjeREUuw0lsbJCRgXB7O/eI1XnEccKAn6qfMWQ8KmmK1YF00yx9/HMxfCgsT/ejlCJhcb2RlKypXj+k43cGlHW3RMllcR+5cJhWC6KXIur+91nY44w0zUE93PiKhpE1FnXajphILRXw4h+6wpaVQ30xQ7IFdWXM/OK2rVAaqhhpIRCev1ZNdBQ3LnWfVh5bj90OdAQar+mmostOaEZDVjvdVqXjodG8SG1VeOPPr3MqYe1v6CuEugEhjfMXjYnqc0D6lrmY32Y0wYavDCkXnhMDR11X+f5vCbMKnXWKj4GCrTVxPrTHCBdJ1gwftGB0tkOoanPMZzv2w4+sz2sx/8dXDUAxtfL5gHZiV68zNL6qD9lq2bKs0YEJPXEZyHwbwCgsu/7/c0cibHyXmuJZb9LzEtxd/hM+Ftnkg==;5:nAHy3Yf+sdYWChkZMC0lOuwk3Zxw7KNSzL5onnCpRvm88W6XfcobUhAsaKgVZME61lxdK6MNKUMgCZ6rg/zM6PdgiSvRf13RjByI6hBaIIckDVDzsSZKStBec9PQFTwST1q2gtXwlkxAaOSv8PUF0g==;24:Ct5c9l+tega8wlmPKXh5Qkytl0H3Z/fb/OGYmm4WIQymvcfuWJaWdBlcOPRVP8AfG/ezar/o/KIX3jdWHtDDj4iTAFM1uBUBadPlSjgshZA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR0802MB2284;7:hZJuCUsF1RPzbBp+LIE0cukIDUe4mRFDhIi+JajRQz72NMFiYtNQ4Kpa6L8Wi43ObUkJaPg3GMIFVnAeC4np/CyEosdUoEqqaJ2wITmEphqauezbsqDYi8BzWZVmF48zHxB+GY95K8neqoC4T9hsA3faWtdfZ1zjCpoZduFxv3AY3LUkxotJdqyr4U86uwtEiyvt3gZeIJ8kuCHLK984l8u5X7ACX2yzAT4dzO+8DpJAhvpMCO0N/72UVORpzk6sGg5gKkE2C+ikKauCGv/ojFPTtTP8J5u48HJgjQrUWCUXSVHiszRiCICkyuG0FaHv/90MI+6Q3cgGuc0svq1XHw==;20:YspqF9aOvd3xYqpYx3dJl/zGaHZisGcPDskmWDH8/a+4KQKqLPUGkuheXgE9PYk+AeFLhMOF6e9Ltk65DT9F1npSFaUHAEwIEKiENjbns7ZWMDG8mmeXS2lu1Vko6dkxEP8bp9IOWHMYx+g2AK6QYq+vJa4ZMjnxkuhpkxhwAjE= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Apr 2017 17:36:22.3595 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2284 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On implementing of nested pid namespaces support in CRIU (checkpoint-restore in userspace tool) we run into the situation, that it's impossible to create a task with specific NSpid effectively. After commit 49f4d8b93ccf "pidns: Capture the user namespace and filter ns_last_pid" it is impossible to set ns_last_pid on any pid namespace, except task's active pid_ns (before the commit it was possible to write to pid_ns_for_children). Thus, if a restored task in a container has more than one pid_ns levels, the restorer code must have a task helper for every pid namespace of the task's pid_ns hierarhy. This is a big problem, because of communication with a helper for every pid_ns in the hierarchy is not cheap and not performance-good as it implies many helpers wakeups to create a single task (independently, how you communicate with the helpers). This patch tries to decide the problem. It introduces a new pid_ns ns_ioctl(PIDNS_REQ_SET_LAST_PID_VEC), which allows to write a vector of last pids on pid_ns hierarchy. The vector is passed as a ":"-delimited string with pids, written in reverse order. The first number corresponds to the opened namespace ns_last_pid, the second is to its parent, etc. So, if you have the pid namespaces hierarchy like: pid_ns1 (grand father) | v pid_ns2 (father) | v pid_ns3 (child) and the ns of task's of pid_ns3 is open, then the corresponding vector will be "last_ns_pid3:last_ns_pid2:last_ns_pid1". This vector may be short and it may contain less levels, for example, "last_ns_pid3:last_ns_pid2" or even "last_ns_pid3", in dependence of which levels you want to populate. To write in a pid_ns's ns_last_pid we check that the writer task has CAP_SYS_ADMIN permittions in this pid_ns's user_ns. One note about struct pidns_ioc_req. It's made extensible and may expanded in the future. The always existing fields present at the moment, the future fields and they sizes may be determined by pidns_ioc_req::req by the future code. Signed-off-by: Kirill Tkhai --- include/uapi/linux/nsfs.h | 9 +++++ kernel/pid_namespace.c | 88 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 97 insertions(+) diff --git a/include/uapi/linux/nsfs.h b/include/uapi/linux/nsfs.h index 544bbb661475..37bb4af917b5 100644 --- a/include/uapi/linux/nsfs.h +++ b/include/uapi/linux/nsfs.h @@ -17,4 +17,13 @@ /* Execute namespace-specific ioctl */ #define NS_SPECIFIC_IOC _IO(NSIO, 0x5) +struct pidns_ioc_req { +/* Set vector of last pids in namespace hierarchy */ +#define PIDNS_REQ_SET_LAST_PID_VEC 0x1 + unsigned int req; + void __user *data; + unsigned int data_size; + char std_fields[0]; +}; + #endif /* __LINUX_NSFS_H */ diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index de461aa0bf9a..0e86fa15cd92 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -21,6 +21,8 @@ #include #include #include +#include +#include struct pid_cache { int nr_ids; @@ -428,6 +430,91 @@ static struct ns_common *pidns_get_parent(struct ns_common *ns) return &get_pid_ns(pid_ns)->ns; } +#ifdef CONFIG_CHECKPOINT_RESTORE +static long set_last_pid_vec(struct pid_namespace *pid_ns, + struct pidns_ioc_req *req) +{ + char *str, *p; + int ret = 0; + pid_t pid; + + read_lock(&tasklist_lock); + if (!pid_ns->child_reaper) + ret = -EINVAL; + read_unlock(&tasklist_lock); + if (ret) + return ret; + + if (req->data_size >= PAGE_SIZE) + return -EINVAL; + str = vmalloc(req->data_size + 1); + if (!str) + return -ENOMEM; + if (copy_from_user(str, req->data, req->data_size)) { + ret = -EFAULT; + goto out_vfree; + } + str[req->data_size] = '\0'; + + p = str; + while (p && *p != '\0') { + if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)) { + ret = -EPERM; + goto out_vfree; + } + + if (sscanf(p, "%d", &pid) != 1 || pid < 0 || pid > pid_max) { + ret = -EINVAL; + goto out_vfree; + } + + /* Write directly: see the comment in pid_ns_ctl_handler() */ + pid_ns->last_pid = pid; + + p = strchr(p, ':'); + pid_ns = pid_ns->parent; + if (p) { + if (!pid_ns) { + ret = -EINVAL; + goto out_vfree; + } + p++; + } + } + + ret = 0; +out_vfree: + vfree(str); + return ret; +} +#else /* CONFIG_CHECKPOINT_RESTORE */ +static long set_last_pid_vec(struct pid_namespace *pid_ns, + struct pidns_ioc_req *req) +{ + return -ENOTTY; +} +#endif /* CONFIG_CHECKPOINT_RESTORE */ + +static long pidns_ioctl(struct ns_common *ns, unsigned long arg) +{ + struct pid_namespace *pid_ns = to_pid_ns(ns); + struct pidns_ioc_req user_req; + int ret; + + ret = copy_from_user(&user_req, (void *)arg, + offsetof(struct pidns_ioc_req, std_fields)); + if (ret) + return ret; + + switch (user_req.req) { + case PIDNS_REQ_SET_LAST_PID_VEC: + return set_last_pid_vec(pid_ns, &user_req); + default: + return -ENOTTY; + } + return 0; +} + static struct user_namespace *pidns_owner(struct ns_common *ns) { return to_pid_ns(ns)->user_ns; @@ -441,6 +528,7 @@ const struct proc_ns_operations pidns_operations = { .install = pidns_install, .owner = pidns_owner, .get_parent = pidns_get_parent, + .ns_ioctl = pidns_ioctl, }; static __init int pid_namespaces_init(void)