From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755403AbcFQJHL (ORCPT ); Fri, 17 Jun 2016 05:07:11 -0400 Received: from mail-db5eur01on0109.outbound.protection.outlook.com ([104.47.2.109]:55328 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753942AbcFQJHG (ORCPT ); Fri, 17 Jun 2016 05:07:06 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=VDavydov@virtuozzo.com; Date: Fri, 17 Jun 2016 12:06:55 +0300 From: Vladimir Davydov To: Johannes Weiner CC: Andrew Morton , Tejun Heo , Michal Hocko , Li Zefan , , , , Subject: Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs Message-ID: <20160617090655.GE13143@esperanza> References: <20160616034244.14839-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20160616034244.14839-1-hannes@cmpxchg.org> X-Originating-IP: [195.214.232.10] X-ClientProxiedBy: DB5PR06CA0024.eurprd06.prod.outlook.com (10.162.165.34) To HE1PR08MB0585.eurprd08.prod.outlook.com (10.163.178.139) X-MS-Office365-Filtering-Correlation-Id: 80325437-6513-4f73-4727-08d3968ebf0a X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;2:Ffk0dlHU3TjB/faDYC7ojCbVkaWpaKP5jbDFg15vnySwRY8tluXKDD0QtPQqBU0QywNGtm57aZ22KE6VmqUhB893s6oQplwSHOQfQAyZ7cvpf0T6nl7WlzvDoQwAkUozaYIsDGSLhFPmSDq2WFp55dCvd71l1MsHKd3QXe6h7wIYiqfweAIBTpUiPGmLXoAL;3:l/1ckrzWVkPynxv0mpDjW8TBqVX7thcA1/RXvcUv5olhX4J/gaL8HdLSFAk+5rU1Ll2BWiz7z8Ld35pyvqj6hbz6IO6SaOha+2vKlrDUmD08kN9I35u9zYsqguxnxgIp X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:HE1PR08MB0585; X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;25:pFlA8Qwi+WrcTdTmmGtYIAgNj9rQZN3V0cYFLRGCGooAQ8GJiY/b/YBUivRpqop1xRy8hN2aEwcC2oZwMBN5skeEjSdZwfYuoMQkEMIHtgcUVqAzBIifUxhFK9HcLiq1keFYtj1Hnarbbktonpimjao13yZ5b1iJyoRp2M0sktlKx4Mw+msNrgz+GUQOaUCNHwSn+85CKUu/MMpWPI5UPKGDCMJbOZnGqmqiyOEeB+7rnwMX/MrIm35WwQ4dYqh85eqMRr6mgE6EOHtWxsHUCBrL51QHZgDhD3GjnVYQkDWlXnW1lkqIE99cg3nGthTdRaCQJg3P3Em8Sy49O0vW+D2cLLAW9scGWxjvj20qXH0YGjWHqy69WKCvlLvTVxUJsiSVcQoA0pD98EuaX/fxQK2lOklGrJbcnn3ANzbpLHxGPo3f8KpN4lYGys/LhX01/ws76MwDGLIngBcIeqI/GJ6X+FF8AG5CmRQ4CwKBpN356WJ55ppRyXPIWTY5YA1ZUyfBMJbR+vNoEl8Yv9N/gGDeP1k+Q3S2FjVAfA36u3Gc09SWCqIgltca1DQIuZKrqlVsC3aWcwnPnQ+He3vLeX2Wv4P2zsvelQ3dAHxtG5qXfCZGUkyPbjXMfjHluYi6IirgMxAbbWZdThRi7WOjvyxGRdFwFyH/P053ZeE94vtDxHJvUsbK7RwXn513ijsjp5GCven4SuXxGsPth/ch/GGTeLVTGPpYrXiT4YwoYcY= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(209352067349851); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040130)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6041072)(6043046)(6042046);SRVR:HE1PR08MB0585;BCL:0;PCL:0;RULEID:;SRVR:HE1PR08MB0585; X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;4:eeyvFTGVTpsL0XlIr6lx7jEfSjSN7fJzQYRSH3d5JDO6+X3LECCEcqo6iafG3oOye0jQba/QkgXQrON+jBjZskumRdIsN1dgOe0Vv5Ln4U7ndW11fnz9q2k2PEmP4uhcv+I6gLbSVvZ89qqk5Ws/6CFZx5ctGE5Jfcv8Qd5p10YpRoG6cbxtV5FHVArJ4XBy9H1RKpkelTGO3mviWTW66c4Q+pIUETMgfz6p+ZdT55HQE0bjaauMu5RlWu4YpW3DFgJ4x1wyTZA3D9ROjHOKQ1MqMRC+M9kHG6072DH17qecUjwIbmOEBDJ+t2o/qeY0wXa9ynEf+vBH2jZksKJGYhachWiwCp5ghbV1OX//F4Yi+KYtNG/x65uHqT7F4p/J1qbPraIXbr58+Iucmq6eAJp60a+o6rmYHTJ+3m08SxFAFl8A33agxoM+SMGxB+NUsYcuYQ8dt4xPwEASW7aNem2UJsVT/BxafPlPGgISGeM= X-Forefront-PRVS: 09760A0505 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(24454002)(189002)(199003)(2906002)(4326007)(97756001)(1076002)(3846002)(23726003)(6116002)(2950100001)(42186005)(80792005)(33656002)(5004730100002)(189998001)(92566002)(9686002)(46406003)(5008740100001)(110136002)(586003)(50986999)(77096005)(54356999)(76176999)(101416001)(33716001)(97736004)(66066001)(81156014)(106356001)(50466002)(81166006)(47776003)(8676002)(105586002)(86362001)(68736007);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1PR08MB0585;H:esperanza;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;HE1PR08MB0585;23:+kK9bqV2P2kjB7mIXuQDqWU+XH8hmLg7pZN54gptD?= =?us-ascii?Q?ZwOMEsgGEWJhISoCDvlQMOdgMnm74q7rO6IGYOSKkxdJ7/LfaqfzFbtbDb6t?= =?us-ascii?Q?y1tCde3KI9KR47vnjUbtnErcpf9msx8l9PFPIOHyr1qYwhAx14Gk1tNBjUKI?= =?us-ascii?Q?yzeqwXhClWk43pb8s2KDSG55Z0bxYktII2minbo9N9wAWKbH4ueUK0XrkQT1?= =?us-ascii?Q?jrkhqKlFwvai7USMhigXj8GcJX5cQW4Qq1pItBn9rgZ58ARaYg4aXTHQ5Vag?= =?us-ascii?Q?5wOFHxUGheCNplFWlrfd0Dk1VyJRFxX7nnrqGS4eweWgyCUr55lw7wzJ6Uji?= =?us-ascii?Q?097F3sSvQZmGtH1mV5fwqt0VjKE8mQLysp+CFfjcFZhkOt/YvOsL5Rz/V9wY?= =?us-ascii?Q?oYsVfJR9bVvIL80fzrk6/HWWqKV891FjAz/56qrn5aojt5C0XsUjHuCy+33F?= =?us-ascii?Q?gogAcVaHM93DAo8F/DE4vuo64TSy4RnHhxcq8jOMLnC8DAbO+o/QPdzCqpAH?= =?us-ascii?Q?msrKp6j0Hf9ndKWN3IibOd1zScnQuD8QqWtv9f1vsN5GdklWq8ugkUPiPx2F?= =?us-ascii?Q?+TKTizCTKmZ5GjXzlsmv9Ohc7Sf5mlUFylPyWK4+UuyHWTwwTpR0lWBWLD9B?= =?us-ascii?Q?xsFxJpspnKmQZfxlIkfoVBWVmKJKylmR+pPoDfgp106c1QSLh5bq3+6AFzH4?= =?us-ascii?Q?4B1VCMsdfEO42YQCQuI47S40JdburGHitoke8At2poG5J5mcnMHnmF/bvgTh?= =?us-ascii?Q?E9do5/IMvgs5TDtM8fTsjM8M9P+F3giR/kz4vQuXRSOGR3y4BvZSoAjkzXw2?= =?us-ascii?Q?Bi5d8DpYDu/zkqvlgJCf7Lrg4iFM2tnaOwTWXFW4ysGH86oQKkb+f5d32T64?= =?us-ascii?Q?MYHQ50aboQWLWrUzMgkKvjScM5t8K3Kw9/sljt6nRiElLZTe5PAfrCxKRBbD?= =?us-ascii?Q?9mPZXmokqYPxenNhHTNhJf3F9m6c47f9RAQD747ScITPu16vdd+HWbI9isp/?= =?us-ascii?Q?ZX7+aJY2723qni7P8sKb5iQIGVeLginxG1yzf17EZfGRd2qofeUzUGzlL1EL?= =?us-ascii?Q?ZsCdJ1OQRRIsQtQachCYwiZMSPf?= X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;6:/LDdhpwLBK1ibuJwMBjEF32WV+LP5PkoJJlmST2h6fhGo7Q6xPImGIJD3IYx/eU8n5Dl32zlBp3eB6IP30MNypCot3cxgAMLA7D/I5WIpM3sSuQMsaS/tk4ECWKb4otwPtppOD0BOlyDsJ8WQWiNHWiIu/lniv5bqXBZ4LQgWTfNjqRg08rxEFq6Z+LCyLFJ2TZC2lgkYBq14CiS3ax3dZiCOpCJ1L5E6Qr74BFOq8ns1eYUInBnJ3W6U8qOilu8o7pCvcz2/N1CMTnHVSgGUGtWBWu3y7nJNnjARHwNOhZYPxXTEJty6N+NVRJgBLL5;5:EUVLqzk3Gkr96H4EWj90X8ZkOvNCtoasJDSYUwJbOawFWaq8f/lCv7TmChQ0/dbBZXWxArddqvPQnJUH+RlZI//4Na/hDgXCFgcO4+BDvyFZ9HoyNwMS+mloJpZfP36GiQLjirjomeOyKSxzhs5ugA==;24:IVBmmK4uM2Q1W59fvqpdQ1DfHema3AcM8/v2ZkEwZ91icnCumd+mnruuxWSZ/BxGXGarQw3CQMREHM+ZoA12wOxgCVJR8S14YrE/R+O6tBE=;7:K9ergG/CT/qJJqdPzQBHMIf+jJKknRYHuxxZs+xRE65atL7+eFNuUfEqYXqfIKPcLKMB7S2vyvIsojniHzgVC5htE7nM5h5PWEt6nBcy+BRllyLZbCOm7ix7GmF1nQghBcRiO/CrCnnKvV6QBGhoePoIAcIIUJEwjyehQywyufoqPMI/E1kWPzUXIk0ziYGfw6IQEmiXcT5Hv+Dj8BipYz+1kj/seNUjQaHdFvQdU7EGq49vfUuI6lRbPUvzPUCS SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;HE1PR08MB0585;20:g56MOxVeAREDA68xv9odeoo2rt0ud1cIx+zFl/EAqM69Giphxcf56/m/5/MXAXHtxlVMzVj//B0uIMf0P2iqwooulU8SXxnNLkwQOp/qlk7ndRXr5/Zg+bgKB1XK1MKSon6BxpUXQLPVfCrD+7Jlc/QZh8fQcBFDfTkYH1E7hg0= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Jun 2016 09:07:01.0435 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR08MB0585 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 15, 2016 at 11:42:44PM -0400, Johannes Weiner wrote: > The memory controller has quite a bit of state that usually outlives > the cgroup and pins its CSS until said state disappears. At the same > time it imposes a 16-bit limit on the CSS ID space to economically > store IDs in the wild. Consequently, when we use cgroups to contain > frequent but small and short-lived jobs that leave behind some page > cache, we quickly run into the 64k limitations of outstanding CSSs. > Creating a new cgroup fails with -ENOSPC while there are only a few, > or even no user-visible cgroups in existence. > > Although pinning CSSs past cgroup removal is common, there are only > two instances that actually need a CSS ID after a cgroup is deleted: > cache shadow entries and swapout records. > > Cache shadow entries reference the ID weakly and can deal with the CSS > having disappeared when it's looked up later. They pose no hurdle. > > Swap-out records do need to pin the css to hierarchically attribute > swapins after the cgroup has been deleted; though the only pages that > remain swapped out after a process exits are tmpfs/shmem pages. Those > references are under the user's control and thus manageable. > > This patch introduces a private 16bit memcg ID and switches swap and > cache shadow entries over to using that. It then decouples the CSS > lifetime from the CSS ID lifetime, such that a CSS ID can be recycled > when the CSS is only pinned by common objects that don't need an ID. There's already id which is only used for online memory cgroups - it's kmemcg_id. May be, instead of introducing one more idr, we could name it generically and reuse it for shadow entries? Regarding swap entries, would it really make much difference if we used 4 bytes per swap page instead of 2? For a 100 GB swap it'd increase overhead from 50 MB up to 100 MB, which still doesn't seem too much IMO, so may be just use plain unrestricted css->id for swap entries?