From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933368AbcFQQnT (ORCPT ); Fri, 17 Jun 2016 12:43:19 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:60192 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753029AbcFQQnQ (ORCPT ); Fri, 17 Jun 2016 12:43:16 -0400 Date: Fri, 17 Jun 2016 12:40:43 -0400 From: Johannes Weiner To: Vladimir Davydov Cc: Andrew Morton , Tejun Heo , Michal Hocko , Li Zefan , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs Message-ID: <20160617164043.GA10485@cmpxchg.org> References: <20160616034244.14839-1-hannes@cmpxchg.org> <20160617090655.GE13143@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160617090655.GE13143@esperanza> User-Agent: Mutt/1.6.1 (2016-04-27) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 17, 2016 at 12:06:55PM +0300, Vladimir Davydov wrote: > On Wed, Jun 15, 2016 at 11:42:44PM -0400, Johannes Weiner wrote: > > The memory controller has quite a bit of state that usually outlives > > the cgroup and pins its CSS until said state disappears. At the same > > time it imposes a 16-bit limit on the CSS ID space to economically > > store IDs in the wild. Consequently, when we use cgroups to contain > > frequent but small and short-lived jobs that leave behind some page > > cache, we quickly run into the 64k limitations of outstanding CSSs. > > Creating a new cgroup fails with -ENOSPC while there are only a few, > > or even no user-visible cgroups in existence. > > > > Although pinning CSSs past cgroup removal is common, there are only > > two instances that actually need a CSS ID after a cgroup is deleted: > > cache shadow entries and swapout records. > > > > Cache shadow entries reference the ID weakly and can deal with the CSS > > having disappeared when it's looked up later. They pose no hurdle. > > > > Swap-out records do need to pin the css to hierarchically attribute > > swapins after the cgroup has been deleted; though the only pages that > > remain swapped out after a process exits are tmpfs/shmem pages. Those > > references are under the user's control and thus manageable. > > > > This patch introduces a private 16bit memcg ID and switches swap and > > cache shadow entries over to using that. It then decouples the CSS > > lifetime from the CSS ID lifetime, such that a CSS ID can be recycled > > when the CSS is only pinned by common objects that don't need an ID. > > There's already id which is only used for online memory cgroups - it's > kmemcg_id. May be, instead of introducing one more idr, we could name it > generically and reuse it for shadow entries? Good point. But it seems mem_cgroup_idr is more generic, it makes sense to switch slab accounting over to that. I'll look into that, but as a refactoring patch on top of this fix. > Regarding swap entries, would it really make much difference if we used > 4 bytes per swap page instead of 2? For a 100 GB swap it'd increase > overhead from 50 MB up to 100 MB, which still doesn't seem too much IMO, > so may be just use plain unrestricted css->id for swap entries? Yes and no. I agree that the increased consumption wouldn't be too crazy, but if we have to maintain a 16-bit ID anyway, we might as well use it for swap too to save that space. I don't think tmpfs and shmem pins past offlining will be common enough to significantly eat into the ID space of online cgroups.