From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E970C433DF for ; Mon, 3 Aug 2020 09:00:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 45B2C2070A for ; Mon, 3 Aug 2020 09:00:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45B2C2070A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8A9EA8D00F3; Mon, 3 Aug 2020 05:00:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 859F18D00E9; Mon, 3 Aug 2020 05:00:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 799608D00F3; Mon, 3 Aug 2020 05:00:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0072.hostedemail.com [216.40.44.72]) by kanga.kvack.org (Postfix) with ESMTP id 6416D8D00E9 for ; Mon, 3 Aug 2020 05:00:36 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1C9FE53B4 for ; Mon, 3 Aug 2020 09:00:36 +0000 (UTC) X-FDA: 77108661672.09.bear72_511534f26f9c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id E2F99180AD806 for ; Mon, 3 Aug 2020 09:00:35 +0000 (UTC) X-HE-Tag: bear72_511534f26f9c X-Filterd-Recvd-Size: 4070 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Mon, 3 Aug 2020 09:00:35 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4B3CAAD12; Mon, 3 Aug 2020 09:00:49 +0000 (UTC) Date: Mon, 3 Aug 2020 11:00:33 +0200 From: Michal Hocko To: Roman Gushchin Cc: Andrew Morton , Christoph Lameter , Johannes Weiner , Shakeel Butt , linux-mm@kvack.org, Vlastimil Babka , kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v7 05/19] mm: memcontrol: decouple reference counting from page accounting Message-ID: <20200803090033.GE5174@dhcp22.suse.cz> References: <20200623174037.3951353-1-guro@fb.com> <20200623174037.3951353-6-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200623174037.3951353-6-guro@fb.com> X-Rspamd-Queue-Id: E2F99180AD806 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: I am sorry for coming late here. On Tue 23-06-20 10:40:23, Roman Gushchin wrote: > From: Johannes Weiner > > The reference counting of a memcg is currently coupled directly to how > many 4k pages are charged to it. This doesn't work well with Roman's new > slab controller, which maintains pools of objects and doesn't want to keep > an extra balance sheet for the pages backing those objects. > > This unusual refcounting design (reference counts usually track pointers > to an object) is only for historical reasons: memcg used to not take any > css references and simply stalled offlining until all charges had been > reparented and the page counters had dropped to zero. When we got rid of > the reparenting requirement, the simple mechanical translation was to take > a reference for every charge. > > More historical context can be found in commit e8ea14cc6ead ("mm: > memcontrol: take a css reference for each charged page"), commit > 64f219938941 ("mm: memcontrol: remove obsolete kmemcg pinning tricks") and > commit b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined > groups"). > > The new slab controller exposes the limitations in this scheme, so let's > switch it to a more idiomatic reference counting model based on actual > kernel pointers to the memcg: > > - The per-cpu stock holds a reference to the memcg its caching > > - User pages hold a reference for their page->mem_cgroup. Transparent > huge pages will no longer acquire tail references in advance, we'll > get them if needed during the split. > > - Kernel pages hold a reference for their page->mem_cgroup > > - Pages allocated in the root cgroup will acquire and release css > references for simplicity. css_get() and css_put() optimize that. > > - The current memcg_charge_slab() already hacked around the per-charge > references; this change gets rid of that as well. just for completeness - tcp accounting will handle reference in mem_cgroup_sk_{alloc,free} As all those paths are handling the reference count differently it is probably good to remind that in a comment: /* Caller is responsible to hold reference for the existence of the charged object * for try_charge function. We will need to be more careful (e.g. http://lkml.kernel.org/r/alpine.LSU.2.11.2007302011450.2347@eggly.anvils) but considering that the old model doesn't fit with the new slab accounting as mentioned above this is not really something terrible to live with. [...] > @@ -5456,7 +5460,10 @@ static int mem_cgroup_move_account(struct page *page, > */ > smp_mb(); > > - page->mem_cgroup = to; /* caller should have done css_get */ > + css_get(&to->css); > + css_put(&from->css); > + > + page->mem_cgroup = to; > > __unlock_page_memcg(from); What prevents from memcg to be released here? -- Michal Hocko SUSE Labs