From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 141E4C433E5 for ; Tue, 23 Jun 2020 01:59:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BCA5A2075A for ; Tue, 23 Jun 2020 01:59:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="cleLukYO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BCA5A2075A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=fb.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1A3726B0022; Mon, 22 Jun 2020 21:58:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E77556B0029; Mon, 22 Jun 2020 21:58:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DE9F6B000C; Mon, 22 Jun 2020 21:58:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0237.hostedemail.com [216.40.44.237]) by kanga.kvack.org (Postfix) with ESMTP id E601D6B000A for ; Mon, 22 Jun 2020 21:58:56 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A21C9181AC9CB for ; Tue, 23 Jun 2020 01:58:56 +0000 (UTC) X-FDA: 76958818272.18.comb91_0817cb726e37 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 7AE9F100EDBFF for ; Tue, 23 Jun 2020 01:58:56 +0000 (UTC) X-HE-Tag: comb91_0817cb726e37 X-Filterd-Recvd-Size: 9705 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 23 Jun 2020 01:58:55 +0000 (UTC) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.42/8.16.0.42) with SMTP id 05N1wF7k003354 for ; Mon, 22 Jun 2020 18:58:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=8oQq9mbfN7u+78zKU/22sk8TgDqRjx2rzmqnCfVeR/c=; b=cleLukYOTlXlKChvEY5+qIXVFoORh8mkQoaRyRpAtvh55J16JeVbtR1bT8xEPmfWFyJs EzDIaxmN5aCiOcuBwiLBB0Zijh/JoHJorlGKnOGFKphD10zmJ3w6fDGhPo8E0TJjphwT vjHKANAaFcSVI155wjYY/gGuj5DBI8/6fdE= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net with ESMTP id 31se4nknf7-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 22 Jun 2020 18:58:55 -0700 Received: from intmgw002.41.prn1.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Mon, 22 Jun 2020 18:58:51 -0700 Received: by devvm1291.vll0.facebook.com (Postfix, from userid 111017) id 78DF826DD028; Mon, 22 Jun 2020 18:58:48 -0700 (PDT) Smtp-Origin-Hostprefix: devvm From: Roman Gushchin Smtp-Origin-Hostname: devvm1291.vll0.facebook.com To: Andrew Morton , Christoph Lameter CC: Johannes Weiner , Michal Hocko , Shakeel Butt , , Vlastimil Babka , , , Roman Gushchin Smtp-Origin-Cluster: vll0c01 Subject: [PATCH v7 07/19] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Date: Mon, 22 Jun 2020 18:58:34 -0700 Message-ID: <20200623015846.1141975-8-guro@fb.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200623015846.1141975-1-guro@fb.com> References: <20200623015846.1141975-1-guro@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-22_16:2020-06-22,2020-06-22 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 bulkscore=0 impostorscore=0 malwarescore=0 phishscore=0 mlxlogscore=999 spamscore=0 clxscore=1015 mlxscore=0 lowpriorityscore=0 priorityscore=1501 cotscore=-2147483648 adultscore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006230012 X-FB-Internal: deliver X-Rspamd-Queue-Id: 7AE9F100EDBFF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allocate and release memory to store obj_cgroup pointers for each non-roo= t slab page. Reuse page->mem_cgroup pointer to store a pointer to the allocated space. This commit temporarily increases the memory footprint of the kernel memo= ry accounting. To store obj_cgroup pointers we'll need a place for an objcg_pointer for each allocated object. However, the following patches in the series will enable sharing of slab pages between memory cgroups, which will dramatically increase the total slab utilization. And the fina= l memory footprint will be significantly smaller than before. To distinguish between obj_cgroups and memcg pointers in case when it's not obvious which one is used (as in page_cgroup_ino()), let's always set the lowest bit in the obj_cgroup case. The original obj_cgroups pointer is marked to be ignored by kmemleak, which otherwise would report a memory leak for each allocated vector. Signed-off-by: Roman Gushchin Reviewed-by: Vlastimil Babka Reviewed-by: Shakeel Butt --- include/linux/mm_types.h | 5 +++- include/linux/slab_def.h | 6 +++++ include/linux/slub_def.h | 5 ++++ mm/memcontrol.c | 17 ++++++++++--- mm/slab.h | 52 ++++++++++++++++++++++++++++++++++++++++ 5 files changed, 81 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 64ede5f150dc..0277fbab7c93 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -198,7 +198,10 @@ struct page { atomic_t _refcount; =20 #ifdef CONFIG_MEMCG - struct mem_cgroup *mem_cgroup; + union { + struct mem_cgroup *mem_cgroup; + struct obj_cgroup **obj_cgroups; + }; #endif =20 /* diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index abc7de77b988..ccda7b9669a5 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -114,4 +114,10 @@ static inline unsigned int obj_to_index(const struct= kmem_cache *cache, return reciprocal_divide(offset, cache->reciprocal_buffer_size); } =20 +static inline int objs_per_slab_page(const struct kmem_cache *cache, + const struct page *page) +{ + return cache->num; +} + #endif /* _LINUX_SLAB_DEF_H */ diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 30e91c83d401..f87302dcfe8c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -198,4 +198,9 @@ static inline unsigned int obj_to_index(const struct = kmem_cache *cache, return __obj_to_index(cache, page_address(page), obj); } =20 +static inline int objs_per_slab_page(const struct kmem_cache *cache, + const struct page *page) +{ + return page->objects; +} #endif /* _LINUX_SLUB_DEF_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 407f90f7a2f7..60e3f3ca75ca 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -569,10 +569,21 @@ ino_t page_cgroup_ino(struct page *page) unsigned long ino =3D 0; =20 rcu_read_lock(); - if (PageSlab(page) && !PageTail(page)) + if (PageSlab(page) && !PageTail(page)) { memcg =3D memcg_from_slab_page(page); - else - memcg =3D READ_ONCE(page->mem_cgroup); + } else { + memcg =3D page->mem_cgroup; + + /* + * The lowest bit set means that memcg isn't a valid + * memcg pointer, but a obj_cgroups pointer. + * In this case the page is shared and doesn't belong + * to any specific memory cgroup. + */ + if ((unsigned long) memcg & 0x1UL) + memcg =3D NULL; + } + while (memcg && !(memcg->css.flags & CSS_ONLINE)) memcg =3D parent_mem_cgroup(memcg); if (memcg) diff --git a/mm/slab.h b/mm/slab.h index 1e2d80991904..7d175c2f1a61 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -109,6 +109,7 @@ struct memcg_cache_params { #include #include #include +#include =20 /* * State of the slab allocator. @@ -347,6 +348,18 @@ static inline struct kmem_cache *memcg_root_cache(st= ruct kmem_cache *s) return s->memcg_params.root_cache; } =20 +static inline struct obj_cgroup **page_obj_cgroups(struct page *page) +{ + /* + * page->mem_cgroup and page->obj_cgroups are sharing the same + * space. To distinguish between them in case we don't know for sure + * that the page is a slab page (e.g. page_cgroup_ino()), let's + * always set the lowest bit of obj_cgroups. + */ + return (struct obj_cgroup **) + ((unsigned long)page->obj_cgroups & ~0x1UL); +} + /* * Expects a pointer to a slab page. Please note, that PageSlab() check * isn't sufficient, as it returns true also for tail compound slab page= s, @@ -434,6 +447,28 @@ static __always_inline void memcg_uncharge_slab(stru= ct page *page, int order, percpu_ref_put_many(&s->memcg_params.refcnt, nr_pages); } =20 +static inline int memcg_alloc_page_obj_cgroups(struct page *page, + struct kmem_cache *s, gfp_t gfp) +{ + unsigned int objects =3D objs_per_slab_page(s, page); + void *vec; + + vec =3D kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp, + page_to_nid(page)); + if (!vec) + return -ENOMEM; + + kmemleak_not_leak(vec); + page->obj_cgroups =3D (struct obj_cgroup **) ((unsigned long)vec | 0x1U= L); + return 0; +} + +static inline void memcg_free_page_obj_cgroups(struct page *page) +{ + kfree(page_obj_cgroups(page)); + page->obj_cgroups =3D NULL; +} + extern void slab_init_memcg_params(struct kmem_cache *); extern void memcg_link_cache(struct kmem_cache *s, struct mem_cgroup *me= mcg); =20 @@ -483,6 +518,16 @@ static inline void memcg_uncharge_slab(struct page *= page, int order, { } =20 +static inline int memcg_alloc_page_obj_cgroups(struct page *page, + struct kmem_cache *s, gfp_t gfp) +{ + return 0; +} + +static inline void memcg_free_page_obj_cgroups(struct page *page) +{ +} + static inline void slab_init_memcg_params(struct kmem_cache *s) { } @@ -509,12 +554,18 @@ static __always_inline int charge_slab_page(struct = page *page, gfp_t gfp, int order, struct kmem_cache *s) { + int ret; + if (is_root_cache(s)) { mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s), PAGE_SIZE << order); return 0; } =20 + ret =3D memcg_alloc_page_obj_cgroups(page, s, gfp); + if (ret) + return ret; + return memcg_charge_slab(page, gfp, order, s); } =20 @@ -527,6 +578,7 @@ static __always_inline void uncharge_slab_page(struct= page *page, int order, return; } =20 + memcg_free_page_obj_cgroups(page); memcg_uncharge_slab(page, order, s); } =20 --=20 2.26.2