From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DCFDC43381 for ; Thu, 11 Mar 2021 06:51:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA84864FDA for ; Thu, 11 Mar 2021 06:51:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231394AbhCKGvE (ORCPT ); Thu, 11 Mar 2021 01:51:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231129AbhCKGuu (ORCPT ); Thu, 11 Mar 2021 01:50:50 -0500 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9371CC061574 for ; Wed, 10 Mar 2021 22:50:50 -0800 (PST) Received: by mail-pg1-x535.google.com with SMTP id o10so13067755pgg.4 for ; Wed, 10 Mar 2021 22:50:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gy8z+iDVmbujI7kny+ZI/Ozlau0l7pUtmfyXuQSj6Hk=; b=1E4+i/zMPc+3v7kSJcsh3DUr+LeT/2BaxlDvP/jRXyNu14oc/A79BTgQ8XQ0YmaKjP hTrZlSsOMe2SyW5xQ05PsD8g+CQgyoSfJhzCvSYFlUiEYjJwsRoif67G/mRYDnXax1QK pMAVCeakgTsGe/gR5i1H5sJIowFKncKEr8NNiDSB/KqtFz3QsRLB3ukZnNtMX1fw7+Pa VSx8KEPyXV06f6U3EHyToWof0IJpqLCFHT2feQ3QT7Xx+doHeArel4q+IbOUv+p2Wl08 VL/pYPylqPAcEBb2f0DjijoU0uNC1HkfvMFBX3t2TzAwkfgfxf0LEJisQP6590iqnfGl V6Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gy8z+iDVmbujI7kny+ZI/Ozlau0l7pUtmfyXuQSj6Hk=; b=e9gjrX7eLShcANKhSG51LzCh6UgLHKgQV4XGuXbJ87De2syD5JFdfHXUPLmsNZVvnt 1DF5QX42N2wtSlI70VeMX917qnof/dR+LqGAcOVL08+Cq17H/P/owbrwUCG7HDBrM92U 0Jr7rQdRrS2zxjnFHEZN4q3H8nCdtavTEd17opWZ/iL4GgcSfdpUit9nqNSyDDNUJRsR xMDEBTodpH8lJ4l+TZ10U2aEQPc08ngsEk6BV48ZaMZfM+Bn196Pq0+6DD6PwnaWUujU YX67LYEW+ai11GHijPvyTWChx+WQh0HnP1Z243wnBp9536ZBocdWMHHfNl0rDob3oVCs q/FQ== X-Gm-Message-State: AOAM531UWwy4K+8KWjmaJygXgnC4aqTl17f3GiZukmQtCKBR67cPXdqj L+fzmEdZ9xgnzJvzHsuZvm3Bn6oYFQ9nTC0qmyH2Bw== X-Google-Smtp-Source: ABdhPJwOPsLr32TFsxyUO77tkAG/MDUIVAH4TSbihapuytRTDI5Hxc0VTOnkqrfd5vqbuQOdfX9ht0d1/3vOjK5KmXc= X-Received: by 2002:a63:161c:: with SMTP id w28mr6138181pgl.341.1615445448808; Wed, 10 Mar 2021 22:50:48 -0800 (PST) MIME-Version: 1.0 References: <20210309100717.253-1-songmuchun@bytedance.com> <20210309100717.253-4-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Thu, 11 Mar 2021 14:50:11 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Andrew Morton , Shakeel Butt , Vladimir Davydov , LKML , Linux Memory Management List , Xiongchun duan Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 11, 2021 at 3:53 AM Roman Gushchin wrote: > > On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote: > > Since Roman series "The new cgroup slab memory controller" applied. All > > slab objects are charged via the new APIs of obj_cgroup. The new APIs > > introduce a struct obj_cgroup to charge slab objects. It prevents > > long-living objects from pinning the original memory cgroup in the memory. > > But there are still some corner objects (e.g. allocations larger than > > order-1 page on SLUB) which are not charged via the new APIs. Those > > objects (include the pages which are allocated from buddy allocator > > directly) are charged as kmem pages which still hold a reference to > > the memory cgroup. > > > > This patch aims to charge the kmem pages by using the new APIs of > > obj_cgroup. Finally, the page->memcg_data of the kmem page points to > > an object cgroup. We can use the page_objcg() to get the object > > cgroup associated with a kmem page. Or we can use page_memcg_check() > > to get the memory cgroup associated with a kmem page, but caller must > > ensure that the returned memcg won't be released (e.g. acquire the > > rcu_read_lock or css_set_lock). > > > > Signed-off-by: Muchun Song > > --- > > include/linux/memcontrol.h | 63 ++++++++++++++++++------ > > mm/memcontrol.c | 119 ++++++++++++++++++++++++++++++--------------- > > 2 files changed, 128 insertions(+), 54 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83cbcdcfcc92..07c449af9c0f 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page *page) > > } > > > > /* > > + * After the initialization objcg->memcg is always pointing at > > + * a valid memcg, but can be atomically swapped to the parent memcg. > > + * > > + * The caller must ensure that the returned memcg won't be released: > > + * e.g. acquire the rcu_read_lock or css_set_lock. > > + */ > > +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > > +{ > > + return READ_ONCE(objcg->memcg); > > +} > > + > > +/* > > * page_memcg - get the memory cgroup associated with a non-kmem page > > * @page: a pointer to the page struct > > * > > @@ -422,15 +434,19 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) > > * @page: a pointer to the page struct > > * > > * Returns a pointer to the memory cgroup associated with the page, > > - * or NULL. This function unlike page_memcg() can take any page > > + * or NULL. This function unlike page_memcg() can take any page > > * as an argument. It has to be used in cases when it's not known if a page > > - * has an associated memory cgroup pointer or an object cgroups vector. > > + * has an associated memory cgroup pointer or an object cgroups vector or > > + * an object cgroup. > > * > > * Any of the following ensures page and memcg binding stability: > > * - the page lock > > * - LRU isolation > > * - lock_page_memcg() > > * - exclusive reference > > + * > > + * Should be called under rcu lock which can protect memcg associated with a > > + * kmem page from being released. > > How about this: > > For a non-kmem page any of the following ensures page and memcg binding stability: > - the page lock > - LRU isolation > - lock_page_memcg() > - exclusive reference > > For a kmem page a caller should hold an rcu read lock to protect memcg associated > with a kmem page from being released. OK. I will use this. Thanks Roman. > > > */ > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > @@ -443,6 +459,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page) > > if (memcg_data & MEMCG_DATA_OBJCGS) > > return NULL; > > > > + if (memcg_data & MEMCG_DATA_KMEM) { > > + struct obj_cgroup *objcg; > > + > > + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > + return obj_cgroup_memcg(objcg); > > + } > > + > > return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > @@ -501,6 +524,25 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) > > return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > +/* > > + * page_objcg - get the object cgroup associated with a kmem page > > + * @page: a pointer to the page struct > > + * > > + * Returns a pointer to the object cgroup associated with the kmem page, > > + * or NULL. This function assumes that the page is known to have an > > + * associated object cgroup. It's only safe to call this function > > + * against kmem pages (PageMemcgKmem() returns true). > > + */ > > +static inline struct obj_cgroup *page_objcg(struct page *page) > > +{ > > + unsigned long memcg_data = page->memcg_data; > > + > > + VM_BUG_ON_PAGE(PageSlab(page), page); > > + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); > > + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); > > + > > + return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > +} > > #else > > static inline struct obj_cgroup **page_objcgs(struct page *page) > > { > > @@ -511,6 +553,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) > > { > > return NULL; > > } > > + > > +static inline struct obj_cgroup *page_objcg(struct page *page) > > +{ > > + return NULL; > > +} > > #endif > > > > static __always_inline bool memcg_stat_item_in_bytes(int idx) > > @@ -729,18 +776,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg) > > percpu_ref_put(&objcg->refcnt); > > } > > > > -/* > > - * After the initialization objcg->memcg is always pointing at > > - * a valid memcg, but can be atomically swapped to the parent memcg. > > - * > > - * The caller must ensure that the returned memcg won't be released: > > - * e.g. acquire the rcu_read_lock or css_set_lock. > > - */ > > -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > > -{ > > - return READ_ONCE(objcg->memcg); > > -} > > - > > static inline void mem_cgroup_put(struct mem_cgroup *memcg) > > { > > if (memcg) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index e1dc73ceb98a..38376f9d6659 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -859,15 +859,26 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, > > pg_data_t *pgdat = page_pgdat(page); > > struct lruvec *lruvec; > > > > - memcg = page_memcg_check(head); > > - /* Untracked pages have no memcg, no lruvec. Update only the node */ > > - if (!memcg) { > > - __mod_node_page_state(pgdat, idx, val); > > - return; > > + if (PageMemcgKmem(head)) { > > + rcu_read_lock(); > > + memcg = obj_cgroup_memcg(page_objcg(page)); > > + } else { > > + memcg = page_memcg(head); > > + /* > > + * Untracked pages have no memcg, no lruvec. Update only the > > + * node. > > + */ > > + if (!memcg) { > > + __mod_node_page_state(pgdat, idx, val); > > + return; > > + } > > } > > > > lruvec = mem_cgroup_lruvec(memcg, pgdat); > > __mod_lruvec_state(lruvec, idx, val); > > + > > + if (PageMemcgKmem(head)) > > + rcu_read_unlock(); > > } > > EXPORT_SYMBOL(__mod_lruvec_page_state); > > > > @@ -2906,6 +2917,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg) > > page->memcg_data = (unsigned long)memcg; > > } > > > > +static inline struct mem_cgroup *obj_cgroup_memcg_get(struct obj_cgroup *objcg) > > I'd prefer get_obj_cgroup_memcg(), if you don't mind. LGTM, will do. > > > +{ > > + struct mem_cgroup *memcg; > > + > > + rcu_read_lock(); > > +retry: > > + memcg = obj_cgroup_memcg(objcg); > > + if (unlikely(!css_tryget(&memcg->css))) > > + goto retry; > > + rcu_read_unlock(); > > + > > + return memcg; > > +} > > + > > #ifdef CONFIG_MEMCG_KMEM > > int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, > > gfp_t gfp, bool new_page) > > @@ -3071,15 +3096,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, > > struct mem_cgroup *memcg; > > int ret; > > > > - rcu_read_lock(); > > -retry: > > - memcg = obj_cgroup_memcg(objcg); > > - if (unlikely(!css_tryget(&memcg->css))) > > - goto retry; > > - rcu_read_unlock(); > > - > > + memcg = obj_cgroup_memcg_get(objcg); > > ret = __memcg_kmem_charge(memcg, gfp, nr_pages); > > - > > css_put(&memcg->css); > > > > return ret; > > @@ -3144,18 +3162,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page > > */ > > int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) > > { > > - struct mem_cgroup *memcg; > > + struct obj_cgroup *objcg; > > int ret = 0; > > > > - memcg = get_mem_cgroup_from_current(); > > - if (memcg && !mem_cgroup_is_root(memcg)) { > > - ret = __memcg_kmem_charge(memcg, gfp, 1 << order); > > + objcg = get_obj_cgroup_from_current(); > > + if (objcg) { > > + ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order); > > if (!ret) { > > - page->memcg_data = (unsigned long)memcg | > > + page->memcg_data = (unsigned long)objcg | > > MEMCG_DATA_KMEM; > > return 0; > > } > > - css_put(&memcg->css); > > + obj_cgroup_put(objcg); > > } > > return ret; > > } > > @@ -3167,17 +3185,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) > > */ > > void __memcg_kmem_uncharge_page(struct page *page, int order) > > { > > - struct mem_cgroup *memcg; > > + struct obj_cgroup *objcg; > > unsigned int nr_pages = 1 << order; > > > > if (!page_memcg_charged(page)) > > return; > > > > - memcg = page_memcg_check(page); > > - VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); > > - __memcg_kmem_uncharge(memcg, nr_pages); > > + objcg = page_objcg(page); > > + obj_cgroup_uncharge_pages(objcg, nr_pages); > > page->memcg_data = 0; > > - css_put(&memcg->css); > > + obj_cgroup_put(objcg); > > } > > > > static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) > > @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug) > > static void uncharge_batch(const struct uncharge_gather *ug) > > { > > unsigned long flags; > > + unsigned long nr_pages; > > > > - if (!mem_cgroup_is_root(ug->memcg)) { > > - page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); > > + /* > > + * The kmem pages can be reparented to the root memcg, in > > + * order to prevent the memory counter of root memcg from > > + * increasing indefinitely. We should decrease the memory > > + * counter when unchange. > > I guess the correct syntax is > "The kmem pages can be reparented to the root memcg. In > order to prevent the memory counter of root memcg from > increasing indefinitely, we should decrease the memory > counter when unchange." Right. I will combine your and Johannes suggestions about how to rework the code here. > > > + */ > > + if (mem_cgroup_is_root(ug->memcg)) > > + nr_pages = ug->nr_kmem; > > + else > > + nr_pages = ug->nr_pages; > > + > > + if (nr_pages) { > > + page_counter_uncharge(&ug->memcg->memory, nr_pages); > > if (do_memsw_account()) > > - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); > > + page_counter_uncharge(&ug->memcg->memsw, nr_pages); > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) > > page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); > > memcg_oom_recover(ug->memcg); > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) > > > > static void uncharge_page(struct page *page, struct uncharge_gather *ug) > > { > > - unsigned long nr_pages; > > + unsigned long nr_pages, nr_kmem; > > struct mem_cgroup *memcg; > > > > VM_BUG_ON_PAGE(PageLRU(page), page); > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) > > if (!page_memcg_charged(page)) > > return; > > > > + nr_pages = compound_nr(page); > > /* > > * Nobody should be changing or seriously looking at > > - * page memcg at this point, we have fully exclusive > > - * access to the page. > > + * page memcg or objcg at this point, we have fully > > + * exclusive access to the page. > > */ > > - memcg = page_memcg_check(page); > > + if (PageMemcgKmem(page)) { > > + struct obj_cgroup *objcg; > > + > > + objcg = page_objcg(page); > > + memcg = obj_cgroup_memcg_get(objcg); > > + > > + page->memcg_data = 0; > > + obj_cgroup_put(objcg); > > + nr_kmem = nr_pages; > > + } else { > > + memcg = page_memcg(page); > > + page->memcg_data = 0; > > + nr_kmem = 0; > > + } > > + > > if (ug->memcg != memcg) { > > if (ug->memcg) { > > uncharge_batch(ug); > > uncharge_gather_clear(ug); > > } > > ug->memcg = memcg; > > + ug->dummy_page = page; > > > > /* pairs with css_put in uncharge_batch */ > > css_get(&ug->memcg->css); > > } > > > > - nr_pages = compound_nr(page); > > ug->nr_pages += nr_pages; > > + ug->nr_kmem += nr_kmem; > > + ug->pgpgout += !nr_kmem; > > > > - if (PageMemcgKmem(page)) > > - ug->nr_kmem += nr_pages; > > - else > > - ug->pgpgout++; > > - > > - ug->dummy_page = page; > > - page->memcg_data = 0; > > - css_put(&ug->memcg->css); > > + css_put(&memcg->css); > > } > > > > /** > > -- > > 2.11.0 > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5425AC433E6 for ; Thu, 11 Mar 2021 06:50:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9CF0C64F4D for ; Thu, 11 Mar 2021 06:50:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CF0C64F4D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1ACBC8D0276; Thu, 11 Mar 2021 01:50:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15D918D0250; Thu, 11 Mar 2021 01:50:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF21D8D0276; Thu, 11 Mar 2021 01:50:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id CB7468D0250 for ; Thu, 11 Mar 2021 01:50:50 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8AB3C8249980 for ; Thu, 11 Mar 2021 06:50:50 +0000 (UTC) X-FDA: 77906670660.26.A98842E Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) by imf12.hostedemail.com (Postfix) with ESMTP id 47C8812E for ; Thu, 11 Mar 2021 06:50:46 +0000 (UTC) Received: by mail-pf1-f169.google.com with SMTP id s21so2927032pfm.1 for ; Wed, 10 Mar 2021 22:50:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gy8z+iDVmbujI7kny+ZI/Ozlau0l7pUtmfyXuQSj6Hk=; b=1E4+i/zMPc+3v7kSJcsh3DUr+LeT/2BaxlDvP/jRXyNu14oc/A79BTgQ8XQ0YmaKjP hTrZlSsOMe2SyW5xQ05PsD8g+CQgyoSfJhzCvSYFlUiEYjJwsRoif67G/mRYDnXax1QK pMAVCeakgTsGe/gR5i1H5sJIowFKncKEr8NNiDSB/KqtFz3QsRLB3ukZnNtMX1fw7+Pa VSx8KEPyXV06f6U3EHyToWof0IJpqLCFHT2feQ3QT7Xx+doHeArel4q+IbOUv+p2Wl08 VL/pYPylqPAcEBb2f0DjijoU0uNC1HkfvMFBX3t2TzAwkfgfxf0LEJisQP6590iqnfGl V6Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gy8z+iDVmbujI7kny+ZI/Ozlau0l7pUtmfyXuQSj6Hk=; b=MAf8FfRkKn3I6lo8PdKHk1IGIaL6BH/tZTg1NjhqhleYokX7b7a5hStiMrAsqTUuSl CyE/UcGSSdGZLjhhLCJHeE7JoNmB5T25hBfzZe20f5MyhsJl6TvZ+O26RdCI1PC3S3zW Fc2/yj/Kahp2WR4HpaKgf9dBLCK0QKmHT4Jj8Wh296pjo3FTX94RWYvGnEJxDaPqaRX0 o1wOp6WgLPTY9M7bn5G2aaP1z6KBDhAWYcU7MLEM6OMYkwcqVAi3xjSNHiilUmjF7/8Q fh1JwxEgPA7/737nFg42DUbc3C7SbAJueVEj/DanpV22fv6lR4T28LzdLvZWNIye3HR2 U/sQ== X-Gm-Message-State: AOAM530Y6xcB8zPbsTuqwGiJGWK9om3xlffiWSux/r4dMs7Rqv8+pxc0 0Z6gMjelxk2jx3njcXU0XtBeTkWWaZClPLA9teSaGA== X-Google-Smtp-Source: ABdhPJwOPsLr32TFsxyUO77tkAG/MDUIVAH4TSbihapuytRTDI5Hxc0VTOnkqrfd5vqbuQOdfX9ht0d1/3vOjK5KmXc= X-Received: by 2002:a63:161c:: with SMTP id w28mr6138181pgl.341.1615445448808; Wed, 10 Mar 2021 22:50:48 -0800 (PST) MIME-Version: 1.0 References: <20210309100717.253-1-songmuchun@bytedance.com> <20210309100717.253-4-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Thu, 11 Mar 2021 14:50:11 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Andrew Morton , Shakeel Butt , Vladimir Davydov , LKML , Linux Memory Management List , Xiongchun duan Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: oo5ea5hup8hz5aot8kgqhncno1sggkmy X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 47C8812E Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from=""; helo=mail-pf1-f169.google.com; client-ip=209.85.210.169 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615445446-622568 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 11, 2021 at 3:53 AM Roman Gushchin wrote: > > On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote: > > Since Roman series "The new cgroup slab memory controller" applied. All > > slab objects are charged via the new APIs of obj_cgroup. The new APIs > > introduce a struct obj_cgroup to charge slab objects. It prevents > > long-living objects from pinning the original memory cgroup in the memory. > > But there are still some corner objects (e.g. allocations larger than > > order-1 page on SLUB) which are not charged via the new APIs. Those > > objects (include the pages which are allocated from buddy allocator > > directly) are charged as kmem pages which still hold a reference to > > the memory cgroup. > > > > This patch aims to charge the kmem pages by using the new APIs of > > obj_cgroup. Finally, the page->memcg_data of the kmem page points to > > an object cgroup. We can use the page_objcg() to get the object > > cgroup associated with a kmem page. Or we can use page_memcg_check() > > to get the memory cgroup associated with a kmem page, but caller must > > ensure that the returned memcg won't be released (e.g. acquire the > > rcu_read_lock or css_set_lock). > > > > Signed-off-by: Muchun Song > > --- > > include/linux/memcontrol.h | 63 ++++++++++++++++++------ > > mm/memcontrol.c | 119 ++++++++++++++++++++++++++++++--------------- > > 2 files changed, 128 insertions(+), 54 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 83cbcdcfcc92..07c449af9c0f 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page *page) > > } > > > > /* > > + * After the initialization objcg->memcg is always pointing at > > + * a valid memcg, but can be atomically swapped to the parent memcg. > > + * > > + * The caller must ensure that the returned memcg won't be released: > > + * e.g. acquire the rcu_read_lock or css_set_lock. > > + */ > > +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > > +{ > > + return READ_ONCE(objcg->memcg); > > +} > > + > > +/* > > * page_memcg - get the memory cgroup associated with a non-kmem page > > * @page: a pointer to the page struct > > * > > @@ -422,15 +434,19 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page) > > * @page: a pointer to the page struct > > * > > * Returns a pointer to the memory cgroup associated with the page, > > - * or NULL. This function unlike page_memcg() can take any page > > + * or NULL. This function unlike page_memcg() can take any page > > * as an argument. It has to be used in cases when it's not known if a page > > - * has an associated memory cgroup pointer or an object cgroups vector. > > + * has an associated memory cgroup pointer or an object cgroups vector or > > + * an object cgroup. > > * > > * Any of the following ensures page and memcg binding stability: > > * - the page lock > > * - LRU isolation > > * - lock_page_memcg() > > * - exclusive reference > > + * > > + * Should be called under rcu lock which can protect memcg associated with a > > + * kmem page from being released. > > How about this: > > For a non-kmem page any of the following ensures page and memcg binding stability: > - the page lock > - LRU isolation > - lock_page_memcg() > - exclusive reference > > For a kmem page a caller should hold an rcu read lock to protect memcg associated > with a kmem page from being released. OK. I will use this. Thanks Roman. > > > */ > > static inline struct mem_cgroup *page_memcg_check(struct page *page) > > { > > @@ -443,6 +459,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page) > > if (memcg_data & MEMCG_DATA_OBJCGS) > > return NULL; > > > > + if (memcg_data & MEMCG_DATA_KMEM) { > > + struct obj_cgroup *objcg; > > + > > + objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > + return obj_cgroup_memcg(objcg); > > + } > > + > > return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > @@ -501,6 +524,25 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) > > return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > } > > > > +/* > > + * page_objcg - get the object cgroup associated with a kmem page > > + * @page: a pointer to the page struct > > + * > > + * Returns a pointer to the object cgroup associated with the kmem page, > > + * or NULL. This function assumes that the page is known to have an > > + * associated object cgroup. It's only safe to call this function > > + * against kmem pages (PageMemcgKmem() returns true). > > + */ > > +static inline struct obj_cgroup *page_objcg(struct page *page) > > +{ > > + unsigned long memcg_data = page->memcg_data; > > + > > + VM_BUG_ON_PAGE(PageSlab(page), page); > > + VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page); > > + VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page); > > + > > + return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); > > +} > > #else > > static inline struct obj_cgroup **page_objcgs(struct page *page) > > { > > @@ -511,6 +553,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page) > > { > > return NULL; > > } > > + > > +static inline struct obj_cgroup *page_objcg(struct page *page) > > +{ > > + return NULL; > > +} > > #endif > > > > static __always_inline bool memcg_stat_item_in_bytes(int idx) > > @@ -729,18 +776,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg) > > percpu_ref_put(&objcg->refcnt); > > } > > > > -/* > > - * After the initialization objcg->memcg is always pointing at > > - * a valid memcg, but can be atomically swapped to the parent memcg. > > - * > > - * The caller must ensure that the returned memcg won't be released: > > - * e.g. acquire the rcu_read_lock or css_set_lock. > > - */ > > -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg) > > -{ > > - return READ_ONCE(objcg->memcg); > > -} > > - > > static inline void mem_cgroup_put(struct mem_cgroup *memcg) > > { > > if (memcg) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index e1dc73ceb98a..38376f9d6659 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -859,15 +859,26 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx, > > pg_data_t *pgdat = page_pgdat(page); > > struct lruvec *lruvec; > > > > - memcg = page_memcg_check(head); > > - /* Untracked pages have no memcg, no lruvec. Update only the node */ > > - if (!memcg) { > > - __mod_node_page_state(pgdat, idx, val); > > - return; > > + if (PageMemcgKmem(head)) { > > + rcu_read_lock(); > > + memcg = obj_cgroup_memcg(page_objcg(page)); > > + } else { > > + memcg = page_memcg(head); > > + /* > > + * Untracked pages have no memcg, no lruvec. Update only the > > + * node. > > + */ > > + if (!memcg) { > > + __mod_node_page_state(pgdat, idx, val); > > + return; > > + } > > } > > > > lruvec = mem_cgroup_lruvec(memcg, pgdat); > > __mod_lruvec_state(lruvec, idx, val); > > + > > + if (PageMemcgKmem(head)) > > + rcu_read_unlock(); > > } > > EXPORT_SYMBOL(__mod_lruvec_page_state); > > > > @@ -2906,6 +2917,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg) > > page->memcg_data = (unsigned long)memcg; > > } > > > > +static inline struct mem_cgroup *obj_cgroup_memcg_get(struct obj_cgroup *objcg) > > I'd prefer get_obj_cgroup_memcg(), if you don't mind. LGTM, will do. > > > +{ > > + struct mem_cgroup *memcg; > > + > > + rcu_read_lock(); > > +retry: > > + memcg = obj_cgroup_memcg(objcg); > > + if (unlikely(!css_tryget(&memcg->css))) > > + goto retry; > > + rcu_read_unlock(); > > + > > + return memcg; > > +} > > + > > #ifdef CONFIG_MEMCG_KMEM > > int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, > > gfp_t gfp, bool new_page) > > @@ -3071,15 +3096,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, > > struct mem_cgroup *memcg; > > int ret; > > > > - rcu_read_lock(); > > -retry: > > - memcg = obj_cgroup_memcg(objcg); > > - if (unlikely(!css_tryget(&memcg->css))) > > - goto retry; > > - rcu_read_unlock(); > > - > > + memcg = obj_cgroup_memcg_get(objcg); > > ret = __memcg_kmem_charge(memcg, gfp, nr_pages); > > - > > css_put(&memcg->css); > > > > return ret; > > @@ -3144,18 +3162,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page > > */ > > int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) > > { > > - struct mem_cgroup *memcg; > > + struct obj_cgroup *objcg; > > int ret = 0; > > > > - memcg = get_mem_cgroup_from_current(); > > - if (memcg && !mem_cgroup_is_root(memcg)) { > > - ret = __memcg_kmem_charge(memcg, gfp, 1 << order); > > + objcg = get_obj_cgroup_from_current(); > > + if (objcg) { > > + ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order); > > if (!ret) { > > - page->memcg_data = (unsigned long)memcg | > > + page->memcg_data = (unsigned long)objcg | > > MEMCG_DATA_KMEM; > > return 0; > > } > > - css_put(&memcg->css); > > + obj_cgroup_put(objcg); > > } > > return ret; > > } > > @@ -3167,17 +3185,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) > > */ > > void __memcg_kmem_uncharge_page(struct page *page, int order) > > { > > - struct mem_cgroup *memcg; > > + struct obj_cgroup *objcg; > > unsigned int nr_pages = 1 << order; > > > > if (!page_memcg_charged(page)) > > return; > > > > - memcg = page_memcg_check(page); > > - VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page); > > - __memcg_kmem_uncharge(memcg, nr_pages); > > + objcg = page_objcg(page); > > + obj_cgroup_uncharge_pages(objcg, nr_pages); > > page->memcg_data = 0; > > - css_put(&memcg->css); > > + obj_cgroup_put(objcg); > > } > > > > static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) > > @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug) > > static void uncharge_batch(const struct uncharge_gather *ug) > > { > > unsigned long flags; > > + unsigned long nr_pages; > > > > - if (!mem_cgroup_is_root(ug->memcg)) { > > - page_counter_uncharge(&ug->memcg->memory, ug->nr_pages); > > + /* > > + * The kmem pages can be reparented to the root memcg, in > > + * order to prevent the memory counter of root memcg from > > + * increasing indefinitely. We should decrease the memory > > + * counter when unchange. > > I guess the correct syntax is > "The kmem pages can be reparented to the root memcg. In > order to prevent the memory counter of root memcg from > increasing indefinitely, we should decrease the memory > counter when unchange." Right. I will combine your and Johannes suggestions about how to rework the code here. > > > + */ > > + if (mem_cgroup_is_root(ug->memcg)) > > + nr_pages = ug->nr_kmem; > > + else > > + nr_pages = ug->nr_pages; > > + > > + if (nr_pages) { > > + page_counter_uncharge(&ug->memcg->memory, nr_pages); > > if (do_memsw_account()) > > - page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages); > > + page_counter_uncharge(&ug->memcg->memsw, nr_pages); > > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem) > > page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem); > > memcg_oom_recover(ug->memcg); > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug) > > > > static void uncharge_page(struct page *page, struct uncharge_gather *ug) > > { > > - unsigned long nr_pages; > > + unsigned long nr_pages, nr_kmem; > > struct mem_cgroup *memcg; > > > > VM_BUG_ON_PAGE(PageLRU(page), page); > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) > > if (!page_memcg_charged(page)) > > return; > > > > + nr_pages = compound_nr(page); > > /* > > * Nobody should be changing or seriously looking at > > - * page memcg at this point, we have fully exclusive > > - * access to the page. > > + * page memcg or objcg at this point, we have fully > > + * exclusive access to the page. > > */ > > - memcg = page_memcg_check(page); > > + if (PageMemcgKmem(page)) { > > + struct obj_cgroup *objcg; > > + > > + objcg = page_objcg(page); > > + memcg = obj_cgroup_memcg_get(objcg); > > + > > + page->memcg_data = 0; > > + obj_cgroup_put(objcg); > > + nr_kmem = nr_pages; > > + } else { > > + memcg = page_memcg(page); > > + page->memcg_data = 0; > > + nr_kmem = 0; > > + } > > + > > if (ug->memcg != memcg) { > > if (ug->memcg) { > > uncharge_batch(ug); > > uncharge_gather_clear(ug); > > } > > ug->memcg = memcg; > > + ug->dummy_page = page; > > > > /* pairs with css_put in uncharge_batch */ > > css_get(&ug->memcg->css); > > } > > > > - nr_pages = compound_nr(page); > > ug->nr_pages += nr_pages; > > + ug->nr_kmem += nr_kmem; > > + ug->pgpgout += !nr_kmem; > > > > - if (PageMemcgKmem(page)) > > - ug->nr_kmem += nr_pages; > > - else > > - ug->pgpgout++; > > - > > - ug->dummy_page = page; > > - page->memcg_data = 0; > > - css_put(&ug->memcg->css); > > + css_put(&memcg->css); > > } > > > > /** > > -- > > 2.11.0 > >