From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48CAEC55189 for ; Wed, 22 Apr 2020 06:40:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E6A33206E9 for ; Wed, 22 Apr 2020 06:40:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="t4DFP0hH" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6A33206E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D9E48E0003; Wed, 22 Apr 2020 02:40:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 726C38E0007; Wed, 22 Apr 2020 02:40:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A1198E0003; Wed, 22 Apr 2020 02:40:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 3E78E8E0005 for ; Wed, 22 Apr 2020 02:40:49 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id F2B1C8248076 for ; Wed, 22 Apr 2020 06:40:48 +0000 (UTC) X-FDA: 76734542976.07.hook76_5e638d326e62e X-HE-Tag: hook76_5e638d326e62e X-Filterd-Recvd-Size: 12709 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Wed, 22 Apr 2020 06:40:48 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id x15so600167pfa.1 for ; Tue, 21 Apr 2020 23:40:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=fWYEPUjWmjPP7gG4u2ydUqEd/zDoBHYpeI1SjYsmQwY=; b=t4DFP0hH2qVKcjGwwqWyUzQznQbVN/ReJl7XtH6cTFewnH9HF+Pj90no+uJpZXH9iI Q7Vd2ZJJCci2lINKdpea2kd48QLlM6Xv2xBJTe5fTV23YXJMuLsbj3LH5o6xU/Op3dUX 5HeYW38z7ApZYfw5Fq13sJ7B1Ac8BG/c0csQna+X/ScHm8BN5XyhCZKUWl/LzCpZBBmj h8RfZZZpVmpnXR4swnjBtbRpaxKY6FHYJ+z2y8Ttg5wIGsTCxVNDYE8zA5mCGTuw5Ngf 1Gmz219vW7ImtILoAEz423JEgJBQ3llOn5Ffrmk19GJEIoZNSC0/y24Yqaay6FHuHA2X ALMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=fWYEPUjWmjPP7gG4u2ydUqEd/zDoBHYpeI1SjYsmQwY=; b=JHtAnDm3j8ziSVaqa+ZqP4tjluIF0A/5AGl2qiaQlIlst3JaAKCCfmCvz3AVt123fN tq8oCLemdm86qn8V/Y3ZcEILK1mA1GmcAw2LqukGRPQ7VOUoGp/9koqMLJfqpjQTKfoU m616wvcNO6kH2EXH1g06k1NuHKV1Rxs8nd9vXmU+f2ihTKR7zRACL9aPZEhxziuNsXSo blJzVwnnb0Ul6mYcYAJJVrM9tbDEUsgAHV6nlez0rCKEql9I8YoEVSUvWug9lOmQyiah 4UzBGAoYAR4gaN49SgekHFsC8oSQe0/eMOk5ZBV32mLmNaZ30CnvrhhIA3xc6jh6kyf/ 3yGg== X-Gm-Message-State: AGi0PuaH3jdPOCDkmKsI7s/uwVgOJJ5wECnqPDXpe52rZxuFBQqSUy4C qXKI27hoSJXjANpiWa+5fys= X-Google-Smtp-Source: APiQypIfSliA/AMNnjffraDRCbAgYVGMyLZTLt/MoG4eNvqJ6oZF48EkCjIzyu4gcCRcPrhgioDLaw== X-Received: by 2002:a65:4645:: with SMTP id k5mr25216404pgr.115.1587537647330; Tue, 21 Apr 2020 23:40:47 -0700 (PDT) Received: from js1304-desktop ([114.206.198.176]) by smtp.gmail.com with ESMTPSA id u3sm4281393pfn.217.2020.04.21.23.40.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Apr 2020 23:40:47 -0700 (PDT) Date: Wed, 22 Apr 2020 15:40:41 +0900 From: Joonsoo Kim To: Johannes Weiner Cc: Alex Shi , Shakeel Butt , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 05/18] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API Message-ID: <20200422064041.GE6780@js1304-desktop> References: <20200420221126.341272-1-hannes@cmpxchg.org> <20200420221126.341272-6-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200420221126.341272-6-hannes@cmpxchg.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 20, 2020 at 06:11:13PM -0400, Johannes Weiner wrote: > The try/commit/cancel protocol that memcg uses dates back to when > pages used to be uncharged upon removal from the page cache, and thus > couldn't be committed before the insertion had succeeded. Nowadays, > pages are uncharged when they are physically freed; it doesn't matter > whether the insertion was successful or not. For the page cache, the > transaction dance has become unnecessary. > > Introduce a mem_cgroup_charge() function that simply charges a newly > allocated page to a cgroup and sets up page->mem_cgroup in one single > step. If the insertion fails, the caller doesn't have to do anything > but free/put the page. > > Then switch the page cache over to this new API. > > Subsequent patches will also convert anon pages, but it needs a bit > more prep work. Right now, memcg depends on page->mapping being > already set up at the time of charging, so that it can maintain its > own MEMCG_CACHE and MEMCG_RSS counters. For anon, page->mapping is set > under the same pte lock under which the page is publishd, so a single > charge point that can block doesn't work there just yet. > > The following prep patches will replace the private memcg counters > with the generic vmstat counters, thus removing the page->mapping > dependency, then complete the transition to the new single-point > charge API and delete the old transactional scheme. > > Signed-off-by: Johannes Weiner > --- > include/linux/memcontrol.h | 10 ++++ > mm/filemap.c | 24 ++++------ > mm/memcontrol.c | 27 +++++++++++ > mm/shmem.c | 97 +++++++++++++++++--------------------- > 4 files changed, 89 insertions(+), 69 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index c7875a48c8c1..5e8b0e38f145 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -367,6 +367,10 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm, > void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg, > bool lrucare); > void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); > + > +int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, > + bool lrucare); > + > void mem_cgroup_uncharge(struct page *page); > void mem_cgroup_uncharge_list(struct list_head *page_list); > > @@ -872,6 +876,12 @@ static inline void mem_cgroup_cancel_charge(struct page *page, > { > } > > +static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm, > + gfp_t gfp_mask, bool lrucare) > +{ > + return 0; > +} > + > static inline void mem_cgroup_uncharge(struct page *page) > { > } > diff --git a/mm/filemap.c b/mm/filemap.c > index 5b31af9d5b1b..5bdbda965177 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -832,7 +832,6 @@ static int __add_to_page_cache_locked(struct page *page, > { > XA_STATE(xas, &mapping->i_pages, offset); > int huge = PageHuge(page); > - struct mem_cgroup *memcg; > int error; > void *old; > > @@ -840,17 +839,16 @@ static int __add_to_page_cache_locked(struct page *page, > VM_BUG_ON_PAGE(PageSwapBacked(page), page); > mapping_set_update(&xas, mapping); > > - if (!huge) { > - error = mem_cgroup_try_charge(page, current->mm, > - gfp_mask, &memcg); > - if (error) > - return error; > - } > - > get_page(page); > page->mapping = mapping; > page->index = offset; > > + if (!huge) { > + error = mem_cgroup_charge(page, current->mm, gfp_mask, false); > + if (error) > + goto error; > + } > + > do { > xas_lock_irq(&xas); > old = xas_load(&xas); > @@ -874,20 +872,18 @@ static int __add_to_page_cache_locked(struct page *page, > xas_unlock_irq(&xas); > } while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK)); > > - if (xas_error(&xas)) > + if (xas_error(&xas)) { > + error = xas_error(&xas); > goto error; > + } > > - if (!huge) > - mem_cgroup_commit_charge(page, memcg, false); > trace_mm_filemap_add_to_page_cache(page); > return 0; > error: > page->mapping = NULL; > /* Leave page->index set: truncation relies upon it */ > - if (!huge) > - mem_cgroup_cancel_charge(page, memcg); > put_page(page); > - return xas_error(&xas); > + return error; > } > ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO); > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 711d6dd5cbb1..b38c0a672d26 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6577,6 +6577,33 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg) > cancel_charge(memcg, nr_pages); > } > > +/** > + * mem_cgroup_charge - charge a newly allocated page to a cgroup > + * @page: page to charge > + * @mm: mm context of the victim > + * @gfp_mask: reclaim mode > + * @lrucare: page might be on the LRU already > + * > + * Try to charge @page to the memcg that @mm belongs to, reclaiming > + * pages according to @gfp_mask if necessary. > + * > + * Returns 0 on success. Otherwise, an error code is returned. > + */ > +int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask, > + bool lrucare) > +{ > + struct mem_cgroup *memcg; > + int ret; > + > + VM_BUG_ON_PAGE(!page->mapping, page); > + > + ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg); > + if (ret) > + return ret; > + mem_cgroup_commit_charge(page, memcg, lrucare); > + return 0; > +} > + > struct uncharge_gather { > struct mem_cgroup *memcg; > unsigned long pgpgout; > diff --git a/mm/shmem.c b/mm/shmem.c > index 52c66801321e..2384f6c7ef71 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -605,11 +605,13 @@ static inline bool is_huge_enabled(struct shmem_sb_info *sbinfo) > */ > static int shmem_add_to_page_cache(struct page *page, > struct address_space *mapping, > - pgoff_t index, void *expected, gfp_t gfp) > + pgoff_t index, void *expected, gfp_t gfp, > + struct mm_struct *charge_mm) > { > XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page)); > unsigned long i = 0; > unsigned long nr = compound_nr(page); > + int error; > > VM_BUG_ON_PAGE(PageTail(page), page); > VM_BUG_ON_PAGE(index != round_down(index, nr), page); > @@ -621,6 +623,16 @@ static int shmem_add_to_page_cache(struct page *page, > page->mapping = mapping; > page->index = index; > > + error = mem_cgroup_charge(page, charge_mm, gfp, PageSwapCache(page)); > + if (error) { > + if (!PageSwapCache(page) && PageTransHuge(page)) { > + count_vm_event(THP_FILE_FALLBACK); > + count_vm_event(THP_FILE_FALLBACK_CHARGE); > + } > + goto error; > + } > + cgroup_throttle_swaprate(page, gfp); > + > do { > void *entry; > xas_lock_irq(&xas); > @@ -648,12 +660,15 @@ static int shmem_add_to_page_cache(struct page *page, > } while (xas_nomem(&xas, gfp)); > > if (xas_error(&xas)) { > - page->mapping = NULL; > - page_ref_sub(page, nr); > - return xas_error(&xas); > + error = xas_error(&xas); > + goto error; > } > > return 0; > +error: > + page->mapping = NULL; > + page_ref_sub(page, nr); > + return error; > } > > /* > @@ -1619,7 +1634,6 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, > struct address_space *mapping = inode->i_mapping; > struct shmem_inode_info *info = SHMEM_I(inode); > struct mm_struct *charge_mm = vma ? vma->vm_mm : current->mm; > - struct mem_cgroup *memcg; > struct page *page; > swp_entry_t swap; > int error; > @@ -1664,29 +1678,22 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, > goto failed; > } > > - error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg); > - if (!error) { > - error = shmem_add_to_page_cache(page, mapping, index, > - swp_to_radix_entry(swap), gfp); > - /* > - * We already confirmed swap under page lock, and make > - * no memory allocation here, so usually no possibility > - * of error; but free_swap_and_cache() only trylocks a > - * page, so it is just possible that the entry has been > - * truncated or holepunched since swap was confirmed. > - * shmem_undo_range() will have done some of the > - * unaccounting, now delete_from_swap_cache() will do > - * the rest. > - */ > - if (error) { > - mem_cgroup_cancel_charge(page, memcg); > - delete_from_swap_cache(page); > - } > - } > - if (error) > + error = shmem_add_to_page_cache(page, mapping, index, > + swp_to_radix_entry(swap), gfp, > + charge_mm); > + /* > + * We already confirmed swap under page lock, and make no > + * memory allocation here, so usually no possibility of error; > + * but free_swap_and_cache() only trylocks a page, so it is > + * just possible that the entry has been truncated or > + * holepunched since swap was confirmed. shmem_undo_range() > + * will have done some of the unaccounting, now > + * delete_from_swap_cache() will do the rest. > + */ > + if (error) { > + delete_from_swap_cache(page); > goto failed; -EEXIST (from swap cache) and -ENOMEM (from memcg) should be handled differently. delete_from_swap_cache() is for -EEXIST case. Thanks.