From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=/SNT=6G=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 48CAEC55189
	for <linux-mm@archiver.kernel.org>; Wed, 22 Apr 2020 06:40:51 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id E6A33206E9
	for <linux-mm@archiver.kernel.org>; Wed, 22 Apr 2020 06:40:50 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="t4DFP0hH"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6A33206E9
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 8D9E48E0003; Wed, 22 Apr 2020 02:40:49 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 726C38E0007; Wed, 22 Apr 2020 02:40:49 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5A1198E0003; Wed, 22 Apr 2020 02:40:49 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25])
	by kanga.kvack.org (Postfix) with ESMTP id 3E78E8E0005
	for <linux-mm@kvack.org>; Wed, 22 Apr 2020 02:40:49 -0400 (EDT)
Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id F2B1C8248076
	for <linux-mm@kvack.org>; Wed, 22 Apr 2020 06:40:48 +0000 (UTC)
X-FDA: 76734542976.07.hook76_5e638d326e62e
X-HE-Tag: hook76_5e638d326e62e
X-Filterd-Recvd-Size: 12709
Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193])
	by imf32.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Wed, 22 Apr 2020 06:40:48 +0000 (UTC)
Received: by mail-pf1-f193.google.com with SMTP id x15so600167pfa.1
        for <linux-mm@kvack.org>; Tue, 21 Apr 2020 23:40:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to:user-agent;
        bh=fWYEPUjWmjPP7gG4u2ydUqEd/zDoBHYpeI1SjYsmQwY=;
        b=t4DFP0hH2qVKcjGwwqWyUzQznQbVN/ReJl7XtH6cTFewnH9HF+Pj90no+uJpZXH9iI
         Q7Vd2ZJJCci2lINKdpea2kd48QLlM6Xv2xBJTe5fTV23YXJMuLsbj3LH5o6xU/Op3dUX
         5HeYW38z7ApZYfw5Fq13sJ7B1Ac8BG/c0csQna+X/ScHm8BN5XyhCZKUWl/LzCpZBBmj
         h8RfZZZpVmpnXR4swnjBtbRpaxKY6FHYJ+z2y8Ttg5wIGsTCxVNDYE8zA5mCGTuw5Ngf
         1Gmz219vW7ImtILoAEz423JEgJBQ3llOn5Ffrmk19GJEIoZNSC0/y24Yqaay6FHuHA2X
         ALMQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to:user-agent;
        bh=fWYEPUjWmjPP7gG4u2ydUqEd/zDoBHYpeI1SjYsmQwY=;
        b=JHtAnDm3j8ziSVaqa+ZqP4tjluIF0A/5AGl2qiaQlIlst3JaAKCCfmCvz3AVt123fN
         tq8oCLemdm86qn8V/Y3ZcEILK1mA1GmcAw2LqukGRPQ7VOUoGp/9koqMLJfqpjQTKfoU
         m616wvcNO6kH2EXH1g06k1NuHKV1Rxs8nd9vXmU+f2ihTKR7zRACL9aPZEhxziuNsXSo
         blJzVwnnb0Ul6mYcYAJJVrM9tbDEUsgAHV6nlez0rCKEql9I8YoEVSUvWug9lOmQyiah
         4UzBGAoYAR4gaN49SgekHFsC8oSQe0/eMOk5ZBV32mLmNaZ30CnvrhhIA3xc6jh6kyf/
         3yGg==
X-Gm-Message-State: AGi0PuaH3jdPOCDkmKsI7s/uwVgOJJ5wECnqPDXpe52rZxuFBQqSUy4C
	qXKI27hoSJXjANpiWa+5fys=
X-Google-Smtp-Source: APiQypIfSliA/AMNnjffraDRCbAgYVGMyLZTLt/MoG4eNvqJ6oZF48EkCjIzyu4gcCRcPrhgioDLaw==
X-Received: by 2002:a65:4645:: with SMTP id k5mr25216404pgr.115.1587537647330;
        Tue, 21 Apr 2020 23:40:47 -0700 (PDT)
Received: from js1304-desktop ([114.206.198.176])
        by smtp.gmail.com with ESMTPSA id u3sm4281393pfn.217.2020.04.21.23.40.44
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Tue, 21 Apr 2020 23:40:47 -0700 (PDT)
Date: Wed, 22 Apr 2020 15:40:41 +0900
From: Joonsoo Kim <js1304@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	Shakeel Butt <shakeelb@google.com>, Hugh Dickins <hughd@google.com>,
	Michal Hocko <mhocko@suse.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Roman Gushchin <guro@fb.com>, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com
Subject: Re: [PATCH 05/18] mm: memcontrol: convert page cache to a new
 mem_cgroup_charge() API
Message-ID: <20200422064041.GE6780@js1304-desktop>
References: <20200420221126.341272-1-hannes@cmpxchg.org>
 <20200420221126.341272-6-hannes@cmpxchg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20200420221126.341272-6-hannes@cmpxchg.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Apr 20, 2020 at 06:11:13PM -0400, Johannes Weiner wrote:
> The try/commit/cancel protocol that memcg uses dates back to when
> pages used to be uncharged upon removal from the page cache, and thus
> couldn't be committed before the insertion had succeeded. Nowadays,
> pages are uncharged when they are physically freed; it doesn't matter
> whether the insertion was successful or not. For the page cache, the
> transaction dance has become unnecessary.
> 
> Introduce a mem_cgroup_charge() function that simply charges a newly
> allocated page to a cgroup and sets up page->mem_cgroup in one single
> step. If the insertion fails, the caller doesn't have to do anything
> but free/put the page.
> 
> Then switch the page cache over to this new API.
> 
> Subsequent patches will also convert anon pages, but it needs a bit
> more prep work. Right now, memcg depends on page->mapping being
> already set up at the time of charging, so that it can maintain its
> own MEMCG_CACHE and MEMCG_RSS counters. For anon, page->mapping is set
> under the same pte lock under which the page is publishd, so a single
> charge point that can block doesn't work there just yet.
> 
> The following prep patches will replace the private memcg counters
> with the generic vmstat counters, thus removing the page->mapping
> dependency, then complete the transition to the new single-point
> charge API and delete the old transactional scheme.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/memcontrol.h | 10 ++++
>  mm/filemap.c               | 24 ++++------
>  mm/memcontrol.c            | 27 +++++++++++
>  mm/shmem.c                 | 97 +++++++++++++++++---------------------
>  4 files changed, 89 insertions(+), 69 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index c7875a48c8c1..5e8b0e38f145 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -367,6 +367,10 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
>  void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
>  			      bool lrucare);
>  void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
> +
> +int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
> +		      bool lrucare);
> +
>  void mem_cgroup_uncharge(struct page *page);
>  void mem_cgroup_uncharge_list(struct list_head *page_list);
>  
> @@ -872,6 +876,12 @@ static inline void mem_cgroup_cancel_charge(struct page *page,
>  {
>  }
>  
> +static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
> +				    gfp_t gfp_mask, bool lrucare)
> +{
> +	return 0;
> +}
> +
>  static inline void mem_cgroup_uncharge(struct page *page)
>  {
>  }
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5b31af9d5b1b..5bdbda965177 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -832,7 +832,6 @@ static int __add_to_page_cache_locked(struct page *page,
>  {
>  	XA_STATE(xas, &mapping->i_pages, offset);
>  	int huge = PageHuge(page);
> -	struct mem_cgroup *memcg;
>  	int error;
>  	void *old;
>  
> @@ -840,17 +839,16 @@ static int __add_to_page_cache_locked(struct page *page,
>  	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
>  	mapping_set_update(&xas, mapping);
>  
> -	if (!huge) {
> -		error = mem_cgroup_try_charge(page, current->mm,
> -					      gfp_mask, &memcg);
> -		if (error)
> -			return error;
> -	}
> -
>  	get_page(page);
>  	page->mapping = mapping;
>  	page->index = offset;
>  
> +	if (!huge) {
> +		error = mem_cgroup_charge(page, current->mm, gfp_mask, false);
> +		if (error)
> +			goto error;
> +	}
> +
>  	do {
>  		xas_lock_irq(&xas);
>  		old = xas_load(&xas);
> @@ -874,20 +872,18 @@ static int __add_to_page_cache_locked(struct page *page,
>  		xas_unlock_irq(&xas);
>  	} while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK));
>  
> -	if (xas_error(&xas))
> +	if (xas_error(&xas)) {
> +		error = xas_error(&xas);
>  		goto error;
> +	}
>  
> -	if (!huge)
> -		mem_cgroup_commit_charge(page, memcg, false);
>  	trace_mm_filemap_add_to_page_cache(page);
>  	return 0;
>  error:
>  	page->mapping = NULL;
>  	/* Leave page->index set: truncation relies upon it */
> -	if (!huge)
> -		mem_cgroup_cancel_charge(page, memcg);
>  	put_page(page);
> -	return xas_error(&xas);
> +	return error;
>  }
>  ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 711d6dd5cbb1..b38c0a672d26 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6577,6 +6577,33 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
>  	cancel_charge(memcg, nr_pages);
>  }
>  
> +/**
> + * mem_cgroup_charge - charge a newly allocated page to a cgroup
> + * @page: page to charge
> + * @mm: mm context of the victim
> + * @gfp_mask: reclaim mode
> + * @lrucare: page might be on the LRU already
> + *
> + * Try to charge @page to the memcg that @mm belongs to, reclaiming
> + * pages according to @gfp_mask if necessary.
> + *
> + * Returns 0 on success. Otherwise, an error code is returned.
> + */
> +int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
> +		      bool lrucare)
> +{
> +	struct mem_cgroup *memcg;
> +	int ret;
> +
> +	VM_BUG_ON_PAGE(!page->mapping, page);
> +
> +	ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
> +	if (ret)
> +		return ret;
> +	mem_cgroup_commit_charge(page, memcg, lrucare);
> +	return 0;
> +}
> +
>  struct uncharge_gather {
>  	struct mem_cgroup *memcg;
>  	unsigned long pgpgout;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 52c66801321e..2384f6c7ef71 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -605,11 +605,13 @@ static inline bool is_huge_enabled(struct shmem_sb_info *sbinfo)
>   */
>  static int shmem_add_to_page_cache(struct page *page,
>  				   struct address_space *mapping,
> -				   pgoff_t index, void *expected, gfp_t gfp)
> +				   pgoff_t index, void *expected, gfp_t gfp,
> +				   struct mm_struct *charge_mm)
>  {
>  	XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page));
>  	unsigned long i = 0;
>  	unsigned long nr = compound_nr(page);
> +	int error;
>  
>  	VM_BUG_ON_PAGE(PageTail(page), page);
>  	VM_BUG_ON_PAGE(index != round_down(index, nr), page);
> @@ -621,6 +623,16 @@ static int shmem_add_to_page_cache(struct page *page,
>  	page->mapping = mapping;
>  	page->index = index;
>  
> +	error = mem_cgroup_charge(page, charge_mm, gfp, PageSwapCache(page));
> +	if (error) {
> +		if (!PageSwapCache(page) && PageTransHuge(page)) {
> +			count_vm_event(THP_FILE_FALLBACK);
> +			count_vm_event(THP_FILE_FALLBACK_CHARGE);
> +		}
> +		goto error;
> +	}
> +	cgroup_throttle_swaprate(page, gfp);
> +
>  	do {
>  		void *entry;
>  		xas_lock_irq(&xas);
> @@ -648,12 +660,15 @@ static int shmem_add_to_page_cache(struct page *page,
>  	} while (xas_nomem(&xas, gfp));
>  
>  	if (xas_error(&xas)) {
> -		page->mapping = NULL;
> -		page_ref_sub(page, nr);
> -		return xas_error(&xas);
> +		error = xas_error(&xas);
> +		goto error;
>  	}
>  
>  	return 0;
> +error:
> +	page->mapping = NULL;
> +	page_ref_sub(page, nr);
> +	return error;
>  }
>  
>  /*
> @@ -1619,7 +1634,6 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
>  	struct address_space *mapping = inode->i_mapping;
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct mm_struct *charge_mm = vma ? vma->vm_mm : current->mm;
> -	struct mem_cgroup *memcg;
>  	struct page *page;
>  	swp_entry_t swap;
>  	int error;
> @@ -1664,29 +1678,22 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
>  			goto failed;
>  	}
>  
> -	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
> -	if (!error) {
> -		error = shmem_add_to_page_cache(page, mapping, index,
> -						swp_to_radix_entry(swap), gfp);
> -		/*
> -		 * We already confirmed swap under page lock, and make
> -		 * no memory allocation here, so usually no possibility
> -		 * of error; but free_swap_and_cache() only trylocks a
> -		 * page, so it is just possible that the entry has been
> -		 * truncated or holepunched since swap was confirmed.
> -		 * shmem_undo_range() will have done some of the
> -		 * unaccounting, now delete_from_swap_cache() will do
> -		 * the rest.
> -		 */
> -		if (error) {
> -			mem_cgroup_cancel_charge(page, memcg);
> -			delete_from_swap_cache(page);
> -		}
> -	}
> -	if (error)
> +	error = shmem_add_to_page_cache(page, mapping, index,
> +					swp_to_radix_entry(swap), gfp,
> +					charge_mm);
> +	/*
> +	 * We already confirmed swap under page lock, and make no
> +	 * memory allocation here, so usually no possibility of error;
> +	 * but free_swap_and_cache() only trylocks a page, so it is
> +	 * just possible that the entry has been truncated or
> +	 * holepunched since swap was confirmed.  shmem_undo_range()
> +	 * will have done some of the unaccounting, now
> +	 * delete_from_swap_cache() will do the rest.
> +	 */
> +	if (error) {
> +		delete_from_swap_cache(page);
>  		goto failed;

-EEXIST (from swap cache) and -ENOMEM (from memcg) should be handled
differently. delete_from_swap_cache() is for -EEXIST case.

Thanks.