linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: js1304@gmail.com
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	kernel-team@lge.com, Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache
Date: Fri,  3 Apr 2020 14:40:43 +0900	[thread overview]
Message-ID: <1585892447-32059-6-git-send-email-iamjoonsoo.kim@lge.com> (raw)
In-Reply-To: <1585892447-32059-1-git-send-email-iamjoonsoo.kim@lge.com>

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Currently, some swapped-in pages are not charged to the memcg until
actual access to the page happens. I checked the code and found that
it could cause a problem. In this implementation, even if the memcg
is enabled, one can consume a lot of memory in the system by exploiting
this hole. For example, one can make all the pages swapped out and
then call madvise_willneed() to load the all swapped-out pages without
pressing the memcg. Although actual access requires charging, it's really
big benefit to load the swapped-out pages to the memory without pressing
the memcg.

And, for workingset detection which is implemented on the following patch,
a memcg should be committed before the workingset detection is executed.
For this purpose, the best solution, I think, is charging the page when
adding to the swap cache. Charging there is not that hard. Caller of
adding the page to the swap cache has enough information about the charged
memcg. So, what we need to do is just passing this information to
the right place.

With this patch, specific memcg could be pressured more since readahead
pages are also charged to it now. This would result in performance
degradation to that user but it would be fair since that readahead is for
that user.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/swap.h |  4 ++--
 mm/shmem.c           | 18 ++++++++++--------
 mm/swap_state.c      | 25 +++++++++++++++++++++----
 3 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 273de48..eea0700 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -409,7 +409,7 @@ extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *page);
 extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
-			gfp_t gfp, void **shadowp);
+			struct vm_area_struct *vma, gfp_t gfp, void **shadowp);
 extern int __add_to_swap_cache(struct page *page, swp_entry_t entry);
 extern void __delete_from_swap_cache(struct page *page,
 			swp_entry_t entry, void *shadow);
@@ -567,7 +567,7 @@ static inline int add_to_swap(struct page *page)
 }
 
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
-					gfp_t gfp_mask, void **shadowp)
+			struct vm_area_struct *vma, gfp_t gfp, void **shadowp)
 {
 	return -1;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 9e34b4e..8e28c1f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1369,7 +1369,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	if (list_empty(&info->swaplist))
 		list_add(&info->swaplist, &shmem_swaplist);
 
-	if (add_to_swap_cache(page, swap,
+	if (add_to_swap_cache(page, swap, NULL,
 			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
 			NULL) == 0) {
 		spin_lock_irq(&info->lock);
@@ -1434,10 +1434,11 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo)
 #endif
 
 static void shmem_pseudo_vma_init(struct vm_area_struct *vma,
-		struct shmem_inode_info *info, pgoff_t index)
+			struct mm_struct *mm, struct shmem_inode_info *info,
+			pgoff_t index)
 {
 	/* Create a pseudo vma that just contains the policy */
-	vma_init(vma, NULL);
+	vma_init(vma, mm);
 	/* Bias interleave by inode number to distribute better across nodes */
 	vma->vm_pgoff = index + info->vfs_inode.i_ino;
 	vma->vm_policy = mpol_shared_policy_lookup(&info->policy, index);
@@ -1450,13 +1451,14 @@ static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma)
 }
 
 static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
-			struct shmem_inode_info *info, pgoff_t index)
+			struct mm_struct *mm, struct shmem_inode_info *info,
+			pgoff_t index)
 {
 	struct vm_area_struct pvma;
 	struct page *page;
 	struct vm_fault vmf;
 
-	shmem_pseudo_vma_init(&pvma, info, index);
+	shmem_pseudo_vma_init(&pvma, mm, info, index);
 	vmf.vma = &pvma;
 	vmf.address = 0;
 	page = swap_cluster_readahead(swap, gfp, &vmf);
@@ -1481,7 +1483,7 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
 								XA_PRESENT))
 		return NULL;
 
-	shmem_pseudo_vma_init(&pvma, info, hindex);
+	shmem_pseudo_vma_init(&pvma, NULL, info, hindex);
 	page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
 			HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true);
 	shmem_pseudo_vma_destroy(&pvma);
@@ -1496,7 +1498,7 @@ static struct page *shmem_alloc_page(gfp_t gfp,
 	struct vm_area_struct pvma;
 	struct page *page;
 
-	shmem_pseudo_vma_init(&pvma, info, index);
+	shmem_pseudo_vma_init(&pvma, NULL, info, index);
 	page = alloc_page_vma(gfp, &pvma, 0);
 	shmem_pseudo_vma_destroy(&pvma);
 
@@ -1652,7 +1654,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
 			count_memcg_event_mm(charge_mm, PGMAJFAULT);
 		}
 		/* Here we actually start the io */
-		page = shmem_swapin(swap, gfp, info, index);
+		page = shmem_swapin(swap, gfp, charge_mm, info, index);
 		if (!page) {
 			error = -ENOMEM;
 			goto failed;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index f06af84..1db73a2 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -112,7 +112,7 @@ void show_swap_cache_info(void)
  * but sets SwapCache flag and private instead of mapping and index.
  */
 int add_to_swap_cache(struct page *page, swp_entry_t entry,
-			gfp_t gfp, void **shadowp)
+			struct vm_area_struct *vma, gfp_t gfp, void **shadowp)
 {
 	struct address_space *address_space = swap_address_space(entry);
 	pgoff_t idx = swp_offset(entry);
@@ -120,14 +120,26 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
 	unsigned long i, nr = compound_nr(page);
 	unsigned long nrexceptional = 0;
 	void *old;
+	bool compound = !!compound_order(page);
+	int error;
+	struct mm_struct *mm = vma ? vma->vm_mm : current->mm;
+	struct mem_cgroup *memcg;
 
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageSwapCache(page), page);
 	VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
 	page_ref_add(page, nr);
+	/* PageSwapCache() prevent the page from being re-charged */
 	SetPageSwapCache(page);
 
+	error = mem_cgroup_try_charge(page, mm, gfp, &memcg, compound);
+	if (error) {
+		ClearPageSwapCache(page);
+		page_ref_sub(page, nr);
+		return error;
+	}
+
 	do {
 		xas_lock_irq(&xas);
 		xas_create_range(&xas);
@@ -153,11 +165,16 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
 		xas_unlock_irq(&xas);
 	} while (xas_nomem(&xas, gfp));
 
-	if (!xas_error(&xas))
+	if (!xas_error(&xas)) {
+		mem_cgroup_commit_charge(page, memcg, false, compound);
 		return 0;
+	}
+
+	mem_cgroup_cancel_charge(page, memcg, compound);
 
 	ClearPageSwapCache(page);
 	page_ref_sub(page, nr);
+
 	return xas_error(&xas);
 }
 
@@ -221,7 +238,7 @@ int add_to_swap(struct page *page)
 	/*
 	 * Add it to the swap cache.
 	 */
-	err = add_to_swap_cache(page, entry,
+	err = add_to_swap_cache(page, entry, NULL,
 			__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN, NULL);
 	if (err)
 		/*
@@ -431,7 +448,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		/* May fail (-ENOMEM) if XArray node allocation failed. */
 		__SetPageLocked(new_page);
 		__SetPageSwapBacked(new_page);
-		err = add_to_swap_cache(new_page, entry,
+		err = add_to_swap_cache(new_page, entry, vma,
 				gfp_mask & GFP_KERNEL, NULL);
 		if (likely(!err)) {
 			/* Initiate read into locked page */
-- 
2.7.4


  parent reply	other threads:[~2020-04-03  5:41 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03  5:40 [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list js1304
2020-04-03  5:40 ` [PATCH v5 01/10] mm/vmscan: make active/inactive ratio as 1:1 for anon lru js1304
2020-04-03  5:40 ` [PATCH v5 02/10] mm/vmscan: protect the workingset on anonymous LRU js1304
2020-04-03  5:40 ` [PATCH v5 03/10] mm/workingset: extend the workingset detection for anon LRU js1304
2020-04-03  5:40 ` [PATCH v5 04/10] mm/swapcache: support to handle the exceptional entries in swapcache js1304
2020-04-03  5:40 ` js1304 [this message]
2020-04-03 18:29   ` [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache Yang Shi
2020-04-06  1:03     ` Joonsoo Kim
2020-04-07  0:22       ` Yang Shi
2020-04-07  1:27         ` Joonsoo Kim
2020-04-16 16:11   ` Johannes Weiner
2020-04-17  1:38     ` Joonsoo Kim
2020-04-17  3:31       ` Johannes Weiner
2020-04-17  3:57         ` Joonsoo Kim
2020-04-03  5:40 ` [PATCH v5 06/10] mm/swap: implement workingset detection for anonymous LRU js1304
2020-04-03  5:40 ` [PATCH v5 07/10] mm/workingset: support to remember the previous owner of the page js1304
2020-04-03  5:40 ` [PATCH v5 08/10] mm/swap: do not readahead if the previous owner of the swap entry isn't me js1304
2020-04-03  5:40 ` [PATCH v5 09/10] mm/vmscan: restore active/inactive ratio for anonymous LRU js1304
2020-04-03  5:45 ` [PATCH v5 10/10] mm/swap: reinforce the reclaim_stat changed by anon LRU algorithm change js1304
     [not found] ` <20200406091814.17256-1-hdanton@sina.com>
2020-04-07  0:40   ` [PATCH v5 02/10] mm/vmscan: protect the workingset on anonymous LRU Joonsoo Kim
     [not found] ` <20200406115804.4440-1-hdanton@sina.com>
2020-04-07  0:42   ` [PATCH v5 05/10] mm/swap: charge the page when adding to the swap cache Joonsoo Kim
     [not found]   ` <20200407022144.11164-1-hdanton@sina.com>
2020-04-09  0:53     ` Joonsoo Kim
2020-04-08 16:55 ` [PATCH v5 00/10] workingset protection/detection on the anonymous LRU list Vlastimil Babka
2020-04-09  0:50   ` Joonsoo Kim
2020-06-03  3:57     ` Suren Baghdasaryan
2020-06-03  5:46       ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1585892447-32059-6-git-send-email-iamjoonsoo.kim@lge.com \
    --to=js1304@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).