linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: mhocko@suse.com, kirill.shutemov@linux.intel.com,
	hannes@cmpxchg.org, vbabka@suse.cz, rientjes@google.com,
	akpm@linux-foundation.org
Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: [v2 PATCH -mm] mm: account deferred split THPs into MemAvailable
Date: Thu, 22 Aug 2019 01:55:25 +0800	[thread overview]
Message-ID: <1566410125-66011-1-git-send-email-yang.shi@linux.alibaba.com> (raw)

Available memory is one of the most important metrics for memory
pressure.  Currently, the deferred split THPs are not accounted into
available memory, but they are reclaimable actually, like reclaimable
slabs.

And, they seems very common with the common workloads when THP is
enabled.  A simple run with MariaDB test of mmtest with THP enabled as
always shows it could generate over fifteen thousand deferred split THPs
(accumulated around 30G in one hour run, 75% of 40G memory for my VM).
It looks worth accounting in MemAvailable.

Record the number of freeable normal pages of deferred split THPs into
the second tail page, and account it into KReclaimable.  Although THP
allocations are not exactly "kernel allocations", once they are unmapped,
they are in fact kernel-only.  KReclaimable has been accounted into
MemAvailable.

When the deferred split THPs get split due to memory pressure or freed,
just decrease by the recorded number.

With this change when running program which populates 1G address space
then madvise(MADV_DONTNEED) 511 pages for every THP, /proc/meminfo would
show the deferred split THPs are accounted properly.

Populated by before calling madvise(MADV_DONTNEED):
MemAvailable:   43531960 kB
AnonPages:       1096660 kB
KReclaimable:      26156 kB
AnonHugePages:   1056768 kB

After calling madvise(MADV_DONTNEED):
MemAvailable:   44411164 kB
AnonPages:         50140 kB
KReclaimable:    1070640 kB
AnonHugePages:     10240 kB

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 Documentation/filesystems/proc.txt |  4 ++--
 include/linux/huge_mm.h            |  7 +++++--
 include/linux/mm_types.h           |  3 ++-
 mm/huge_memory.c                   | 13 ++++++++++++-
 mm/rmap.c                          |  4 ++--
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 99ca040..93fc183 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -968,8 +968,8 @@ ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
               with huge pages
 ShmemPmdMapped: Shared memory mapped into userspace with huge pages
 KReclaimable: Kernel allocations that the kernel will attempt to reclaim
-              under memory pressure. Includes SReclaimable (below), and other
-              direct allocations with a shrinker.
+              under memory pressure. Includes SReclaimable (below), deferred
+              split THPs, and other direct allocations with a shrinker.
         Slab: in-kernel data structures cache
 SReclaimable: Part of Slab, that might be reclaimed, such as caches
   SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 61c9ffd..c194630 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -162,7 +162,7 @@ static inline int split_huge_page(struct page *page)
 {
 	return split_huge_page_to_list(page, NULL);
 }
-void deferred_split_huge_page(struct page *page);
+void deferred_split_huge_page(struct page *page, unsigned int nr);
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze, struct page *page);
@@ -324,7 +324,10 @@ static inline int split_huge_page(struct page *page)
 {
 	return 0;
 }
-static inline void deferred_split_huge_page(struct page *page) {}
+static inline void deferred_split_huge_page(struct page *page, unsigned int nr)
+{
+}
+
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 156640c..17e0fc5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -138,7 +138,8 @@ struct page {
 		};
 		struct {	/* Second tail page of compound page */
 			unsigned long _compound_pad_1;	/* compound_head */
-			unsigned long _compound_pad_2;
+			/* Freeable normal pages for deferred split shrinker */
+			unsigned long nr_freeable;
 			/* For both global and memcg */
 			struct list_head deferred_list;
 		};
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c9a596e..e04ac4d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -524,6 +524,7 @@ void prep_transhuge_page(struct page *page)
 
 	INIT_LIST_HEAD(page_deferred_list(page));
 	set_compound_page_dtor(page, TRANSHUGE_PAGE_DTOR);
+	page[2].nr_freeable = 0;
 }
 
 static unsigned long __thp_get_unmapped_area(struct file *filp, unsigned long len,
@@ -2766,6 +2767,10 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		if (!list_empty(page_deferred_list(head))) {
 			ds_queue->split_queue_len--;
 			list_del(page_deferred_list(head));
+			__mod_node_page_state(page_pgdat(page),
+					NR_KERNEL_MISC_RECLAIMABLE,
+					-head[2].nr_freeable);
+			head[2].nr_freeable = 0;
 		}
 		if (mapping)
 			__dec_node_page_state(page, NR_SHMEM_THPS);
@@ -2816,11 +2821,14 @@ void free_transhuge_page(struct page *page)
 		ds_queue->split_queue_len--;
 		list_del(page_deferred_list(page));
 	}
+	__mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+			      -page[2].nr_freeable);
+	page[2].nr_freeable = 0;
 	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	free_compound_page(page);
 }
 
-void deferred_split_huge_page(struct page *page)
+void deferred_split_huge_page(struct page *page, unsigned int nr)
 {
 	struct deferred_split *ds_queue = get_deferred_split_queue(page);
 #ifdef CONFIG_MEMCG
@@ -2844,6 +2852,9 @@ void deferred_split_huge_page(struct page *page)
 		return;
 
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
+	page[2].nr_freeable += nr;
+	__mod_node_page_state(page_pgdat(page), NR_KERNEL_MISC_RECLAIMABLE,
+			      nr);
 	if (list_empty(page_deferred_list(page))) {
 		count_vm_event(THP_DEFERRED_SPLIT_PAGE);
 		list_add_tail(page_deferred_list(page), &ds_queue->split_queue);
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2a..6008fab 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1286,7 +1286,7 @@ static void page_remove_anon_compound_rmap(struct page *page)
 
 	if (nr) {
 		__mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
-		deferred_split_huge_page(page);
+		deferred_split_huge_page(page, nr);
 	}
 }
 
@@ -1320,7 +1320,7 @@ void page_remove_rmap(struct page *page, bool compound)
 		clear_page_mlock(page);
 
 	if (PageTransCompound(page))
-		deferred_split_huge_page(compound_head(page));
+		deferred_split_huge_page(compound_head(page), 1);
 
 	/*
 	 * It would be tidy to reset the PageAnon mapping here,
-- 
1.8.3.1


             reply	other threads:[~2019-08-21 17:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-21 17:55 Yang Shi [this message]
2019-08-22  8:04 ` [v2 PATCH -mm] mm: account deferred split THPs into MemAvailable Michal Hocko
2019-08-22 12:56   ` Vlastimil Babka
2019-08-22 15:29     ` Kirill A. Shutemov
2019-08-26  7:40       ` Michal Hocko
2019-08-26 13:15         ` Kirill A. Shutemov
2019-08-27  6:01           ` Michal Hocko
2019-08-27 11:02             ` Kirill A. Shutemov
2019-08-27 11:48               ` Michal Hocko
2019-08-27 12:01               ` Vlastimil Babka
2019-08-27 12:09                 ` Michal Hocko
2019-08-27 12:17                   ` Kirill A. Shutemov
2019-08-27 12:59                     ` Kirill A. Shutemov
2019-08-27 17:06                       ` Yang Shi
2019-08-28  7:57                         ` Michal Hocko
2019-08-28 14:03                           ` Kirill A. Shutemov
2019-08-28 14:12                             ` Michal Hocko
2019-08-28 14:46                               ` Kirill A. Shutemov
2019-08-28 16:02                                 ` Michal Hocko
2019-08-29 17:03                                   ` Yang Shi
2019-08-30  6:23                                     ` Michal Hocko
2019-08-30 12:53                                   ` Kirill A. Shutemov
2019-08-22 15:49     ` Kirill A. Shutemov
2019-08-22 15:57     ` Yang Shi
2019-08-22 15:33   ` Yang Shi
2019-08-26  7:43     ` Michal Hocko
2019-08-27  4:27       ` Yang Shi
2019-08-27  5:59         ` Michal Hocko
2019-08-27  8:32           ` Kirill A. Shutemov
2019-08-27  9:00             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1566410125-66011-1-git-send-email-yang.shi@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).