From: Johannes Weiner <hannes@cmpxchg.org>
To: Joonsoo Kim <js1304@gmail.com>, Alex Shi <alex.shi@linux.alibaba.com>
Cc: Shakeel Butt <shakeelb@google.com>,
Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Roman Gushchin <guro@fb.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 15/18] mm: memcontrol: make swap tracking an integral part of memory control
Date: Mon, 20 Apr 2020 18:11:23 -0400 [thread overview]
Message-ID: <20200420221126.341272-16-hannes@cmpxchg.org> (raw)
In-Reply-To: <20200420221126.341272-1-hannes@cmpxchg.org>
Without swap page tracking, users that are otherwise memory controlled
can easily escape their containment and allocate significant amounts
of memory that they're not being charged for. That's because swap does
readahead, but without the cgroup records of who owned the page at
swapout, readahead pages don't get charged until somebody actually
faults them into their page table and we can identify an owner task.
This can be maliciously exploited with MADV_WILLNEED, which triggers
arbitrary readahead allocations without charging the pages.
Make swap swap page tracking an integral part of memcg and remove the
Kconfig options. In the first place, it was only made configurable to
allow users to save some memory. But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G
of swap, or 0.04%. Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.
The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files,
but it won't disable swap slot ownership tracking.
This patch makes sure we always have the cgroup records at swapin
time; the next patch will fix the actual bug by charging readahead
swap pages at swapin time rather than at fault time.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
init/Kconfig | 17 +----------------
mm/memcontrol.c | 48 +++++++++++++++++-------------------------------
mm/swap_cgroup.c | 6 ------
3 files changed, 18 insertions(+), 53 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..39cdb13168cf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -835,24 +835,9 @@ config MEMCG
Provides control over the memory footprint of tasks in a cgroup.
config MEMCG_SWAP
- bool "Swap controller"
+ bool
depends on MEMCG && SWAP
- help
- Provides control over the swap space consumed by tasks in a cgroup.
-
-config MEMCG_SWAP_ENABLED
- bool "Swap controller enabled by default"
- depends on MEMCG_SWAP
default y
- help
- Memory Resource Controller Swap Extension comes with its price in
- a bigger memory consumption. General purpose distribution kernels
- which want to enable the feature but keep it disabled by default
- and let the user enable it by swapaccount=1 boot command line
- parameter should have this option unselected.
- For those who want to have the feature enabled by default should
- select this option (if, for some reason, they need to disable it
- then swapaccount=0 does the trick).
config MEMCG_KMEM
bool
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5558777023e7..1d7408a8744a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -83,14 +83,10 @@ static bool cgroup_memory_nokmem;
/* Whether the swap controller is active */
#ifdef CONFIG_MEMCG_SWAP
-#ifdef CONFIG_MEMCG_SWAP_ENABLED
bool cgroup_memory_noswap __read_mostly;
#else
-bool cgroup_memory_noswap __read_mostly = 1;
-#endif /* CONFIG_MEMCG_SWAP_ENABLED */
-#else
#define cgroup_memory_noswap 1
-#endif /* CONFIG_MEMCG_SWAP */
+#endif
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -5290,8 +5286,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* we call find_get_page() with swapper_space directly.
*/
page = find_get_page(swap_address_space(ent), swp_offset(ent));
- if (do_memsw_account())
- entry->val = ent.val;
+ entry->val = ent.val;
return page;
}
@@ -5325,8 +5320,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
page = find_get_entry(mapping, pgoff);
if (xa_is_value(page)) {
swp_entry_t swp = radix_to_swp_entry(page);
- if (do_memsw_account())
- *entry = swp;
+ *entry = swp;
page = find_get_page(swap_address_space(swp),
swp_offset(swp));
}
@@ -6459,6 +6453,9 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
goto out;
if (PageSwapCache(page)) {
+ swp_entry_t ent = { .val = page_private(page), };
+ unsigned short id;
+
/*
* Every swap fault against a single page tries to charge the
* page, bail as early as possible. shmem_unuse() encounters
@@ -6470,17 +6467,12 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
if (compound_head(page)->mem_cgroup)
goto out;
- if (!cgroup_memory_noswap) {
- swp_entry_t ent = { .val = page_private(page), };
- unsigned short id;
-
- id = lookup_swap_cgroup_id(ent);
- rcu_read_lock();
- memcg = mem_cgroup_from_id(id);
- if (memcg && !css_tryget_online(&memcg->css))
- memcg = NULL;
- rcu_read_unlock();
- }
+ id = lookup_swap_cgroup_id(ent);
+ rcu_read_lock();
+ memcg = mem_cgroup_from_id(id);
+ if (memcg && !css_tryget_online(&memcg->css))
+ memcg = NULL;
+ rcu_read_unlock();
}
if (!memcg)
@@ -6497,7 +6489,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
memcg_check_events(memcg, page);
local_irq_enable();
- if (do_memsw_account() && PageSwapCache(page)) {
+ if (PageSwapCache(page)) {
swp_entry_t entry = { .val = page_private(page) };
/*
* The swap entry might not get freed for a long time,
@@ -6884,9 +6876,6 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);
- if (!do_memsw_account())
- return;
-
memcg = page->mem_cgroup;
/* Readahead page, never charged */
@@ -6913,7 +6902,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memory, nr_entries);
- if (memcg != swap_memcg) {
+ if (do_memsw_account() && memcg != swap_memcg) {
if (!mem_cgroup_is_root(swap_memcg))
page_counter_charge(&swap_memcg->memsw, nr_entries);
page_counter_uncharge(&memcg->memsw, nr_entries);
@@ -6949,7 +6938,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || cgroup_memory_noswap)
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;
memcg = page->mem_cgroup;
@@ -6965,7 +6954,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
memcg = mem_cgroup_id_get_online(memcg);
- if (!mem_cgroup_is_root(memcg) &&
+ if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg) &&
!page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
memcg_memory_event(memcg, MEMCG_SWAP_MAX);
memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
@@ -6993,14 +6982,11 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages)
struct mem_cgroup *memcg;
unsigned short id;
- if (cgroup_memory_noswap)
- return;
-
id = swap_cgroup_record(entry, 0, nr_pages);
rcu_read_lock();
memcg = mem_cgroup_from_id(id);
if (memcg) {
- if (!mem_cgroup_is_root(memcg)) {
+ if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg)) {
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
page_counter_uncharge(&memcg->swap, nr_pages);
else
diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c
index 7aa764f09079..7f34343c075a 100644
--- a/mm/swap_cgroup.c
+++ b/mm/swap_cgroup.c
@@ -171,9 +171,6 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
unsigned long length;
struct swap_cgroup_ctrl *ctrl;
- if (cgroup_memory_noswap)
- return 0;
-
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
array_size = length * sizeof(void *);
@@ -209,9 +206,6 @@ void swap_cgroup_swapoff(int type)
unsigned long i, length;
struct swap_cgroup_ctrl *ctrl;
- if (cgroup_memory_noswap)
- return;
-
mutex_lock(&swap_cgroup_mutex);
ctrl = &swap_cgroup_ctrl[type];
map = ctrl->map;
--
2.26.0
next prev parent reply other threads:[~2020-04-20 22:12 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-20 22:11 [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
2020-04-20 22:11 ` [PATCH 01/18] mm: fix NUMA node file count error in replace_page_cache() Johannes Weiner
2020-04-21 8:28 ` Alex Shi
2020-04-21 19:13 ` Shakeel Butt
2020-04-22 6:34 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving Johannes Weiner
2020-04-22 6:36 ` Joonsoo Kim
2020-04-22 16:51 ` Shakeel Butt
2020-04-22 17:42 ` Johannes Weiner
2020-04-22 18:01 ` Shakeel Butt
2020-04-22 18:02 ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 03/18] mm: memcontrol: drop @compound parameter from memcg charging API Johannes Weiner
2020-04-21 9:11 ` Alex Shi
2020-04-22 6:37 ` Joonsoo Kim
2020-04-22 17:30 ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 04/18] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
2020-04-21 9:11 ` Alex Shi
2020-04-22 6:37 ` Joonsoo Kim
2020-04-22 22:20 ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 05/18] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API Johannes Weiner
2020-04-21 9:12 ` Alex Shi
2020-04-22 6:40 ` Joonsoo Kim
2020-04-22 12:09 ` Johannes Weiner
2020-04-23 5:25 ` Joonsoo Kim
2020-05-08 16:01 ` Johannes Weiner
2020-05-11 1:57 ` Joonsoo Kim
2020-05-11 7:38 ` Hugh Dickins
2020-05-11 15:06 ` Johannes Weiner
2020-05-11 16:32 ` Hugh Dickins
2020-05-11 18:10 ` Johannes Weiner
2020-05-11 18:12 ` Johannes Weiner
2020-05-11 18:44 ` Hugh Dickins
2020-04-20 22:11 ` [PATCH 06/18] mm: memcontrol: prepare uncharging for removal of private page type counters Johannes Weiner
2020-04-21 9:12 ` Alex Shi
2020-04-22 6:41 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 07/18] mm: memcontrol: prepare move_account " Johannes Weiner
2020-04-21 9:13 ` Alex Shi
2020-04-22 6:41 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 08/18] mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters Johannes Weiner
2020-04-22 6:42 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 09/18] mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters Johannes Weiner
2020-04-22 6:42 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 10/18] mm: memcontrol: switch to native NR_ANON_MAPPED counter Johannes Weiner
2020-04-22 6:51 ` Joonsoo Kim
2020-04-22 12:28 ` Johannes Weiner
2020-04-23 5:27 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 11/18] mm: memcontrol: switch to native NR_ANON_THPS counter Johannes Weiner
2020-04-24 0:29 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 12/18] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API Johannes Weiner
2020-04-24 0:29 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 13/18] mm: memcontrol: drop unused try/commit/cancel charge API Johannes Weiner
2020-04-24 0:30 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 14/18] mm: memcontrol: prepare swap controller setup for integration Johannes Weiner
2020-04-24 0:30 ` Joonsoo Kim
2020-04-20 22:11 ` Johannes Weiner [this message]
2020-04-21 9:27 ` [PATCH 15/18] mm: memcontrol: make swap tracking an integral part of memory control Alex Shi
2020-04-21 14:39 ` Johannes Weiner
2020-04-22 3:14 ` Alex Shi
2020-04-22 13:30 ` Johannes Weiner
2020-04-22 13:40 ` Alex Shi
2020-04-22 13:43 ` Alex Shi
2020-04-24 0:30 ` Joonsoo Kim
2020-04-24 3:01 ` Johannes Weiner
2020-04-20 22:11 ` [PATCH 16/18] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
2020-04-21 9:21 ` Alex Shi
2020-04-24 0:44 ` Joonsoo Kim
2020-04-24 2:51 ` Johannes Weiner
2020-04-28 6:49 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 17/18] mm: memcontrol: delete unused lrucare handling Johannes Weiner
2020-04-24 0:46 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 18/18] mm: memcontrol: update page->mem_cgroup stability rules Johannes Weiner
2020-04-21 9:20 ` Alex Shi
2020-04-24 0:48 ` Joonsoo Kim
2020-04-21 9:10 ` Hillf Danton
2020-04-21 14:34 ` Johannes Weiner
2020-04-21 9:32 ` [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Alex Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200420221126.341272-16-hannes@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=alex.shi@linux.alibaba.com \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hughd@google.com \
--cc=js1304@gmail.com \
--cc=kernel-team@fb.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).