mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: alex.shi@linux.alibaba.com, guro@fb.com, hannes@cmpxchg.org,
	hughd@google.com, iamjoonsoo.kim@lge.com, kirill@shutemov.name,
	mhocko@suse.com, mm-commits@vger.kernel.org, shakeelb@google.com
Subject: + mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch added to -mm tree
Date: Fri, 08 May 2020 16:20:10 -0700	[thread overview]
Message-ID: <20200508232010.yvSeG_rZb%akpm@linux-foundation.org> (raw)
In-Reply-To: <20200507183509.c5ef146c5aaeb118a25a39a8@linux-foundation.org>


The patch titled
     Subject: mm: memcontrol: make swap tracking an integral part of memory control
has been added to the -mm tree.  Its filename is
     mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: make swap tracking an integral part of memory control

Without swap page tracking, users that are otherwise memory controlled can
easily escape their containment and allocate significant amounts of memory
that they're not being charged for.  That's because swap does readahead,
but without the cgroup records of who owned the page at swapout, readahead
pages don't get charged until somebody actually faults them into their
page table and we can identify an owner task.  This can be maliciously
exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.

Make swap swap page tracking an integral part of memcg and remove the
Kconfig options.  In the first place, it was only made configurable to
allow users to save some memory.  But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G of
swap, or 0.04%.  Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.

The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files, but
it won't disable swap slot ownership tracking.

This patch makes sure we always have the cgroup records at swapin time;
the next patch will fix the actual bug by charging readahead swap pages at
swapin time rather than at fault time.

v2: fix double swap charge bug in cgroup1/cgroup2 code gating

Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/Kconfig     |   17 ----------------
 mm/memcontrol.c  |   47 +++++++++++++++++----------------------------
 mm/swap_cgroup.c |    6 -----
 3 files changed, 19 insertions(+), 51 deletions(-)

--- a/init/Kconfig~mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control
+++ a/init/Kconfig
@@ -835,24 +835,9 @@ config MEMCG
 	  Provides control over the memory footprint of tasks in a cgroup.
 
 config MEMCG_SWAP
-	bool "Swap controller"
+	bool
 	depends on MEMCG && SWAP
-	help
-	  Provides control over the swap space consumed by tasks in a cgroup.
-
-config MEMCG_SWAP_ENABLED
-	bool "Swap controller enabled by default"
-	depends on MEMCG_SWAP
 	default y
-	help
-	  Memory Resource Controller Swap Extension comes with its price in
-	  a bigger memory consumption. General purpose distribution kernels
-	  which want to enable the feature but keep it disabled by default
-	  and let the user enable it by swapaccount=1 boot command line
-	  parameter should have this option unselected.
-	  For those who want to have the feature enabled by default should
-	  select this option (if, for some reason, they need to disable it
-	  then swapaccount=0 does the trick).
 
 config MEMCG_KMEM
 	bool
--- a/mm/memcontrol.c~mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control
+++ a/mm/memcontrol.c
@@ -83,14 +83,10 @@ static bool cgroup_memory_nokmem;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
-#ifdef CONFIG_MEMCG_SWAP_ENABLED
 bool cgroup_memory_noswap __read_mostly;
 #else
-bool cgroup_memory_noswap __read_mostly = 1;
-#endif /* CONFIG_MEMCG_SWAP_ENABLED */
-#else
 #define cgroup_memory_noswap		1
-#endif /* CONFIG_MEMCG_SWAP */
+#endif
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -5296,8 +5292,7 @@ static struct page *mc_handle_swap_pte(s
 	 * we call find_get_page() with swapper_space directly.
 	 */
 	page = find_get_page(swap_address_space(ent), swp_offset(ent));
-	if (do_memsw_account())
-		entry->val = ent.val;
+	entry->val = ent.val;
 
 	return page;
 }
@@ -5331,8 +5326,7 @@ static struct page *mc_handle_file_pte(s
 		page = find_get_entry(mapping, pgoff);
 		if (xa_is_value(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
-			if (do_memsw_account())
-				*entry = swp;
+			*entry = swp;
 			page = find_get_page(swap_address_space(swp),
 					     swp_offset(swp));
 		}
@@ -6459,6 +6453,9 @@ int mem_cgroup_charge(struct page *page,
 		goto out;
 
 	if (PageSwapCache(page)) {
+		swp_entry_t ent = { .val = page_private(page), };
+		unsigned short id;
+
 		/*
 		 * Every swap fault against a single page tries to charge the
 		 * page, bail as early as possible.  shmem_unuse() encounters
@@ -6470,17 +6467,12 @@ int mem_cgroup_charge(struct page *page,
 		if (compound_head(page)->mem_cgroup)
 			goto out;
 
-		if (!cgroup_memory_noswap) {
-			swp_entry_t ent = { .val = page_private(page), };
-			unsigned short id;
-
-			id = lookup_swap_cgroup_id(ent);
-			rcu_read_lock();
-			memcg = mem_cgroup_from_id(id);
-			if (memcg && !css_tryget_online(&memcg->css))
-				memcg = NULL;
-			rcu_read_unlock();
-		}
+		id = lookup_swap_cgroup_id(ent);
+		rcu_read_lock();
+		memcg = mem_cgroup_from_id(id);
+		if (memcg && !css_tryget_online(&memcg->css))
+			memcg = NULL;
+		rcu_read_unlock();
 	}
 
 	if (!memcg)
@@ -6497,7 +6489,7 @@ int mem_cgroup_charge(struct page *page,
 	memcg_check_events(memcg, page);
 	local_irq_enable();
 
-	if (do_memsw_account() && PageSwapCache(page)) {
+	if (PageSwapCache(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		/*
 		 * The swap entry might not get freed for a long time,
@@ -6882,7 +6874,7 @@ void mem_cgroup_swapout(struct page *pag
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 
-	if (!do_memsw_account())
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
 
 	memcg = page->mem_cgroup;
@@ -6911,7 +6903,7 @@ void mem_cgroup_swapout(struct page *pag
 	if (!mem_cgroup_is_root(memcg))
 		page_counter_uncharge(&memcg->memory, nr_entries);
 
-	if (memcg != swap_memcg) {
+	if (!cgroup_memory_noswap && memcg != swap_memcg) {
 		if (!mem_cgroup_is_root(swap_memcg))
 			page_counter_charge(&swap_memcg->memsw, nr_entries);
 		page_counter_uncharge(&memcg->memsw, nr_entries);
@@ -6947,7 +6939,7 @@ int mem_cgroup_try_charge_swap(struct pa
 	struct mem_cgroup *memcg;
 	unsigned short oldid;
 
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || cgroup_memory_noswap)
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return 0;
 
 	memcg = page->mem_cgroup;
@@ -6963,7 +6955,7 @@ int mem_cgroup_try_charge_swap(struct pa
 
 	memcg = mem_cgroup_id_get_online(memcg);
 
-	if (!mem_cgroup_is_root(memcg) &&
+	if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg) &&
 	    !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
 		memcg_memory_event(memcg, MEMCG_SWAP_MAX);
 		memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
@@ -6991,14 +6983,11 @@ void mem_cgroup_uncharge_swap(swp_entry_
 	struct mem_cgroup *memcg;
 	unsigned short id;
 
-	if (cgroup_memory_noswap)
-		return;
-
 	id = swap_cgroup_record(entry, 0, nr_pages);
 	rcu_read_lock();
 	memcg = mem_cgroup_from_id(id);
 	if (memcg) {
-		if (!mem_cgroup_is_root(memcg)) {
+		if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg)) {
 			if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 				page_counter_uncharge(&memcg->swap, nr_pages);
 			else
--- a/mm/swap_cgroup.c~mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control
+++ a/mm/swap_cgroup.c
@@ -171,9 +171,6 @@ int swap_cgroup_swapon(int type, unsigne
 	unsigned long length;
 	struct swap_cgroup_ctrl *ctrl;
 
-	if (cgroup_memory_noswap)
-		return 0;
-
 	length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
 	array_size = length * sizeof(void *);
 
@@ -209,9 +206,6 @@ void swap_cgroup_swapoff(int type)
 	unsigned long i, length;
 	struct swap_cgroup_ctrl *ctrl;
 
-	if (cgroup_memory_noswap)
-		return;

  parent reply	other threads:[~2020-05-08 23:20 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-08  1:35 incoming Andrew Morton
2020-05-08  1:35 ` [patch 01/15] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Andrew Morton
2020-05-08  1:35 ` [patch 02/15] mm, memcg: fix error return value of mem_cgroup_css_alloc() Andrew Morton
2020-05-08  1:35 ` [patch 03/15] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Andrew Morton
2020-05-08  1:35 ` [patch 04/15] kernel/kcov.c: fix typos in kcov_remote_start documentation Andrew Morton
2020-05-08  1:35 ` [patch 05/15] scripts/decodecode: fix trapping instruction formatting Andrew Morton
2020-05-08  1:35 ` [patch 06/15] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Andrew Morton
2020-05-08  1:35 ` [patch 07/15] eventpoll: fix missing wakeup for ovflist in ep_poll_callback Andrew Morton
2020-05-08  1:36 ` [patch 08/15] scripts/gdb: repair rb_first() and rb_last() Andrew Morton
2020-05-08  1:36 ` [patch 09/15] mm/slub: fix incorrect interpretation of s->offset Andrew Morton
2020-05-08  1:36 ` [patch 10/15] percpu: make pcpu_alloc() aware of current gfp context Andrew Morton
2020-05-08  1:36 ` [patch 11/15] kselftests: introduce new epoll60 testcase for catching lost wakeups Andrew Morton
2020-05-08  1:36 ` [patch 12/15] epoll: atomically remove wait entry on wake up Andrew Morton
2020-05-08  1:36 ` [patch 13/15] mm/vmscan: remove unnecessary argument description of isolate_lru_pages() Andrew Morton
2020-05-08  1:36 ` [patch 14/15] ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST Andrew Morton
2020-05-08  1:36 ` [patch 15/15] mm: limit boost_watermark on small zones Andrew Morton
2020-05-08  2:08 ` + arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch added to -mm tree Andrew Morton
2020-05-08 20:46 ` + ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch " Andrew Morton
2020-05-08 20:50 ` + mm-introduce-external-memory-hinting-api-fix-2.patch " Andrew Morton
2020-05-08 21:32 ` + checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch " Andrew Morton
2020-05-08 23:19 ` + mm-fix-numa-node-file-count-error-in-replace_page_cache.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-move-out-cgroup-swaprate-throttling.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-prepare-swap-controller-setup-for-integration.patch " Andrew Morton
2020-05-08 23:20 ` Andrew Morton [this message]
2020-05-08 23:20 ` + mm-memcontrol-charge-swapin-pages-on-instantiation.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-document-the-new-swap-control-behavior.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-delete-unused-lrucare-handling.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-update-page-mem_cgroup-stability-rules.patch " Andrew Morton
2020-05-08 23:38 ` + selftests-vm-pkeys-introduce-powerpc-support-fix.patch " Andrew Morton
2020-05-08 23:38 ` + selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch " Andrew Morton
2020-05-08 23:40 ` + memcg-expose-root-cgroups-memorystat.patch " Andrew Morton
2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch " Andrew Morton
2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch " Andrew Morton
2020-05-08 23:48 ` + zcomp-use-array_size-for-backends-list.patch " Andrew Morton
2020-05-08 23:53 ` + device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch " Andrew Morton
2020-05-08 23:56 ` + mm-memory_hotplug-introduce-add_memory_driver_managed.patch " Andrew Morton
2020-05-08 23:56 ` + kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch " Andrew Morton
2020-05-08 23:56 ` + device-dax-add-memory-via-add_memory_driver_managed.patch " Andrew Morton
2020-05-09  0:45 ` + mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
2020-05-11 19:08 ` + mm-reset-numa-stats-for-boot-pagesets-v3.patch " Andrew Morton
2020-05-11 19:54 ` + mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image.patch removed from " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] arm64-memory-give-hotplug-memory-a-different-resource-name.patch " Andrew Morton
2020-05-11 20:38 ` + arm-add-support-for-folded-p4d-page-tables-fix.patch added to " Andrew Morton
2020-05-11 20:40 ` + mm-add-debug_wx-support-fix-2.patch " Andrew Morton
2020-05-11 20:41 ` + mm-add-debug_wx-support-fix-3.patch " Andrew Morton
2020-05-11 20:50 ` + arm64-mm-drop-__have_arch_huge_ptep_get.patch " Andrew Morton
2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch " Andrew Morton
2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch " Andrew Morton
2020-05-11 21:00 ` + mm-introduce-external-memory-hinting-api-fix-2-fix.patch " Andrew Morton
2020-05-11 21:02 ` + mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch " Andrew Morton
2020-05-11 21:12 ` [alternative-merged] mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch removed from " Andrew Morton
2020-05-11 21:21 ` + seq_file-introduce-define_seq_attribute-helper-macro.patch added to " Andrew Morton
2020-05-11 21:21 ` + mm-vmstat-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
2020-05-11 21:21 ` + kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
2020-05-11 21:22 ` + seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch " Andrew Morton
2020-05-11 21:23 ` + lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch " Andrew Morton
2020-05-11 21:31 ` [alternative-merged] lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch removed from " Andrew Morton
2020-05-11 21:44 ` + xtensa-add-loglvl-to-show_trace-fix.patch added to " Andrew Morton
2020-05-11 22:44 ` mmotm 2020-05-11-15-43 uploaded Andrew Morton
2020-05-12 21:03 ` + kasan-consistently-disable-debugging-features.patch added to -mm tree Andrew Morton
2020-05-12 21:03 ` + kasan-add-missing-functions-declarations-to-kasanh.patch " Andrew Morton
2020-05-12 21:04 ` + kasan-move-kasan_report-into-reportc.patch " Andrew Morton
2020-05-13 18:31 ` + mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch " Andrew Morton
2020-05-13 20:56 ` + kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch " Andrew Morton
2020-05-13 21:26 ` + vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch " Andrew Morton
2020-05-13 22:00 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch " Andrew Morton
2020-05-13 22:48 ` + mm-swap-use-prandom_u32_max.patch " Andrew Morton
2020-05-13 22:54 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch " Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200508232010.yvSeG_rZb%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).