mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* incoming
@ 2020-05-08  1:35 Andrew Morton
  2020-05-08  1:35 ` [patch 01/15] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Andrew Morton
                   ` (80 more replies)
  0 siblings, 81 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mm-commits, linux-mm


14 fixes and one selftest to verify the ipc fixes herein.


15 patches, based on a811c1fa0a02c062555b54651065899437bacdbe:


    Oleg Nesterov <oleg@redhat.com>:
      ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()

    Yafang Shao <laoar.shao@gmail.com>:
      mm, memcg: fix error return value of mem_cgroup_css_alloc()

    David Hildenbrand <david@redhat.com>:
      mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()

    Maciej Grochowski <maciej.grochowski@pm.me>:
      kernel/kcov.c: fix typos in kcov_remote_start documentation

    Ivan Delalande <colona@arista.com>:
      scripts/decodecode: fix trapping instruction formatting

    Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>:
      arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()

    Khazhismel Kumykov <khazhy@google.com>:
      eventpoll: fix missing wakeup for ovflist in ep_poll_callback

    Aymeric Agon-Rambosson <aymeric.agon@yandex.com>:
      scripts/gdb: repair rb_first() and rb_last()

    Waiman Long <longman@redhat.com>:
      mm/slub: fix incorrect interpretation of s->offset

    Filipe Manana <fdmanana@suse.com>:
      percpu: make pcpu_alloc() aware of current gfp context

    Roman Penyaev <rpenyaev@suse.de>:
      kselftests: introduce new epoll60 testcase for catching lost wakeups
      epoll: atomically remove wait entry on wake up

    Qiwu Chen <qiwuchen55@gmail.com>:
      mm/vmscan: remove unnecessary argument description of isolate_lru_pages()

    Kees Cook <keescook@chromium.org>:
      ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST

    Henry Willard <henry.willard@oracle.com>:
      mm: limit boost_watermark on small zones

 arch/x86/kvm/svm/sev.c                                        |    2 
 fs/eventpoll.c                                                |   61 ++--
 ipc/mqueue.c                                                  |   34 +-
 kernel/kcov.c                                                 |    4 
 lib/Kconfig.ubsan                                             |   15 -
 mm/memcontrol.c                                               |   15 -
 mm/page_alloc.c                                               |    9 
 mm/percpu.c                                                   |   14 
 mm/slub.c                                                     |   45 ++-
 mm/vmscan.c                                                   |    1 
 scripts/decodecode                                            |    2 
 scripts/gdb/linux/rbtree.py                                   |    4 
 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c |  146 ++++++++++
 tools/testing/selftests/wireguard/qemu/debug.config           |    1 
 14 files changed, 275 insertions(+), 78 deletions(-)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 01/15] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
  2020-05-08  1:35 incoming Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:35 ` [patch 02/15] mm, memcg: fix error return value of mem_cgroup_css_alloc() Andrew Morton
                   ` (79 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: 1vier1, akpm, dave, ebiederm, elfring, linux-mm, manfred,
	mm-commits, oleg, stable, torvalds, yoji.fujihar.min

From: Oleg Nesterov <oleg@redhat.com>
Subject: ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()

Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no
longer works if the sender doesn't have rights to send a signal.

Change __do_notify() to use do_send_sig_info() instead of kill_pid_info()
to avoid check_kill_permission().

This needs the additional notify.sigev_signo != 0 check, shouldn't we
change do_mq_notify() to deny sigev_signo == 0 ?

Test-case:

	#include <signal.h>
	#include <mqueue.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <assert.h>

	static int notified;

	static void sigh(int sig)
	{
		notified = 1;
	}

	int main(void)
	{
		signal(SIGIO, sigh);

		int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL);
		assert(fd >= 0);

		struct sigevent se = {
			.sigev_notify	= SIGEV_SIGNAL,
			.sigev_signo	= SIGIO,
		};
		assert(mq_notify(fd, &se) == 0);

		if (!fork()) {
			assert(setuid(1) == 0);
			mq_send(fd, "",1,0);
			return 0;
		}

		wait(NULL);
		mq_unlink("/mq");
		assert(notified);
		return 0;
	}

[manfred@colorfullife.com: 1) Add self_exec_id evaluation so that the implementation matches do_notify_parent 2) use PIDTYPE_TGID everywhere]
Link: http://lkml.kernel.org/r/e2a782e4-eab9-4f5c-c749-c07a8f7a4e66@colorfullife.com
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Reported-by: Yoji <yoji.fujihar.min@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Markus Elfring <elfring@users.sourceforge.net>
Cc: <1vier1@web.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 ipc/mqueue.c |   34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

--- a/ipc/mqueue.c~ipc-mqueuec-change-__do_notify-to-bypass-check_kill_permission-v2
+++ a/ipc/mqueue.c
@@ -142,6 +142,7 @@ struct mqueue_inode_info {
 
 	struct sigevent notify;
 	struct pid *notify_owner;
+	u32 notify_self_exec_id;
 	struct user_namespace *notify_user_ns;
 	struct user_struct *user;	/* user who created, for accounting */
 	struct sock *notify_sock;
@@ -773,28 +774,44 @@ static void __do_notify(struct mqueue_in
 	 * synchronously. */
 	if (info->notify_owner &&
 	    info->attr.mq_curmsgs == 1) {
-		struct kernel_siginfo sig_i;
 		switch (info->notify.sigev_notify) {
 		case SIGEV_NONE:
 			break;
-		case SIGEV_SIGNAL:
-			/* sends signal */
+		case SIGEV_SIGNAL: {
+			struct kernel_siginfo sig_i;
+			struct task_struct *task;
+
+			/* do_mq_notify() accepts sigev_signo == 0, why?? */
+			if (!info->notify.sigev_signo)
+				break;
 
 			clear_siginfo(&sig_i);
 			sig_i.si_signo = info->notify.sigev_signo;
 			sig_i.si_errno = 0;
 			sig_i.si_code = SI_MESGQ;
 			sig_i.si_value = info->notify.sigev_value;
-			/* map current pid/uid into info->owner's namespaces */
 			rcu_read_lock();
+			/* map current pid/uid into info->owner's namespaces */
 			sig_i.si_pid = task_tgid_nr_ns(current,
 						ns_of_pid(info->notify_owner));
-			sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
+			sig_i.si_uid = from_kuid_munged(info->notify_user_ns,
+						current_uid());
+			/*
+			 * We can't use kill_pid_info(), this signal should
+			 * bypass check_kill_permission(). It is from kernel
+			 * but si_fromuser() can't know this.
+			 * We do check the self_exec_id, to avoid sending
+			 * signals to programs that don't expect them.
+			 */
+			task = pid_task(info->notify_owner, PIDTYPE_TGID);
+			if (task && task->self_exec_id ==
+						info->notify_self_exec_id) {
+				do_send_sig_info(info->notify.sigev_signo,
+						&sig_i, task, PIDTYPE_TGID);
+			}
 			rcu_read_unlock();

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 02/15] mm, memcg: fix error return value of mem_cgroup_css_alloc()
  2020-05-08  1:35 incoming Andrew Morton
  2020-05-08  1:35 ` [patch 01/15] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:35 ` [patch 03/15] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Andrew Morton
                   ` (78 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: akpm, hannes, laoar.shao, linux-mm, mhocko, mm-commits, torvalds,
	vdavydov.dev, willy

From: Yafang Shao <laoar.shao@gmail.com>
Subject: mm, memcg: fix error return value of mem_cgroup_css_alloc()

When I run my memcg testcase which creates lots of memcgs, I found
there're unexpected out of memory logs while there're still enough
available free memory.  The error log is, mkdir: cannot create directory
'foo.65533': Cannot allocate memory

The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an
-ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno
should be -ENOSPC "No space left on device", which is an appropriate errno
for userspace's failed mkdir.

As the errno really misled me, we should make it right.  After this patch,
the error log will be "mkdir: cannot create directory 'foo.65533': No
space left on device"

[akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal]
[akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal]
Link: http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1586192163-20099-1-git-send-email-laoar.shao@gmail.com
Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- a/mm/memcontrol.c~mm-memcg-fix-error-return-value-of-mem_cgroup_css_alloc
+++ a/mm/memcontrol.c
@@ -4990,19 +4990,22 @@ static struct mem_cgroup *mem_cgroup_all
 	unsigned int size;
 	int node;
 	int __maybe_unused i;
+	long error = -ENOMEM;
 
 	size = sizeof(struct mem_cgroup);
 	size += nr_node_ids * sizeof(struct mem_cgroup_per_node *);
 
 	memcg = kzalloc(size, GFP_KERNEL);
 	if (!memcg)
-		return NULL;
+		return ERR_PTR(error);
 
 	memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
 				 1, MEM_CGROUP_ID_MAX,
 				 GFP_KERNEL);
-	if (memcg->id.id < 0)
+	if (memcg->id.id < 0) {
+		error = memcg->id.id;
 		goto fail;
+	}
 
 	memcg->vmstats_local = alloc_percpu(struct memcg_vmstats_percpu);
 	if (!memcg->vmstats_local)
@@ -5046,7 +5049,7 @@ static struct mem_cgroup *mem_cgroup_all
 fail:
 	mem_cgroup_id_remove(memcg);
 	__mem_cgroup_free(memcg);
-	return NULL;
+	return ERR_PTR(error);
 }
 
 static struct cgroup_subsys_state * __ref
@@ -5057,8 +5060,8 @@ mem_cgroup_css_alloc(struct cgroup_subsy
 	long error = -ENOMEM;
 
 	memcg = mem_cgroup_alloc();
-	if (!memcg)
-		return ERR_PTR(error);
+	if (IS_ERR(memcg))
+		return ERR_CAST(memcg);
 
 	WRITE_ONCE(memcg->high, PAGE_COUNTER_MAX);
 	memcg->soft_limit = PAGE_COUNTER_MAX;
@@ -5108,7 +5111,7 @@ mem_cgroup_css_alloc(struct cgroup_subsy
 fail:
 	mem_cgroup_id_remove(memcg);
 	mem_cgroup_free(memcg);
-	return ERR_PTR(-ENOMEM);
+	return ERR_PTR(error);
 }
 
 static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 03/15] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  2020-05-08  1:35 incoming Andrew Morton
  2020-05-08  1:35 ` [patch 01/15] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Andrew Morton
  2020-05-08  1:35 ` [patch 02/15] mm, memcg: fix error return value of mem_cgroup_css_alloc() Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:35 ` [patch 04/15] kernel/kcov.c: fix typos in kcov_remote_start documentation Andrew Morton
                   ` (77 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: akpm, alexander.duyck, bhe, daniel.m.jordan, david, ktkhai,
	linux-mm, mhocko, mhocko, mm-commits, osalvador,
	pankaj.gupta.linux, pasha.tatashin, shile.zhang, stable,
	torvalds

From: David Hildenbrand <david@redhat.com>
Subject: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()

Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.

[  105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[  105.608933] Modules linked in:
[  105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
[  105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
[  105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
[  105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
[  105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
[  105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
[  105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
[  105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
[  105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
[  105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  105.608933] FS:  0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
[  105.608933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
[  105.608933] Call Trace:
[  105.608933]  set_zone_contiguous+0x56/0x70
[  105.608933]  page_alloc_init_late+0x166/0x176
[  105.608933]  kernel_init_freeable+0xfa/0x255
[  105.608933]  ? rest_init+0xaa/0xaa
[  105.608933]  kernel_init+0xa/0x106
[  105.608933]  ret_from_fork+0x35/0x40

The issue becomes visible when having a lot of memory (e.g., 4TB) assigned
to a single NUMA node - a system that can easily be created using QEMU. 
Inside VMs on a hypervisor with quite some memory overcommit, this is
fairly easy to trigger.

Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    1 +
 1 file changed, 1 insertion(+)

--- a/mm/page_alloc.c~mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous
+++ a/mm/page_alloc.c
@@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zo
 		if (!__pageblock_pfn_to_page(block_start_pfn,
 					     block_end_pfn, zone))
 			return;
+		cond_resched();
 	}
 
 	/* We confirm that there is no hole */
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 04/15] kernel/kcov.c: fix typos in kcov_remote_start documentation
  2020-05-08  1:35 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2020-05-08  1:35 ` [patch 03/15] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:35 ` [patch 05/15] scripts/decodecode: fix trapping instruction formatting Andrew Morton
                   ` (76 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: akpm, andreyknvl, linux-mm, maciej.grochowski, mm-commits, torvalds

From: Maciej Grochowski <maciej.grochowski@pm.me>
Subject: kernel/kcov.c: fix typos in kcov_remote_start documentation

Link: http://lkml.kernel.org/r/20200420030259.31674-1-maciek.grochowski@gmail.com
Signed-off-by: Maciej Grochowski <maciej.grochowski@pm.me>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/kcov.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/kcov.c~kernel-kcovc-fix-typos-in-kcov_remote_start-documentation
+++ a/kernel/kcov.c
@@ -740,8 +740,8 @@ static const struct file_operations kcov
  * kcov_remote_handle() with KCOV_SUBSYSTEM_COMMON as the subsystem id and an
  * arbitrary 4-byte non-zero number as the instance id). This common handle
  * then gets saved into the task_struct of the process that issued the
- * KCOV_REMOTE_ENABLE ioctl. When this proccess issues system calls that spawn
- * kernel threads, the common handle must be retrived via kcov_common_handle()
+ * KCOV_REMOTE_ENABLE ioctl. When this process issues system calls that spawn
+ * kernel threads, the common handle must be retrieved via kcov_common_handle()
  * and passed to the spawned threads via custom annotations. Those kernel
  * threads must in turn be annotated with kcov_remote_start(common_handle) and
  * kcov_remote_stop(). All of the threads that are spawned by the same process
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 05/15] scripts/decodecode: fix trapping instruction formatting
  2020-05-08  1:35 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2020-05-08  1:35 ` [patch 04/15] kernel/kcov.c: fix typos in kcov_remote_start documentation Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:35 ` [patch 06/15] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Andrew Morton
                   ` (75 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: akpm, bp, colona, linux-mm, mm-commits, torvalds

From: Ivan Delalande <colona@arista.com>
Subject: scripts/decodecode: fix trapping instruction formatting

If the trapping instruction contains a ':', for a memory access through
segment registers for example, the sed substitution will insert the '*'
marker in the middle of the instruction instead of the line address:

	2b:   65 48 0f c7 0f          cmpxchg16b %gs:*(%rdi)          <-- trapping instruction

I started to think I had forgotten some quirk of the assembly syntax
before noticing that it was actually coming from the script.  Fix it to
add the address marker at the right place for these instructions:

	28:   49 8b 06                mov    (%r14),%rax
	2b:*  65 48 0f c7 0f          cmpxchg16b %gs:(%rdi)           <-- trapping instruction
	30:   0f 94 c0                sete   %al

Link: http://lkml.kernel.org/r/20200419223653.GA31248@visor
Fixes: 18ff44b189e2 ("scripts/decodecode: make faulting insn ptr more robust")
Signed-off-by: Ivan Delalande <colona@arista.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/decodecode |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/scripts/decodecode~scripts-decodecode-fix-trapping-instruction-formatting
+++ a/scripts/decodecode
@@ -126,7 +126,7 @@ faultlinenum=$(( $(wc -l $T.oo  | cut -d
 faultline=`cat $T.dis | head -1 | cut -d":" -f2-`
 faultline=`echo "$faultline" | sed -e 's/\[/\\\[/g; s/\]/\\\]/g'`
 
-cat $T.oo | sed -e "${faultlinenum}s/^\(.*:\)\(.*\)/\1\*\2\t\t<-- trapping instruction/"
+cat $T.oo | sed -e "${faultlinenum}s/^\([^:]*:\)\(.*\)/\1\*\2\t\t<-- trapping instruction/"
 echo
 cat $T.aa
 cleanup
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 06/15] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
  2020-05-08  1:35 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2020-05-08  1:35 ` [patch 05/15] scripts/decodecode: fix trapping instruction formatting Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:35 ` [patch 07/15] eventpoll: fix missing wakeup for ovflist in ep_poll_callback Andrew Morton
                   ` (74 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: akpm, bp, brijesh.singh, hpa, hubcap, ira.weiny,
	Janakarajan.Natarajan, jmattson, joro, linux-mm, mingo,
	mm-commits, pbonzini, sean.j.christopherson, tglx, torvalds,
	vkuznets, wanpengli

From: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Subject: arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()

When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().

Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use flags,
but incorrectly updated the call in sev_pin_memory().  As the original
coding of this call was correct, revert the change made by that commit.

Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com
Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/kvm/svm/sev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kvm/svm/sev.c~kvm-svm-change-flag-passed-to-gup-fast-in-sev_pin_memory
+++ a/arch/x86/kvm/svm/sev.c
@@ -345,7 +345,7 @@ static struct page **sev_pin_memory(stru
 		return NULL;
 
 	/* Pin the user virtual address. */
-	npinned = get_user_pages_fast(uaddr, npages, FOLL_WRITE, pages);
+	npinned = get_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0, pages);
 	if (npinned != npages) {
 		pr_err("SEV: Failure locking %lu pages.\n", npages);
 		goto err;
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 07/15] eventpoll: fix missing wakeup for ovflist in ep_poll_callback
  2020-05-08  1:35 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2020-05-08  1:35 ` [patch 06/15] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Andrew Morton
@ 2020-05-08  1:35 ` Andrew Morton
  2020-05-08  1:36 ` [patch 08/15] scripts/gdb: repair rb_first() and rb_last() Andrew Morton
                   ` (73 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:35 UTC (permalink / raw)
  To: akpm, jbaron, khazhy, linux-mm, mm-commits, r, rpenyaev, stable,
	torvalds, viro

From: Khazhismel Kumykov <khazhy@google.com>
Subject: eventpoll: fix missing wakeup for ovflist in ep_poll_callback

In the event that we add to ovflist, before 339ddb53d373 we would be woken
up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.  With
that wakeup removed, if we add to ovflist here, we may never wake up. 
Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.

We noticed that one of our workloads was missing wakeups starting with
339ddb53d373 and upon manual inspection, this wakeup seemed missing to me.
With this patch added, we no longer see missing wakeups.  I haven't yet
tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.

[khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman]
  Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com
Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com
Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/eventpoll.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- a/fs/eventpoll.c~eventpoll-fix-missing-wakeup-for-ovflist-in-ep_poll_callback
+++ a/fs/eventpoll.c
@@ -1171,6 +1171,10 @@ static inline bool chain_epi_lockless(st
 {
 	struct eventpoll *ep = epi->ep;
 
+	/* Fast preliminary check */
+	if (epi->next != EP_UNACTIVE_PTR)
+		return false;
+
 	/* Check that the same epi has not been just chained from another CPU */
 	if (cmpxchg(&epi->next, EP_UNACTIVE_PTR, NULL) != EP_UNACTIVE_PTR)
 		return false;
@@ -1237,16 +1241,12 @@ static int ep_poll_callback(wait_queue_e
 	 * chained in ep->ovflist and requeued later on.
 	 */
 	if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) {
-		if (epi->next == EP_UNACTIVE_PTR &&
-		    chain_epi_lockless(epi))
+		if (chain_epi_lockless(epi))
+			ep_pm_stay_awake_rcu(epi);
+	} else if (!ep_is_linked(epi)) {
+		/* In the usual case, add event to ready list. */
+		if (list_add_tail_lockless(&epi->rdllink, &ep->rdllist))
 			ep_pm_stay_awake_rcu(epi);
-		goto out_unlock;
-	}

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 08/15] scripts/gdb: repair rb_first() and rb_last()
  2020-05-08  1:35 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2020-05-08  1:35 ` [patch 07/15] eventpoll: fix missing wakeup for ovflist in ep_poll_callback Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 09/15] mm/slub: fix incorrect interpretation of s->offset Andrew Morton
                   ` (72 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, aymeric.agon, dianders, jan.kiszka, jason.wessel, kbingham,
	linux-mm, liuyun01, mm-commits, n.borisov.lkml, swboyd, torvalds

From: Aymeric Agon-Rambosson <aymeric.agon@yandex.com>
Subject: scripts/gdb: repair rb_first() and rb_last()

The current implementations of the rb_first() and rb_last() gdb functions
have a variable that references itself in its instanciation, which causes
the function to throw an error if a specific condition on the argument is
met.  The original author rather intended to reference the argument and
made a typo.  Referring the argument instead makes the function work as
intended.

Link: http://lkml.kernel.org/r/20200427051029.354840-1-aymeric.agon@yandex.com
Signed-off-by: Aymeric Agon-Rambosson <aymeric.agon@yandex.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Jackie Liu <liuyun01@kylinos.cn>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/gdb/linux/rbtree.py |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/scripts/gdb/linux/rbtree.py~scripts-gdb-repair-rb_first-and-rb_last
+++ a/scripts/gdb/linux/rbtree.py
@@ -12,7 +12,7 @@ rb_node_type = utils.CachedType("struct
 
 def rb_first(root):
     if root.type == rb_root_type.get_type():
-        node = node.address.cast(rb_root_type.get_type().pointer())
+        node = root.address.cast(rb_root_type.get_type().pointer())
     elif root.type != rb_root_type.get_type().pointer():
         raise gdb.GdbError("Must be struct rb_root not {}".format(root.type))
 
@@ -28,7 +28,7 @@ def rb_first(root):
 
 def rb_last(root):
     if root.type == rb_root_type.get_type():
-        node = node.address.cast(rb_root_type.get_type().pointer())
+        node = root.address.cast(rb_root_type.get_type().pointer())
     elif root.type != rb_root_type.get_type().pointer():
         raise gdb.GdbError("Must be struct rb_root not {}".format(root.type))
 
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 09/15] mm/slub: fix incorrect interpretation of s->offset
  2020-05-08  1:35 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2020-05-08  1:36 ` [patch 08/15] scripts/gdb: repair rb_first() and rb_last() Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 10/15] percpu: make pcpu_alloc() aware of current gfp context Andrew Morton
                   ` (71 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, aquini, changbin.du, cl, iamjoonsoo.kim, keescook,
	linux-mm, longman, Markus.Elfring, mm-commits, penberg, rientjes,
	silvio.cesare, torvalds, vnik, willy

From: Waiman Long <longman@redhat.com>
Subject: mm/slub: fix incorrect interpretation of s->offset

In a couple of places in the slub memory allocator, the code uses
"s->offset" as a check to see if the free pointer is put right after the
object.  That check is no longer true with commit 3202fa62fb43 ("slub:
relocate freelist pointer to middle of object").

As a result, echoing "1" into the validate sysfs file, e.g.  of dentry,
may cause a bunch of "Freepointer corrupt" error reports like the
following to appear with the system in panic afterwards.

[   38.579769] =============================================================================
[   38.580845] BUG dentry(666:pmcd.service) (Tainted: G    B): Freepointer corrupt
[   38.581948] -----------------------------------------------------------------------------

To fix it, use the check "s->offset == s->inuse" in the new helper
function freeptr_outside_object() instead.  Also add another helper
function get_info_end() to return the end of info block (inuse + free
pointer if not overlapping with object).

Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.com
Fixes: 3202fa62fb43 ("slub: relocate freelist pointer to middle of object")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vitaly Nikolenko <vnik@duasynt.com>
Cc: Silvio Cesare <silvio.cesare@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |   45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

--- a/mm/slub.c~mm-slub-fix-incorrect-interpretation-of-s-offset
+++ a/mm/slub.c
@@ -551,15 +551,32 @@ static void print_section(char *level, c
 	metadata_access_disable();
 }
 
+/*
+ * See comment in calculate_sizes().
+ */
+static inline bool freeptr_outside_object(struct kmem_cache *s)
+{
+	return s->offset >= s->inuse;
+}
+
+/*
+ * Return offset of the end of info block which is inuse + free pointer if
+ * not overlapping with object.
+ */
+static inline unsigned int get_info_end(struct kmem_cache *s)
+{
+	if (freeptr_outside_object(s))
+		return s->inuse + sizeof(void *);
+	else
+		return s->inuse;
+}
+
 static struct track *get_track(struct kmem_cache *s, void *object,
 	enum track_item alloc)
 {
 	struct track *p;
 
-	if (s->offset)
-		p = object + s->offset + sizeof(void *);
-	else
-		p = object + s->inuse;
+	p = object + get_info_end(s);
 
 	return p + alloc;
 }
@@ -686,10 +703,7 @@ static void print_trailer(struct kmem_ca
 		print_section(KERN_ERR, "Redzone ", p + s->object_size,
 			s->inuse - s->object_size);
 
-	if (s->offset)
-		off = s->offset + sizeof(void *);
-	else
-		off = s->inuse;
+	off = get_info_end(s);
 
 	if (s->flags & SLAB_STORE_USER)
 		off += 2 * sizeof(struct track);
@@ -782,7 +796,7 @@ static int check_bytes_and_report(struct
  * object address
  * 	Bytes of the object to be managed.
  * 	If the freepointer may overlay the object then the free
- * 	pointer is the first word of the object.
+ *	pointer is at the middle of the object.
  *
  * 	Poisoning uses 0x6b (POISON_FREE) and the last byte is
  * 	0xa5 (POISON_END)
@@ -816,11 +830,7 @@ static int check_bytes_and_report(struct
 
 static int check_pad_bytes(struct kmem_cache *s, struct page *page, u8 *p)
 {
-	unsigned long off = s->inuse;	/* The end of info */
-
-	if (s->offset)
-		/* Freepointer is placed after the object. */
-		off += sizeof(void *);
+	unsigned long off = get_info_end(s);	/* The end of info */
 
 	if (s->flags & SLAB_STORE_USER)
 		/* We also have user information there */
@@ -907,7 +917,7 @@ static int check_object(struct kmem_cach
 		check_pad_bytes(s, page, p);
 	}
 
-	if (!s->offset && val == SLUB_RED_ACTIVE)
+	if (!freeptr_outside_object(s) && val == SLUB_RED_ACTIVE)
 		/*
 		 * Object and freepointer overlap. Cannot check
 		 * freepointer while object is allocated.
@@ -3587,6 +3597,11 @@ static int calculate_sizes(struct kmem_c
 		 *
 		 * This is the case if we do RCU, have a constructor or
 		 * destructor or are poisoning the objects.
+		 *
+		 * The assumption that s->offset >= s->inuse means free
+		 * pointer is outside of the object is used in the
+		 * freeptr_outside_object() function. If that is no
+		 * longer true, the function needs to be modified.
 		 */
 		s->offset = size;
 		size += sizeof(void *);
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 10/15] percpu: make pcpu_alloc() aware of current gfp context
  2020-05-08  1:35 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2020-05-08  1:36 ` [patch 09/15] mm/slub: fix incorrect interpretation of s->offset Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 11/15] kselftests: introduce new epoll60 testcase for catching lost wakeups Andrew Morton
                   ` (70 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, cl, dennis, fdmanana, linux-mm, mm-commits, tj, torvalds

From: Filipe Manana <fdmanana@suse.com>
Subject: percpu: make pcpu_alloc() aware of current gfp context

Since 5.7-rc1, on btrfs we have a percpu counter initialization for which
we always pass a GFP_KERNEL gfp_t argument (this happens since commit
2992df73268f78 ("btrfs: Implement DREW lock")).  That is safe in some
contextes but not on others where allowing fs reclaim could lead to a
deadlock because we are either holding some btrfs lock needed for a
transaction commit or holding a btrfs transaction handle open.  Because of
that we surround the call to the function that initializes the percpu
counter with a NOFS context using memalloc_nofs_save() (this is done at
btrfs_init_fs_root()).

However it turns out that this is not enough to prevent a possible
deadlock because percpu_alloc() determines if it is in an atomic context
by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this
case) and it is not aware that a NOFS context is set.  Because it thinks
it is in a non atomic context it locks the pcpu_alloc_mutex, which can
result in a btrfs deadlock when pcpu_balance_workfn() is running, has
acquired that mutex and is waiting for reclaim, while the btrfs task that
called percpu_counter_init() (and therefore percpu_alloc()) is holding
either the btrfs commit_root semaphore or a transaction handle (done at
fs/btrfs/backref.c:iterate_extent_inodes()), which prevents reclaim from
finishing as an attempt to commit the current btrfs transaction will
deadlock.

Lockdep reports this issue with the following trace:

[ 1402.960494] ======================================================
[ 1402.961610] WARNING: possible circular locking dependency detected
[ 1402.962711] 5.6.0-rc7-btrfs-next-77 #1 Not tainted
[ 1402.963639] ------------------------------------------------------
[ 1402.964686] kswapd0/91 is trying to acquire lock:
[ 1402.965505] ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1402.966904]
               but task is already holding lock:
[ 1402.967641] ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
[ 1402.968580]
               which lock already depends on the new lock.

[ 1402.969581]
               the existing dependency chain (in reverse order) is:
[ 1402.970542]
               -> #4 (fs_reclaim){+.+.}:
[ 1402.971339]        fs_reclaim_acquire.part.0+0x25/0x30
[ 1402.972055]        __kmalloc+0x5f/0x3a0
[ 1402.972691]        pcpu_create_chunk+0x19/0x230
[ 1402.973493]        pcpu_balance_workfn+0x56a/0x680
[ 1402.974346]        process_one_work+0x235/0x5f0
[ 1402.975150]        worker_thread+0x50/0x3b0
[ 1402.975862]        kthread+0x120/0x140
[ 1402.976518]        ret_from_fork+0x3a/0x50
[ 1402.977228]
               -> #3 (pcpu_alloc_mutex){+.+.}:
[ 1402.978203]        __mutex_lock+0xa9/0xaf0
[ 1402.978909]        pcpu_alloc+0x480/0x7c0
[ 1402.979561]        __percpu_counter_init+0x50/0xd0
[ 1402.980176]        btrfs_drew_lock_init+0x22/0x70 [btrfs]
[ 1402.980855]        btrfs_get_fs_root+0x29c/0x5c0 [btrfs]
[ 1402.981610]        resolve_indirect_refs+0x120/0xa30 [btrfs]
[ 1402.982342]        find_parent_nodes+0x50b/0xf30 [btrfs]
[ 1402.983021]        btrfs_find_all_leafs+0x60/0xb0 [btrfs]
[ 1402.983702]        iterate_extent_inodes+0x139/0x2f0 [btrfs]
[ 1402.984410]        iterate_inodes_from_logical+0xa1/0xe0 [btrfs]
[ 1402.985158]        btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs]
[ 1402.985906]        btrfs_ioctl+0x165a/0x3130 [btrfs]
[ 1402.986521]        ksys_ioctl+0x87/0xc0
[ 1402.987001]        __x64_sys_ioctl+0x16/0x20
[ 1402.987599]        do_syscall_64+0x5c/0x260
[ 1402.988157]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1402.988957]
               -> #2 (&fs_info->commit_root_sem){++++}:
[ 1402.989990]        down_write+0x38/0x70
[ 1402.990516]        btrfs_cache_block_group+0x2ec/0x500 [btrfs]
[ 1402.991262]        find_free_extent+0xc6a/0x1600 [btrfs]
[ 1402.991934]        btrfs_reserve_extent+0x9b/0x180 [btrfs]
[ 1402.992623]        btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
[ 1402.993335]        alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
[ 1402.994092]        __btrfs_cow_block+0x122/0x5a0 [btrfs]
[ 1402.994755]        btrfs_cow_block+0x106/0x240 [btrfs]
[ 1402.995398]        commit_cowonly_roots+0x55/0x310 [btrfs]
[ 1402.996081]        btrfs_commit_transaction+0x509/0xb20 [btrfs]
[ 1402.996813]        sync_filesystem+0x74/0x90
[ 1402.997339]        generic_shutdown_super+0x22/0x100
[ 1402.997962]        kill_anon_super+0x14/0x30
[ 1402.998536]        btrfs_kill_super+0x12/0x20 [btrfs]
[ 1402.999265]        deactivate_locked_super+0x31/0x70
[ 1402.999877]        cleanup_mnt+0x100/0x160
[ 1403.000386]        task_work_run+0x93/0xc0
[ 1403.000896]        exit_to_usermode_loop+0xf9/0x100
[ 1403.001499]        do_syscall_64+0x20d/0x260
[ 1403.002028]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1403.002714]
               -> #1 (&space_info->groups_sem){++++}:
[ 1403.003484]        down_read+0x3c/0x140
[ 1403.004021]        find_free_extent+0xef6/0x1600 [btrfs]
[ 1403.004742]        btrfs_reserve_extent+0x9b/0x180 [btrfs]
[ 1403.005473]        btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
[ 1403.006207]        alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
[ 1403.006991]        __btrfs_cow_block+0x122/0x5a0 [btrfs]
[ 1403.007705]        btrfs_cow_block+0x106/0x240 [btrfs]
[ 1403.008454]        btrfs_search_slot+0x50c/0xd60 [btrfs]
[ 1403.009154]        btrfs_lookup_inode+0x3a/0xc0 [btrfs]
[ 1403.009924]        __btrfs_update_delayed_inode+0x90/0x280 [btrfs]
[ 1403.010797]        __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs]
[ 1403.011649]        __btrfs_run_delayed_items+0x8e/0x180 [btrfs]
[ 1403.012391]        btrfs_commit_transaction+0x31b/0xb20 [btrfs]
[ 1403.013126]        iterate_supers+0x87/0xf0
[ 1403.013713]        ksys_sync+0x60/0xb0
[ 1403.014263]        __ia32_sys_sync+0xa/0x10
[ 1403.014909]        do_syscall_64+0x5c/0x260
[ 1403.015622]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1403.016733]
               -> #0 (&delayed_node->mutex){+.+.}:
[ 1403.017644]        __lock_acquire+0xef0/0x1c80
[ 1403.018194]        lock_acquire+0xa2/0x1d0
[ 1403.018744]        __mutex_lock+0xa9/0xaf0
[ 1403.019274]        __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1403.020211]        btrfs_evict_inode+0x40d/0x560 [btrfs]
[ 1403.021071]        evict+0xd9/0x1c0
[ 1403.021627]        dispose_list+0x48/0x70
[ 1403.022209]        prune_icache_sb+0x54/0x80
[ 1403.022781]        super_cache_scan+0x124/0x1a0
[ 1403.023469]        do_shrink_slab+0x176/0x440
[ 1403.024245]        shrink_slab+0x23a/0x2c0
[ 1403.024984]        shrink_node+0x188/0x6e0
[ 1403.025702]        balance_pgdat+0x31d/0x7f0
[ 1403.026445]        kswapd+0x238/0x550
[ 1403.027066]        kthread+0x120/0x140
[ 1403.027533]        ret_from_fork+0x3a/0x50
[ 1403.028043]
               other info that might help us debug this:

[ 1403.029024] Chain exists of:
                 &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim

[ 1403.030359]  Possible unsafe locking scenario:

[ 1403.031335]        CPU0                    CPU1
[ 1403.031866]        ----                    ----
[ 1403.032459]   lock(fs_reclaim);
[ 1403.032840]                                lock(pcpu_alloc_mutex);
[ 1403.033462]                                lock(fs_reclaim);
[ 1403.034030]   lock(&delayed_node->mutex);
[ 1403.034466]
                *** DEADLOCK ***

[ 1403.035053] 3 locks held by kswapd0/91:
[ 1403.035505]  #0: ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
[ 1403.036964]  #1: ffffffffb4ef6ce8 (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0
[ 1403.038120]  #2: ffff8938ae0490d8 (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50
[ 1403.039284]
               stack backtrace:
[ 1403.040092] CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1
[ 1403.041024] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 1403.042441] Call Trace:
[ 1403.042784]  dump_stack+0x8f/0xd0
[ 1403.043224]  check_noncircular+0x170/0x190
[ 1403.043968]  __lock_acquire+0xef0/0x1c80
[ 1403.044762]  lock_acquire+0xa2/0x1d0
[ 1403.045476]  ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1403.046685]  __mutex_lock+0xa9/0xaf0
[ 1403.047402]  ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1403.048639]  ? __lock_acquire+0x126e/0x1c80
[ 1403.049472]  ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1403.050672]  ? find_held_lock+0x2b/0x80
[ 1403.051419]  ? __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1403.052655]  __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
[ 1403.053872]  btrfs_evict_inode+0x40d/0x560 [btrfs]
[ 1403.054876]  evict+0xd9/0x1c0
[ 1403.055452]  dispose_list+0x48/0x70
[ 1403.056118]  prune_icache_sb+0x54/0x80
[ 1403.056842]  super_cache_scan+0x124/0x1a0
[ 1403.057576]  do_shrink_slab+0x176/0x440
[ 1403.058227]  shrink_slab+0x23a/0x2c0
[ 1403.058912]  shrink_node+0x188/0x6e0
[ 1403.059592]  balance_pgdat+0x31d/0x7f0
[ 1403.060372]  kswapd+0x238/0x550
[ 1403.060972]  ? finish_wait+0x90/0x90
[ 1403.061649]  ? balance_pgdat+0x7f0/0x7f0
[ 1403.062391]  kthread+0x120/0x140
[ 1403.062725]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1403.063226]  ret_from_fork+0x3a/0x50

This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL to
percpu_counter_init() in contextes where it is not reclaim safe, however
that type of approach is discouraged since memalloc_[nofs|noio]_save()
were introduced.  Therefore this change makes pcpu_alloc() look up into an
existing nofs/noio context before deciding whether it is in an atomic
context or not.

Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/percpu.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--- a/mm/percpu.c~percpu-make-pcpu_alloc-aware-of-current-gfp-context
+++ a/mm/percpu.c
@@ -80,6 +80,7 @@
 #include <linux/workqueue.h>
 #include <linux/kmemleak.h>
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 
 #include <asm/cacheflush.h>
 #include <asm/sections.h>
@@ -1557,10 +1558,9 @@ static struct pcpu_chunk *pcpu_chunk_add
 static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 				 gfp_t gfp)
 {
-	/* whitelisted flags that can be passed to the backing allocators */
-	gfp_t pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
-	bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
-	bool do_warn = !(gfp & __GFP_NOWARN);
+	gfp_t pcpu_gfp;
+	bool is_atomic;
+	bool do_warn;
 	static int warn_limit = 10;
 	struct pcpu_chunk *chunk, *next;
 	const char *err;
@@ -1569,6 +1569,12 @@ static void __percpu *pcpu_alloc(size_t
 	void __percpu *ptr;
 	size_t bits, bit_align;
 
+	gfp = current_gfp_context(gfp);
+	/* whitelisted flags that can be passed to the backing allocators */
+	pcpu_gfp = gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
+	do_warn = !(gfp & __GFP_NOWARN);
+
 	/*
 	 * There is now a minimum allocation size of PCPU_MIN_ALLOC_SIZE,
 	 * therefore alignment must be a minimum of that many bytes.
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 11/15] kselftests: introduce new epoll60 testcase for catching lost wakeups
  2020-05-08  1:35 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2020-05-08  1:36 ` [patch 10/15] percpu: make pcpu_alloc() aware of current gfp context Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 12/15] epoll: atomically remove wait entry on wake up Andrew Morton
                   ` (69 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, jbaron, khazhy, linux-mm, mm-commits, r, rpenyaev, torvalds, viro

From: Roman Penyaev <rpenyaev@suse.de>
Subject: kselftests: introduce new epoll60 testcase for catching lost wakeups

This test case catches lost wake up introduced by:
  339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")

The test is simple: we have 10 threads and 10 event fds.  Each thread can
harvest only 1 event.  1 producer fires all 10 events at once and waits
that all 10 events will be observed by 10 threads.

In case of lost wakeup epoll_wait() will timeout and 0 will be returned.

Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:

  5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")

the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups.  Problem is fixed in the
following patch.

Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c |  146 ++++++++++
 1 file changed, 146 insertions(+)

--- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c~kselftests-introduce-new-epoll60-testcase-for-catching-lost-wakeups
+++ a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
@@ -3,6 +3,7 @@
 #define _GNU_SOURCE
 #include <poll.h>
 #include <unistd.h>
+#include <assert.h>
 #include <signal.h>
 #include <pthread.h>
 #include <sys/epoll.h>
@@ -3136,4 +3137,149 @@ TEST(epoll59)
 	close(ctx.sfd[0]);
 }
 
+enum {
+	EPOLL60_EVENTS_NR = 10,
+};
+
+struct epoll60_ctx {
+	volatile int stopped;
+	int ready;
+	int waiters;
+	int epfd;
+	int evfd[EPOLL60_EVENTS_NR];
+};
+
+static void *epoll60_wait_thread(void *ctx_)
+{
+	struct epoll60_ctx *ctx = ctx_;
+	struct epoll_event e;
+	sigset_t sigmask;
+	uint64_t v;
+	int ret;
+
+	/* Block SIGUSR1 */
+	sigemptyset(&sigmask);
+	sigaddset(&sigmask, SIGUSR1);
+	sigprocmask(SIG_SETMASK, &sigmask, NULL);
+
+	/* Prepare empty mask for epoll_pwait() */
+	sigemptyset(&sigmask);
+
+	while (!ctx->stopped) {
+		/* Mark we are ready */
+		__atomic_fetch_add(&ctx->ready, 1, __ATOMIC_ACQUIRE);
+
+		/* Start when all are ready */
+		while (__atomic_load_n(&ctx->ready, __ATOMIC_ACQUIRE) &&
+		       !ctx->stopped);
+
+		/* Account this waiter */
+		__atomic_fetch_add(&ctx->waiters, 1, __ATOMIC_ACQUIRE);
+
+		ret = epoll_pwait(ctx->epfd, &e, 1, 2000, &sigmask);
+		if (ret != 1) {
+			/* We expect only signal delivery on stop */
+			assert(ret < 0 && errno == EINTR && "Lost wakeup!\n");
+			assert(ctx->stopped);
+			break;
+		}
+
+		ret = read(e.data.fd, &v, sizeof(v));
+		/* Since we are on ET mode, thus each thread gets its own fd. */
+		assert(ret == sizeof(v));
+
+		__atomic_fetch_sub(&ctx->waiters, 1, __ATOMIC_RELEASE);
+	}
+
+	return NULL;
+}
+
+static inline unsigned long long msecs(void)
+{
+	struct timespec ts;
+	unsigned long long msecs;
+
+	clock_gettime(CLOCK_REALTIME, &ts);
+	msecs = ts.tv_sec * 1000ull;
+	msecs += ts.tv_nsec / 1000000ull;
+
+	return msecs;
+}
+
+static inline int count_waiters(struct epoll60_ctx *ctx)
+{
+	return __atomic_load_n(&ctx->waiters, __ATOMIC_ACQUIRE);
+}
+
+TEST(epoll60)
+{
+	struct epoll60_ctx ctx = { 0 };
+	pthread_t waiters[ARRAY_SIZE(ctx.evfd)];
+	struct epoll_event e;
+	int i, n, ret;
+
+	signal(SIGUSR1, signal_handler);
+
+	ctx.epfd = epoll_create1(0);
+	ASSERT_GE(ctx.epfd, 0);
+
+	/* Create event fds */
+	for (i = 0; i < ARRAY_SIZE(ctx.evfd); i++) {
+		ctx.evfd[i] = eventfd(0, EFD_NONBLOCK);
+		ASSERT_GE(ctx.evfd[i], 0);
+
+		e.events = EPOLLIN | EPOLLET;
+		e.data.fd = ctx.evfd[i];
+		ASSERT_EQ(epoll_ctl(ctx.epfd, EPOLL_CTL_ADD, ctx.evfd[i], &e), 0);
+	}
+
+	/* Create waiter threads */
+	for (i = 0; i < ARRAY_SIZE(waiters); i++)
+		ASSERT_EQ(pthread_create(&waiters[i], NULL,
+					 epoll60_wait_thread, &ctx), 0);
+
+	for (i = 0; i < 300; i++) {
+		uint64_t v = 1, ms;
+
+		/* Wait for all to be ready */
+		while (__atomic_load_n(&ctx.ready, __ATOMIC_ACQUIRE) !=
+		       ARRAY_SIZE(ctx.evfd))
+			;
+
+		/* Steady, go */
+		__atomic_fetch_sub(&ctx.ready, ARRAY_SIZE(ctx.evfd),
+				   __ATOMIC_ACQUIRE);
+
+		/* Wait all have gone to kernel */
+		while (count_waiters(&ctx) != ARRAY_SIZE(ctx.evfd))
+			;
+
+		/* 1ms should be enough to schedule away */
+		usleep(1000);
+
+		/* Quickly signal all handles at once */
+		for (n = 0; n < ARRAY_SIZE(ctx.evfd); n++) {
+			ret = write(ctx.evfd[n], &v, sizeof(v));
+			ASSERT_EQ(ret, sizeof(v));
+		}
+
+		/* Busy loop for 1s and wait for all waiters to wake up */
+		ms = msecs();
+		while (count_waiters(&ctx) && msecs() < ms + 1000)
+			;
+
+		ASSERT_EQ(count_waiters(&ctx), 0);
+	}
+	ctx.stopped = 1;
+	/* Stop waiters */
+	for (i = 0; i < ARRAY_SIZE(waiters); i++)
+		ret = pthread_kill(waiters[i], SIGUSR1);
+	for (i = 0; i < ARRAY_SIZE(waiters); i++)
+		pthread_join(waiters[i], NULL);
+
+	for (i = 0; i < ARRAY_SIZE(waiters); i++)
+		close(ctx.evfd[i]);
+	close(ctx.epfd);
+}
+
 TEST_HARNESS_MAIN
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 12/15] epoll: atomically remove wait entry on wake up
  2020-05-08  1:35 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2020-05-08  1:36 ` [patch 11/15] kselftests: introduce new epoll60 testcase for catching lost wakeups Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 13/15] mm/vmscan: remove unnecessary argument description of isolate_lru_pages() Andrew Morton
                   ` (68 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, jbaron, khazhy, linux-mm, mm-commits, r, rpenyaev, stable,
	torvalds, viro

From: Roman Penyaev <rpenyaev@suse.de>
Subject: epoll: atomically remove wait entry on wake up

This patch does two things:

1. fixes lost wakeup introduced by:
  339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")

2. improves performance for events delivery.

The description of the problem is the following: if N (>1) threads are
waiting on ep->wq for new events and M (>1) events come, it is quite
likely that >1 wakeups hit the same wait queue entry, because there is
quite a big window between __add_wait_queue_exclusive() and the following
__remove_wait_queue() calls in ep_poll() function.  This can lead to lost
wakeups, because thread, which was woken up, can handle not all the events
in ->rdllist.  (in better words the problem is described here:
https://lkml.org/lkml/2019/10/7/905)

The idea of the current patch is to use init_wait() instead of
init_waitqueue_entry().  Internally init_wait() sets
autoremove_wake_function as a callback, which removes the wait entry
atomically (under the wq locks) from the list, thus the next coming wakeup
hits the next wait entry in the wait queue, thus preventing lost wakeups.

Problem is very well reproduced by the epoll60 test case [1].

Wait entry removal on wakeup has also performance benefits, because there
is no need to take a ep->lock and remove wait entry from the queue after
the successful wakeup.  Here is the timing output of the epoll60 test
case:

  With explicit wakeup from ep_scan_ready_list() (the state of the
  code prior 339ddb53d373):

    real    0m6.970s
    user    0m49.786s
    sys     0m0.113s

 After this patch:

   real    0m5.220s
   user    0m36.879s
   sys     0m0.019s

The other testcase is the stress-epoll [2], where one thread consumes
all the events and other threads produce many events:

  With explicit wakeup from ep_scan_ready_list() (the state of the
  code prior 339ddb53d373):

    threads  events/ms  run-time ms
          8       5427         1474
         16       6163         2596
         32       6824         4689
         64       7060         9064
        128       6991        18309

 After this patch:

    threads  events/ms  run-time ms
          8       5598         1429
         16       7073         2262
         32       7502         4265
         64       7640         8376
        128       7634        16767

 (number of "events/ms" represents event bandwidth, thus higher is
  better; number of "run-time ms" represents overall time spent
  doing the benchmark, thus lower is better)

[1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
[2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c

Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/eventpoll.c |   43 ++++++++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 19 deletions(-)

--- a/fs/eventpoll.c~epoll-atomically-remove-wait-entry-on-wake-up
+++ a/fs/eventpoll.c
@@ -1822,7 +1822,6 @@ static int ep_poll(struct eventpoll *ep,
 {
 	int res = 0, eavail, timed_out = 0;
 	u64 slack = 0;
-	bool waiter = false;
 	wait_queue_entry_t wait;
 	ktime_t expires, *to = NULL;
 
@@ -1867,21 +1866,23 @@ fetch_events:
 	 */
 	ep_reset_busy_poll_napi_id(ep);
 
-	/*
-	 * We don't have any available event to return to the caller.  We need
-	 * to sleep here, and we will be woken by ep_poll_callback() when events
-	 * become available.
-	 */
-	if (!waiter) {
-		waiter = true;
-		init_waitqueue_entry(&wait, current);
-
+	do {
+		/*
+		 * Internally init_wait() uses autoremove_wake_function(),
+		 * thus wait entry is removed from the wait queue on each
+		 * wakeup. Why it is important? In case of several waiters
+		 * each new wakeup will hit the next waiter, giving it the
+		 * chance to harvest new event. Otherwise wakeup can be
+		 * lost. This is also good performance-wise, because on
+		 * normal wakeup path no need to call __remove_wait_queue()
+		 * explicitly, thus ep->lock is not taken, which halts the
+		 * event delivery.
+		 */
+		init_wait(&wait);
 		write_lock_irq(&ep->lock);
 		__add_wait_queue_exclusive(&ep->wq, &wait);
 		write_unlock_irq(&ep->lock);
-	}
 
-	for (;;) {
 		/*
 		 * We don't want to sleep if the ep_poll_callback() sends us
 		 * a wakeup in between. That's why we set the task state
@@ -1911,10 +1912,20 @@ fetch_events:
 			timed_out = 1;
 			break;
 		}
-	}
+
+		/* We were woken up, thus go and try to harvest some events */
+		eavail = 1;
+
+	} while (0);
 
 	__set_current_state(TASK_RUNNING);
 
+	if (!list_empty_careful(&wait.entry)) {
+		write_lock_irq(&ep->lock);
+		__remove_wait_queue(&ep->wq, &wait);
+		write_unlock_irq(&ep->lock);
+	}
+
 send_events:
 	/*
 	 * Try to transfer events to user space. In case we get 0 events and
@@ -1925,12 +1936,6 @@ send_events:
 	    !(res = ep_send_events(ep, events, maxevents)) && !timed_out)
 		goto fetch_events;
 
-	if (waiter) {
-		write_lock_irq(&ep->lock);
-		__remove_wait_queue(&ep->wq, &wait);
-		write_unlock_irq(&ep->lock);
-	}

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 13/15] mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
  2020-05-08  1:35 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2020-05-08  1:36 ` [patch 12/15] epoll: atomically remove wait entry on wake up Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 14/15] ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST Andrew Morton
                   ` (67 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, chenqiwu, linux-mm, mm-commits, qiwuchen55, torvalds

From: Qiwu Chen <qiwuchen55@gmail.com>
Subject: mm/vmscan: remove unnecessary argument description of isolate_lru_pages()

Since commit a9e7c39fa9fd9 ("mm/vmscan.c: remove 7th argument of
isolate_lru_pages()"), the explanation of 'mode' argument has been
unnecessary.  Let's remove it.

Link: http://lkml.kernel.org/r/20200501090346.2894-1-chenqiwu@xiaomi.com
Signed-off-by: Qiwu Chen <chenqiwu@xiaomi.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    1 -
 1 file changed, 1 deletion(-)

--- a/mm/vmscan.c~mm-vmscan-remove-unnecessary-argument-description-of-isolate_lru_pages
+++ a/mm/vmscan.c
@@ -1625,7 +1625,6 @@ static __always_inline void update_lru_s
  * @dst:	The temp list to put pages on to.
  * @nr_scanned:	The number of pages that were scanned.
  * @sc:		The scan_control struct for this reclaim session
- * @mode:	One of the LRU isolation modes
  * @lru:	LRU list id for isolating
  *
  * returns how many pages were moved onto *@dst.
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 14/15] ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
  2020-05-08  1:35 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2020-05-08  1:36 ` [patch 13/15] mm/vmscan: remove unnecessary argument description of isolate_lru_pages() Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  1:36 ` [patch 15/15] mm: limit boost_watermark on small zones Andrew Morton
                   ` (66 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, aryabinin, jpoimboe, keescook, linux-mm, mm-commits,
	peterz, rdunlap, sfr, torvalds

From: Kees Cook <keescook@chromium.org>
Subject: ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST

The documentation for UBSAN_ALIGNMENT already mentions that it should not
be used on all*config builds (and for efficient-unaligned-access
architectures), so just refactor the Kconfig to correctly implement this
so randconfigs will stop creating insane images that freak out objtool
under CONFIG_UBSAN_TRAP (due to the false positives producing functions
that never return, etc).

Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook
Fixes: 0887a7ebc977 ("ubsan: add trap instrumentation option")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
  Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/Kconfig.ubsan                                   |   15 ++++------
 tools/testing/selftests/wireguard/qemu/debug.config |    1 
 2 files changed, 6 insertions(+), 10 deletions(-)

--- a/lib/Kconfig.ubsan~ubsan-disable-ubsan_alignment-under-compile_test
+++ a/lib/Kconfig.ubsan
@@ -60,18 +60,15 @@ config UBSAN_SANITIZE_ALL
 	  Enabling this option will get kernel image size increased
 	  significantly.
 
-config UBSAN_NO_ALIGNMENT
-	bool "Disable checking of pointers alignment"
-	default y if HAVE_EFFICIENT_UNALIGNED_ACCESS
+config UBSAN_ALIGNMENT
+	bool "Enable checks for pointers alignment"
+	default !HAVE_EFFICIENT_UNALIGNED_ACCESS
+	depends on !X86 || !COMPILE_TEST
 	help
-	  This option disables the check of unaligned memory accesses.
-	  This option should be used when building allmodconfig.
-	  Disabling this option on architectures that support unaligned
+	  This option enables the check of unaligned memory accesses.
+	  Enabling this option on architectures that support unaligned
 	  accesses may produce a lot of false positives.
 
-config UBSAN_ALIGNMENT
-	def_bool !UBSAN_NO_ALIGNMENT

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [patch 15/15] mm: limit boost_watermark on small zones
  2020-05-08  1:35 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2020-05-08  1:36 ` [patch 14/15] ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST Andrew Morton
@ 2020-05-08  1:36 ` Andrew Morton
  2020-05-08  2:08 ` + arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch added to -mm tree Andrew Morton
                   ` (65 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  1:36 UTC (permalink / raw)
  To: akpm, david, henry.willard, linux-mm, mgorman, mm-commits,
	stable, torvalds, vbabka

From: Henry Willard <henry.willard@oracle.com>
Subject: mm: limit boost_watermark on small zones

Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external
fragmentation event occurs") adds a boost_watermark() function which
increases the min watermark in a zone by at least pageblock_nr_pages or
the number of pages in a page block. On Arm64, with 64K pages and 512M
huge pages, this is 8192 pages or 512M. It does this regardless of the
number of managed pages managed in the zone or the likelihood of success.
This can put the zone immediately under water in terms of allocating pages
from the zone, and can cause a small machine to fail immediately due to
OoM. Unlike set_recommended_min_free_kbytes(), which substantially
increases min_free_kbytes and is tied to THP, boost_watermark() can be
called even if THP is not active. The problem is most likely to appear
on architectures such as Arm64 where pageblock_nr_pages is very large.

It is desirable to run the kdump capture kernel in as small a space as
possible to avoid wasting memory. In some architectures, such as Arm64,
there are restrictions on where the capture kernel can run, and therefore,
the space available. A capture kernel running in 768M can fail due to OoM
immediately after boost_watermark() sets the min in zone DMA32, where
most of the memory is, to 512M. It fails even though there is over 500M of
free memory. With boost_watermark() suppressed, the capture kernel can run
successfully in 448M.

This patch limits boost_watermark() to boosting a zone's min watermark only
when there are enough pages that the boost will produce positive results.
In this case that is estimated to be four times as many pages as
pageblock_nr_pages.

Mel said:

: There is no harm in marking it stable.  Clearly it does not happen very
: often but it's not impossible.  32-bit x86 is a lot less common now
: which would previously have been vulnerable to triggering this easily. 
: ppc64 has a larger base page size but typically only has one zone. 
: arm64 is likely the most vulnerable, particularly when CMA is
: configured with a small movable zone.

Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@oracle.com
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Henry Willard <henry.willard@oracle.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |    8 ++++++++
 1 file changed, 8 insertions(+)

--- a/mm/page_alloc.c~mm-limit-boost_watermark-on-small-zones
+++ a/mm/page_alloc.c
@@ -2401,6 +2401,14 @@ static inline void boost_watermark(struc
 
 	if (!watermark_boost_factor)
 		return;
+	/*
+	 * Don't bother in zones that are unlikely to produce results.
+	 * On small machines, including kdump capture kernels running
+	 * in a small area, boosting the watermark can cause an out of
+	 * memory situation immediately.
+	 */
+	if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
+		return;
 
 	max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
 			watermark_boost_factor, 10000);
_

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (14 preceding siblings ...)
  2020-05-08  1:36 ` [patch 15/15] mm: limit boost_watermark on small zones Andrew Morton
@ 2020-05-08  2:08 ` Andrew Morton
  2020-05-08 20:46 ` + ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch " Andrew Morton
                   ` (64 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08  2:08 UTC (permalink / raw)
  To: akpm, ira.weiny, mm-commits, sfr


The patch titled
     Subject: arch-kunmap-remove-duplicate-kunmap-implementations-fix
has been added to the -mm tree.  Its filename is
     arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: arch-kunmap-remove-duplicate-kunmap-implementations-fix

fix CONFIG_HIGHMEM=n build on various architectures

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/highmem.h |    5 +++++
 1 file changed, 5 insertions(+)

--- a/include/linux/highmem.h~arch-kunmap-remove-duplicate-kunmap-implementations-fix
+++ a/include/linux/highmem.h
@@ -53,6 +53,7 @@ static inline void *kmap(struct page *pa
 }
 
 void kunmap_high(struct page *page);
+
 static inline void kunmap(struct page *page)
 {
 	might_sleep();
@@ -111,6 +112,10 @@ static inline void *kmap(struct page *pa
 	return page_address(page);
 }
 
+static inline void kunmap_high(struct page *page)
+{
+}
+
 static inline void kunmap(struct page *page)
 {
 }
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fixpatch-added-to-mm-tree.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
linux-next-rejects.patch
linux-next-git-rejects.patch
arch-x86-makefile-use-config_shell.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (15 preceding siblings ...)
  2020-05-08  2:08 ` + arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch added to -mm tree Andrew Morton
@ 2020-05-08 20:46 ` Andrew Morton
  2020-05-08 20:50 ` + mm-introduce-external-memory-hinting-api-fix-2.patch " Andrew Morton
                   ` (63 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 20:46 UTC (permalink / raw)
  To: akpm, dave, longman, manfred, mingo, mm-commits, neilb, oberpar,
	rostedt, schwab, vvs


The patch titled
     Subject: ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix
has been added to the -mm tree.  Its filename is
     ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix

fix conflict resolution fix

Cc: Andreas Schwab <schwab@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 ipc/util.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/ipc/util.c~ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix
+++ a/ipc/util.c
@@ -777,7 +777,7 @@ static struct kern_ipc_perm *sysvipc_fin
 		}
 	}
 out:
-	*new_pos = pos + 1;
+	*new_pos = index + 1;
 	return ipc;
 }
 
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fixpatch-added-to-mm-tree.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
linux-next-rejects.patch
linux-next-git-rejects.patch
arch-x86-makefile-use-config_shell.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-introduce-external-memory-hinting-api-fix-2.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (16 preceding siblings ...)
  2020-05-08 20:46 ` + ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch " Andrew Morton
@ 2020-05-08 20:50 ` Andrew Morton
  2020-05-08 21:32 ` + checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch " Andrew Morton
                   ` (62 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 20:50 UTC (permalink / raw)
  To: lkp, minchan, mm-commits, sfr


The patch titled
     Subject: mm: fix build error for mips of process_madvise
has been added to the -mm tree.  Its filename is
     mm-introduce-external-memory-hinting-api-fix-2.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-introduce-external-memory-hinting-api-fix-2.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-introduce-external-memory-hinting-api-fix-2.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Minchan Kim <minchan@kernel.org>
Subject: mm: fix build error for mips of process_madvise

kbuild test rebot reported build break of process_madvise for mips[1].
This patch should fix it.

[1] https://lore.kernel.org/linux-mm/202005080716.cUcbCQ3i%25lkp@intel.com/
Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/mips/kernel/syscalls/syscall_o32.tbl |    1 +
 1 file changed, 1 insertion(+)

--- a/arch/mips/kernel/syscalls/syscall_o32.tbl~mm-introduce-external-memory-hinting-api-fix-2
+++ a/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -425,3 +425,4 @@
 435	o32	clone3				__sys_clone3
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
+439	o32	process_madvise			sys_process_madvise		compat_sys_process_madvise
_

Patches currently in -mm which might be from minchan@kernel.org are

mm-pass-task-and-mm-to-do_madvise.patch
mm-introduce-external-memory-hinting-api.patch
mm-introduce-external-memory-hinting-api-fix.patch
mm-introduce-external-memory-hinting-api-fix-2.patch
mm-check-fatal-signal-pending-of-target-process.patch
pid-move-pidfd_get_pid-function-to-pidc.patch
mm-support-both-pid-and-pidfd-for-process_madvise.patch
mm-support-vector-address-ranges-for-process_madvise.patch
mm-support-vector-address-ranges-for-process_madvise-fix.patch
mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (17 preceding siblings ...)
  2020-05-08 20:50 ` + mm-introduce-external-memory-hinting-api-fix-2.patch " Andrew Morton
@ 2020-05-08 21:32 ` Andrew Morton
  2020-05-08 23:19 ` + mm-fix-numa-node-file-count-error-in-replace_page_cache.patch " Andrew Morton
                   ` (61 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 21:32 UTC (permalink / raw)
  To: geert+renesas, joe, konstantin, mm-commits


The patch titled
     Subject: checkpatch-use-patch-subject-when-reading-from-stdin-fix
has been added to the -mm tree.  Its filename is
     checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Joe Perches <joe@perches.com>
Subject: checkpatch-use-patch-subject-when-reading-from-stdin-fix

reduce cpu usage

Link: http://lkml.kernel.org/r/c9d89bb24c7414142414c60371e210fdcf4617d2.camel@perches.com
Cc: Joe Perches <joe@perches.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Joe Perches <joe@perches.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/checkpatch.pl |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

--- a/scripts/checkpatch.pl~checkpatch-use-patch-subject-when-reading-from-stdin-fix
+++ a/scripts/checkpatch.pl
@@ -1061,11 +1061,8 @@ for my $filename (@ARGV) {
 	}
 	while (<$FILE>) {
 		chomp;
-		if ($vname eq 'Your patch') {
-			my ($subject) = $_ =~ /^Subject:\s*(.*)/;
-			$vname = '"' . $subject . '"' if $subject;
-		}
 		push(@rawlines, $_);
+		$vname = qq("$1") if ($filename eq '-' && $_ =~ m/^Subject:\s+(.+)/i);
 	}
 	close($FILE);
 
_

Patches currently in -mm which might be from joe@perches.com are

checkpatch-test-git_dir-changes.patch
get_maintainer-add-email-addresses-from-yaml-files.patch
percpu_ref-use-a-more-common-logging-style.patch
checkpatch-additional-maintainer-section-entry-ordering-checks.patch
checkpatch-look-for-c99-comments-in-ctx_locate_comment.patch
checkpatch-disallow-git-and-file-fix.patch
checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch
fs-seq_filec-seq_read-update-pr_info_ratelimited.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-fix-numa-node-file-count-error-in-replace_page_cache.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (18 preceding siblings ...)
  2020-05-08 21:32 ` + checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch " Andrew Morton
                   ` (60 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: fix NUMA node file count error in replace_page_cache()
has been added to the -mm tree.  Its filename is
     mm-fix-numa-node-file-count-error-in-replace_page_cache.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-numa-node-file-count-error-in-replace_page_cache.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: fix NUMA node file count error in replace_page_cache()

Patch series "mm: memcontrol: charge swapin pages on instantiation", v2.

This patch series reworks memcg to charge swapin pages directly at
swapin time, rather than at fault time, which may be much later, or
not happen at all.

Changes in version 2:
- prevent double charges on pre-allocated hugepages in khugepaged
- leave shmem swapcache when charging fails to avoid double IO (Joonsoo)
- fix temporary accounting bug by switching rmap<->commit (Joonsoo)
- fix double swap charge bug in cgroup1/cgroup2 code gating
- simplify swapin error checking (Joonsoo)
- mm: memcontrol: document the new swap control behavior (Alex)
- review tags

The delayed swapin charging scheme we have right now causes problems:

- Alex's per-cgroup lru_lock patches rely on pages that have been
  isolated from the LRU to have a stable page->mem_cgroup; otherwise
  the lock may change underneath him. Swapcache pages are charged only
  after they are added to the LRU, and charging doesn't follow the LRU
  isolation protocol.

- Joonsoo's anon workingset patches need a suitable LRU at the time
  the page enters the swap cache and displaces the non-resident
  info. But the correct LRU is only available after charging.

- It's a containment hole / DoS vector. Users can trigger arbitrarily
  large swap readahead using MADV_WILLNEED. The memory is never
  charged unless somebody actually touches it.

- It complicates the page->mem_cgroup stabilization rules

In order to charge pages directly at swapin time, the memcg code base
needs to be prepared, and several overdue cleanups become a necessity:

To charge pages at swapin time, we need to always have cgroup
ownership tracking of swap records. We also cannot rely on
page->mapping to tell apart page types at charge time, because that's
only set up during a page fault.

To eliminate the page->mapping dependency, memcg needs to ditch its
private page type counters (MEMCG_CACHE, MEMCG_RSS, NR_SHMEM) in favor
of the generic vmstat counters and accounting sites, such as
NR_FILE_PAGES, NR_ANON_MAPPED etc.

To switch to generic vmstat counters, the charge sequence must be
adjusted such that page->mem_cgroup is set up by the time these
counters are modified.

The series is structured as follows:

1. Bug fixes
2. Decoupling charging from rmap
3. Swap controller integration into memcg
4. Direct swapin charging


This patch (of 19):

When replacing one page with another one in the cache, we have to decrease
the file count of the old page's NUMA node and increase the one of the new
NUMA node, otherwise the old node leaks the count and the new node
eventually underflows its counter.

Link: http://lkml.kernel.org/r/20200508183105.225460-1-hannes@cmpxchg.org
Link: http://lkml.kernel.org/r/20200508183105.225460-2-hannes@cmpxchg.org
Fixes: 74d609585d8b ("page cache: Add and replace pages using the XArray")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/filemap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/filemap.c~mm-fix-numa-node-file-count-error-in-replace_page_cache
+++ a/mm/filemap.c
@@ -808,11 +808,11 @@ int replace_page_cache_page(struct page
 	old->mapping = NULL;
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!PageHuge(old))
-		__dec_node_page_state(new, NR_FILE_PAGES);
+		__dec_node_page_state(old, NR_FILE_PAGES);
 	if (!PageHuge(new))
 		__inc_node_page_state(new, NR_FILE_PAGES);
 	if (PageSwapBacked(old))
-		__dec_node_page_state(new, NR_SHMEM);
+		__dec_node_page_state(old, NR_SHMEM);
 	if (PageSwapBacked(new))
 		__inc_node_page_state(new, NR_SHMEM);
 	xas_unlock_irqrestore(&xas, flags);
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (19 preceding siblings ...)
  2020-05-08 23:19 ` + mm-fix-numa-node-file-count-error-in-replace_page_cache.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch " Andrew Morton
                   ` (59 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: fix stat-corrupting race in charge moving
has been added to the -mm tree.  Its filename is
     mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: fix stat-corrupting race in charge moving

The move_lock is a per-memcg lock, but the VM accounting code that needs
to acquire it comes from the page and follows page->mem_cgroup under RCU
protection.  That means that the page becomes unlocked not when we drop
the move_lock, but when we update page->mem_cgroup.  And that assignment
doesn't imply any memory ordering.  If that pointer write gets reordered
against the reads of the page state - page_mapped, PageDirty etc.  the
state may change while we rely on it being stable and we can end up
corrupting the counters.

Place an SMP memory barrier to make sure we're done with all page state by
the time the new page->mem_cgroup becomes visible.

Also replace the open-coded move_lock with a lock_page_memcg() to make it
more obvious what we're serializing against.

Link: http://lkml.kernel.org/r/20200508183105.225460-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-fix-stat-corrupting-race-in-charge-moving
+++ a/mm/memcontrol.c
@@ -5378,7 +5378,6 @@ static int mem_cgroup_move_account(struc
 {
 	struct lruvec *from_vec, *to_vec;
 	struct pglist_data *pgdat;
-	unsigned long flags;
 	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
 	int ret;
 	bool anon;
@@ -5405,18 +5404,13 @@ static int mem_cgroup_move_account(struc
 	from_vec = mem_cgroup_lruvec(from, pgdat);
 	to_vec = mem_cgroup_lruvec(to, pgdat);
 
-	spin_lock_irqsave(&from->move_lock, flags);
+	lock_page_memcg(page);
 
 	if (!anon && page_mapped(page)) {
 		__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
 		__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
 	}
 
-	/*
-	 * move_lock grabbed above and caller set from->moving_account, so
-	 * mod_memcg_page_state will serialize updates to PageDirty.
-	 * So mapping should be stable for dirty pages.
-	 */
 	if (!anon && PageDirty(page)) {
 		struct address_space *mapping = page_mapping(page);
 
@@ -5432,15 +5426,23 @@ static int mem_cgroup_move_account(struc
 	}
 
 	/*
+	 * All state has been migrated, let's switch to the new memcg.
+	 *
 	 * It is safe to change page->mem_cgroup here because the page
-	 * is referenced, charged, and isolated - we can't race with
-	 * uncharging, charging, migration, or LRU putback.
+	 * is referenced, charged, isolated, and locked: we can't race
+	 * with (un)charging, migration, LRU putback, or anything else
+	 * that would rely on a stable page->mem_cgroup.
+	 *
+	 * Note that lock_page_memcg is a memcg lock, not a page lock,
+	 * to save space. As soon as we switch page->mem_cgroup to a
+	 * new memcg that isn't locked, the above state can change
+	 * concurrently again. Make sure we're truly done with it.
 	 */
+	smp_mb();
 
-	/* caller should have done css_get */
-	page->mem_cgroup = to;
+	page->mem_cgroup = to; 	/* caller should have done css_get */
 
-	spin_unlock_irqrestore(&from->move_lock, flags);
+	__unlock_page_memcg(from);
 
 	ret = 0;
 
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (20 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-move-out-cgroup-swaprate-throttling.patch " Andrew Morton
                   ` (58 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: drop @compound parameter from memcg charging API
has been added to the -mm tree.  Its filename is
     mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: drop @compound parameter from memcg charging API

The memcg charging API carries a boolean @compound parameter that tells
whether the page we're dealing with is a hugepage. 
mem_cgroup_commit_charge() has another boolean @lrucare that indicates
whether the page needs LRU locking or not while charging.  The majority of
callsites know those parameters at compile time, which results in a lot of
naked "false, false" argument lists.  This makes for cryptic code and is a
breeding ground for subtle mistakes.

Thankfully, the huge page state can be inferred from the page itself and
doesn't need to be passed along.  This is safe because charging completes
before the page is published and somebody may split it.

Simplify the callsites by removing @compound, and let memcg infer the
state by using hpage_nr_pages() unconditionally.  That function does
PageTransHuge() to identify huge pages, which also helpfully asserts that
nobody passes in tail pages by accident.

The following patches will introduce a new charging API, best not to carry
over unnecessary weight.

Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |   22 +++++++-------------
 kernel/events/uprobes.c    |    6 ++---
 mm/filemap.c               |    6 ++---
 mm/huge_memory.c           |    8 +++----
 mm/khugepaged.c            |   20 +++++++++---------
 mm/memcontrol.c            |   38 +++++++++++++----------------------
 mm/memory.c                |   32 +++++++++++++----------------
 mm/migrate.c               |    6 ++---
 mm/shmem.c                 |   22 ++++++++------------
 mm/swapfile.c              |    9 +++-----
 mm/userfaultfd.c           |    6 ++---
 11 files changed, 77 insertions(+), 98 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/include/linux/memcontrol.h
@@ -410,15 +410,12 @@ static inline bool mem_cgroup_below_min(
 }
 
 int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp,
-			  bool compound);
+			  gfp_t gfp_mask, struct mem_cgroup **memcgp);
 int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp,
-			  bool compound);
+			  gfp_t gfp_mask, struct mem_cgroup **memcgp);
 void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
-			      bool lrucare, bool compound);
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
-		bool compound);
+			      bool lrucare);
+void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
 void mem_cgroup_uncharge(struct page *page);
 void mem_cgroup_uncharge_list(struct list_head *page_list);
 
@@ -910,8 +907,7 @@ static inline bool mem_cgroup_below_min(
 
 static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
 					gfp_t gfp_mask,
-					struct mem_cgroup **memcgp,
-					bool compound)
+					struct mem_cgroup **memcgp)
 {
 	*memcgp = NULL;
 	return 0;
@@ -920,8 +916,7 @@ static inline int mem_cgroup_try_charge(
 static inline int mem_cgroup_try_charge_delay(struct page *page,
 					      struct mm_struct *mm,
 					      gfp_t gfp_mask,
-					      struct mem_cgroup **memcgp,
-					      bool compound)
+					      struct mem_cgroup **memcgp)
 {
 	*memcgp = NULL;
 	return 0;
@@ -929,13 +924,12 @@ static inline int mem_cgroup_try_charge_
 
 static inline void mem_cgroup_commit_charge(struct page *page,
 					    struct mem_cgroup *memcg,
-					    bool lrucare, bool compound)
+					    bool lrucare)
 {
 }
 
 static inline void mem_cgroup_cancel_charge(struct page *page,
-					    struct mem_cgroup *memcg,
-					    bool compound)
+					    struct mem_cgroup *memcg)
 {
 }
 
--- a/kernel/events/uprobes.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/kernel/events/uprobes.c
@@ -169,7 +169,7 @@ static int __replace_page(struct vm_area
 
 	if (new_page) {
 		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
-					    &memcg, false);
+					    &memcg);
 		if (err)
 			return err;
 	}
@@ -181,7 +181,7 @@ static int __replace_page(struct vm_area
 	err = -EAGAIN;
 	if (!page_vma_mapped_walk(&pvmw)) {
 		if (new_page)
-			mem_cgroup_cancel_charge(new_page, memcg, false);
+			mem_cgroup_cancel_charge(new_page, memcg);
 		goto unlock;
 	}
 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
@@ -189,7 +189,7 @@ static int __replace_page(struct vm_area
 	if (new_page) {
 		get_page(new_page);
 		page_add_new_anon_rmap(new_page, vma, addr, false);
-		mem_cgroup_commit_charge(new_page, memcg, false, false);
+		mem_cgroup_commit_charge(new_page, memcg, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 	} else
 		/* no new page, just dec_mm_counter for old_page */
--- a/mm/filemap.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/filemap.c
@@ -842,7 +842,7 @@ static int __add_to_page_cache_locked(st
 
 	if (!huge) {
 		error = mem_cgroup_try_charge(page, current->mm,
-					      gfp_mask, &memcg, false);
+					      gfp_mask, &memcg);
 		if (error)
 			return error;
 	}
@@ -878,14 +878,14 @@ unlock:
 		goto error;
 
 	if (!huge)
-		mem_cgroup_commit_charge(page, memcg, false, false);
+		mem_cgroup_commit_charge(page, memcg, false);
 	trace_mm_filemap_add_to_page_cache(page);
 	return 0;
 error:
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
 	if (!huge)
-		mem_cgroup_cancel_charge(page, memcg, false);
+		mem_cgroup_cancel_charge(page, memcg);
 	put_page(page);
 	return xas_error(&xas);
 }
--- a/mm/huge_memory.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/huge_memory.c
@@ -594,7 +594,7 @@ static vm_fault_t __do_huge_pmd_anonymou
 
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
-	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
+	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg)) {
 		put_page(page);
 		count_vm_event(THP_FAULT_FALLBACK);
 		count_vm_event(THP_FAULT_FALLBACK_CHARGE);
@@ -630,7 +630,7 @@ static vm_fault_t __do_huge_pmd_anonymou
 			vm_fault_t ret2;
 
 			spin_unlock(vmf->ptl);
-			mem_cgroup_cancel_charge(page, memcg, true);
+			mem_cgroup_cancel_charge(page, memcg);
 			put_page(page);
 			pte_free(vma->vm_mm, pgtable);
 			ret2 = handle_userfault(vmf, VM_UFFD_MISSING);
@@ -641,7 +641,7 @@ static vm_fault_t __do_huge_pmd_anonymou
 		entry = mk_huge_pmd(page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 		page_add_new_anon_rmap(page, vma, haddr, true);
-		mem_cgroup_commit_charge(page, memcg, false, true);
+		mem_cgroup_commit_charge(page, memcg, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
 		set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
@@ -658,7 +658,7 @@ unlock_release:
 release:
 	if (pgtable)
 		pte_free(vma->vm_mm, pgtable);
-	mem_cgroup_cancel_charge(page, memcg, true);
+	mem_cgroup_cancel_charge(page, memcg);
 	put_page(page);
 	return ret;
 
--- a/mm/khugepaged.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/khugepaged.c
@@ -974,7 +974,7 @@ static void collapse_huge_page(struct mm
 		goto out_nolock;
 	}
 
-	if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) {
+	if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
 		result = SCAN_CGROUP_CHARGE_FAIL;
 		goto out_nolock;
 	}
@@ -982,7 +982,7 @@ static void collapse_huge_page(struct mm
 	down_read(&mm->mmap_sem);
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result) {
-		mem_cgroup_cancel_charge(new_page, memcg, true);
+		mem_cgroup_cancel_charge(new_page, memcg);
 		up_read(&mm->mmap_sem);
 		goto out_nolock;
 	}
@@ -990,7 +990,7 @@ static void collapse_huge_page(struct mm
 	pmd = mm_find_pmd(mm, address);
 	if (!pmd) {
 		result = SCAN_PMD_NULL;
-		mem_cgroup_cancel_charge(new_page, memcg, true);
+		mem_cgroup_cancel_charge(new_page, memcg);
 		up_read(&mm->mmap_sem);
 		goto out_nolock;
 	}
@@ -1001,7 +1001,7 @@ static void collapse_huge_page(struct mm
 	 * Continuing to collapse causes inconsistency.
 	 */
 	if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) {
-		mem_cgroup_cancel_charge(new_page, memcg, true);
+		mem_cgroup_cancel_charge(new_page, memcg);
 		up_read(&mm->mmap_sem);
 		goto out_nolock;
 	}
@@ -1087,7 +1087,7 @@ static void collapse_huge_page(struct mm
 	spin_lock(pmd_ptl);
 	BUG_ON(!pmd_none(*pmd));
 	page_add_new_anon_rmap(new_page, vma, address, true);
-	mem_cgroup_commit_charge(new_page, memcg, false, true);
+	mem_cgroup_commit_charge(new_page, memcg, false);
 	count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
 	lru_cache_add_active_or_unevictable(new_page, vma);
 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
@@ -1105,7 +1105,7 @@ out_nolock:
 	trace_mm_collapse_huge_page(mm, isolated, result);
 	return;
 out:
-	mem_cgroup_cancel_charge(new_page, memcg, true);
+	mem_cgroup_cancel_charge(new_page, memcg);
 	goto out_up_write;
 }
 
@@ -1534,7 +1534,7 @@ static void collapse_file(struct mm_stru
 		goto out;
 	}
 
-	if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) {
+	if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
 		result = SCAN_CGROUP_CHARGE_FAIL;
 		goto out;
 	}
@@ -1547,7 +1547,7 @@ static void collapse_file(struct mm_stru
 			break;
 		xas_unlock_irq(&xas);
 		if (!xas_nomem(&xas, GFP_KERNEL)) {
-			mem_cgroup_cancel_charge(new_page, memcg, true);
+			mem_cgroup_cancel_charge(new_page, memcg);
 			result = SCAN_FAIL;
 			goto out;
 		}
@@ -1783,7 +1783,7 @@ xa_unlocked:
 
 		SetPageUptodate(new_page);
 		page_ref_add(new_page, HPAGE_PMD_NR - 1);
-		mem_cgroup_commit_charge(new_page, memcg, false, true);
+		mem_cgroup_commit_charge(new_page, memcg, false);
 
 		if (is_shmem) {
 			set_page_dirty(new_page);
@@ -1838,7 +1838,7 @@ xa_unlocked:
 		VM_BUG_ON(nr_none);
 		xas_unlock_irq(&xas);
 
-		mem_cgroup_cancel_charge(new_page, memcg, true);
+		mem_cgroup_cancel_charge(new_page, memcg);
 		new_page->mapping = NULL;
 	}
 
--- a/mm/memcontrol.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/memcontrol.c
@@ -834,7 +834,7 @@ static unsigned long memcg_events_local(
 
 static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
 					 struct page *page,
-					 bool compound, int nr_pages)
+					 int nr_pages)
 {
 	/*
 	 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
@@ -848,7 +848,7 @@ static void mem_cgroup_charge_statistics
 			__mod_memcg_state(memcg, NR_SHMEM, nr_pages);
 	}
 
-	if (compound) {
+	if (abs(nr_pages) > 1) {
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
 		__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
 	}
@@ -5447,9 +5447,9 @@ static int mem_cgroup_move_account(struc
 	ret = 0;
 
 	local_irq_disable();
-	mem_cgroup_charge_statistics(to, page, compound, nr_pages);
+	mem_cgroup_charge_statistics(to, page, nr_pages);
 	memcg_check_events(to, page);
-	mem_cgroup_charge_statistics(from, page, compound, -nr_pages);
+	mem_cgroup_charge_statistics(from, page, -nr_pages);
 	memcg_check_events(from, page);
 	local_irq_enable();
 out_unlock:
@@ -6434,7 +6434,6 @@ void mem_cgroup_calculate_protection(str
  * @mm: mm context of the victim
  * @gfp_mask: reclaim mode
  * @memcgp: charged memcg return
- * @compound: charge the page as compound or small page
  *
  * Try to charge @page to the memcg that @mm belongs to, reclaiming
  * pages according to @gfp_mask if necessary.
@@ -6447,11 +6446,10 @@ void mem_cgroup_calculate_protection(str
  * with mem_cgroup_cancel_charge() in case page instantiation fails.
  */
 int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp,
-			  bool compound)
+			  gfp_t gfp_mask, struct mem_cgroup **memcgp)
 {
+	unsigned int nr_pages = hpage_nr_pages(page);
 	struct mem_cgroup *memcg = NULL;
-	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
 	int ret = 0;
 
 	if (mem_cgroup_disabled())
@@ -6493,13 +6491,12 @@ out:
 }
 
 int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp,
-			  bool compound)
+			  gfp_t gfp_mask, struct mem_cgroup **memcgp)
 {
 	struct mem_cgroup *memcg;
 	int ret;
 
-	ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp, compound);
+	ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
 	memcg = *memcgp;
 	mem_cgroup_throttle_swaprate(memcg, page_to_nid(page), gfp_mask);
 	return ret;
@@ -6510,7 +6507,6 @@ int mem_cgroup_try_charge_delay(struct p
  * @page: page to charge
  * @memcg: memcg to charge the page to
  * @lrucare: page might be on LRU already
- * @compound: charge the page as compound or small page
  *
  * Finalize a charge transaction started by mem_cgroup_try_charge(),
  * after page->mapping has been set up.  This must happen atomically
@@ -6523,9 +6519,9 @@ int mem_cgroup_try_charge_delay(struct p
  * Use mem_cgroup_cancel_charge() to cancel the transaction instead.
  */
 void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
-			      bool lrucare, bool compound)
+			      bool lrucare)
 {
-	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+	unsigned int nr_pages = hpage_nr_pages(page);
 
 	VM_BUG_ON_PAGE(!page->mapping, page);
 	VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
@@ -6543,7 +6539,7 @@ void mem_cgroup_commit_charge(struct pag
 	commit_charge(page, memcg, lrucare);
 
 	local_irq_disable();
-	mem_cgroup_charge_statistics(memcg, page, compound, nr_pages);
+	mem_cgroup_charge_statistics(memcg, page, nr_pages);
 	memcg_check_events(memcg, page);
 	local_irq_enable();
 
@@ -6562,14 +6558,12 @@ void mem_cgroup_commit_charge(struct pag
  * mem_cgroup_cancel_charge - cancel a page charge
  * @page: page to charge
  * @memcg: memcg to charge the page to
- * @compound: charge the page as compound or small page
  *
  * Cancel a charge transaction started by mem_cgroup_try_charge().
  */
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
-		bool compound)
+void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
 {
-	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+	unsigned int nr_pages = hpage_nr_pages(page);
 
 	if (mem_cgroup_disabled())
 		return;
@@ -6784,8 +6778,7 @@ void mem_cgroup_migrate(struct page *old
 	commit_charge(newpage, memcg, false);
 
 	local_irq_save(flags);
-	mem_cgroup_charge_statistics(memcg, newpage, PageTransHuge(newpage),
-			nr_pages);
+	mem_cgroup_charge_statistics(memcg, newpage, nr_pages);
 	memcg_check_events(memcg, newpage);
 	local_irq_restore(flags);
 }
@@ -7015,8 +7008,7 @@ void mem_cgroup_swapout(struct page *pag
 	 * only synchronisation we have for updating the per-CPU variables.
 	 */
 	VM_BUG_ON(!irqs_disabled());
-	mem_cgroup_charge_statistics(memcg, page, PageTransHuge(page),
-				     -nr_entries);
+	mem_cgroup_charge_statistics(memcg, page, -nr_entries);
 	memcg_check_events(memcg, page);
 
 	if (!mem_cgroup_is_root(memcg))
--- a/mm/memory.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/memory.c
@@ -2678,7 +2678,7 @@ static vm_fault_t wp_page_copy(struct vm
 		}
 	}
 
-	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
+	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg))
 		goto oom_free_new;
 
 	__SetPageUptodate(new_page);
@@ -2713,7 +2713,7 @@ static vm_fault_t wp_page_copy(struct vm
 		 */
 		ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
 		page_add_new_anon_rmap(new_page, vma, vmf->address, false);
-		mem_cgroup_commit_charge(new_page, memcg, false, false);
+		mem_cgroup_commit_charge(new_page, memcg, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 		/*
 		 * We call the notify macro here because, when using secondary
@@ -2752,7 +2752,7 @@ static vm_fault_t wp_page_copy(struct vm
 		new_page = old_page;
 		page_copied = 1;
 	} else {
-		mem_cgroup_cancel_charge(new_page, memcg, false);
+		mem_cgroup_cancel_charge(new_page, memcg);
 	}
 
 	if (new_page)
@@ -3195,8 +3195,7 @@ vm_fault_t do_swap_page(struct vm_fault
 		goto out_page;
 	}
 
-	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL,
-					&memcg, false)) {
+	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
 		ret = VM_FAULT_OOM;
 		goto out_page;
 	}
@@ -3247,11 +3246,11 @@ vm_fault_t do_swap_page(struct vm_fault
 	/* ksm created a completely new copy */
 	if (unlikely(page != swapcache && swapcache)) {
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
-		mem_cgroup_commit_charge(page, memcg, false, false);
+		mem_cgroup_commit_charge(page, memcg, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
 		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
-		mem_cgroup_commit_charge(page, memcg, true, false);
+		mem_cgroup_commit_charge(page, memcg, true);
 		activate_page(page);
 	}
 
@@ -3287,7 +3286,7 @@ unlock:
 out:
 	return ret;
 out_nomap:
-	mem_cgroup_cancel_charge(page, memcg, false);
+	mem_cgroup_cancel_charge(page, memcg);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out_page:
 	unlock_page(page);
@@ -3361,8 +3360,7 @@ static vm_fault_t do_anonymous_page(stru
 	if (!page)
 		goto oom;
 
-	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg,
-					false))
+	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg))
 		goto oom_free_page;
 
 	/*
@@ -3388,14 +3386,14 @@ static vm_fault_t do_anonymous_page(stru
 	/* Deliver the page fault to userland, check inside PT lock */
 	if (userfaultfd_missing(vma)) {
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		mem_cgroup_cancel_charge(page, memcg, false);
+		mem_cgroup_cancel_charge(page, memcg);
 		put_page(page);
 		return handle_userfault(vmf, VM_UFFD_MISSING);
 	}
 
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 	page_add_new_anon_rmap(page, vma, vmf->address, false);
-	mem_cgroup_commit_charge(page, memcg, false, false);
+	mem_cgroup_commit_charge(page, memcg, false);
 	lru_cache_add_active_or_unevictable(page, vma);
 setpte:
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3406,7 +3404,7 @@ unlock:
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	return ret;
 release:
-	mem_cgroup_cancel_charge(page, memcg, false);
+	mem_cgroup_cancel_charge(page, memcg);
 	put_page(page);
 	goto unlock;
 oom_free_page:
@@ -3657,7 +3655,7 @@ vm_fault_t alloc_set_pte(struct vm_fault
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
-		mem_cgroup_commit_charge(page, memcg, false, false);
+		mem_cgroup_commit_charge(page, memcg, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
 		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
@@ -3866,8 +3864,8 @@ static vm_fault_t do_cow_fault(struct vm
 	if (!vmf->cow_page)
 		return VM_FAULT_OOM;
 
-	if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm, GFP_KERNEL,
-				&vmf->memcg, false)) {
+	if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm,
+					GFP_KERNEL, &vmf->memcg)) {
 		put_page(vmf->cow_page);
 		return VM_FAULT_OOM;
 	}
@@ -3888,7 +3886,7 @@ static vm_fault_t do_cow_fault(struct vm
 		goto uncharge_out;
 	return ret;
 uncharge_out:
-	mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg, false);
+	mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg);
 	put_page(vmf->cow_page);
 	return ret;
 }
--- a/mm/migrate.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/migrate.c
@@ -2786,7 +2786,7 @@ static void migrate_vma_insert_page(stru
 
 	if (unlikely(anon_vma_prepare(vma)))
 		goto abort;
-	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg, false))
+	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg))
 		goto abort;
 
 	/*
@@ -2832,7 +2832,7 @@ static void migrate_vma_insert_page(stru
 
 	inc_mm_counter(mm, MM_ANONPAGES);
 	page_add_new_anon_rmap(page, vma, addr, false);
-	mem_cgroup_commit_charge(page, memcg, false, false);
+	mem_cgroup_commit_charge(page, memcg, false);
 	if (!is_zone_device_page(page))
 		lru_cache_add_active_or_unevictable(page, vma);
 	get_page(page);
@@ -2854,7 +2854,7 @@ static void migrate_vma_insert_page(stru
 
 unlock_abort:
 	pte_unmap_unlock(ptep, ptl);
-	mem_cgroup_cancel_charge(page, memcg, false);
+	mem_cgroup_cancel_charge(page, memcg);
 abort:
 	*src &= ~MIGRATE_PFN_MIGRATE;
 }
--- a/mm/shmem.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/shmem.c
@@ -1664,8 +1664,7 @@ static int shmem_swapin_page(struct inod
 			goto failed;
 	}
 
-	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
-					    false);
+	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
 	if (!error) {
 		error = shmem_add_to_page_cache(page, mapping, index,
 						swp_to_radix_entry(swap), gfp);
@@ -1680,14 +1679,14 @@ static int shmem_swapin_page(struct inod
 		 * the rest.
 		 */
 		if (error) {
-			mem_cgroup_cancel_charge(page, memcg, false);
+			mem_cgroup_cancel_charge(page, memcg);
 			delete_from_swap_cache(page);
 		}
 	}
 	if (error)
 		goto failed;
 
-	mem_cgroup_commit_charge(page, memcg, true, false);
+	mem_cgroup_commit_charge(page, memcg, true);
 
 	spin_lock_irq(&info->lock);
 	info->swapped--;
@@ -1859,8 +1858,7 @@ alloc_nohuge:
 	if (sgp == SGP_WRITE)
 		__SetPageReferenced(page);
 
-	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
-					    PageTransHuge(page));
+	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
 	if (error) {
 		if (PageTransHuge(page)) {
 			count_vm_event(THP_FILE_FALLBACK);
@@ -1871,12 +1869,10 @@ alloc_nohuge:
 	error = shmem_add_to_page_cache(page, mapping, hindex,
 					NULL, gfp & GFP_RECLAIM_MASK);
 	if (error) {
-		mem_cgroup_cancel_charge(page, memcg,
-					 PageTransHuge(page));
+		mem_cgroup_cancel_charge(page, memcg);
 		goto unacct;
 	}
-	mem_cgroup_commit_charge(page, memcg, false,
-				 PageTransHuge(page));
+	mem_cgroup_commit_charge(page, memcg, false);
 	lru_cache_add_anon(page);
 
 	spin_lock_irq(&info->lock);
@@ -2364,7 +2360,7 @@ static int shmem_mfill_atomic_pte(struct
 	if (unlikely(offset >= max_off))
 		goto out_release;
 
-	ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg, false);
+	ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg);
 	if (ret)
 		goto out_release;
 
@@ -2373,7 +2369,7 @@ static int shmem_mfill_atomic_pte(struct
 	if (ret)
 		goto out_release_uncharge;
 
-	mem_cgroup_commit_charge(page, memcg, false, false);
+	mem_cgroup_commit_charge(page, memcg, false);
 
 	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
 	if (dst_vma->vm_flags & VM_WRITE)
@@ -2424,7 +2420,7 @@ out_release_uncharge_unlock:
 	ClearPageDirty(page);
 	delete_from_page_cache(page);
 out_release_uncharge:
-	mem_cgroup_cancel_charge(page, memcg, false);
+	mem_cgroup_cancel_charge(page, memcg);
 out_release:
 	unlock_page(page);
 	put_page(page);
--- a/mm/swapfile.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/swapfile.c
@@ -1863,15 +1863,14 @@ static int unuse_pte(struct vm_area_stru
 	if (unlikely(!page))
 		return -ENOMEM;
 
-	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL,
-				&memcg, false)) {
+	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
 		ret = -ENOMEM;
 		goto out_nolock;
 	}
 
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
-		mem_cgroup_cancel_charge(page, memcg, false);
+		mem_cgroup_cancel_charge(page, memcg);
 		ret = 0;
 		goto out;
 	}
@@ -1883,10 +1882,10 @@ static int unuse_pte(struct vm_area_stru
 		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	if (page == swapcache) {
 		page_add_anon_rmap(page, vma, addr, false);
-		mem_cgroup_commit_charge(page, memcg, true, false);
+		mem_cgroup_commit_charge(page, memcg, true);
 	} else { /* ksm created a completely new copy */
 		page_add_new_anon_rmap(page, vma, addr, false);
-		mem_cgroup_commit_charge(page, memcg, false, false);
+		mem_cgroup_commit_charge(page, memcg, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	}
 	swap_free(entry);
--- a/mm/userfaultfd.c~mm-memcontrol-drop-compound-parameter-from-memcg-charging-api
+++ a/mm/userfaultfd.c
@@ -97,7 +97,7 @@ static int mcopy_atomic_pte(struct mm_st
 	__SetPageUptodate(page);
 
 	ret = -ENOMEM;
-	if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg, false))
+	if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg))
 		goto out_release;
 
 	_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
@@ -124,7 +124,7 @@ static int mcopy_atomic_pte(struct mm_st
 
 	inc_mm_counter(dst_mm, MM_ANONPAGES);
 	page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
-	mem_cgroup_commit_charge(page, memcg, false, false);
+	mem_cgroup_commit_charge(page, memcg, false);
 	lru_cache_add_active_or_unevictable(page, dst_vma);
 
 	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
@@ -138,7 +138,7 @@ out:
 	return ret;
 out_release_uncharge_unlock:
 	pte_unmap_unlock(dst_pte, ptl);
-	mem_cgroup_cancel_charge(page, memcg, false);
+	mem_cgroup_cancel_charge(page, memcg);
 out_release:
 	put_page(page);
 	goto out;
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-move-out-cgroup-swaprate-throttling.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (21 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch " Andrew Morton
                   ` (57 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: move out cgroup swaprate throttling
has been added to the -mm tree.  Its filename is
     mm-memcontrol-move-out-cgroup-swaprate-throttling.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-move-out-cgroup-swaprate-throttling.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: move out cgroup swaprate throttling

The cgroup swaprate throttling is about matching new anon allocations to
the rate of available IO when that is being throttled.  It's the io
controller hooking into the VM, rather than a memory controller thing.

Rename mem_cgroup_throttle_swaprate() to cgroup_throttle_swaprate(), and
drop the @memcg argument which is only used to check whether the preceding
page charge has succeeded and the fault is proceeding.

We could decouple the call from mem_cgroup_try_charge() here as well, but
that would cause unnecessary churn: the following patches convert all
callsites to a new charge API and we'll decouple as we go along.

Link: http://lkml.kernel.org/r/20200508183105.225460-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/swap.h |    6 ++----
 mm/memcontrol.c      |    5 ++---
 mm/swapfile.c        |   14 +++++++-------
 3 files changed, 11 insertions(+), 14 deletions(-)

--- a/include/linux/swap.h~mm-memcontrol-move-out-cgroup-swaprate-throttling
+++ a/include/linux/swap.h
@@ -650,11 +650,9 @@ static inline int mem_cgroup_swappiness(
 #endif
 
 #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
-extern void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
-					 gfp_t gfp_mask);
+extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
 #else
-static inline void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg,
-						int node, gfp_t gfp_mask)
+static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask)
 {
 }
 #endif
--- a/mm/memcontrol.c~mm-memcontrol-move-out-cgroup-swaprate-throttling
+++ a/mm/memcontrol.c
@@ -6493,12 +6493,11 @@ out:
 int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
 			  gfp_t gfp_mask, struct mem_cgroup **memcgp)
 {
-	struct mem_cgroup *memcg;
 	int ret;
 
 	ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
-	memcg = *memcgp;
-	mem_cgroup_throttle_swaprate(memcg, page_to_nid(page), gfp_mask);
+	if (*memcgp)
+		cgroup_throttle_swaprate(page, gfp_mask);
 	return ret;
 }
 
--- a/mm/swapfile.c~mm-memcontrol-move-out-cgroup-swaprate-throttling
+++ a/mm/swapfile.c
@@ -3743,11 +3743,12 @@ static void free_swap_count_continuation
 }
 
 #if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
-void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
-				  gfp_t gfp_mask)
+void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask)
 {
 	struct swap_info_struct *si, *next;
-	if (!(gfp_mask & __GFP_IO) || !memcg)
+	int nid = page_to_nid(page);
+
+	if (!(gfp_mask & __GFP_IO))
 		return;
 
 	if (!blk_cgroup_congested())
@@ -3761,11 +3762,10 @@ void mem_cgroup_throttle_swaprate(struct
 		return;
 
 	spin_lock(&swap_avail_lock);
-	plist_for_each_entry_safe(si, next, &swap_avail_heads[node],
-				  avail_lists[node]) {
+	plist_for_each_entry_safe(si, next, &swap_avail_heads[nid],
+				  avail_lists[nid]) {
 		if (si->bdev) {
-			blkcg_schedule_throttle(bdev_get_queue(si->bdev),
-						true);
+			blkcg_schedule_throttle(bdev_get_queue(si->bdev), true);
 			break;
 		}
 	}
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (22 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-move-out-cgroup-swaprate-throttling.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch " Andrew Morton
                   ` (56 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: convert page cache to a new mem_cgroup_charge() API
has been added to the -mm tree.  Its filename is
     mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: convert page cache to a new mem_cgroup_charge() API

The try/commit/cancel protocol that memcg uses dates back to when pages
used to be uncharged upon removal from the page cache, and thus couldn't
be committed before the insertion had succeeded.  Nowadays, pages are
uncharged when they are physically freed; it doesn't matter whether the
insertion was successful or not.  For the page cache, the transaction
dance has become unnecessary.

Introduce a mem_cgroup_charge() function that simply charges a newly
allocated page to a cgroup and sets up page->mem_cgroup in one single
step.  If the insertion fails, the caller doesn't have to do anything but
free/put the page.

Then switch the page cache over to this new API.

Subsequent patches will also convert anon pages, but it needs a bit more
prep work.  Right now, memcg depends on page->mapping being already set up
at the time of charging, so that it can maintain its own MEMCG_CACHE and
MEMCG_RSS counters.  For anon, page->mapping is set under the same pte
lock under which the page is publishd, so a single charge point that can
block doesn't work there just yet.

The following prep patches will replace the private memcg counters with
the generic vmstat counters, thus removing the page->mapping dependency,
then complete the transition to the new single-point charge API and delete
the old transactional scheme.

v2: leave shmem swapcache when charging fails to avoid double IO (Joonsoo)

Link: http://lkml.kernel.org/r/20200508183105.225460-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |   10 +++
 mm/filemap.c               |   24 +++------
 mm/memcontrol.c            |   29 ++++++++++-
 mm/shmem.c                 |   88 +++++++++++++++--------------------
 4 files changed, 85 insertions(+), 66 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api
+++ a/include/linux/memcontrol.h
@@ -416,6 +416,10 @@ int mem_cgroup_try_charge_delay(struct p
 void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
 			      bool lrucare);
 void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
+
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
+		      bool lrucare);
+
 void mem_cgroup_uncharge(struct page *page);
 void mem_cgroup_uncharge_list(struct list_head *page_list);
 
@@ -933,6 +937,12 @@ static inline void mem_cgroup_cancel_cha
 {
 }
 
+static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
+				    gfp_t gfp_mask, bool lrucare)
+{
+	return 0;
+}
+
 static inline void mem_cgroup_uncharge(struct page *page)
 {
 }
--- a/mm/filemap.c~mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api
+++ a/mm/filemap.c
@@ -832,7 +832,6 @@ static int __add_to_page_cache_locked(st
 {
 	XA_STATE(xas, &mapping->i_pages, offset);
 	int huge = PageHuge(page);
-	struct mem_cgroup *memcg;
 	int error;
 	void *old;
 
@@ -840,17 +839,16 @@ static int __add_to_page_cache_locked(st
 	VM_BUG_ON_PAGE(PageSwapBacked(page), page);
 	mapping_set_update(&xas, mapping);
 
-	if (!huge) {
-		error = mem_cgroup_try_charge(page, current->mm,
-					      gfp_mask, &memcg);
-		if (error)
-			return error;
-	}
-
 	get_page(page);
 	page->mapping = mapping;
 	page->index = offset;
 
+	if (!huge) {
+		error = mem_cgroup_charge(page, current->mm, gfp_mask, false);
+		if (error)
+			goto error;
+	}
+
 	do {
 		xas_lock_irq(&xas);
 		old = xas_load(&xas);
@@ -874,20 +872,18 @@ unlock:
 		xas_unlock_irq(&xas);
 	} while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK));
 
-	if (xas_error(&xas))
+	if (xas_error(&xas)) {
+		error = xas_error(&xas);
 		goto error;
+	}
 
-	if (!huge)
-		mem_cgroup_commit_charge(page, memcg, false);
 	trace_mm_filemap_add_to_page_cache(page);
 	return 0;
 error:
 	page->mapping = NULL;
 	/* Leave page->index set: truncation relies upon it */
-	if (!huge)
-		mem_cgroup_cancel_charge(page, memcg);
 	put_page(page);
-	return xas_error(&xas);
+	return error;
 }
 ALLOW_ERROR_INJECTION(__add_to_page_cache_locked, ERRNO);
 
--- a/mm/memcontrol.c~mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api
+++ a/mm/memcontrol.c
@@ -6577,6 +6577,33 @@ void mem_cgroup_cancel_charge(struct pag
 	cancel_charge(memcg, nr_pages);
 }
 
+/**
+ * mem_cgroup_charge - charge a newly allocated page to a cgroup
+ * @page: page to charge
+ * @mm: mm context of the victim
+ * @gfp_mask: reclaim mode
+ * @lrucare: page might be on the LRU already
+ *
+ * Try to charge @page to the memcg that @mm belongs to, reclaiming
+ * pages according to @gfp_mask if necessary.
+ *
+ * Returns 0 on success. Otherwise, an error code is returned.
+ */
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
+		      bool lrucare)
+{
+	struct mem_cgroup *memcg;
+	int ret;
+
+	VM_BUG_ON_PAGE(!page->mapping, page);
+
+	ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
+	if (ret)
+		return ret;
+	mem_cgroup_commit_charge(page, memcg, lrucare);
+	return 0;
+}
+
 struct uncharge_gather {
 	struct mem_cgroup *memcg;
 	unsigned long pgpgout;
@@ -6624,8 +6651,6 @@ static void uncharge_batch(const struct
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
 	VM_BUG_ON_PAGE(PageLRU(page), page);
-	VM_BUG_ON_PAGE(page_count(page) && !is_zone_device_page(page) &&
-			!PageHWPoison(page) , page);
 
 	if (!page->mem_cgroup)
 		return;
--- a/mm/shmem.c~mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api
+++ a/mm/shmem.c
@@ -605,11 +605,13 @@ static inline bool is_huge_enabled(struc
  */
 static int shmem_add_to_page_cache(struct page *page,
 				   struct address_space *mapping,
-				   pgoff_t index, void *expected, gfp_t gfp)
+				   pgoff_t index, void *expected, gfp_t gfp,
+				   struct mm_struct *charge_mm)
 {
 	XA_STATE_ORDER(xas, &mapping->i_pages, index, compound_order(page));
 	unsigned long i = 0;
 	unsigned long nr = compound_nr(page);
+	int error;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
 	VM_BUG_ON_PAGE(index != round_down(index, nr), page);
@@ -621,12 +623,22 @@ static int shmem_add_to_page_cache(struc
 	page->mapping = mapping;
 	page->index = index;
 
+	error = mem_cgroup_charge(page, charge_mm, gfp, PageSwapCache(page));
+	if (error) {
+		if (!PageSwapCache(page) && PageTransHuge(page)) {
+			count_vm_event(THP_FILE_FALLBACK);
+			count_vm_event(THP_FILE_FALLBACK_CHARGE);
+		}
+		goto error;
+	}
+	cgroup_throttle_swaprate(page, gfp);
+
 	do {
 		void *entry;
 		xas_lock_irq(&xas);
 		entry = xas_find_conflict(&xas);
 		if (entry != expected)
-			xas_set_err(&xas, -EEXIST);
+			xas_set_err(&xas, expected ? -ENOENT : -EEXIST);
 		xas_create_range(&xas);
 		if (xas_error(&xas))
 			goto unlock;
@@ -648,12 +660,15 @@ unlock:
 	} while (xas_nomem(&xas, gfp));
 
 	if (xas_error(&xas)) {
-		page->mapping = NULL;
-		page_ref_sub(page, nr);
-		return xas_error(&xas);
+		error = xas_error(&xas);
+		goto error;
 	}
 
 	return 0;
+error:
+	page->mapping = NULL;
+	page_ref_sub(page, nr);
+	return error;
 }
 
 /*
@@ -1619,7 +1634,6 @@ static int shmem_swapin_page(struct inod
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct mm_struct *charge_mm = vma ? vma->vm_mm : current->mm;
-	struct mem_cgroup *memcg;
 	struct page *page;
 	swp_entry_t swap;
 	int error;
@@ -1664,29 +1678,23 @@ static int shmem_swapin_page(struct inod
 			goto failed;
 	}
 
-	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
-	if (!error) {
-		error = shmem_add_to_page_cache(page, mapping, index,
-						swp_to_radix_entry(swap), gfp);
+	error = shmem_add_to_page_cache(page, mapping, index,
+					swp_to_radix_entry(swap), gfp,
+					charge_mm);
+	if (error) {
 		/*
-		 * We already confirmed swap under page lock, and make
-		 * no memory allocation here, so usually no possibility
-		 * of error; but free_swap_and_cache() only trylocks a
-		 * page, so it is just possible that the entry has been
-		 * truncated or holepunched since swap was confirmed.
+		 * We already confirmed swap under page lock, but
+		 * free_swap_and_cache() only trylocks a page, so it
+		 * is just possible that the entry has been truncated
+		 * or holepunched since swap was confirmed.
 		 * shmem_undo_range() will have done some of the
 		 * unaccounting, now delete_from_swap_cache() will do
 		 * the rest.
 		 */
-		if (error) {
-			mem_cgroup_cancel_charge(page, memcg);
+		if (error == -ENOENT)
 			delete_from_swap_cache(page);
-		}
-	}
-	if (error)
 		goto failed;
-
-	mem_cgroup_commit_charge(page, memcg, true);
+	}
 
 	spin_lock_irq(&info->lock);
 	info->swapped--;
@@ -1733,7 +1741,6 @@ static int shmem_getpage_gfp(struct inod
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct shmem_sb_info *sbinfo;
 	struct mm_struct *charge_mm;
-	struct mem_cgroup *memcg;
 	struct page *page;
 	enum sgp_type sgp_huge = sgp;
 	pgoff_t hindex = index;
@@ -1858,21 +1865,11 @@ alloc_nohuge:
 	if (sgp == SGP_WRITE)
 		__SetPageReferenced(page);
 
-	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
-	if (error) {
-		if (PageTransHuge(page)) {
-			count_vm_event(THP_FILE_FALLBACK);
-			count_vm_event(THP_FILE_FALLBACK_CHARGE);
-		}
-		goto unacct;
-	}
 	error = shmem_add_to_page_cache(page, mapping, hindex,
-					NULL, gfp & GFP_RECLAIM_MASK);
-	if (error) {
-		mem_cgroup_cancel_charge(page, memcg);
+					NULL, gfp & GFP_RECLAIM_MASK,
+					charge_mm);
+	if (error)
 		goto unacct;
-	}
-	mem_cgroup_commit_charge(page, memcg, false);
 	lru_cache_add_anon(page);
 
 	spin_lock_irq(&info->lock);
@@ -2310,7 +2307,6 @@ static int shmem_mfill_atomic_pte(struct
 	struct address_space *mapping = inode->i_mapping;
 	gfp_t gfp = mapping_gfp_mask(mapping);
 	pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
-	struct mem_cgroup *memcg;
 	spinlock_t *ptl;
 	void *page_kaddr;
 	struct page *page;
@@ -2360,16 +2356,10 @@ static int shmem_mfill_atomic_pte(struct
 	if (unlikely(offset >= max_off))
 		goto out_release;
 
-	ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg);
-	if (ret)
-		goto out_release;
-
 	ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
-						gfp & GFP_RECLAIM_MASK);
+				      gfp & GFP_RECLAIM_MASK, dst_mm);
 	if (ret)
-		goto out_release_uncharge;
-
-	mem_cgroup_commit_charge(page, memcg, false);
+		goto out_release;
 
 	_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
 	if (dst_vma->vm_flags & VM_WRITE)
@@ -2390,11 +2380,11 @@ static int shmem_mfill_atomic_pte(struct
 	ret = -EFAULT;
 	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
 	if (unlikely(offset >= max_off))
-		goto out_release_uncharge_unlock;
+		goto out_release_unlock;
 
 	ret = -EEXIST;
 	if (!pte_none(*dst_pte))
-		goto out_release_uncharge_unlock;
+		goto out_release_unlock;
 
 	lru_cache_add_anon(page);
 
@@ -2415,12 +2405,10 @@ static int shmem_mfill_atomic_pte(struct
 	ret = 0;
 out:
 	return ret;
-out_release_uncharge_unlock:
+out_release_unlock:
 	pte_unmap_unlock(dst_pte, ptl);
 	ClearPageDirty(page);
 	delete_from_page_cache(page);
-out_release_uncharge:
-	mem_cgroup_cancel_charge(page, memcg);
 out_release:
 	unlock_page(page);
 	put_page(page);
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (23 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch " Andrew Morton
                   ` (55 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: prepare uncharging for removal of private page type counters
has been added to the -mm tree.  Its filename is
     mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: prepare uncharging for removal of private page type counters

The uncharge batching code adds up the anon, file, kmem counts to
determine the total number of pages to uncharge and references to drop. 
But the next patches will remove the anon and file counters.

Maintain an aggregate nr_pages in the uncharge_gather struct.

Link: http://lkml.kernel.org/r/20200508183105.225460-7-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters
+++ a/mm/memcontrol.c
@@ -6606,6 +6606,7 @@ int mem_cgroup_charge(struct page *page,
 
 struct uncharge_gather {
 	struct mem_cgroup *memcg;
+	unsigned long nr_pages;
 	unsigned long pgpgout;
 	unsigned long nr_anon;
 	unsigned long nr_file;
@@ -6622,13 +6623,12 @@ static inline void uncharge_gather_clear
 
 static void uncharge_batch(const struct uncharge_gather *ug)
 {
-	unsigned long nr_pages = ug->nr_anon + ug->nr_file + ug->nr_kmem;
 	unsigned long flags;
 
 	if (!mem_cgroup_is_root(ug->memcg)) {
-		page_counter_uncharge(&ug->memcg->memory, nr_pages);
+		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
 		if (do_memsw_account())
-			page_counter_uncharge(&ug->memcg->memsw, nr_pages);
+			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
 		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
 			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
 		memcg_oom_recover(ug->memcg);
@@ -6640,16 +6640,18 @@ static void uncharge_batch(const struct
 	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
 	__mod_memcg_state(ug->memcg, NR_SHMEM, -ug->nr_shmem);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
-	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, nr_pages);
+	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
 	memcg_check_events(ug->memcg, ug->dummy_page);
 	local_irq_restore(flags);
 
 	if (!mem_cgroup_is_root(ug->memcg))
-		css_put_many(&ug->memcg->css, nr_pages);
+		css_put_many(&ug->memcg->css, ug->nr_pages);
 }
 
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
+	unsigned long nr_pages;
+
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
 	if (!page->mem_cgroup)
@@ -6669,13 +6671,12 @@ static void uncharge_page(struct page *p
 		ug->memcg = page->mem_cgroup;
 	}
 
-	if (!PageKmemcg(page)) {
-		unsigned int nr_pages = 1;
+	nr_pages = compound_nr(page);
+	ug->nr_pages += nr_pages;
 
-		if (PageTransHuge(page)) {
-			nr_pages = compound_nr(page);
+	if (!PageKmemcg(page)) {
+		if (PageTransHuge(page))
 			ug->nr_huge += nr_pages;
-		}
 		if (PageAnon(page))
 			ug->nr_anon += nr_pages;
 		else {
@@ -6685,7 +6686,7 @@ static void uncharge_page(struct page *p
 		}
 		ug->pgpgout++;
 	} else {
-		ug->nr_kmem += compound_nr(page);
+		ug->nr_kmem += nr_pages;
 		__ClearPageKmemcg(page);
 	}
 
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (24 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch " Andrew Morton
                   ` (54 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: prepare move_account for removal of private page type counters
has been added to the -mm tree.  Its filename is
     mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: prepare move_account for removal of private page type counters

When memcg uses the generic vmstat counters, it doesn't need to do
anything at charging and uncharging time.  It does, however, need to
migrate counts when pages move to a different cgroup in move_account.

Prepare the move_account function for the arrival of NR_FILE_PAGES,
NR_ANON_MAPPED, NR_ANON_THPS etc.  by having a branch for files and a
branch for anon, which can then divided into sub-branches.

Link: http://lkml.kernel.org/r/20200508183105.225460-8-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters
+++ a/mm/memcontrol.c
@@ -5380,7 +5380,6 @@ static int mem_cgroup_move_account(struc
 	struct pglist_data *pgdat;
 	unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
 	int ret;
-	bool anon;
 
 	VM_BUG_ON(from == to);
 	VM_BUG_ON_PAGE(PageLRU(page), page);
@@ -5398,25 +5397,27 @@ static int mem_cgroup_move_account(struc
 	if (page->mem_cgroup != from)
 		goto out_unlock;
 
-	anon = PageAnon(page);
-
 	pgdat = page_pgdat(page);
 	from_vec = mem_cgroup_lruvec(from, pgdat);
 	to_vec = mem_cgroup_lruvec(to, pgdat);
 
 	lock_page_memcg(page);
 
-	if (!anon && page_mapped(page)) {
-		__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
-	}
+	if (!PageAnon(page)) {
+		if (page_mapped(page)) {
+			__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
+			__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
+		}
 
-	if (!anon && PageDirty(page)) {
-		struct address_space *mapping = page_mapping(page);
+		if (PageDirty(page)) {
+			struct address_space *mapping = page_mapping(page);
 
-		if (mapping_cap_account_dirty(mapping)) {
-			__mod_lruvec_state(from_vec, NR_FILE_DIRTY, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_FILE_DIRTY, nr_pages);
+			if (mapping_cap_account_dirty(mapping)) {
+				__mod_lruvec_state(from_vec, NR_FILE_DIRTY,
+						   -nr_pages);
+				__mod_lruvec_state(to_vec, NR_FILE_DIRTY,
+						   nr_pages);
+			}
 		}
 	}
 
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (25 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch " Andrew Morton
                   ` (53 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters
has been added to the -mm tree.  Its filename is
     mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters

Anonymous compound pages can be mapped by ptes, which means that if we
want to track NR_MAPPED_ANON, NR_ANON_THPS on a per-cgroup basis, we have
to be prepared to see tail pages in our accounting functions.

Make mod_lruvec_page_state() and lock_page_memcg() deal with tail pages
correctly, namely by redirecting to the head page which has the
page->mem_cgroup set up.

Link: http://lkml.kernel.org/r/20200508183105.225460-9-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    5 +++--
 mm/memcontrol.c            |    9 ++++++---
 2 files changed, 9 insertions(+), 5 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters
+++ a/include/linux/memcontrol.h
@@ -760,16 +760,17 @@ static inline void mod_lruvec_state(stru
 static inline void __mod_lruvec_page_state(struct page *page,
 					   enum node_stat_item idx, int val)
 {
+	struct page *head = compound_head(page); /* rmap on tail pages */
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!page->mem_cgroup) {
+	if (!head->mem_cgroup) {
 		__mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
-	lruvec = mem_cgroup_lruvec(page->mem_cgroup, pgdat);
+	lruvec = mem_cgroup_lruvec(head->mem_cgroup, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
 }
 
--- a/mm/memcontrol.c~mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters
+++ a/mm/memcontrol.c
@@ -1981,6 +1981,7 @@ void mem_cgroup_print_oom_group(struct m
  */
 struct mem_cgroup *lock_page_memcg(struct page *page)
 {
+	struct page *head = compound_head(page); /* rmap on tail pages */
 	struct mem_cgroup *memcg;
 	unsigned long flags;
 
@@ -2000,7 +2001,7 @@ struct mem_cgroup *lock_page_memcg(struc
 	if (mem_cgroup_disabled())
 		return NULL;
 again:
-	memcg = page->mem_cgroup;
+	memcg = head->mem_cgroup;
 	if (unlikely(!memcg))
 		return NULL;
 
@@ -2008,7 +2009,7 @@ again:
 		return memcg;
 
 	spin_lock_irqsave(&memcg->move_lock, flags);
-	if (memcg != page->mem_cgroup) {
+	if (memcg != head->mem_cgroup) {
 		spin_unlock_irqrestore(&memcg->move_lock, flags);
 		goto again;
 	}
@@ -2051,7 +2052,9 @@ void __unlock_page_memcg(struct mem_cgro
  */
 void unlock_page_memcg(struct page *page)
 {
-	__unlock_page_memcg(page->mem_cgroup);
+	struct page *head = compound_head(page);
+
+	__unlock_page_memcg(head->mem_cgroup);
 }
 EXPORT_SYMBOL(unlock_page_memcg);
 
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (26 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch " Andrew Morton
                   ` (52 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters
has been added to the -mm tree.  Its filename is
     mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters

Memcg maintains private MEMCG_CACHE and NR_SHMEM counters.  This
divergence from the generic VM accounting means unnecessary code overhead,
and creates a dependency for memcg that page->mapping is set up at the
time of charging, so that page types can be told apart.

Convert the generic accounting sites to mod_lruvec_page_state and friends
to maintain the per-cgroup vmstat counters of NR_FILE_PAGES and NR_SHMEM. 
The page is already locked in these places, so page->mem_cgroup is stable;
we only need minimal tweaks of two mem_cgroup_migrate() calls to ensure
it's set up in time.

Then replace MEMCG_CACHE with NR_FILE_PAGES and delete the private
NR_SHMEM accounting sites.

Link: http://lkml.kernel.org/r/20200508183105.225460-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    3 +--
 mm/filemap.c               |   17 +++++++++--------
 mm/khugepaged.c            |   16 +++++++++++-----
 mm/memcontrol.c            |   28 +++++++++++-----------------
 mm/migrate.c               |   15 +++++++++++----
 mm/shmem.c                 |   14 +++++++-------
 6 files changed, 50 insertions(+), 43 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters
+++ a/include/linux/memcontrol.h
@@ -29,8 +29,7 @@ struct kmem_cache;
 
 /* Cgroup-specific page state, on top of universal node page state */
 enum memcg_stat_item {
-	MEMCG_CACHE = NR_VM_NODE_STAT_ITEMS,
-	MEMCG_RSS,
+	MEMCG_RSS = NR_VM_NODE_STAT_ITEMS,
 	MEMCG_RSS_HUGE,
 	MEMCG_SWAP,
 	MEMCG_SOCK,
--- a/mm/filemap.c~mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters
+++ a/mm/filemap.c
@@ -199,9 +199,9 @@ static void unaccount_page_cache_page(st
 
 	nr = hpage_nr_pages(page);
 
-	__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
+	__mod_lruvec_page_state(page, NR_FILE_PAGES, -nr);
 	if (PageSwapBacked(page)) {
-		__mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr);
+		__mod_lruvec_page_state(page, NR_SHMEM, -nr);
 		if (PageTransHuge(page))
 			__dec_node_page_state(page, NR_SHMEM_THPS);
 	} else if (PageTransHuge(page)) {
@@ -802,21 +802,22 @@ int replace_page_cache_page(struct page
 	new->mapping = mapping;
 	new->index = offset;
 
+	mem_cgroup_migrate(old, new);
+
 	xas_lock_irqsave(&xas, flags);
 	xas_store(&xas, new);
 
 	old->mapping = NULL;
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!PageHuge(old))
-		__dec_node_page_state(old, NR_FILE_PAGES);
+		__dec_lruvec_page_state(old, NR_FILE_PAGES);
 	if (!PageHuge(new))
-		__inc_node_page_state(new, NR_FILE_PAGES);
+		__inc_lruvec_page_state(new, NR_FILE_PAGES);
 	if (PageSwapBacked(old))
-		__dec_node_page_state(old, NR_SHMEM);
+		__dec_lruvec_page_state(old, NR_SHMEM);
 	if (PageSwapBacked(new))
-		__inc_node_page_state(new, NR_SHMEM);
+		__inc_lruvec_page_state(new, NR_SHMEM);
 	xas_unlock_irqrestore(&xas, flags);
-	mem_cgroup_migrate(old, new);
 	if (freepage)
 		freepage(old);
 	put_page(old);
@@ -867,7 +868,7 @@ static int __add_to_page_cache_locked(st
 
 		/* hugetlb pages do not participate in page cache accounting */
 		if (!huge)
-			__inc_node_page_state(page, NR_FILE_PAGES);
+			__inc_lruvec_page_state(page, NR_FILE_PAGES);
 unlock:
 		xas_unlock_irq(&xas);
 	} while (xas_nomem(&xas, gfp_mask & GFP_RECLAIM_MASK));
--- a/mm/khugepaged.c~mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters
+++ a/mm/khugepaged.c
@@ -1740,12 +1740,18 @@ out_unlock:
 	}
 
 	if (nr_none) {
-		struct zone *zone = page_zone(new_page);
-
-		__mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none);
+		struct lruvec *lruvec;
+		/*
+		 * XXX: We have started try_charge and pinned the
+		 * memcg, but the page isn't committed yet so we
+		 * cannot use mod_lruvec_page_state(). This hackery
+		 * will be cleaned up when remove the page->mapping
+		 * dependency from memcg and fully charge above.
+		 */
+		lruvec = mem_cgroup_lruvec(memcg, page_pgdat(new_page));
+		__mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_none);
 		if (is_shmem)
-			__mod_node_page_state(zone->zone_pgdat,
-					      NR_SHMEM, nr_none);
+			__mod_lruvec_state(lruvec, NR_SHMEM, nr_none);
 	}
 
 xa_locked:
--- a/mm/memcontrol.c~mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters
+++ a/mm/memcontrol.c
@@ -842,11 +842,6 @@ static void mem_cgroup_charge_statistics
 	 */
 	if (PageAnon(page))
 		__mod_memcg_state(memcg, MEMCG_RSS, nr_pages);
-	else {
-		__mod_memcg_state(memcg, MEMCG_CACHE, nr_pages);
-		if (PageSwapBacked(page))
-			__mod_memcg_state(memcg, NR_SHMEM, nr_pages);
-	}
 
 	if (abs(nr_pages) > 1) {
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
@@ -1392,7 +1387,7 @@ static char *memory_stat_format(struct m
 		       (u64)memcg_page_state(memcg, MEMCG_RSS) *
 		       PAGE_SIZE);
 	seq_buf_printf(&s, "file %llu\n",
-		       (u64)memcg_page_state(memcg, MEMCG_CACHE) *
+		       (u64)memcg_page_state(memcg, NR_FILE_PAGES) *
 		       PAGE_SIZE);
 	seq_buf_printf(&s, "kernel_stack %llu\n",
 		       (u64)memcg_page_state(memcg, MEMCG_KERNEL_STACK_KB) *
@@ -3304,7 +3299,7 @@ static unsigned long mem_cgroup_usage(st
 	unsigned long val;
 
 	if (mem_cgroup_is_root(memcg)) {
-		val = memcg_page_state(memcg, MEMCG_CACHE) +
+		val = memcg_page_state(memcg, NR_FILE_PAGES) +
 			memcg_page_state(memcg, MEMCG_RSS);
 		if (swap)
 			val += memcg_page_state(memcg, MEMCG_SWAP);
@@ -3775,7 +3770,7 @@ static int memcg_numa_stat_show(struct s
 #endif /* CONFIG_NUMA */
 
 static const unsigned int memcg1_stats[] = {
-	MEMCG_CACHE,
+	NR_FILE_PAGES,
 	MEMCG_RSS,
 	MEMCG_RSS_HUGE,
 	NR_SHMEM,
@@ -5407,6 +5402,14 @@ static int mem_cgroup_move_account(struc
 	lock_page_memcg(page);
 
 	if (!PageAnon(page)) {
+		__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
+		__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
+
+		if (PageSwapBacked(page)) {
+			__mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages);
+			__mod_lruvec_state(to_vec, NR_SHMEM, nr_pages);
+		}
+
 		if (page_mapped(page)) {
 			__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
 			__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
@@ -6613,10 +6616,8 @@ struct uncharge_gather {
 	unsigned long nr_pages;
 	unsigned long pgpgout;
 	unsigned long nr_anon;
-	unsigned long nr_file;
 	unsigned long nr_kmem;
 	unsigned long nr_huge;
-	unsigned long nr_shmem;
 	struct page *dummy_page;
 };
 
@@ -6640,9 +6641,7 @@ static void uncharge_batch(const struct
 
 	local_irq_save(flags);
 	__mod_memcg_state(ug->memcg, MEMCG_RSS, -ug->nr_anon);
-	__mod_memcg_state(ug->memcg, MEMCG_CACHE, -ug->nr_file);
 	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
-	__mod_memcg_state(ug->memcg, NR_SHMEM, -ug->nr_shmem);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
 	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
 	memcg_check_events(ug->memcg, ug->dummy_page);
@@ -6683,11 +6682,6 @@ static void uncharge_page(struct page *p
 			ug->nr_huge += nr_pages;
 		if (PageAnon(page))
 			ug->nr_anon += nr_pages;
-		else {
-			ug->nr_file += nr_pages;
-			if (PageSwapBacked(page))
-				ug->nr_shmem += nr_pages;
-		}
 		ug->pgpgout++;
 	} else {
 		ug->nr_kmem += nr_pages;
--- a/mm/migrate.c~mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters
+++ a/mm/migrate.c
@@ -490,11 +490,18 @@ int migrate_page_move_mapping(struct add
 	 * are mapped to swap space.
 	 */
 	if (newzone != oldzone) {
-		__dec_node_state(oldzone->zone_pgdat, NR_FILE_PAGES);
-		__inc_node_state(newzone->zone_pgdat, NR_FILE_PAGES);
+		struct lruvec *old_lruvec, *new_lruvec;
+		struct mem_cgroup *memcg;
+
+		memcg = page_memcg(page);
+		old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
+		new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
+
+		__dec_lruvec_state(old_lruvec, NR_FILE_PAGES);
+		__inc_lruvec_state(new_lruvec, NR_FILE_PAGES);
 		if (PageSwapBacked(page) && !PageSwapCache(page)) {
-			__dec_node_state(oldzone->zone_pgdat, NR_SHMEM);
-			__inc_node_state(newzone->zone_pgdat, NR_SHMEM);
+			__dec_lruvec_state(old_lruvec, NR_SHMEM);
+			__inc_lruvec_state(new_lruvec, NR_SHMEM);
 		}
 		if (dirty && mapping_cap_account_dirty(mapping)) {
 			__dec_node_state(oldzone->zone_pgdat, NR_FILE_DIRTY);
--- a/mm/shmem.c~mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters
+++ a/mm/shmem.c
@@ -653,8 +653,8 @@ next:
 			__inc_node_page_state(page, NR_SHMEM_THPS);
 		}
 		mapping->nrpages += nr;
-		__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
-		__mod_node_page_state(page_pgdat(page), NR_SHMEM, nr);
+		__mod_lruvec_page_state(page, NR_FILE_PAGES, nr);
+		__mod_lruvec_page_state(page, NR_SHMEM, nr);
 unlock:
 		xas_unlock_irq(&xas);
 	} while (xas_nomem(&xas, gfp));
@@ -685,8 +685,8 @@ static void shmem_delete_from_page_cache
 	error = shmem_replace_entry(mapping, page->index, page, radswap);
 	page->mapping = NULL;
 	mapping->nrpages--;
-	__dec_node_page_state(page, NR_FILE_PAGES);
-	__dec_node_page_state(page, NR_SHMEM);
+	__dec_lruvec_page_state(page, NR_FILE_PAGES);
+	__dec_lruvec_page_state(page, NR_SHMEM);
 	xa_unlock_irq(&mapping->i_pages);
 	put_page(page);
 	BUG_ON(error);
@@ -1593,8 +1593,9 @@ static int shmem_replace_page(struct pag
 	xa_lock_irq(&swap_mapping->i_pages);
 	error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage);
 	if (!error) {
-		__inc_node_page_state(newpage, NR_FILE_PAGES);
-		__dec_node_page_state(oldpage, NR_FILE_PAGES);
+		mem_cgroup_migrate(oldpage, newpage);
+		__inc_lruvec_page_state(newpage, NR_FILE_PAGES);
+		__dec_lruvec_page_state(oldpage, NR_FILE_PAGES);
 	}
 	xa_unlock_irq(&swap_mapping->i_pages);
 
@@ -1606,7 +1607,6 @@ static int shmem_replace_page(struct pag
 		 */
 		oldpage = newpage;
 	} else {
-		mem_cgroup_migrate(oldpage, newpage);
 		lru_cache_add_anon(newpage);
 		*pagep = newpage;
 	}
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (27 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch " Andrew Morton
                   ` (51 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: switch to native NR_ANON_MAPPED counter
has been added to the -mm tree.  Its filename is
     mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: switch to native NR_ANON_MAPPED counter

Memcg maintains a private MEMCG_RSS counter.  This divergence from the
generic VM accounting means unnecessary code overhead, and creates a
dependency for memcg that page->mapping is set up at the time of charging,
so that page types can be told apart.

Convert the generic accounting sites to mod_lruvec_page_state and friends
to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED.  We use
lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the
same way we do for NR_FILE_MAPPED.

With the previous patch removing MEMCG_CACHE and the private NR_SHMEM
counter, this patch finally eliminates the need to have page->mapping set
up at charge time.  However, we need to have page->mem_cgroup set up by
the time rmap runs and does the accounting, so switch the commit and the
rmap callbacks around.

v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo)

Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    3 --
 kernel/events/uprobes.c    |    2 -
 mm/huge_memory.c           |    2 -
 mm/khugepaged.c            |    2 -
 mm/memcontrol.c            |   27 ++++++--------------
 mm/memory.c                |   10 +++----
 mm/migrate.c               |    2 -
 mm/rmap.c                  |   47 +++++++++++++++++++++--------------
 mm/swapfile.c              |    4 +-
 mm/userfaultfd.c           |    2 -
 10 files changed, 51 insertions(+), 50 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/include/linux/memcontrol.h
@@ -29,8 +29,7 @@ struct kmem_cache;
 
 /* Cgroup-specific page state, on top of universal node page state */
 enum memcg_stat_item {
-	MEMCG_RSS = NR_VM_NODE_STAT_ITEMS,
-	MEMCG_RSS_HUGE,
+	MEMCG_RSS_HUGE = NR_VM_NODE_STAT_ITEMS,
 	MEMCG_SWAP,
 	MEMCG_SOCK,
 	/* XXX: why are these zone and not node counters? */
--- a/kernel/events/uprobes.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/kernel/events/uprobes.c
@@ -188,8 +188,8 @@ static int __replace_page(struct vm_area
 
 	if (new_page) {
 		get_page(new_page);
-		page_add_new_anon_rmap(new_page, vma, addr, false);
 		mem_cgroup_commit_charge(new_page, memcg, false);
+		page_add_new_anon_rmap(new_page, vma, addr, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 	} else
 		/* no new page, just dec_mm_counter for old_page */
--- a/mm/huge_memory.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/huge_memory.c
@@ -640,8 +640,8 @@ static vm_fault_t __do_huge_pmd_anonymou
 
 		entry = mk_huge_pmd(page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
-		page_add_new_anon_rmap(page, vma, haddr, true);
 		mem_cgroup_commit_charge(page, memcg, false);
+		page_add_new_anon_rmap(page, vma, haddr, true);
 		lru_cache_add_active_or_unevictable(page, vma);
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
 		set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
--- a/mm/khugepaged.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/khugepaged.c
@@ -1086,8 +1086,8 @@ static void collapse_huge_page(struct mm
 
 	spin_lock(pmd_ptl);
 	BUG_ON(!pmd_none(*pmd));
-	page_add_new_anon_rmap(new_page, vma, address, true);
 	mem_cgroup_commit_charge(new_page, memcg, false);
+	page_add_new_anon_rmap(new_page, vma, address, true);
 	count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
 	lru_cache_add_active_or_unevictable(new_page, vma);
 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
--- a/mm/memcontrol.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/memcontrol.c
@@ -836,13 +836,6 @@ static void mem_cgroup_charge_statistics
 					 struct page *page,
 					 int nr_pages)
 {
-	/*
-	 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
-	 * counted as CACHE even if it's on ANON LRU.
-	 */
-	if (PageAnon(page))
-		__mod_memcg_state(memcg, MEMCG_RSS, nr_pages);
-
 	if (abs(nr_pages) > 1) {
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
 		__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
@@ -1384,7 +1377,7 @@ static char *memory_stat_format(struct m
 	 */
 
 	seq_buf_printf(&s, "anon %llu\n",
-		       (u64)memcg_page_state(memcg, MEMCG_RSS) *
+		       (u64)memcg_page_state(memcg, NR_ANON_MAPPED) *
 		       PAGE_SIZE);
 	seq_buf_printf(&s, "file %llu\n",
 		       (u64)memcg_page_state(memcg, NR_FILE_PAGES) *
@@ -3300,7 +3293,7 @@ static unsigned long mem_cgroup_usage(st
 
 	if (mem_cgroup_is_root(memcg)) {
 		val = memcg_page_state(memcg, NR_FILE_PAGES) +
-			memcg_page_state(memcg, MEMCG_RSS);
+			memcg_page_state(memcg, NR_ANON_MAPPED);
 		if (swap)
 			val += memcg_page_state(memcg, MEMCG_SWAP);
 	} else {
@@ -3771,7 +3764,7 @@ static int memcg_numa_stat_show(struct s
 
 static const unsigned int memcg1_stats[] = {
 	NR_FILE_PAGES,
-	MEMCG_RSS,
+	NR_ANON_MAPPED,
 	MEMCG_RSS_HUGE,
 	NR_SHMEM,
 	NR_FILE_MAPPED,
@@ -5401,7 +5394,12 @@ static int mem_cgroup_move_account(struc
 
 	lock_page_memcg(page);
 
-	if (!PageAnon(page)) {
+	if (PageAnon(page)) {
+		if (page_mapped(page)) {
+			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
+			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
+		}
+	} else {
 		__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
 		__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
 
@@ -6529,7 +6527,6 @@ void mem_cgroup_commit_charge(struct pag
 {
 	unsigned int nr_pages = hpage_nr_pages(page);
 
-	VM_BUG_ON_PAGE(!page->mapping, page);
 	VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
 
 	if (mem_cgroup_disabled())
@@ -6602,8 +6599,6 @@ int mem_cgroup_charge(struct page *page,
 	struct mem_cgroup *memcg;
 	int ret;
 
-	VM_BUG_ON_PAGE(!page->mapping, page);
-
 	ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
 	if (ret)
 		return ret;
@@ -6615,7 +6610,6 @@ struct uncharge_gather {
 	struct mem_cgroup *memcg;
 	unsigned long nr_pages;
 	unsigned long pgpgout;
-	unsigned long nr_anon;
 	unsigned long nr_kmem;
 	unsigned long nr_huge;
 	struct page *dummy_page;
@@ -6640,7 +6634,6 @@ static void uncharge_batch(const struct
 	}
 
 	local_irq_save(flags);
-	__mod_memcg_state(ug->memcg, MEMCG_RSS, -ug->nr_anon);
 	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
 	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
@@ -6680,8 +6673,6 @@ static void uncharge_page(struct page *p
 	if (!PageKmemcg(page)) {
 		if (PageTransHuge(page))
 			ug->nr_huge += nr_pages;
-		if (PageAnon(page))
-			ug->nr_anon += nr_pages;
 		ug->pgpgout++;
 	} else {
 		ug->nr_kmem += nr_pages;
--- a/mm/memory.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/memory.c
@@ -2712,8 +2712,8 @@ static vm_fault_t wp_page_copy(struct vm
 		 * thread doing COW.
 		 */
 		ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
-		page_add_new_anon_rmap(new_page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(new_page, memcg, false);
+		page_add_new_anon_rmap(new_page, vma, vmf->address, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 		/*
 		 * We call the notify macro here because, when using secondary
@@ -3245,12 +3245,12 @@ vm_fault_t do_swap_page(struct vm_fault
 
 	/* ksm created a completely new copy */
 	if (unlikely(page != swapcache && swapcache)) {
-		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false);
+		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
-		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
 		mem_cgroup_commit_charge(page, memcg, true);
+		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
 		activate_page(page);
 	}
 
@@ -3392,8 +3392,8 @@ static vm_fault_t do_anonymous_page(stru
 	}
 
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, vma, vmf->address, false);
 	mem_cgroup_commit_charge(page, memcg, false);
+	page_add_new_anon_rmap(page, vma, vmf->address, false);
 	lru_cache_add_active_or_unevictable(page, vma);
 setpte:
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3654,8 +3654,8 @@ vm_fault_t alloc_set_pte(struct vm_fault
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false);
+		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
 		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
--- a/mm/migrate.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/migrate.c
@@ -2838,8 +2838,8 @@ static void migrate_vma_insert_page(stru
 		goto unlock_abort;
 
 	inc_mm_counter(mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, vma, addr, false);
 	mem_cgroup_commit_charge(page, memcg, false);
+	page_add_new_anon_rmap(page, vma, addr, false);
 	if (!is_zone_device_page(page))
 		lru_cache_add_active_or_unevictable(page, vma);
 	get_page(page);
--- a/mm/rmap.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/rmap.c
@@ -1114,6 +1114,11 @@ void do_page_add_anon_rmap(struct page *
 	bool compound = flags & RMAP_COMPOUND;
 	bool first;
 
+	if (unlikely(PageKsm(page)))
+		lock_page_memcg(page);
+	else
+		VM_BUG_ON_PAGE(!PageLocked(page), page);
+
 	if (compound) {
 		atomic_t *mapcount;
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -1134,12 +1139,13 @@ void do_page_add_anon_rmap(struct page *
 		 */
 		if (compound)
 			__inc_node_page_state(page, NR_ANON_THPS);
-		__mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr);
+		__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
 	}
-	if (unlikely(PageKsm(page)))
-		return;
 
-	VM_BUG_ON_PAGE(!PageLocked(page), page);
+	if (unlikely(PageKsm(page))) {
+		unlock_page_memcg(page);
+		return;
+	}
 
 	/* address might be in next vma when migration races vma_adjust */
 	if (first)
@@ -1181,7 +1187,7 @@ void page_add_new_anon_rmap(struct page
 		/* increment count (starts at -1) */
 		atomic_set(&page->_mapcount, 0);
 	}
-	__mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr);
+	__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
 	__page_set_anon_rmap(page, vma, address, 1);
 }
 
@@ -1230,13 +1236,12 @@ static void page_remove_file_rmap(struct
 	int i, nr = 1;
 
 	VM_BUG_ON_PAGE(compound && !PageHead(page), page);
-	lock_page_memcg(page);
 
 	/* Hugepages are not counted in NR_FILE_MAPPED for now. */
 	if (unlikely(PageHuge(page))) {
 		/* hugetlb pages are always mapped with pmds */
 		atomic_dec(compound_mapcount_ptr(page));
-		goto out;
+		return;
 	}
 
 	/* page still mapped by someone else? */
@@ -1246,14 +1251,14 @@ static void page_remove_file_rmap(struct
 				nr++;
 		}
 		if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
-			goto out;
+			return;
 		if (PageSwapBacked(page))
 			__dec_node_page_state(page, NR_SHMEM_PMDMAPPED);
 		else
 			__dec_node_page_state(page, NR_FILE_PMDMAPPED);
 	} else {
 		if (!atomic_add_negative(-1, &page->_mapcount))
-			goto out;
+			return;
 	}
 
 	/*
@@ -1265,8 +1270,6 @@ static void page_remove_file_rmap(struct
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
-out:
-	unlock_page_memcg(page);
 }
 
 static void page_remove_anon_compound_rmap(struct page *page)
@@ -1310,7 +1313,7 @@ static void page_remove_anon_compound_rm
 		clear_page_mlock(page);
 
 	if (nr)
-		__mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
+		__mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr);
 }
 
 /**
@@ -1322,22 +1325,28 @@ static void page_remove_anon_compound_rm
  */
 void page_remove_rmap(struct page *page, bool compound)
 {
-	if (!PageAnon(page))
-		return page_remove_file_rmap(page, compound);
+	lock_page_memcg(page);
 
-	if (compound)
-		return page_remove_anon_compound_rmap(page);
+	if (!PageAnon(page)) {
+		page_remove_file_rmap(page, compound);
+		goto out;
+	}
+
+	if (compound) {
+		page_remove_anon_compound_rmap(page);
+		goto out;
+	}
 
 	/* page still mapped by someone else? */
 	if (!atomic_add_negative(-1, &page->_mapcount))
-		return;
+		goto out;
 
 	/*
 	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
 	 * these counters are not modified in interrupt context, and
 	 * pte lock(a spinlock) is held, which implies preemption disabled.
 	 */
-	__dec_node_page_state(page, NR_ANON_MAPPED);
+	__dec_lruvec_page_state(page, NR_ANON_MAPPED);
 
 	if (unlikely(PageMlocked(page)))
 		clear_page_mlock(page);
@@ -1354,6 +1363,8 @@ void page_remove_rmap(struct page *page,
 	 * Leaving it set also helps swapoff to reinstate ptes
 	 * faster for those pages still in swapcache.
 	 */
+out:
+	unlock_page_memcg(page);
 }
 
 /*
--- a/mm/swapfile.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/swapfile.c
@@ -1881,11 +1881,11 @@ static int unuse_pte(struct vm_area_stru
 	set_pte_at(vma->vm_mm, addr, pte,
 		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	if (page == swapcache) {
-		page_add_anon_rmap(page, vma, addr, false);
 		mem_cgroup_commit_charge(page, memcg, true);
+		page_add_anon_rmap(page, vma, addr, false);
 	} else { /* ksm created a completely new copy */
-		page_add_new_anon_rmap(page, vma, addr, false);
 		mem_cgroup_commit_charge(page, memcg, false);
+		page_add_new_anon_rmap(page, vma, addr, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	}
 	swap_free(entry);
--- a/mm/userfaultfd.c~mm-memcontrol-switch-to-native-nr_anon_mapped-counter
+++ a/mm/userfaultfd.c
@@ -123,8 +123,8 @@ static int mcopy_atomic_pte(struct mm_st
 		goto out_release_uncharge_unlock;
 
 	inc_mm_counter(dst_mm, MM_ANONPAGES);
-	page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
 	mem_cgroup_commit_charge(page, memcg, false);
+	page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
 	lru_cache_add_active_or_unevictable(page, dst_vma);
 
 	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (28 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch " Andrew Morton
@ 2020-05-08 23:19 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch " Andrew Morton
                   ` (50 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:19 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: switch to native NR_ANON_THPS counter
has been added to the -mm tree.  Its filename is
     mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: switch to native NR_ANON_THPS counter

With rmap memcg locking already in place for NR_ANON_MAPPED, it's just a
small step to remove the MEMCG_RSS_HUGE wart and switch memcg to the
native NR_ANON_THPS accounting sites.

Link: http://lkml.kernel.org/r/20200508183105.225460-12-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    3 --
 mm/huge_memory.c           |    4 ++-
 mm/memcontrol.c            |   39 ++++++++++++++---------------------
 mm/rmap.c                  |    6 ++---
 4 files changed, 23 insertions(+), 29 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-switch-to-native-nr_anon_thps-counter
+++ a/include/linux/memcontrol.h
@@ -29,8 +29,7 @@ struct kmem_cache;
 
 /* Cgroup-specific page state, on top of universal node page state */
 enum memcg_stat_item {
-	MEMCG_RSS_HUGE = NR_VM_NODE_STAT_ITEMS,
-	MEMCG_SWAP,
+	MEMCG_SWAP = NR_VM_NODE_STAT_ITEMS,
 	MEMCG_SOCK,
 	/* XXX: why are these zone and not node counters? */
 	MEMCG_KERNEL_STACK_KB,
--- a/mm/huge_memory.c~mm-memcontrol-switch-to-native-nr_anon_thps-counter
+++ a/mm/huge_memory.c
@@ -2360,15 +2360,17 @@ static void __split_huge_pmd_locked(stru
 			atomic_inc(&page[i]._mapcount);
 	}
 
+	lock_page_memcg(page);
 	if (atomic_add_negative(-1, compound_mapcount_ptr(page))) {
 		/* Last compound_mapcount is gone. */
-		__dec_node_page_state(page, NR_ANON_THPS);
+		__dec_lruvec_page_state(page, NR_ANON_THPS);
 		if (TestClearPageDoubleMap(page)) {
 			/* No need in mapcount reference anymore */
 			for (i = 0; i < HPAGE_PMD_NR; i++)
 				atomic_dec(&page[i]._mapcount);
 		}
 	}
+	unlock_page_memcg(page);
 
 	smp_wmb(); /* make pte visible before pmd */
 	pmd_populate(mm, pmd, pgtable);
--- a/mm/memcontrol.c~mm-memcontrol-switch-to-native-nr_anon_thps-counter
+++ a/mm/memcontrol.c
@@ -836,11 +836,6 @@ static void mem_cgroup_charge_statistics
 					 struct page *page,
 					 int nr_pages)
 {
-	if (abs(nr_pages) > 1) {
-		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-		__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
-	}
-
 	/* pagein of a big page is an event. So, ignore page size */
 	if (nr_pages > 0)
 		__count_memcg_events(memcg, PGPGIN, 1);
@@ -1406,15 +1401,9 @@ static char *memory_stat_format(struct m
 		       (u64)memcg_page_state(memcg, NR_WRITEBACK) *
 		       PAGE_SIZE);
 
-	/*
-	 * TODO: We should eventually replace our own MEMCG_RSS_HUGE counter
-	 * with the NR_ANON_THP vm counter, but right now it's a pain in the
-	 * arse because it requires migrating the work out of rmap to a place
-	 * where the page->mem_cgroup is set up and stable.
-	 */
 	seq_buf_printf(&s, "anon_thp %llu\n",
-		       (u64)memcg_page_state(memcg, MEMCG_RSS_HUGE) *
-		       PAGE_SIZE);
+		       (u64)memcg_page_state(memcg, NR_ANON_THPS) *
+		       HPAGE_PMD_NR * PAGE_SIZE);
 
 	for (i = 0; i < NR_LRU_LISTS; i++)
 		seq_buf_printf(&s, "%s %llu\n", lru_list_name(i),
@@ -3008,8 +2997,6 @@ void mem_cgroup_split_huge_fixup(struct
 
 	for (i = 1; i < HPAGE_PMD_NR; i++)
 		head[i].mem_cgroup = head->mem_cgroup;
-
-	__mod_memcg_state(head->mem_cgroup, MEMCG_RSS_HUGE, -HPAGE_PMD_NR);
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
@@ -3765,7 +3752,7 @@ static int memcg_numa_stat_show(struct s
 static const unsigned int memcg1_stats[] = {
 	NR_FILE_PAGES,
 	NR_ANON_MAPPED,
-	MEMCG_RSS_HUGE,
+	NR_ANON_THPS,
 	NR_SHMEM,
 	NR_FILE_MAPPED,
 	NR_FILE_DIRTY,
@@ -3802,11 +3789,14 @@ static int memcg_stat_show(struct seq_fi
 	BUILD_BUG_ON(ARRAY_SIZE(memcg1_stat_names) != ARRAY_SIZE(memcg1_stats));
 
 	for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) {
+		unsigned long nr;
+
 		if (memcg1_stats[i] == MEMCG_SWAP && !do_memsw_account())
 			continue;
-		seq_printf(m, "%s %lu\n", memcg1_stat_names[i],
-			   memcg_page_state_local(memcg, memcg1_stats[i]) *
-			   PAGE_SIZE);
+		nr = memcg_page_state_local(memcg, memcg1_stats[i]);
+		if (memcg1_stats[i] == NR_ANON_THPS)
+			nr *= HPAGE_PMD_NR;
+		seq_printf(m, "%s %lu\n", memcg1_stat_names[i], nr * PAGE_SIZE);
 	}
 
 	for (i = 0; i < ARRAY_SIZE(memcg1_events); i++)
@@ -5398,6 +5388,13 @@ static int mem_cgroup_move_account(struc
 		if (page_mapped(page)) {
 			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
 			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
+			if (PageTransHuge(page)) {
+				__mod_lruvec_state(from_vec, NR_ANON_THPS,
+						   -nr_pages);
+				__mod_lruvec_state(to_vec, NR_ANON_THPS,
+						   nr_pages);
+			}
+
 		}
 	} else {
 		__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
@@ -6611,7 +6608,6 @@ struct uncharge_gather {
 	unsigned long nr_pages;
 	unsigned long pgpgout;
 	unsigned long nr_kmem;
-	unsigned long nr_huge;
 	struct page *dummy_page;
 };
 
@@ -6634,7 +6630,6 @@ static void uncharge_batch(const struct
 	}
 
 	local_irq_save(flags);
-	__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
 	__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
 	__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
 	memcg_check_events(ug->memcg, ug->dummy_page);
@@ -6671,8 +6666,6 @@ static void uncharge_page(struct page *p
 	ug->nr_pages += nr_pages;
 
 	if (!PageKmemcg(page)) {
-		if (PageTransHuge(page))
-			ug->nr_huge += nr_pages;
 		ug->pgpgout++;
 	} else {
 		ug->nr_kmem += nr_pages;
--- a/mm/rmap.c~mm-memcontrol-switch-to-native-nr_anon_thps-counter
+++ a/mm/rmap.c
@@ -1138,7 +1138,7 @@ void do_page_add_anon_rmap(struct page *
 		 * disabled.
 		 */
 		if (compound)
-			__inc_node_page_state(page, NR_ANON_THPS);
+			__inc_lruvec_page_state(page, NR_ANON_THPS);
 		__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
 	}
 
@@ -1180,7 +1180,7 @@ void page_add_new_anon_rmap(struct page
 		if (hpage_pincount_available(page))
 			atomic_set(compound_pincount_ptr(page), 0);
 
-		__inc_node_page_state(page, NR_ANON_THPS);
+		__inc_lruvec_page_state(page, NR_ANON_THPS);
 	} else {
 		/* Anon THP always mapped first with PMD */
 		VM_BUG_ON_PAGE(PageTransCompound(page), page);
@@ -1286,7 +1286,7 @@ static void page_remove_anon_compound_rm
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 		return;
 
-	__dec_node_page_state(page, NR_ANON_THPS);
+	__dec_lruvec_page_state(page, NR_ANON_THPS);
 
 	if (TestClearPageDoubleMap(page)) {
 		/*
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (29 preceding siblings ...)
  2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch " Andrew Morton
                   ` (49 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
has been added to the -mm tree.  Its filename is
     mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API

With the page->mapping requirement gone from memcg, we can charge anon and
file-thp pages in one single step, right after they're allocated.

This removes two out of three API calls - especially the tricky commit
step that needed to happen at just the right time between when the page is
"set up" and when it's "published" - somewhat vague and fluid concepts
that varied by page type.  All we need is a freshly allocated page and a
memcg context to charge.

v2: prevent double charges on pre-allocated hugepages in khugepaged

Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mm.h      |    4 +---
 kernel/events/uprobes.c |   11 +++--------
 mm/filemap.c            |    2 +-
 mm/huge_memory.c        |    9 +++------
 mm/khugepaged.c         |   35 ++++++++++-------------------------
 mm/memory.c             |   36 ++++++++++--------------------------
 mm/migrate.c            |    5 +----
 mm/swapfile.c           |    6 +-----
 mm/userfaultfd.c        |    5 +----
 9 files changed, 31 insertions(+), 82 deletions(-)

--- a/include/linux/mm.h~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/include/linux/mm.h
@@ -505,7 +505,6 @@ struct vm_fault {
 	pte_t orig_pte;			/* Value of PTE at the time of fault */
 
 	struct page *cow_page;		/* Page handler may use for COW fault */
-	struct mem_cgroup *memcg;	/* Cgroup cow_page belongs to */
 	struct page *page;		/* ->fault handlers should return a
 					 * page here, unless VM_FAULT_NOPAGE
 					 * is set (which is also implied by
@@ -939,8 +938,7 @@ static inline pte_t maybe_mkwrite(pte_t
 	return pte;
 }
 
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
-		struct page *page);
+vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page);
 vm_fault_t finish_fault(struct vm_fault *vmf);
 vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
 #endif
--- a/kernel/events/uprobes.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/kernel/events/uprobes.c
@@ -162,14 +162,13 @@ static int __replace_page(struct vm_area
 	};
 	int err;
 	struct mmu_notifier_range range;
-	struct mem_cgroup *memcg;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
 				addr + PAGE_SIZE);
 
 	if (new_page) {
-		err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
-					    &memcg);
+		err = mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL,
+					false);
 		if (err)
 			return err;
 	}
@@ -179,16 +178,12 @@ static int __replace_page(struct vm_area
 
 	mmu_notifier_invalidate_range_start(&range);
 	err = -EAGAIN;
-	if (!page_vma_mapped_walk(&pvmw)) {
-		if (new_page)
-			mem_cgroup_cancel_charge(new_page, memcg);
+	if (!page_vma_mapped_walk(&pvmw))
 		goto unlock;
-	}
 	VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
 
 	if (new_page) {
 		get_page(new_page);
-		mem_cgroup_commit_charge(new_page, memcg, false);
 		page_add_new_anon_rmap(new_page, vma, addr, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 	} else
--- a/mm/filemap.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/filemap.c
@@ -2633,7 +2633,7 @@ void filemap_map_pages(struct vm_fault *
 		if (vmf->pte)
 			vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
-		if (alloc_set_pte(vmf, NULL, page))
+		if (alloc_set_pte(vmf, page))
 			goto unlock;
 		unlock_page(page);
 		goto next;
--- a/mm/huge_memory.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/huge_memory.c
@@ -587,19 +587,19 @@ static vm_fault_t __do_huge_pmd_anonymou
 			struct page *page, gfp_t gfp)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct mem_cgroup *memcg;
 	pgtable_t pgtable;
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	vm_fault_t ret = 0;
 
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
-	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg)) {
+	if (mem_cgroup_charge(page, vma->vm_mm, gfp, false)) {
 		put_page(page);
 		count_vm_event(THP_FAULT_FALLBACK);
 		count_vm_event(THP_FAULT_FALLBACK_CHARGE);
 		return VM_FAULT_FALLBACK;
 	}
+	cgroup_throttle_swaprate(page, gfp);
 
 	pgtable = pte_alloc_one(vma->vm_mm);
 	if (unlikely(!pgtable)) {
@@ -630,7 +630,6 @@ static vm_fault_t __do_huge_pmd_anonymou
 			vm_fault_t ret2;
 
 			spin_unlock(vmf->ptl);
-			mem_cgroup_cancel_charge(page, memcg);
 			put_page(page);
 			pte_free(vma->vm_mm, pgtable);
 			ret2 = handle_userfault(vmf, VM_UFFD_MISSING);
@@ -640,7 +639,6 @@ static vm_fault_t __do_huge_pmd_anonymou
 
 		entry = mk_huge_pmd(page, vma->vm_page_prot);
 		entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
-		mem_cgroup_commit_charge(page, memcg, false);
 		page_add_new_anon_rmap(page, vma, haddr, true);
 		lru_cache_add_active_or_unevictable(page, vma);
 		pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
@@ -649,7 +647,7 @@ static vm_fault_t __do_huge_pmd_anonymou
 		mm_inc_nr_ptes(vma->vm_mm);
 		spin_unlock(vmf->ptl);
 		count_vm_event(THP_FAULT_ALLOC);
-		count_memcg_events(memcg, THP_FAULT_ALLOC, 1);
+		count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
 	}
 
 	return 0;
@@ -658,7 +656,6 @@ unlock_release:
 release:
 	if (pgtable)
 		pte_free(vma->vm_mm, pgtable);
-	mem_cgroup_cancel_charge(page, memcg);
 	put_page(page);
 	return ret;
 
--- a/mm/khugepaged.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/khugepaged.c
@@ -951,7 +951,6 @@ static void collapse_huge_page(struct mm
 	struct page *new_page;
 	spinlock_t *pmd_ptl, *pte_ptl;
 	int isolated = 0, result = 0;
-	struct mem_cgroup *memcg;
 	struct vm_area_struct *vma;
 	struct mmu_notifier_range range;
 	gfp_t gfp;
@@ -974,15 +973,15 @@ static void collapse_huge_page(struct mm
 		goto out_nolock;
 	}
 
-	if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
+	if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
 		result = SCAN_CGROUP_CHARGE_FAIL;
 		goto out_nolock;
 	}
+	count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
 
 	down_read(&mm->mmap_sem);
 	result = hugepage_vma_revalidate(mm, address, &vma);
 	if (result) {
-		mem_cgroup_cancel_charge(new_page, memcg);
 		up_read(&mm->mmap_sem);
 		goto out_nolock;
 	}
@@ -990,7 +989,6 @@ static void collapse_huge_page(struct mm
 	pmd = mm_find_pmd(mm, address);
 	if (!pmd) {
 		result = SCAN_PMD_NULL;
-		mem_cgroup_cancel_charge(new_page, memcg);
 		up_read(&mm->mmap_sem);
 		goto out_nolock;
 	}
@@ -1001,7 +999,6 @@ static void collapse_huge_page(struct mm
 	 * Continuing to collapse causes inconsistency.
 	 */
 	if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) {
-		mem_cgroup_cancel_charge(new_page, memcg);
 		up_read(&mm->mmap_sem);
 		goto out_nolock;
 	}
@@ -1086,9 +1083,7 @@ static void collapse_huge_page(struct mm
 
 	spin_lock(pmd_ptl);
 	BUG_ON(!pmd_none(*pmd));
-	mem_cgroup_commit_charge(new_page, memcg, false);
 	page_add_new_anon_rmap(new_page, vma, address, true);
-	count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
 	lru_cache_add_active_or_unevictable(new_page, vma);
 	pgtable_trans_huge_deposit(mm, pmd, pgtable);
 	set_pmd_at(mm, address, pmd, _pmd);
@@ -1102,10 +1097,11 @@ static void collapse_huge_page(struct mm
 out_up_write:
 	up_write(&mm->mmap_sem);
 out_nolock:
+	if (*hpage)
+		mem_cgroup_uncharge(*hpage);
 	trace_mm_collapse_huge_page(mm, isolated, result);
 	return;
 out:
-	mem_cgroup_cancel_charge(new_page, memcg);
 	goto out_up_write;
 }
 
@@ -1515,7 +1511,6 @@ static void collapse_file(struct mm_stru
 	struct address_space *mapping = file->f_mapping;
 	gfp_t gfp;
 	struct page *new_page;
-	struct mem_cgroup *memcg;
 	pgoff_t index, end = start + HPAGE_PMD_NR;
 	LIST_HEAD(pagelist);
 	XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
@@ -1534,10 +1529,11 @@ static void collapse_file(struct mm_stru
 		goto out;
 	}
 
-	if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
+	if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
 		result = SCAN_CGROUP_CHARGE_FAIL;
 		goto out;
 	}
+	count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
 
 	/* This will be less messy when we use multi-index entries */
 	do {
@@ -1547,7 +1543,6 @@ static void collapse_file(struct mm_stru
 			break;
 		xas_unlock_irq(&xas);
 		if (!xas_nomem(&xas, GFP_KERNEL)) {
-			mem_cgroup_cancel_charge(new_page, memcg);
 			result = SCAN_FAIL;
 			goto out;
 		}
@@ -1740,18 +1735,9 @@ out_unlock:
 	}
 
 	if (nr_none) {
-		struct lruvec *lruvec;
-		/*
-		 * XXX: We have started try_charge and pinned the
-		 * memcg, but the page isn't committed yet so we
-		 * cannot use mod_lruvec_page_state(). This hackery
-		 * will be cleaned up when remove the page->mapping
-		 * dependency from memcg and fully charge above.
-		 */
-		lruvec = mem_cgroup_lruvec(memcg, page_pgdat(new_page));
-		__mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_none);
+		__mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
 		if (is_shmem)
-			__mod_lruvec_state(lruvec, NR_SHMEM, nr_none);
+			__mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
 	}
 
 xa_locked:
@@ -1789,7 +1775,6 @@ xa_unlocked:
 
 		SetPageUptodate(new_page);
 		page_ref_add(new_page, HPAGE_PMD_NR - 1);
-		mem_cgroup_commit_charge(new_page, memcg, false);
 
 		if (is_shmem) {
 			set_page_dirty(new_page);
@@ -1797,7 +1782,6 @@ xa_unlocked:
 		} else {
 			lru_cache_add_file(new_page);
 		}
-		count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
 
 		/*
 		 * Remove pte page tables, so we can re-fault the page as huge.
@@ -1844,13 +1828,14 @@ xa_unlocked:
 		VM_BUG_ON(nr_none);
 		xas_unlock_irq(&xas);
 
-		mem_cgroup_cancel_charge(new_page, memcg);
 		new_page->mapping = NULL;
 	}
 
 	unlock_page(new_page);
 out:
 	VM_BUG_ON(!list_empty(&pagelist));
+	if (*hpage)
+		mem_cgroup_uncharge(*hpage);
 	/* TODO: tracepoints */
 }
 
--- a/mm/memory.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/memory.c
@@ -2647,7 +2647,6 @@ static vm_fault_t wp_page_copy(struct vm
 	struct page *new_page = NULL;
 	pte_t entry;
 	int page_copied = 0;
-	struct mem_cgroup *memcg;
 	struct mmu_notifier_range range;
 
 	if (unlikely(anon_vma_prepare(vma)))
@@ -2678,8 +2677,9 @@ static vm_fault_t wp_page_copy(struct vm
 		}
 	}
 
-	if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg))
+	if (mem_cgroup_charge(new_page, mm, GFP_KERNEL, false))
 		goto oom_free_new;
+	cgroup_throttle_swaprate(new_page, GFP_KERNEL);
 
 	__SetPageUptodate(new_page);
 
@@ -2712,7 +2712,6 @@ static vm_fault_t wp_page_copy(struct vm
 		 * thread doing COW.
 		 */
 		ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
-		mem_cgroup_commit_charge(new_page, memcg, false);
 		page_add_new_anon_rmap(new_page, vma, vmf->address, false);
 		lru_cache_add_active_or_unevictable(new_page, vma);
 		/*
@@ -2751,8 +2750,6 @@ static vm_fault_t wp_page_copy(struct vm
 		/* Free the old page.. */
 		new_page = old_page;
 		page_copied = 1;
-	} else {
-		mem_cgroup_cancel_charge(new_page, memcg);
 	}
 
 	if (new_page)
@@ -3090,7 +3087,6 @@ vm_fault_t do_swap_page(struct vm_fault
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *page = NULL, *swapcache;
-	struct mem_cgroup *memcg;
 	swp_entry_t entry;
 	pte_t pte;
 	int locked;
@@ -3195,10 +3191,11 @@ vm_fault_t do_swap_page(struct vm_fault
 		goto out_page;
 	}
 
-	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
+	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
 		ret = VM_FAULT_OOM;
 		goto out_page;
 	}
+	cgroup_throttle_swaprate(page, GFP_KERNEL);
 
 	/*
 	 * Back out if somebody else already faulted in this pte.
@@ -3245,11 +3242,9 @@ vm_fault_t do_swap_page(struct vm_fault
 
 	/* ksm created a completely new copy */
 	if (unlikely(page != swapcache && swapcache)) {
-		mem_cgroup_commit_charge(page, memcg, false);
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
-		mem_cgroup_commit_charge(page, memcg, true);
 		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
 		activate_page(page);
 	}
@@ -3286,7 +3281,6 @@ unlock:
 out:
 	return ret;
 out_nomap:
-	mem_cgroup_cancel_charge(page, memcg);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out_page:
 	unlock_page(page);
@@ -3307,7 +3301,6 @@ out_release:
 static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct mem_cgroup *memcg;
 	struct page *page;
 	vm_fault_t ret = 0;
 	pte_t entry;
@@ -3360,8 +3353,9 @@ static vm_fault_t do_anonymous_page(stru
 	if (!page)
 		goto oom;
 
-	if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg))
+	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
 		goto oom_free_page;
+	cgroup_throttle_swaprate(page, GFP_KERNEL);
 
 	/*
 	 * The memory barrier inside __SetPageUptodate makes sure that
@@ -3386,13 +3380,11 @@ static vm_fault_t do_anonymous_page(stru
 	/* Deliver the page fault to userland, check inside PT lock */
 	if (userfaultfd_missing(vma)) {
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-		mem_cgroup_cancel_charge(page, memcg);
 		put_page(page);
 		return handle_userfault(vmf, VM_UFFD_MISSING);
 	}
 
 	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-	mem_cgroup_commit_charge(page, memcg, false);
 	page_add_new_anon_rmap(page, vma, vmf->address, false);
 	lru_cache_add_active_or_unevictable(page, vma);
 setpte:
@@ -3404,7 +3396,6 @@ unlock:
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	return ret;
 release:
-	mem_cgroup_cancel_charge(page, memcg);
 	put_page(page);
 	goto unlock;
 oom_free_page:
@@ -3609,7 +3600,6 @@ static vm_fault_t do_set_pmd(struct vm_f
  * mapping. If needed, the fucntion allocates page table or use pre-allocated.
  *
  * @vmf: fault environment
- * @memcg: memcg to charge page (only for private mappings)
  * @page: page to map
  *
  * Caller must take care of unlocking vmf->ptl, if vmf->pte is non-NULL on
@@ -3620,8 +3610,7 @@ static vm_fault_t do_set_pmd(struct vm_f
  *
  * Return: %0 on success, %VM_FAULT_ code in case of error.
  */
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
-		struct page *page)
+vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -3629,9 +3618,6 @@ vm_fault_t alloc_set_pte(struct vm_fault
 	vm_fault_t ret;
 
 	if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
-		/* THP on COW? */
-		VM_BUG_ON_PAGE(memcg, page);
-
 		ret = do_set_pmd(vmf, page);
 		if (ret != VM_FAULT_FALLBACK)
 			return ret;
@@ -3654,7 +3640,6 @@ vm_fault_t alloc_set_pte(struct vm_fault
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-		mem_cgroup_commit_charge(page, memcg, false);
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	} else {
@@ -3704,7 +3689,7 @@ vm_fault_t finish_fault(struct vm_fault
 	if (!(vmf->vma->vm_flags & VM_SHARED))
 		ret = check_stable_address_space(vmf->vma->vm_mm);
 	if (!ret)
-		ret = alloc_set_pte(vmf, vmf->memcg, page);
+		ret = alloc_set_pte(vmf, page);
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	return ret;
@@ -3864,11 +3849,11 @@ static vm_fault_t do_cow_fault(struct vm
 	if (!vmf->cow_page)
 		return VM_FAULT_OOM;
 
-	if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm,
-					GFP_KERNEL, &vmf->memcg)) {
+	if (mem_cgroup_charge(vmf->cow_page, vma->vm_mm, GFP_KERNEL, false)) {
 		put_page(vmf->cow_page);
 		return VM_FAULT_OOM;
 	}
+	cgroup_throttle_swaprate(vmf->cow_page, GFP_KERNEL);
 
 	ret = __do_fault(vmf);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
@@ -3886,7 +3871,6 @@ static vm_fault_t do_cow_fault(struct vm
 		goto uncharge_out;
 	return ret;
 uncharge_out:
-	mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg);
 	put_page(vmf->cow_page);
 	return ret;
 }
--- a/mm/migrate.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/migrate.c
@@ -2746,7 +2746,6 @@ static void migrate_vma_insert_page(stru
 {
 	struct vm_area_struct *vma = migrate->vma;
 	struct mm_struct *mm = vma->vm_mm;
-	struct mem_cgroup *memcg;
 	bool flush = false;
 	spinlock_t *ptl;
 	pte_t entry;
@@ -2793,7 +2792,7 @@ static void migrate_vma_insert_page(stru
 
 	if (unlikely(anon_vma_prepare(vma)))
 		goto abort;
-	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg))
+	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
 		goto abort;
 
 	/*
@@ -2838,7 +2837,6 @@ static void migrate_vma_insert_page(stru
 		goto unlock_abort;
 
 	inc_mm_counter(mm, MM_ANONPAGES);
-	mem_cgroup_commit_charge(page, memcg, false);
 	page_add_new_anon_rmap(page, vma, addr, false);
 	if (!is_zone_device_page(page))
 		lru_cache_add_active_or_unevictable(page, vma);
@@ -2861,7 +2859,6 @@ static void migrate_vma_insert_page(stru
 
 unlock_abort:
 	pte_unmap_unlock(ptep, ptl);
-	mem_cgroup_cancel_charge(page, memcg);
 abort:
 	*src &= ~MIGRATE_PFN_MIGRATE;
 }
--- a/mm/swapfile.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/swapfile.c
@@ -1853,7 +1853,6 @@ static int unuse_pte(struct vm_area_stru
 		unsigned long addr, swp_entry_t entry, struct page *page)
 {
 	struct page *swapcache;
-	struct mem_cgroup *memcg;
 	spinlock_t *ptl;
 	pte_t *pte;
 	int ret = 1;
@@ -1863,14 +1862,13 @@ static int unuse_pte(struct vm_area_stru
 	if (unlikely(!page))
 		return -ENOMEM;
 
-	if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
+	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
 		ret = -ENOMEM;
 		goto out_nolock;
 	}
 
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
-		mem_cgroup_cancel_charge(page, memcg);
 		ret = 0;
 		goto out;
 	}
@@ -1881,10 +1879,8 @@ static int unuse_pte(struct vm_area_stru
 	set_pte_at(vma->vm_mm, addr, pte,
 		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
 	if (page == swapcache) {
-		mem_cgroup_commit_charge(page, memcg, true);
 		page_add_anon_rmap(page, vma, addr, false);
 	} else { /* ksm created a completely new copy */
-		mem_cgroup_commit_charge(page, memcg, false);
 		page_add_new_anon_rmap(page, vma, addr, false);
 		lru_cache_add_active_or_unevictable(page, vma);
 	}
--- a/mm/userfaultfd.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api
+++ a/mm/userfaultfd.c
@@ -56,7 +56,6 @@ static int mcopy_atomic_pte(struct mm_st
 			    struct page **pagep,
 			    bool wp_copy)
 {
-	struct mem_cgroup *memcg;
 	pte_t _dst_pte, *dst_pte;
 	spinlock_t *ptl;
 	void *page_kaddr;
@@ -97,7 +96,7 @@ static int mcopy_atomic_pte(struct mm_st
 	__SetPageUptodate(page);
 
 	ret = -ENOMEM;
-	if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg))
+	if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL, false))
 		goto out_release;
 
 	_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
@@ -123,7 +122,6 @@ static int mcopy_atomic_pte(struct mm_st
 		goto out_release_uncharge_unlock;
 
 	inc_mm_counter(dst_mm, MM_ANONPAGES);
-	mem_cgroup_commit_charge(page, memcg, false);
 	page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
 	lru_cache_add_active_or_unevictable(page, dst_vma);
 
@@ -138,7 +136,6 @@ out:
 	return ret;
 out_release_uncharge_unlock:
 	pte_unmap_unlock(dst_pte, ptl);
-	mem_cgroup_cancel_charge(page, memcg);
 out_release:
 	put_page(page);
 	goto out;
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (30 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-prepare-swap-controller-setup-for-integration.patch " Andrew Morton
                   ` (48 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: drop unused try/commit/cancel charge API
has been added to the -mm tree.  Its filename is
     mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: drop unused try/commit/cancel charge API

There are no more users. RIP in peace.

Link: http://lkml.kernel.org/r/20200508183105.225460-14-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |   36 ----------
 mm/memcontrol.c            |  126 ++++-------------------------------
 2 files changed, 15 insertions(+), 147 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-drop-unused-try-commit-cancel-charge-api
+++ a/include/linux/memcontrol.h
@@ -406,14 +406,6 @@ static inline bool mem_cgroup_below_min(
 		page_counter_read(&memcg->memory);
 }
 
-int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp);
-int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp);
-void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
-			      bool lrucare);
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
-
 int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
 		      bool lrucare);
 
@@ -907,34 +899,6 @@ static inline bool mem_cgroup_below_min(
 	return false;
 }
 
-static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
-					gfp_t gfp_mask,
-					struct mem_cgroup **memcgp)
-{
-	*memcgp = NULL;
-	return 0;
-}
-
-static inline int mem_cgroup_try_charge_delay(struct page *page,
-					      struct mm_struct *mm,
-					      gfp_t gfp_mask,
-					      struct mem_cgroup **memcgp)
-{
-	*memcgp = NULL;
-	return 0;
-}
-
-static inline void mem_cgroup_commit_charge(struct page *page,
-					    struct mem_cgroup *memcg,
-					    bool lrucare)
-{
-}
-
-static inline void mem_cgroup_cancel_charge(struct page *page,
-					    struct mem_cgroup *memcg)
-{
-}
-
 static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
 				    gfp_t gfp_mask, bool lrucare)
 {
--- a/mm/memcontrol.c~mm-memcontrol-drop-unused-try-commit-cancel-charge-api
+++ a/mm/memcontrol.c
@@ -6431,29 +6431,26 @@ void mem_cgroup_calculate_protection(str
 }
 
 /**
- * mem_cgroup_try_charge - try charging a page
+ * mem_cgroup_charge - charge a newly allocated page to a cgroup
  * @page: page to charge
  * @mm: mm context of the victim
  * @gfp_mask: reclaim mode
- * @memcgp: charged memcg return
+ * @lrucare: page might be on the LRU already
  *
  * Try to charge @page to the memcg that @mm belongs to, reclaiming
  * pages according to @gfp_mask if necessary.
  *
- * Returns 0 on success, with *@memcgp pointing to the charged memcg.
- * Otherwise, an error code is returned.
- *
- * After page->mapping has been set up, the caller must finalize the
- * charge with mem_cgroup_commit_charge().  Or abort the transaction
- * with mem_cgroup_cancel_charge() in case page instantiation fails.
+ * Returns 0 on success. Otherwise, an error code is returned.
  */
-int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp)
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
+		      bool lrucare)
 {
 	unsigned int nr_pages = hpage_nr_pages(page);
 	struct mem_cgroup *memcg = NULL;
 	int ret = 0;
 
+	VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
+
 	if (mem_cgroup_disabled())
 		goto out;
 
@@ -6485,56 +6482,8 @@ int mem_cgroup_try_charge(struct page *p
 		memcg = get_mem_cgroup_from_mm(mm);
 
 	ret = try_charge(memcg, gfp_mask, nr_pages);
-
-	css_put(&memcg->css);
-out:
-	*memcgp = memcg;
-	return ret;
-}
-
-int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
-			  gfp_t gfp_mask, struct mem_cgroup **memcgp)
-{
-	int ret;
-
-	ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
-	if (*memcgp)
-		cgroup_throttle_swaprate(page, gfp_mask);
-	return ret;
-}
-
-/**
- * mem_cgroup_commit_charge - commit a page charge
- * @page: page to charge
- * @memcg: memcg to charge the page to
- * @lrucare: page might be on LRU already
- *
- * Finalize a charge transaction started by mem_cgroup_try_charge(),
- * after page->mapping has been set up.  This must happen atomically
- * as part of the page instantiation, i.e. under the page table lock
- * for anonymous pages, under the page lock for page and swap cache.
- *
- * In addition, the page must not be on the LRU during the commit, to
- * prevent racing with task migration.  If it might be, use @lrucare.
- *
- * Use mem_cgroup_cancel_charge() to cancel the transaction instead.
- */
-void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
-			      bool lrucare)
-{
-	unsigned int nr_pages = hpage_nr_pages(page);
-
-	VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
-
-	if (mem_cgroup_disabled())
-		return;
-	/*
-	 * Swap faults will attempt to charge the same page multiple
-	 * times.  But reuse_swap_page() might have removed the page
-	 * from swapcache already, so we can't check PageSwapCache().
-	 */
-	if (!memcg)
-		return;
+	if (ret)
+		goto out_put;
 
 	commit_charge(page, memcg, lrucare);
 
@@ -6552,55 +6501,11 @@ void mem_cgroup_commit_charge(struct pag
 		 */
 		mem_cgroup_uncharge_swap(entry, nr_pages);
 	}
-}
 
-/**
- * mem_cgroup_cancel_charge - cancel a page charge
- * @page: page to charge
- * @memcg: memcg to charge the page to
- *
- * Cancel a charge transaction started by mem_cgroup_try_charge().
- */
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
-{
-	unsigned int nr_pages = hpage_nr_pages(page);
-
-	if (mem_cgroup_disabled())
-		return;
-	/*
-	 * Swap faults will attempt to charge the same page multiple
-	 * times.  But reuse_swap_page() might have removed the page
-	 * from swapcache already, so we can't check PageSwapCache().
-	 */
-	if (!memcg)
-		return;
-
-	cancel_charge(memcg, nr_pages);
-}
-
-/**
- * mem_cgroup_charge - charge a newly allocated page to a cgroup
- * @page: page to charge
- * @mm: mm context of the victim
- * @gfp_mask: reclaim mode
- * @lrucare: page might be on the LRU already
- *
- * Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
- *
- * Returns 0 on success. Otherwise, an error code is returned.
- */
-int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
-		      bool lrucare)
-{
-	struct mem_cgroup *memcg;
-	int ret;
-
-	ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
-	if (ret)
-		return ret;
-	mem_cgroup_commit_charge(page, memcg, lrucare);
-	return 0;
+out_put:
+	css_put(&memcg->css);
+out:
+	return ret;
 }
 
 struct uncharge_gather {
@@ -6705,8 +6610,7 @@ static void uncharge_list(struct list_he
  * mem_cgroup_uncharge - uncharge a page
  * @page: page to uncharge
  *
- * Uncharge a page previously charged with mem_cgroup_try_charge() and
- * mem_cgroup_commit_charge().
+ * Uncharge a page previously charged with mem_cgroup_charge().
  */
 void mem_cgroup_uncharge(struct page *page)
 {
@@ -6729,7 +6633,7 @@ void mem_cgroup_uncharge(struct page *pa
  * @page_list: list of pages to uncharge
  *
  * Uncharge a list of pages previously charged with
- * mem_cgroup_try_charge() and mem_cgroup_commit_charge().
+ * mem_cgroup_charge().
  */
 void mem_cgroup_uncharge_list(struct list_head *page_list)
 {
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-prepare-swap-controller-setup-for-integration.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (31 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch " Andrew Morton
                   ` (47 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: prepare swap controller setup for integration
has been added to the -mm tree.  Its filename is
     mm-memcontrol-prepare-swap-controller-setup-for-integration.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-prepare-swap-controller-setup-for-integration.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: prepare swap controller setup for integration

A few cleanups to streamline the swap controller setup:

- Replace the do_swap_account flag with cgroup_memory_noswap. This
  brings it in line with other functionality that is usually available
  unless explicitly opted out of - nosocket, nokmem.

- Remove the really_do_swap_account flag that stores the boot option
  and is later used to switch the do_swap_account. It's not clear why
  this indirection is/was necessary. Use do_swap_account directly.

- Minor coding style polishing

Link: http://lkml.kernel.org/r/20200508183105.225460-15-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    2 -
 mm/memcontrol.c            |   59 ++++++++++++++++-------------------
 mm/swap_cgroup.c           |    4 +-
 3 files changed, 31 insertions(+), 34 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-prepare-swap-controller-setup-for-integration
+++ a/include/linux/memcontrol.h
@@ -609,7 +609,7 @@ struct mem_cgroup *mem_cgroup_get_oom_gr
 void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
 
 #ifdef CONFIG_MEMCG_SWAP
-extern int do_swap_account;
+extern bool cgroup_memory_noswap;
 #endif
 
 struct mem_cgroup *lock_page_memcg(struct page *page);
--- a/mm/memcontrol.c~mm-memcontrol-prepare-swap-controller-setup-for-integration
+++ a/mm/memcontrol.c
@@ -83,10 +83,14 @@ static bool cgroup_memory_nokmem;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
-int do_swap_account __read_mostly;
+#ifdef CONFIG_MEMCG_SWAP_ENABLED
+bool cgroup_memory_noswap __read_mostly;
 #else
-#define do_swap_account		0
-#endif
+bool cgroup_memory_noswap __read_mostly = 1;
+#endif /* CONFIG_MEMCG_SWAP_ENABLED */
+#else
+#define cgroup_memory_noswap		1
+#endif /* CONFIG_MEMCG_SWAP */
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -95,7 +99,7 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgw
 /* Whether legacy memory+swap accounting is active */
 static bool do_memsw_account(void)
 {
-	return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && do_swap_account;
+	return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_noswap;
 }
 
 #define THRESHOLDS_EVENTS_TARGET 128
@@ -6458,18 +6462,19 @@ int mem_cgroup_charge(struct page *page,
 		/*
 		 * Every swap fault against a single page tries to charge the
 		 * page, bail as early as possible.  shmem_unuse() encounters
-		 * already charged pages, too.  The USED bit is protected by
-		 * the page lock, which serializes swap cache removal, which
+		 * already charged pages, too.  page->mem_cgroup is protected
+		 * by the page lock, which serializes swap cache removal, which
 		 * in turn serializes uncharging.
 		 */
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		if (compound_head(page)->mem_cgroup)
 			goto out;
 
-		if (do_swap_account) {
+		if (!cgroup_memory_noswap) {
 			swp_entry_t ent = { .val = page_private(page), };
-			unsigned short id = lookup_swap_cgroup_id(ent);
+			unsigned short id;
 
+			id = lookup_swap_cgroup_id(ent);
 			rcu_read_lock();
 			memcg = mem_cgroup_from_id(id);
 			if (memcg && !css_tryget_online(&memcg->css))
@@ -6942,7 +6947,7 @@ int mem_cgroup_try_charge_swap(struct pa
 	struct mem_cgroup *memcg;
 	unsigned short oldid;
 
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !do_swap_account)
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || cgroup_memory_noswap)
 		return 0;
 
 	memcg = page->mem_cgroup;
@@ -6986,7 +6991,7 @@ void mem_cgroup_uncharge_swap(swp_entry_
 	struct mem_cgroup *memcg;
 	unsigned short id;
 
-	if (!do_swap_account)
+	if (cgroup_memory_noswap)
 		return;
 
 	id = swap_cgroup_record(entry, 0, nr_pages);
@@ -7009,7 +7014,7 @@ long mem_cgroup_get_nr_swap_pages(struct
 {
 	long nr_swap_pages = get_nr_swap_pages();
 
-	if (!do_swap_account || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
+	if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return nr_swap_pages;
 	for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg))
 		nr_swap_pages = min_t(long, nr_swap_pages,
@@ -7026,7 +7031,7 @@ bool mem_cgroup_swap_full(struct page *p
 
 	if (vm_swap_full())
 		return true;
-	if (!do_swap_account || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
+	if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return false;
 
 	memcg = page->mem_cgroup;
@@ -7041,22 +7046,15 @@ bool mem_cgroup_swap_full(struct page *p
 	return false;
 }
 
-/* for remember boot option*/
-#ifdef CONFIG_MEMCG_SWAP_ENABLED
-static int really_do_swap_account __initdata = 1;
-#else
-static int really_do_swap_account __initdata;
-#endif
-
-static int __init enable_swap_account(char *s)
+static int __init setup_swap_account(char *s)
 {
 	if (!strcmp(s, "1"))
-		really_do_swap_account = 1;
+		cgroup_memory_noswap = 0;
 	else if (!strcmp(s, "0"))
-		really_do_swap_account = 0;
+		cgroup_memory_noswap = 1;
 	return 1;
 }
-__setup("swapaccount=", enable_swap_account);
+__setup("swapaccount=", setup_swap_account);
 
 static u64 swap_current_read(struct cgroup_subsys_state *css,
 			     struct cftype *cft)
@@ -7122,7 +7120,7 @@ static struct cftype swap_files[] = {
 	{ }	/* terminate */
 };
 
-static struct cftype memsw_cgroup_files[] = {
+static struct cftype memsw_files[] = {
 	{
 		.name = "memsw.usage_in_bytes",
 		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
@@ -7151,13 +7149,12 @@ static struct cftype memsw_cgroup_files[
 
 static int __init mem_cgroup_swap_init(void)
 {
-	if (!mem_cgroup_disabled() && really_do_swap_account) {
-		do_swap_account = 1;
-		WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys,
-					       swap_files));
-		WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys,
-						  memsw_cgroup_files));
-	}
+	if (mem_cgroup_disabled() || cgroup_memory_noswap)
+		return 0;
+
+	WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files));
+	WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files));
+
 	return 0;
 }
 subsys_initcall(mem_cgroup_swap_init);
--- a/mm/swap_cgroup.c~mm-memcontrol-prepare-swap-controller-setup-for-integration
+++ a/mm/swap_cgroup.c
@@ -171,7 +171,7 @@ int swap_cgroup_swapon(int type, unsigne
 	unsigned long length;
 	struct swap_cgroup_ctrl *ctrl;
 
-	if (!do_swap_account)
+	if (cgroup_memory_noswap)
 		return 0;
 
 	length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
@@ -209,7 +209,7 @@ void swap_cgroup_swapoff(int type)
 	unsigned long i, length;
 	struct swap_cgroup_ctrl *ctrl;
 
-	if (!do_swap_account)
+	if (cgroup_memory_noswap)
 		return;
 
 	mutex_lock(&swap_cgroup_mutex);
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (32 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-prepare-swap-controller-setup-for-integration.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-charge-swapin-pages-on-instantiation.patch " Andrew Morton
                   ` (46 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: make swap tracking an integral part of memory control
has been added to the -mm tree.  Its filename is
     mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: make swap tracking an integral part of memory control

Without swap page tracking, users that are otherwise memory controlled can
easily escape their containment and allocate significant amounts of memory
that they're not being charged for.  That's because swap does readahead,
but without the cgroup records of who owned the page at swapout, readahead
pages don't get charged until somebody actually faults them into their
page table and we can identify an owner task.  This can be maliciously
exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.

Make swap swap page tracking an integral part of memcg and remove the
Kconfig options.  In the first place, it was only made configurable to
allow users to save some memory.  But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G of
swap, or 0.04%.  Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.

The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files, but
it won't disable swap slot ownership tracking.

This patch makes sure we always have the cgroup records at swapin time;
the next patch will fix the actual bug by charging readahead swap pages at
swapin time rather than at fault time.

v2: fix double swap charge bug in cgroup1/cgroup2 code gating

Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/Kconfig     |   17 ----------------
 mm/memcontrol.c  |   47 +++++++++++++++++----------------------------
 mm/swap_cgroup.c |    6 -----
 3 files changed, 19 insertions(+), 51 deletions(-)

--- a/init/Kconfig~mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control
+++ a/init/Kconfig
@@ -835,24 +835,9 @@ config MEMCG
 	  Provides control over the memory footprint of tasks in a cgroup.
 
 config MEMCG_SWAP
-	bool "Swap controller"
+	bool
 	depends on MEMCG && SWAP
-	help
-	  Provides control over the swap space consumed by tasks in a cgroup.
-
-config MEMCG_SWAP_ENABLED
-	bool "Swap controller enabled by default"
-	depends on MEMCG_SWAP
 	default y
-	help
-	  Memory Resource Controller Swap Extension comes with its price in
-	  a bigger memory consumption. General purpose distribution kernels
-	  which want to enable the feature but keep it disabled by default
-	  and let the user enable it by swapaccount=1 boot command line
-	  parameter should have this option unselected.
-	  For those who want to have the feature enabled by default should
-	  select this option (if, for some reason, they need to disable it
-	  then swapaccount=0 does the trick).
 
 config MEMCG_KMEM
 	bool
--- a/mm/memcontrol.c~mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control
+++ a/mm/memcontrol.c
@@ -83,14 +83,10 @@ static bool cgroup_memory_nokmem;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
-#ifdef CONFIG_MEMCG_SWAP_ENABLED
 bool cgroup_memory_noswap __read_mostly;
 #else
-bool cgroup_memory_noswap __read_mostly = 1;
-#endif /* CONFIG_MEMCG_SWAP_ENABLED */
-#else
 #define cgroup_memory_noswap		1
-#endif /* CONFIG_MEMCG_SWAP */
+#endif
 
 #ifdef CONFIG_CGROUP_WRITEBACK
 static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -5296,8 +5292,7 @@ static struct page *mc_handle_swap_pte(s
 	 * we call find_get_page() with swapper_space directly.
 	 */
 	page = find_get_page(swap_address_space(ent), swp_offset(ent));
-	if (do_memsw_account())
-		entry->val = ent.val;
+	entry->val = ent.val;
 
 	return page;
 }
@@ -5331,8 +5326,7 @@ static struct page *mc_handle_file_pte(s
 		page = find_get_entry(mapping, pgoff);
 		if (xa_is_value(page)) {
 			swp_entry_t swp = radix_to_swp_entry(page);
-			if (do_memsw_account())
-				*entry = swp;
+			*entry = swp;
 			page = find_get_page(swap_address_space(swp),
 					     swp_offset(swp));
 		}
@@ -6459,6 +6453,9 @@ int mem_cgroup_charge(struct page *page,
 		goto out;
 
 	if (PageSwapCache(page)) {
+		swp_entry_t ent = { .val = page_private(page), };
+		unsigned short id;
+
 		/*
 		 * Every swap fault against a single page tries to charge the
 		 * page, bail as early as possible.  shmem_unuse() encounters
@@ -6470,17 +6467,12 @@ int mem_cgroup_charge(struct page *page,
 		if (compound_head(page)->mem_cgroup)
 			goto out;
 
-		if (!cgroup_memory_noswap) {
-			swp_entry_t ent = { .val = page_private(page), };
-			unsigned short id;
-
-			id = lookup_swap_cgroup_id(ent);
-			rcu_read_lock();
-			memcg = mem_cgroup_from_id(id);
-			if (memcg && !css_tryget_online(&memcg->css))
-				memcg = NULL;
-			rcu_read_unlock();
-		}
+		id = lookup_swap_cgroup_id(ent);
+		rcu_read_lock();
+		memcg = mem_cgroup_from_id(id);
+		if (memcg && !css_tryget_online(&memcg->css))
+			memcg = NULL;
+		rcu_read_unlock();
 	}
 
 	if (!memcg)
@@ -6497,7 +6489,7 @@ int mem_cgroup_charge(struct page *page,
 	memcg_check_events(memcg, page);
 	local_irq_enable();
 
-	if (do_memsw_account() && PageSwapCache(page)) {
+	if (PageSwapCache(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		/*
 		 * The swap entry might not get freed for a long time,
@@ -6882,7 +6874,7 @@ void mem_cgroup_swapout(struct page *pag
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 	VM_BUG_ON_PAGE(page_count(page), page);
 
-	if (!do_memsw_account())
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return;
 
 	memcg = page->mem_cgroup;
@@ -6911,7 +6903,7 @@ void mem_cgroup_swapout(struct page *pag
 	if (!mem_cgroup_is_root(memcg))
 		page_counter_uncharge(&memcg->memory, nr_entries);
 
-	if (memcg != swap_memcg) {
+	if (!cgroup_memory_noswap && memcg != swap_memcg) {
 		if (!mem_cgroup_is_root(swap_memcg))
 			page_counter_charge(&swap_memcg->memsw, nr_entries);
 		page_counter_uncharge(&memcg->memsw, nr_entries);
@@ -6947,7 +6939,7 @@ int mem_cgroup_try_charge_swap(struct pa
 	struct mem_cgroup *memcg;
 	unsigned short oldid;
 
-	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || cgroup_memory_noswap)
+	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
 		return 0;
 
 	memcg = page->mem_cgroup;
@@ -6963,7 +6955,7 @@ int mem_cgroup_try_charge_swap(struct pa
 
 	memcg = mem_cgroup_id_get_online(memcg);
 
-	if (!mem_cgroup_is_root(memcg) &&
+	if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg) &&
 	    !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
 		memcg_memory_event(memcg, MEMCG_SWAP_MAX);
 		memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
@@ -6991,14 +6983,11 @@ void mem_cgroup_uncharge_swap(swp_entry_
 	struct mem_cgroup *memcg;
 	unsigned short id;
 
-	if (cgroup_memory_noswap)
-		return;
-
 	id = swap_cgroup_record(entry, 0, nr_pages);
 	rcu_read_lock();
 	memcg = mem_cgroup_from_id(id);
 	if (memcg) {
-		if (!mem_cgroup_is_root(memcg)) {
+		if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg)) {
 			if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
 				page_counter_uncharge(&memcg->swap, nr_pages);
 			else
--- a/mm/swap_cgroup.c~mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control
+++ a/mm/swap_cgroup.c
@@ -171,9 +171,6 @@ int swap_cgroup_swapon(int type, unsigne
 	unsigned long length;
 	struct swap_cgroup_ctrl *ctrl;
 
-	if (cgroup_memory_noswap)
-		return 0;
-
 	length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
 	array_size = length * sizeof(void *);
 
@@ -209,9 +206,6 @@ void swap_cgroup_swapoff(int type)
 	unsigned long i, length;
 	struct swap_cgroup_ctrl *ctrl;
 
-	if (cgroup_memory_noswap)
-		return;

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-charge-swapin-pages-on-instantiation.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (33 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-document-the-new-swap-control-behavior.patch " Andrew Morton
                   ` (45 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: charge swapin pages on instantiation
has been added to the -mm tree.  Its filename is
     mm-memcontrol-charge-swapin-pages-on-instantiation.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-charge-swapin-pages-on-instantiation.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-charge-swapin-pages-on-instantiation.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: charge swapin pages on instantiation

Right now, users that are otherwise memory controlled can easily escape
their containment and allocate significant amounts of memory that they're
not being charged for.  That's because swap readahead pages are not being
charged until somebody actually faults them into their page table.  This
can be exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.

There are additional problems with the delayed charging of swap pages:

1. To implement refault/workingset detection for anonymous pages, we
   need to have a target LRU available at swapin time, but the LRU is not
   determinable until the page has been charged.

2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
   stable when the page is isolated from the LRU; otherwise, the locks
   change under us.  But swapcache gets charged after it's already on the
   LRU, and even if we cannot isolate it ourselves (since charging is not
   exactly optional).

The previous patch ensured we always maintain cgroup ownership records for
swap pages.  This patch moves the swapcache charging point from the fault
handler to swapin time to fix all of the above problems.

v2: simplify swapin error checking (Joonsoo)

Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory.c     |   15 +++++--
 mm/shmem.c      |   14 ++++---
 mm/swap_state.c |   89 ++++++++++++++++++++++++----------------------
 mm/swapfile.c   |    6 ---
 4 files changed, 67 insertions(+), 57 deletions(-)

--- a/mm/memory.c~mm-memcontrol-charge-swapin-pages-on-instantiation
+++ a/mm/memory.c
@@ -3127,9 +3127,20 @@ vm_fault_t do_swap_page(struct vm_fault
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
 							vmf->address);
 			if (page) {
+				int err;
+
 				__SetPageLocked(page);
 				__SetPageSwapBacked(page);
 				set_page_private(page, entry.val);
+
+				/* Tell memcg to use swap ownership records */
+				SetPageSwapCache(page);
+				err = mem_cgroup_charge(page, vma->vm_mm,
+							GFP_KERNEL, false);
+				ClearPageSwapCache(page);
+				if (err)
+					goto out_page;
+
 				lru_cache_add_anon(page);
 				swap_readpage(page, true);
 			}
@@ -3191,10 +3202,6 @@ vm_fault_t do_swap_page(struct vm_fault
 		goto out_page;
 	}
 
-	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
-		ret = VM_FAULT_OOM;
-		goto out_page;
-	}
 	cgroup_throttle_swaprate(page, GFP_KERNEL);
 
 	/*
--- a/mm/shmem.c~mm-memcontrol-charge-swapin-pages-on-instantiation
+++ a/mm/shmem.c
@@ -623,13 +623,15 @@ static int shmem_add_to_page_cache(struc
 	page->mapping = mapping;
 	page->index = index;
 
-	error = mem_cgroup_charge(page, charge_mm, gfp, PageSwapCache(page));
-	if (error) {
-		if (!PageSwapCache(page) && PageTransHuge(page)) {
-			count_vm_event(THP_FILE_FALLBACK);
-			count_vm_event(THP_FILE_FALLBACK_CHARGE);
+	if (!PageSwapCache(page)) {
+		error = mem_cgroup_charge(page, charge_mm, gfp, false);
+		if (error) {
+			if (PageTransHuge(page)) {
+				count_vm_event(THP_FILE_FALLBACK);
+				count_vm_event(THP_FILE_FALLBACK_CHARGE);
+			}
+			goto error;
 		}
-		goto error;
 	}
 	cgroup_throttle_swaprate(page, gfp);
 
--- a/mm/swapfile.c~mm-memcontrol-charge-swapin-pages-on-instantiation
+++ a/mm/swapfile.c
@@ -1862,11 +1862,6 @@ static int unuse_pte(struct vm_area_stru
 	if (unlikely(!page))
 		return -ENOMEM;
 
-	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
-		ret = -ENOMEM;
-		goto out_nolock;
-	}
-
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
 		ret = 0;
@@ -1892,7 +1887,6 @@ static int unuse_pte(struct vm_area_stru
 	activate_page(page);
 out:
 	pte_unmap_unlock(pte, ptl);
-out_nolock:
 	if (page != swapcache) {
 		unlock_page(page);
 		put_page(page);
--- a/mm/swap_state.c~mm-memcontrol-charge-swapin-pages-on-instantiation
+++ a/mm/swap_state.c
@@ -360,12 +360,13 @@ struct page *__read_swap_cache_async(swp
 			struct vm_area_struct *vma, unsigned long addr,
 			bool *new_page_allocated)
 {
-	struct page *found_page = NULL, *new_page = NULL;
 	struct swap_info_struct *si;
-	int err;
+	struct page *page;
+
 	*new_page_allocated = false;
 
-	do {
+	for (;;) {
+		int err;
 		/*
 		 * First check the swap cache.  Since this is normally
 		 * called after lookup_swap_cache() failed, re-calling
@@ -373,12 +374,12 @@ struct page *__read_swap_cache_async(swp
 		 */
 		si = get_swap_device(entry);
 		if (!si)
-			break;
-		found_page = find_get_page(swap_address_space(entry),
-					   swp_offset(entry));
+			return NULL;
+		page = find_get_page(swap_address_space(entry),
+				     swp_offset(entry));
 		put_swap_device(si);
-		if (found_page)
-			break;
+		if (page)
+			return page;
 
 		/*
 		 * Just skip read ahead for unused swap slot.
@@ -389,21 +390,15 @@ struct page *__read_swap_cache_async(swp
 		 * else swap_off will be aborted if we return NULL.
 		 */
 		if (!__swp_swapcount(entry) && swap_slot_cache_enabled)
-			break;
-
-		/*
-		 * Get a new page to read into from swap.
-		 */
-		if (!new_page) {
-			new_page = alloc_page_vma(gfp_mask, vma, addr);
-			if (!new_page)
-				break;		/* Out of memory */
-		}
+			return NULL;
 
 		/*
 		 * Swap entry may have been freed since our caller observed it.
 		 */
 		err = swapcache_prepare(entry);
+		if (!err)
+			break;
+
 		if (err == -EEXIST) {
 			/*
 			 * We might race against get_swap_page() and stumble
@@ -412,31 +407,43 @@ struct page *__read_swap_cache_async(swp
 			 */
 			cond_resched();
 			continue;
-		} else if (err)		/* swp entry is obsolete ? */
-			break;
-
-		/* May fail (-ENOMEM) if XArray node allocation failed. */
-		__SetPageLocked(new_page);
-		__SetPageSwapBacked(new_page);
-		err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
-		if (likely(!err)) {
-			/* Initiate read into locked page */
-			SetPageWorkingset(new_page);
-			lru_cache_add_anon(new_page);
-			*new_page_allocated = true;
-			return new_page;
 		}
-		__ClearPageLocked(new_page);
-		/*
-		 * add_to_swap_cache() doesn't return -EEXIST, so we can safely
-		 * clear SWAP_HAS_CACHE flag.
-		 */
-		put_swap_page(new_page, entry);
-	} while (err != -ENOMEM);
 
-	if (new_page)
-		put_page(new_page);
-	return found_page;
+		return NULL;
+	}
+
+	/*
+	 * The swap entry is ours to swap in. Prepare a new page.
+	 */
+
+	page = alloc_page_vma(gfp_mask, vma, addr);
+	if (!page)
+		goto fail_free;
+
+	__SetPageLocked(page);
+	__SetPageSwapBacked(page);
+
+	/* May fail (-ENOMEM) if XArray node allocation failed. */
+	if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL))
+		goto fail_unlock;
+
+	if (mem_cgroup_charge(page, NULL, gfp_mask & GFP_KERNEL, false))
+		goto fail_delete;
+
+	/* Initiate read into locked page */
+	SetPageWorkingset(page);
+	lru_cache_add_anon(page);
+	*new_page_allocated = true;
+	return page;
+
+fail_delete:
+	delete_from_swap_cache(page);
+fail_unlock:
+	unlock_page(page);
+	put_page(page);
+fail_free:
+	swap_free(entry);
+	return NULL;
 }
 
 /*
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-document-the-new-swap-control-behavior.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (34 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-charge-swapin-pages-on-instantiation.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-delete-unused-lrucare-handling.patch " Andrew Morton
                   ` (44 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: document the new swap control behavior
has been added to the -mm tree.  Its filename is
     mm-memcontrol-document-the-new-swap-control-behavior.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-document-the-new-swap-control-behavior.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-document-the-new-swap-control-behavior.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Alex Shi <alex.shi@linux.alibaba.com>
Subject: mm: memcontrol: document the new swap control behavior

Link: http://lkml.kernel.org/r/20200508183105.225460-18-hannes@cmpxchg.org
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/admin-guide/cgroup-v1/memory.rst |   19 +++++----------
 1 file changed, 7 insertions(+), 12 deletions(-)

--- a/Documentation/admin-guide/cgroup-v1/memory.rst~mm-memcontrol-document-the-new-swap-control-behavior
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's ful
 unaccounted when it's removed from radix-tree. Even if RSS pages are fully
 unmapped (by kswapd), they may exist as SwapCache in the system until they
 are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
+A swapped-in page is accounted after adding into swapcache.
 
 Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
+Since page's memcg recorded into swap whatever memsw enabled, the page will
+be accounted after swapin.
 
 At page migration, accounting information is kept.
 
@@ -222,18 +222,13 @@ the cgroup that brought it in -- this wi
 But see section 8.2: when moving a task to another cgroup, its pages may
 be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
 
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-delete-unused-lrucare-handling.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (35 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-document-the-new-swap-control-behavior.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:20 ` + mm-memcontrol-update-page-mem_cgroup-stability-rules.patch " Andrew Morton
                   ` (43 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: delete unused lrucare handling
has been added to the -mm tree.  Its filename is
     mm-memcontrol-delete-unused-lrucare-handling.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-delete-unused-lrucare-handling.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-delete-unused-lrucare-handling.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: delete unused lrucare handling

Swapin faults were the last event to charge pages after they had already
been put on the LRU list.  Now that we charge directly on swapin, the
lrucare portion of the charge code is unused.

Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memcontrol.h |    5 +--
 kernel/events/uprobes.c    |    3 -
 mm/filemap.c               |    2 -
 mm/huge_memory.c           |    2 -
 mm/khugepaged.c            |    4 +-
 mm/memcontrol.c            |   57 ++---------------------------------
 mm/memory.c                |    8 ++--
 mm/migrate.c               |    2 -
 mm/shmem.c                 |    2 -
 mm/swap_state.c            |    2 -
 mm/userfaultfd.c           |    2 -
 11 files changed, 19 insertions(+), 70 deletions(-)

--- a/include/linux/memcontrol.h~mm-memcontrol-delete-unused-lrucare-handling
+++ a/include/linux/memcontrol.h
@@ -406,8 +406,7 @@ static inline bool mem_cgroup_below_min(
 		page_counter_read(&memcg->memory);
 }
 
-int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
-		      bool lrucare);
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask);
 
 void mem_cgroup_uncharge(struct page *page);
 void mem_cgroup_uncharge_list(struct list_head *page_list);
@@ -900,7 +899,7 @@ static inline bool mem_cgroup_below_min(
 }
 
 static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
-				    gfp_t gfp_mask, bool lrucare)
+				    gfp_t gfp_mask)
 {
 	return 0;
 }
--- a/kernel/events/uprobes.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/kernel/events/uprobes.c
@@ -167,8 +167,7 @@ static int __replace_page(struct vm_area
 				addr + PAGE_SIZE);
 
 	if (new_page) {
-		err = mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL,
-					false);
+		err = mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL);
 		if (err)
 			return err;
 	}
--- a/mm/filemap.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/filemap.c
@@ -845,7 +845,7 @@ static int __add_to_page_cache_locked(st
 	page->index = offset;
 
 	if (!huge) {
-		error = mem_cgroup_charge(page, current->mm, gfp_mask, false);
+		error = mem_cgroup_charge(page, current->mm, gfp_mask);
 		if (error)
 			goto error;
 	}
--- a/mm/huge_memory.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/huge_memory.c
@@ -593,7 +593,7 @@ static vm_fault_t __do_huge_pmd_anonymou
 
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
-	if (mem_cgroup_charge(page, vma->vm_mm, gfp, false)) {
+	if (mem_cgroup_charge(page, vma->vm_mm, gfp)) {
 		put_page(page);
 		count_vm_event(THP_FAULT_FALLBACK);
 		count_vm_event(THP_FAULT_FALLBACK_CHARGE);
--- a/mm/khugepaged.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/khugepaged.c
@@ -973,7 +973,7 @@ static void collapse_huge_page(struct mm
 		goto out_nolock;
 	}
 
-	if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
+	if (unlikely(mem_cgroup_charge(new_page, mm, gfp))) {
 		result = SCAN_CGROUP_CHARGE_FAIL;
 		goto out_nolock;
 	}
@@ -1529,7 +1529,7 @@ static void collapse_file(struct mm_stru
 		goto out;
 	}
 
-	if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
+	if (unlikely(mem_cgroup_charge(new_page, mm, gfp))) {
 		result = SCAN_CGROUP_CHARGE_FAIL;
 		goto out;
 	}
--- a/mm/memcontrol.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/memcontrol.c
@@ -2603,51 +2603,9 @@ static void cancel_charge(struct mem_cgr
 	css_put_many(&memcg->css, nr_pages);
 }
 
-static void lock_page_lru(struct page *page, int *isolated)
+static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 {
-	pg_data_t *pgdat = page_pgdat(page);
-
-	spin_lock_irq(&pgdat->lru_lock);
-	if (PageLRU(page)) {
-		struct lruvec *lruvec;
-
-		lruvec = mem_cgroup_page_lruvec(page, pgdat);
-		ClearPageLRU(page);
-		del_page_from_lru_list(page, lruvec, page_lru(page));
-		*isolated = 1;
-	} else
-		*isolated = 0;
-}
-
-static void unlock_page_lru(struct page *page, int isolated)
-{
-	pg_data_t *pgdat = page_pgdat(page);
-
-	if (isolated) {
-		struct lruvec *lruvec;
-
-		lruvec = mem_cgroup_page_lruvec(page, pgdat);
-		VM_BUG_ON_PAGE(PageLRU(page), page);
-		SetPageLRU(page);
-		add_page_to_lru_list(page, lruvec, page_lru(page));
-	}
-	spin_unlock_irq(&pgdat->lru_lock);
-}
-
-static void commit_charge(struct page *page, struct mem_cgroup *memcg,
-			  bool lrucare)
-{
-	int isolated;
-
 	VM_BUG_ON_PAGE(page->mem_cgroup, page);
-
-	/*
-	 * In some cases, SwapCache and FUSE(splice_buf->radixtree), the page
-	 * may already be on some other mem_cgroup's LRU.  Take care of it.
-	 */
-	if (lrucare)
-		lock_page_lru(page, &isolated);
-
 	/*
 	 * Nobody should be changing or seriously looking at
 	 * page->mem_cgroup at this point:
@@ -2663,9 +2621,6 @@ static void commit_charge(struct page *p
 	 *   have the page locked
 	 */
 	page->mem_cgroup = memcg;
-
-	if (lrucare)
-		unlock_page_lru(page, isolated);
 }
 
 #ifdef CONFIG_MEMCG_KMEM
@@ -6433,22 +6388,18 @@ void mem_cgroup_calculate_protection(str
  * @page: page to charge
  * @mm: mm context of the victim
  * @gfp_mask: reclaim mode
- * @lrucare: page might be on the LRU already
  *
  * Try to charge @page to the memcg that @mm belongs to, reclaiming
  * pages according to @gfp_mask if necessary.
  *
  * Returns 0 on success. Otherwise, an error code is returned.
  */
-int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
-		      bool lrucare)
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
 {
 	unsigned int nr_pages = hpage_nr_pages(page);
 	struct mem_cgroup *memcg = NULL;
 	int ret = 0;
 
-	VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
-
 	if (mem_cgroup_disabled())
 		goto out;
 
@@ -6482,7 +6433,7 @@ int mem_cgroup_charge(struct page *page,
 	if (ret)
 		goto out_put;
 
-	commit_charge(page, memcg, lrucare);
+	commit_charge(page, memcg);
 
 	local_irq_disable();
 	mem_cgroup_charge_statistics(memcg, page, nr_pages);
@@ -6683,7 +6634,7 @@ void mem_cgroup_migrate(struct page *old
 		page_counter_charge(&memcg->memsw, nr_pages);
 	css_get_many(&memcg->css, nr_pages);
 
-	commit_charge(newpage, memcg, false);
+	commit_charge(newpage, memcg);
 
 	local_irq_save(flags);
 	mem_cgroup_charge_statistics(memcg, newpage, nr_pages);
--- a/mm/memory.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/memory.c
@@ -2677,7 +2677,7 @@ static vm_fault_t wp_page_copy(struct vm
 		}
 	}
 
-	if (mem_cgroup_charge(new_page, mm, GFP_KERNEL, false))
+	if (mem_cgroup_charge(new_page, mm, GFP_KERNEL))
 		goto oom_free_new;
 	cgroup_throttle_swaprate(new_page, GFP_KERNEL);
 
@@ -3136,7 +3136,7 @@ vm_fault_t do_swap_page(struct vm_fault
 				/* Tell memcg to use swap ownership records */
 				SetPageSwapCache(page);
 				err = mem_cgroup_charge(page, vma->vm_mm,
-							GFP_KERNEL, false);
+							GFP_KERNEL);
 				ClearPageSwapCache(page);
 				if (err)
 					goto out_page;
@@ -3360,7 +3360,7 @@ static vm_fault_t do_anonymous_page(stru
 	if (!page)
 		goto oom;
 
-	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
+	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL))
 		goto oom_free_page;
 	cgroup_throttle_swaprate(page, GFP_KERNEL);
 
@@ -3856,7 +3856,7 @@ static vm_fault_t do_cow_fault(struct vm
 	if (!vmf->cow_page)
 		return VM_FAULT_OOM;
 
-	if (mem_cgroup_charge(vmf->cow_page, vma->vm_mm, GFP_KERNEL, false)) {
+	if (mem_cgroup_charge(vmf->cow_page, vma->vm_mm, GFP_KERNEL)) {
 		put_page(vmf->cow_page);
 		return VM_FAULT_OOM;
 	}
--- a/mm/migrate.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/migrate.c
@@ -2792,7 +2792,7 @@ static void migrate_vma_insert_page(stru
 
 	if (unlikely(anon_vma_prepare(vma)))
 		goto abort;
-	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
+	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL))
 		goto abort;
 
 	/*
--- a/mm/shmem.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/shmem.c
@@ -624,7 +624,7 @@ static int shmem_add_to_page_cache(struc
 	page->index = index;
 
 	if (!PageSwapCache(page)) {
-		error = mem_cgroup_charge(page, charge_mm, gfp, false);
+		error = mem_cgroup_charge(page, charge_mm, gfp);
 		if (error) {
 			if (PageTransHuge(page)) {
 				count_vm_event(THP_FILE_FALLBACK);
--- a/mm/swap_state.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/swap_state.c
@@ -427,7 +427,7 @@ struct page *__read_swap_cache_async(swp
 	if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL))
 		goto fail_unlock;
 
-	if (mem_cgroup_charge(page, NULL, gfp_mask & GFP_KERNEL, false))
+	if (mem_cgroup_charge(page, NULL, gfp_mask & GFP_KERNEL))
 		goto fail_delete;
 
 	/* Initiate read into locked page */
--- a/mm/userfaultfd.c~mm-memcontrol-delete-unused-lrucare-handling
+++ a/mm/userfaultfd.c
@@ -96,7 +96,7 @@ static int mcopy_atomic_pte(struct mm_st
 	__SetPageUptodate(page);
 
 	ret = -ENOMEM;
-	if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL, false))
+	if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL))
 		goto out_release;
 
 	_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-update-page-mem_cgroup-stability-rules.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (36 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-delete-unused-lrucare-handling.patch " Andrew Morton
@ 2020-05-08 23:20 ` Andrew Morton
  2020-05-08 23:38 ` + selftests-vm-pkeys-introduce-powerpc-support-fix.patch " Andrew Morton
                   ` (42 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:20 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: memcontrol: update page->mem_cgroup stability rules
has been added to the -mm tree.  Its filename is
     mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-update-page-mem_cgroup-stability-rules.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: memcontrol: update page->mem_cgroup stability rules

The previous patches have simplified the access rules around
page->mem_cgroup somewhat:

1. We never change page->mem_cgroup while the page is isolated by
   somebody else.  This was by far the biggest exception to our rules and
   it didn't stop at lock_page() or lock_page_memcg().

2. We charge pages before they get put into page tables now, so the
   somewhat fishy rule about "can be in page table as long as it's still
   locked" is now gone and boiled down to having an exclusive reference to
   the page.

Document the new rules.  Any of the following will stabilize the
page->mem_cgroup association:

- the page lock
- LRU isolation
- lock_page_memcg()
- exclusive access to the page

Link: http://lkml.kernel.org/r/20200508183105.225460-20-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

--- a/mm/memcontrol.c~mm-memcontrol-update-page-mem_cgroup-stability-rules
+++ a/mm/memcontrol.c
@@ -1201,9 +1201,8 @@ int mem_cgroup_scan_tasks(struct mem_cgr
  * @page: the page
  * @pgdat: pgdat of the page
  *
- * This function is only safe when following the LRU page isolation
- * and putback protocol: the LRU lock must be held, and the page must
- * either be PageLRU() or the caller must have isolated/allocated it.
+ * This function relies on page->mem_cgroup being stable - see the
+ * access rules in commit_charge().
  */
 struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat)
 {
@@ -2607,18 +2606,12 @@ static void commit_charge(struct page *p
 {
 	VM_BUG_ON_PAGE(page->mem_cgroup, page);
 	/*
-	 * Nobody should be changing or seriously looking at
-	 * page->mem_cgroup at this point:
+	 * Any of the following ensures page->mem_cgroup stability:
 	 *
-	 * - the page is uncharged
-	 *
-	 * - the page is off-LRU
-	 *
-	 * - an anonymous fault has exclusive page access, except for
-	 *   a locked page table
-	 *
-	 * - a page cache insertion, a swapin fault, or a migration
-	 *   have the page locked
+	 * - the page lock
+	 * - LRU isolation
+	 * - lock_page_memcg()
+	 * - exclusive reference
 	 */
 	page->mem_cgroup = memcg;
 }
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + selftests-vm-pkeys-introduce-powerpc-support-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (37 preceding siblings ...)
  2020-05-08 23:20 ` + mm-memcontrol-update-page-mem_cgroup-stability-rules.patch " Andrew Morton
@ 2020-05-08 23:38 ` Andrew Morton
  2020-05-08 23:38 ` + selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch " Andrew Morton
                   ` (41 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:38 UTC (permalink / raw)
  To: aneesh.kumar, bauerman, dave.hansen, desnesn, fweimer, mhocko,
	mingo, mm-commits, mpe, msuchanek, sandipan, shuah


The patch titled
     Subject: selftests: vm: pkeys: fix powerpc access right updates
has been added to the -mm tree.  Its filename is
     selftests-vm-pkeys-introduce-powerpc-support-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/selftests-vm-pkeys-introduce-powerpc-support-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/selftests-vm-pkeys-introduce-powerpc-support-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Sandipan Das <sandipan@linux.ibm.com>
Subject: selftests: vm: pkeys: fix powerpc access right updates

The Power ISA mandates that all writes to the Authority Mask Register
(AMR) must always be preceded as well as succeeded by a
context-synchronizing instruction.  This applies to both the privileged
and unprivileged variants of the Move To AMR instruction.

Link: http://lkml.kernel.org/r/5f65cf37be993760de8112a88da194e3ccbb2bf8.1588959697.git.sandipan@linux.ibm.com
Fixes: 130f573c2a79 ("selftests/vm/pkeys: introduce powerpc support")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/pkey-powerpc.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/tools/testing/selftests/vm/pkey-powerpc.h~selftests-vm-pkeys-introduce-powerpc-support-fix
+++ a/tools/testing/selftests/vm/pkey-powerpc.h
@@ -54,7 +54,8 @@ static inline void __write_pkey_reg(u64
 	dprintf4("%s() changing %016llx to %016llx\n",
 			 __func__, __read_pkey_reg(), pkey_reg);
 
-	asm volatile("mtspr 0xd, %0" : : "r" ((unsigned long)(amr)) : "memory");
+	asm volatile("isync; mtspr 0xd, %0; isync"
+		     : : "r" ((unsigned long)(amr)) : "memory");
 
 	dprintf4("%s() pkey register after changing %016llx to %016llx\n",
 			__func__, __read_pkey_reg(), pkey_reg);
_

Patches currently in -mm which might be from sandipan@linux.ibm.com are

mm-reset-numa-stats-for-boot-pagesets.patch
selftests-vm-pkeys-use-sane-types-for-pkey-register.patch
selftests-vm-pkeys-add-helpers-for-pkey-bits.patch
selftests-vm-pkeys-use-the-correct-huge-page-size.patch
selftests-vm-pkeys-introduce-powerpc-support-fix.patch
selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch
selftests-vm-pkeys-use-the-correct-page-size-on-powerpc.patch
selftests-vm-pkeys-fix-multilib-builds-for-x86.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (38 preceding siblings ...)
  2020-05-08 23:38 ` + selftests-vm-pkeys-introduce-powerpc-support-fix.patch " Andrew Morton
@ 2020-05-08 23:38 ` Andrew Morton
  2020-05-08 23:40 ` + memcg-expose-root-cgroups-memorystat.patch " Andrew Morton
                   ` (40 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:38 UTC (permalink / raw)
  To: aneesh.kumar, bauerman, dave.hansen, desnesn, fweimer, linuxram,
	mhocko, mingo, mm-commits, mpe, msuchanek, sandipan, shuah


The patch titled
     Subject: selftests: vm: pkeys: fix powerpc access right definitions
has been added to the -mm tree.  Its filename is
     selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Sandipan Das <sandipan@linux.ibm.com>
Subject: selftests: vm: pkeys: fix powerpc access right definitions

For powerpc, PKEY_DISABLE_WRITE and PKEY_DISABLE_ACCESS are redefined only
if the system headers already define them.  Otherwise, the test fails to
compile due to their absence.  This makes sure that they are always
defined irrespective of them being present in the system headers.

Link: http://lkml.kernel.org/r/1ba86fd8a94f38131cfe2d9f277001dd1ad1d34e.1588959697.git.sandipan@linux.ibm.com
Fixes: 130f573c2a79 ("selftests/vm/pkeys: introduce powerpc support")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/pkey-powerpc.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/tools/testing/selftests/vm/pkey-powerpc.h~selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix
+++ a/tools/testing/selftests/vm/pkey-powerpc.h
@@ -16,15 +16,11 @@
 #define fpregs			fp_regs
 #define si_pkey_offset		0x20
 
-#ifdef PKEY_DISABLE_ACCESS
 #undef PKEY_DISABLE_ACCESS
-# define PKEY_DISABLE_ACCESS	0x3  /* disable read and write */
-#endif
+#define PKEY_DISABLE_ACCESS	0x3  /* disable read and write */
 
-#ifdef PKEY_DISABLE_WRITE
 #undef PKEY_DISABLE_WRITE
-# define PKEY_DISABLE_WRITE	0x2
-#endif
+#define PKEY_DISABLE_WRITE	0x2
 
 #define NR_PKEYS		32
 #define NR_RESERVED_PKEYS_4K	27 /* pkey-0, pkey-1, exec-only-pkey
_

Patches currently in -mm which might be from sandipan@linux.ibm.com are

mm-reset-numa-stats-for-boot-pagesets.patch
selftests-vm-pkeys-use-sane-types-for-pkey-register.patch
selftests-vm-pkeys-add-helpers-for-pkey-bits.patch
selftests-vm-pkeys-use-the-correct-huge-page-size.patch
selftests-vm-pkeys-introduce-powerpc-support-fix.patch
selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch
selftests-vm-pkeys-use-the-correct-page-size-on-powerpc.patch
selftests-vm-pkeys-fix-multilib-builds-for-x86.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + memcg-expose-root-cgroups-memorystat.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (39 preceding siblings ...)
  2020-05-08 23:38 ` + selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch " Andrew Morton
@ 2020-05-08 23:40 ` Andrew Morton
  2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch " Andrew Morton
                   ` (39 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:40 UTC (permalink / raw)
  To: guro, hannes, laoar.shao, mgorman, mhocko, mm-commits, shakeelb


The patch titled
     Subject: memcg: expose root cgroup's memory.stat
has been added to the -mm tree.  Its filename is
     memcg-expose-root-cgroups-memorystat.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/memcg-expose-root-cgroups-memorystat.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/memcg-expose-root-cgroups-memorystat.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Shakeel Butt <shakeelb@google.com>
Subject: memcg: expose root cgroup's memory.stat

One way to measure the efficiency of memory reclaim is to look at the
ratio (pgscan+pfrefill)/pgsteal.  However at the moment these stats are
not updated consistently at the system level and the ratio of these are
not very meaningful.  The pgsteal and pgscan are updated for only global
reclaim while pgrefill gets updated for global as well as cgroup reclaim.

Please note that this difference is only for system level vmstats.  The
cgroup stats returned by memory.stat are actually consistent.  The
cgroup's pgsteal contains number of reclaimed pages for global as well as
cgroup reclaim.  So, one way to get the system level stats is to get these
stats from root's memory.stat, so, expose memory.stat for the root cgroup.

	from Johannes Weiner:
	There are subtle differences between /proc/vmstat and
	memory.stat, and cgroup-aware code that wants to watch the full
	hierarchy currently has to know about these intricacies and
	translate semantics back and forth.

	Generally having the fully recursive memory.stat at the root
	level could help a broader range of usecases.

Link: http://lkml.kernel.org/r/20200508170630.94406-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    1 -
 1 file changed, 1 deletion(-)

--- a/mm/memcontrol.c~memcg-expose-root-cgroups-memorystat
+++ a/mm/memcontrol.c
@@ -6180,7 +6180,6 @@ static struct cftype memory_files[] = {
 	},
 	{
 		.name = "stat",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = memory_stat_show,
 	},
 	{
_

Patches currently in -mm which might be from shakeelb@google.com are

memcg-optimize-memorynuma_stat-like-memorystat.patch
memcg-expose-root-cgroups-memorystat.patch
mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (40 preceding siblings ...)
  2020-05-08 23:40 ` + memcg-expose-root-cgroups-memorystat.patch " Andrew Morton
@ 2020-05-08 23:43 ` Andrew Morton
  2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch " Andrew Morton
                   ` (38 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:43 UTC (permalink / raw)
  To: guro, khlebnikov, mhocko, mm-commits, rdunlap


The patch titled
     Subject: doc: cgroup: update note about conditions when oom killer is invoked
has been added to the -mm tree.  Its filename is
     doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Subject: doc: cgroup: update note about conditions when oom killer is invoked

Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
back to the charge path") cgroup oom killer is no longer invoked only from
page faults.  Now it implements the same semantics as global OOM killer:
allocation context invokes OOM killer and keeps retrying until success.

Link: http://lkml.kernel.org/r/158894738928.208854.5244393925922074518.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/admin-guide/cgroup-v2.rst |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/Documentation/admin-guide/cgroup-v2.rst~doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
 	Under certain circumstances, the usage may go over the limit
 	temporarily.
 
+	In default configuration regular 0-order allocation always
+	succeed unless OOM killer choose current task as a victim.
+
+	Some kinds of allocations don't invoke the OOM killer.
+	Caller could retry them differently, return into userspace
+	as -ENOMEM or silently ignore in cases like disk readahead.
+
 	This is the ultimate protection mechanism.  As long as the
 	high limit is used and monitored properly, this limit's
 	utility is limited to providing the final safety net.
@@ -1228,17 +1235,9 @@ PAGE_SIZE multiple when read back.
 		The number of time the cgroup's memory usage was
 		reached the limit and allocation was about to fail.
 
-		Depending on context result could be invocation of OOM
-		killer and retrying allocation or failing allocation.
-
-		Failed allocation in its turn could be returned into
-		userspace as -ENOMEM or silently ignored in cases like
-		disk readahead.  For now OOM in memory cgroup kills
-		tasks iff shortage has happened inside page fault.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (41 preceding siblings ...)
  2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch " Andrew Morton
@ 2020-05-08 23:43 ` Andrew Morton
  2020-05-08 23:48 ` + zcomp-use-array_size-for-backends-list.patch " Andrew Morton
                   ` (37 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:43 UTC (permalink / raw)
  To: akpm, guro, khlebnikov, mhocko, mm-commits, rdunlap


The patch titled
     Subject: doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix
has been added to the -mm tree.  Its filename is
     doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix

fixes per Randy

Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/admin-guide/cgroup-v2.rst |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/Documentation/admin-guide/cgroup-v2.rst~doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -1172,8 +1172,8 @@ PAGE_SIZE multiple when read back.
 	Under certain circumstances, the usage may go over the limit
 	temporarily.
 
-	In default configuration regular 0-order allocation always
-	succeed unless OOM killer choose current task as a victim.
+	In default configuration regular 0-order allocations always
+	succeed unless OOM killer chooses current task as a victim.
 
 	Some kinds of allocations don't invoke the OOM killer.
 	Caller could retry them differently, return into userspace
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fixpatch-added-to-mm-tree.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
linux-next-rejects.patch
linux-next-git-rejects.patch
arch-x86-makefile-use-config_shell.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + zcomp-use-array_size-for-backends-list.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (42 preceding siblings ...)
  2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch " Andrew Morton
@ 2020-05-08 23:48 ` Andrew Morton
  2020-05-08 23:53 ` + device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch " Andrew Morton
                   ` (36 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:48 UTC (permalink / raw)
  To: akpm, andriy.shevchenko, axboe, minchan, mm-commits,
	sergey.senozhatsky.work


The patch titled
     Subject: zcomp: Use ARRAY_SIZE() for backends list
has been added to the -mm tree.  Its filename is
     zcomp-use-array_size-for-backends-list.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/zcomp-use-array_size-for-backends-list.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/zcomp-use-array_size-for-backends-list.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Subject: zcomp: Use ARRAY_SIZE() for backends list

Instead of keeping NULL terminated array switch to use ARRAY_SIZE()
which helps to further clean up.

Link: http://lkml.kernel.org/r/20200508100758.51644-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/block/zram/zcomp.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/block/zram/zcomp.c~zcomp-use-array_size-for-backends-list
+++ a/drivers/block/zram/zcomp.c
@@ -29,7 +29,6 @@ static const char * const backends[] = {
 #if IS_ENABLED(CONFIG_CRYPTO_ZSTD)
 	"zstd",
 #endif
-	NULL
 };
 
 static void zcomp_strm_free(struct zcomp_strm *zstrm)
@@ -67,7 +66,7 @@ bool zcomp_available_algorithm(const cha
 {
 	int i;
 
-	i = __sysfs_match_string(backends, -1, comp);
+	i = sysfs_match_string(backends, comp);
 	if (i >= 0)
 		return true;
 
@@ -86,9 +85,9 @@ ssize_t zcomp_available_show(const char
 {
 	bool known_algorithm = false;
 	ssize_t sz = 0;
-	int i = 0;
+	int i;
 
-	for (; backends[i]; i++) {
+	for (i = 0; i < ARRAY_SIZE(backends); i++) {
 		if (!strcmp(comp, backends[i])) {
 			known_algorithm = true;
 			sz += scnprintf(buf + sz, PAGE_SIZE - sz - 2,
_

Patches currently in -mm which might be from andriy.shevchenko@linux.intel.com are

zcomp-use-array_size-for-backends-list.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (43 preceding siblings ...)
  2020-05-08 23:48 ` + zcomp-use-array_size-for-backends-list.patch " Andrew Morton
@ 2020-05-08 23:53 ` Andrew Morton
  2020-05-08 23:56 ` + mm-memory_hotplug-introduce-add_memory_driver_managed.patch " Andrew Morton
                   ` (35 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:53 UTC (permalink / raw)
  To: akpm, dan.j.williams, dave.jiang, david, mm-commits,
	pasha.tatashin, stable, vishal.l.verma


The patch titled
     Subject: device-dax: don't leak kernel memory to user space after unloading kmem
has been added to the -mm tree.  Its filename is
     device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: device-dax: don't leak kernel memory to user space after unloading kmem

Assume we have kmem configured and loaded:
  [root@localhost ~]# cat /proc/iomem
  ...
  140000000-33fffffff : Persistent Memory$
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : dax0.0
      150000000-33fffffff : System RAM

Assume we try to unload kmem. This force-unloading will work, even if
memory cannot get removed from the system.
  [root@localhost ~]# rmmod kmem
  [   86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
  ...
  [   86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot

Now, we can reconfigure the namespace:
  [root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
  [  131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
  [  131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
  ...

This fails as expected due to the busy memory resource, and the memory
cannot be used. However, the dax0.0 device is removed, and along its name.

The name of the memory resource now points at freed memory (name of the
device).
  [root@localhost ~]# cat /proc/iomem
  ...
  140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
    150000000-33fffffff : System RAM

We have to make sure to duplicate the string.  While at it, remove the
superfluous setting of the name and fixup a stale comment.

Link: http://lkml.kernel.org/r/20200508084217.9160-2-david@redhat.com
Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[5.3]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/dax/kmem.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/dax/kmem.c~device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem
+++ a/drivers/dax/kmem.c
@@ -22,6 +22,7 @@ int dev_dax_kmem_probe(struct device *de
 	resource_size_t kmem_size;
 	resource_size_t kmem_end;
 	struct resource *new_res;
+	const char *new_res_name;
 	int numa_node;
 	int rc;
 
@@ -48,11 +49,16 @@ int dev_dax_kmem_probe(struct device *de
 	kmem_size &= ~(memory_block_size_bytes() - 1);
 	kmem_end = kmem_start + kmem_size;
 
-	/* Region is permanently reserved.  Hot-remove not yet implemented. */
-	new_res = request_mem_region(kmem_start, kmem_size, dev_name(dev));
+	new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
+	if (!new_res_name)
+		return -ENOMEM;
+
+	/* Region is permanently reserved if hotremove fails. */
+	new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
 	if (!new_res) {
 		dev_warn(dev, "could not reserve region [%pa-%pa]\n",
 			 &kmem_start, &kmem_end);
+		kfree(new_res_name);
 		return -EBUSY;
 	}
 
@@ -63,12 +69,12 @@ int dev_dax_kmem_probe(struct device *de
 	 * unknown to us that will break add_memory() below.
 	 */
 	new_res->flags = IORESOURCE_SYSTEM_RAM;
-	new_res->name = dev_name(dev);
 
 	rc = add_memory(numa_node, new_res->start, resource_size(new_res));
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);
+		kfree(new_res_name);
 		return rc;
 	}
 	dev_dax->dax_kmem_res = new_res;
@@ -83,6 +89,7 @@ static int dev_dax_kmem_remove(struct de
 	struct resource *res = dev_dax->dax_kmem_res;
 	resource_size_t kmem_start = res->start;
 	resource_size_t kmem_size = resource_size(res);
+	const char *res_name = res->name;
 	int rc;
 
 	/*
@@ -102,6 +109,7 @@ static int dev_dax_kmem_remove(struct de
 	/* Release and free dax resources */
 	release_resource(res);
 	kfree(res);
+	kfree(res_name);
 	dev_dax->dax_kmem_res = NULL;
 
 	return 0;
_

Patches currently in -mm which might be from david@redhat.com are

device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memory_hotplug-introduce-add_memory_driver_managed.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (44 preceding siblings ...)
  2020-05-08 23:53 ` + device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch " Andrew Morton
@ 2020-05-08 23:56 ` Andrew Morton
  2020-05-08 23:56 ` + kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch " Andrew Morton
                   ` (34 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:56 UTC (permalink / raw)
  To: bhe, dan.j.williams, dave.hansen, david, ebiederm, mhocko,
	mm-commits, pankaj.gupta.linux, pasha.tatashin, richard.weiyang


The patch titled
     Subject: mm/memory_hotplug: introduce add_memory_driver_managed()
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-introduce-add_memory_driver_managed.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-introduce-add_memory_driver_managed.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-introduce-add_memory_driver_managed.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm/memory_hotplug: introduce add_memory_driver_managed()

Patch series "mm/memory_hotplug: Interface to add driver-managed system
ram", v4.

kexec (via kexec_load()) can currently not properly handle memory added
via dax/kmem, and will have similar issues with virtio-mem.  kexec-tools
will currently add all memory to the fixed-up initial firmware memmap.  In
case of dax/kmem, this means that - in contrast to a proper reboot - how
that persistent memory will be used can no longer be configured by the
kexec'd kernel.  In case of virtio-mem it will be harmful, because that
memory might contain inaccessible pieces that require coordination with
hypervisor first.

In both cases, we want to let the driver in the kexec'd kernel handle
detecting and adding the memory, like during an ordinary reboot. 
Introduce add_memory_driver_managed().  More on the samentics are in patch
#1.

In the future, we might want to make this behavior configurable for
dax/kmem- either by configuring it in the kernel (which would then also
allow to configure kexec_file_load()) or in kexec-tools by also adding
"System RAM (kmem)" memory from /proc/iomem to the fixed-up initial
firmware memmap.

More on the motivation can be found in [1] and [2].

[1] https://lkml.kernel.org/r/20200429160803.109056-1-david@redhat.com
[2] https://lkml.kernel.org/r/20200430102908.10107-1-david@redhat.com


This patch (of 3):

Some device drivers rely on memory they managed to not get added to the
initial (firmware) memmap as system RAM - so it's not used as initial
system RAM by the kernel and the driver is under control.  While this is
the case during cold boot and after a reboot, kexec is not aware of that
and might add such memory to the initial (firmware) memmap of the kexec
kernel.  We need ways to teach kernel and userspace that this system ram
is different.

For example, dax/kmem allows to decide at runtime if persistent memory is
to be used as system ram.  Another future user is virtio-mem, which has to
coordinate with its hypervisor to deal with inaccessible parts within
memory resources.

We want to let users in the kernel (esp. kexec) but also user space
(esp. kexec-tools) know that this memory has different semantics and
needs to be handled differently:
1. Don't create entries in /sys/firmware/memmap/
2. Name the memory resource "System RAM ($DRIVER)" (exposed via
   /proc/iomem) ($DRIVER might be "kmem", "virtio_mem").
3. Flag the memory resource IORESOURCE_MEM_DRIVER_MANAGED

/sys/firmware/memmap/ [1] represents the "raw firmware-provided memory
map" because "on most architectures that firmware-provided memory map is
modified afterwards by the kernel itself".  The primary user is kexec on
x86-64.  Since commit d96ae5309165 ("memory-hotplug: create
/sys/firmware/memmap entry for new memory"), we add all hotplugged memory
to that firmware memmap - which makes perfect sense for traditional memory
hotplug on x86-64, where real HW will also add hotplugged DIMMs to the
firmware memmap.  We replicate what the "raw firmware-provided memory map"
looks like after hot(un)plug.

To keep things simple, let the user provide the full resource name instead
of only the driver name - this way, we don't have to manually
allocate/craft strings for memory resources.  Also use the resource name
to make decisions, to avoid passing additional flags.  In case the name
isn't "System RAM", it's special.

We don't have to worry about firmware_map_remove() on the removal path. 
If there is no entry, it will simply return with -EINVAL.

We'll adapt dax/kmem in a follow-up patch.

[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap

Link: http://lkml.kernel.org/r/20200508084217.9160-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200508084217.9160-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/ioport.h         |    1 
 include/linux/memory_hotplug.h |    2 +
 mm/memory_hotplug.c            |   62 +++++++++++++++++++++++++++++--
 3 files changed, 61 insertions(+), 4 deletions(-)

--- a/include/linux/ioport.h~mm-memory_hotplug-introduce-add_memory_driver_managed
+++ a/include/linux/ioport.h
@@ -103,6 +103,7 @@ struct resource {
 #define IORESOURCE_MEM_32BIT		(3<<3)
 #define IORESOURCE_MEM_SHADOWABLE	(1<<5)	/* dup: IORESOURCE_SHADOWABLE */
 #define IORESOURCE_MEM_EXPANSIONROM	(1<<6)
+#define IORESOURCE_MEM_DRIVER_MANAGED	(1<<7)
 
 /* PnP I/O specific bits (IORESOURCE_BITS) */
 #define IORESOURCE_IO_16BIT_ADDR	(1<<0)
--- a/include/linux/memory_hotplug.h~mm-memory_hotplug-introduce-add_memory_driver_managed
+++ a/include/linux/memory_hotplug.h
@@ -342,6 +342,8 @@ extern void __ref free_area_init_core_ho
 extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
+extern int add_memory_driver_managed(int nid, u64 start, u64 size,
+				     const char *resource_name);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern void remove_pfn_range_from_zone(struct zone *zone,
--- a/mm/memory_hotplug.c~mm-memory_hotplug-introduce-add_memory_driver_managed
+++ a/mm/memory_hotplug.c
@@ -98,11 +98,14 @@ void mem_hotplug_done(void)
 u64 max_mem_size = U64_MAX;
 
 /* add this memory to iomem resource */
-static struct resource *register_memory_resource(u64 start, u64 size)
+static struct resource *register_memory_resource(u64 start, u64 size,
+						 const char *resource_name)
 {
 	struct resource *res;
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-	char *resource_name = "System RAM";
+
+	if (strcmp(resource_name, "System RAM"))
+		flags |= IORESOURCE_MEM_DRIVER_MANAGED;
 
 	/*
 	 * Make sure value parsed from 'mem=' only restricts memory adding
@@ -1057,7 +1060,8 @@ int __ref add_memory_resource(int nid, s
 	BUG_ON(ret);
 
 	/* create new memmap entry */
-	firmware_map_add_hotplug(start, start + size, "System RAM");
+	if (!strcmp(res->name, "System RAM"))
+		firmware_map_add_hotplug(start, start + size, "System RAM");
 
 	/* device_online() will take the lock when calling online_pages() */
 	mem_hotplug_done();
@@ -1083,7 +1087,7 @@ int __ref __add_memory(int nid, u64 star
 	struct resource *res;
 	int ret;
 
-	res = register_memory_resource(start, size);
+	res = register_memory_resource(start, size, "System RAM");
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
@@ -1105,6 +1109,56 @@ int add_memory(int nid, u64 start, u64 s
 }
 EXPORT_SYMBOL_GPL(add_memory);
 
+/*
+ * Add special, driver-managed memory to the system as system RAM. Such
+ * memory is not exposed via the raw firmware-provided memmap as system
+ * RAM, instead, it is detected and added by a driver - during cold boot,
+ * after a reboot, and after kexec.
+ *
+ * Reasons why this memory should not be used for the initial memmap of a
+ * kexec kernel or for placing kexec images:
+ * - The booting kernel is in charge of determining how this memory will be
+ *   used (e.g., use persistent memory as system RAM)
+ * - Coordination with a hypervisor is required before this memory
+ *   can be used (e.g., inaccessible parts).
+ *
+ * For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided
+ * memory map") are created. Also, the created memory resource is flagged
+ * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-case
+ * this memory as well (esp., not place kexec images onto it).
+ *
+ * The resource_name (visible via /proc/iomem) has to have the format
+ * "System RAM ($DRIVER)".
+ */
+int add_memory_driver_managed(int nid, u64 start, u64 size,
+			      const char *resource_name)
+{
+	struct resource *res;
+	int rc;
+
+	if (!resource_name ||
+	    strstr(resource_name, "System RAM (") != resource_name ||
+	    resource_name[strlen(resource_name) - 1] != ')')
+		return -EINVAL;
+
+	lock_device_hotplug();
+
+	res = register_memory_resource(start, size, resource_name);
+	if (IS_ERR(res)) {
+		rc = PTR_ERR(res);
+		goto out_unlock;
+	}
+
+	rc = add_memory_resource(nid, res);
+	if (rc < 0)
+		release_memory_resource(res);
+
+out_unlock:
+	unlock_device_hotplug();
+	return rc;
+}
+EXPORT_SYMBOL_GPL(add_memory_driver_managed);
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * Confirm all pages in a range [start, end) belong to the same zone (skipping
_

Patches currently in -mm which might be from david@redhat.com are

device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch
mm-memory_hotplug-introduce-add_memory_driver_managed.patch
kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch
device-dax-add-memory-via-add_memory_driver_managed.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (45 preceding siblings ...)
  2020-05-08 23:56 ` + mm-memory_hotplug-introduce-add_memory_driver_managed.patch " Andrew Morton
@ 2020-05-08 23:56 ` Andrew Morton
  2020-05-08 23:56 ` + device-dax-add-memory-via-add_memory_driver_managed.patch " Andrew Morton
                   ` (33 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:56 UTC (permalink / raw)
  To: bhe, dan.j.williams, dave.hansen, david, ebiederm, mhocko,
	mm-commits, pankaj.gupta.linux, pasha.tatashin, richard.weiyang


The patch titled
     Subject: kexec_file: don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED
has been added to the -mm tree.  Its filename is
     kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: kexec_file: don't place kexec images on IORESOURCE_MEM_DRIVER_MANAGED

Memory flagged with IORESOURCE_MEM_DRIVER_MANAGED is special - it won't be
part of the initial memmap of the kexec kernel and not all memory might be
accessible.  Don't place any kexec images onto it.

Link: http://lkml.kernel.org/r/20200508084217.9160-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/kexec_file.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/kernel/kexec_file.c~kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed
+++ a/kernel/kexec_file.c
@@ -540,6 +540,11 @@ static int locate_mem_hole_callback(stru
 	unsigned long sz = end - start + 1;
 
 	/* Returning 0 will take to next memory range */
+
+	/* Don't use memory that will be detected and handled by a driver. */
+	if (res->flags & IORESOURCE_MEM_DRIVER_MANAGED)
+		return 0;
+
 	if (sz < kbuf->memsz)
 		return 0;
 
_

Patches currently in -mm which might be from david@redhat.com are

device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch
mm-memory_hotplug-introduce-add_memory_driver_managed.patch
kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch
device-dax-add-memory-via-add_memory_driver_managed.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + device-dax-add-memory-via-add_memory_driver_managed.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (46 preceding siblings ...)
  2020-05-08 23:56 ` + kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch " Andrew Morton
@ 2020-05-08 23:56 ` Andrew Morton
  2020-05-09  0:45 ` + mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
                   ` (32 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-08 23:56 UTC (permalink / raw)
  To: bhe, dan.j.williams, dave.hansen, david, ebiederm, mhocko,
	mm-commits, pankaj.gupta.linux, pasha.tatashin, richard.weiyang


The patch titled
     Subject: device-dax: add memory via add_memory_driver_managed()
has been added to the -mm tree.  Its filename is
     device-dax-add-memory-via-add_memory_driver_managed.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/device-dax-add-memory-via-add_memory_driver_managed.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/device-dax-add-memory-via-add_memory_driver_managed.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: device-dax: add memory via add_memory_driver_managed()

Currently, when adding memory, we create entries in /sys/firmware/memmap/
as "System RAM".  This will lead to kexec-tools to add that memory to the
fixed-up initial memmap for a kexec kernel (loaded via kexec_load()).  The
memory will be considered initial System RAM by the kexec'd kernel and can
no longer be reconfigured.  This is not what happens during a real reboot.

Let's add our memory via add_memory_driver_managed() now, so we won't
create entries in /sys/firmware/memmap/ and indicate the memory as "System
RAM (kmem)" in /proc/iomem.  This allows everybody (especially
kexec-tools) to identify that this memory is special and has to be treated
differently than ordinary (hotplugged) System RAM.

Before configuring the namespace:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-33fffffff : namespace0.0
	3280000000-32ffffffff : PCI Bus 0000:00

After configuring the namespace:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  148200000-33fffffff : dax0.0
	3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem before this change:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  150000000-33fffffff : dax0.0
	    150000000-33fffffff : System RAM
	3280000000-32ffffffff : PCI Bus 0000:00

After loading kmem after this change:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  150000000-33fffffff : dax0.0
	    150000000-33fffffff : System RAM (kmem)
	3280000000-32ffffffff : PCI Bus 0000:00

After a proper reboot:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  148200000-33fffffff : dax0.0
	3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel before this change:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  150000000-33fffffff : System RAM
	3280000000-32ffffffff : PCI Bus 0000:00

Within the kexec kernel after this change:
	[root@localhost ~]# cat /proc/iomem
	...
	140000000-33fffffff : Persistent Memory
	  140000000-1481fffff : namespace0.0
	  148200000-33fffffff : dax0.0
	3280000000-32ffffffff : PCI Bus 0000:00

/sys/firmware/memmap/ before this change:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffdf000 (System RAM)
	00000000bffdf000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)
	0000000150000000-0000000340000000 (System RAM)

/sys/firmware/memmap/ after a proper reboot:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffdf000 (System RAM)
	00000000bffdf000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)

/sys/firmware/memmap/ after this change:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffdf000 (System RAM)
	00000000bffdf000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)

kexec-tools already seem to basically ignore any System RAM that's not on
top level when searching for areas to place kexec images - but also for
determining crash areas to dump via kdump.  Changing the resource name
won't have an impact.

Handle unloading of the driver after memory hotremove failed properly, by
duplicating the string if necessary.

Link: http://lkml.kernel.org/r/20200508084217.9160-5-david@redhat.com
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/dax/dax-private.h |    1 +
 drivers/dax/kmem.c        |   28 ++++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

--- a/drivers/dax/dax-private.h~device-dax-add-memory-via-add_memory_driver_managed
+++ a/drivers/dax/dax-private.h
@@ -44,6 +44,7 @@ struct dax_region {
  * @dev - device core
  * @pgmap - pgmap for memmap setup / lifetime (driver owned)
  * @dax_mem_res: physical address range of hotadded DAX memory
+ * @dax_mem_name: name for hotadded DAX memory via add_memory_driver_managed()
  */
 struct dev_dax {
 	struct dax_region *region;
--- a/drivers/dax/kmem.c~device-dax-add-memory-via-add_memory_driver_managed
+++ a/drivers/dax/kmem.c
@@ -14,6 +14,11 @@
 #include "dax-private.h"
 #include "bus.h"
 
+/* Memory resource name used for add_memory_driver_managed(). */
+static const char *kmem_name;
+/* Set if any memory will remain added when the driver will be unloaded. */
+static bool any_hotremove_failed;
+
 int dev_dax_kmem_probe(struct device *dev)
 {
 	struct dev_dax *dev_dax = to_dev_dax(dev);
@@ -70,7 +75,12 @@ int dev_dax_kmem_probe(struct device *de
 	 */
 	new_res->flags = IORESOURCE_SYSTEM_RAM;
 
-	rc = add_memory(numa_node, new_res->start, resource_size(new_res));
+	/*
+	 * Ensure that future kexec'd kernels will not treat this as RAM
+	 * automatically.
+	 */
+	rc = add_memory_driver_managed(numa_node, new_res->start,
+				       resource_size(new_res), kmem_name);
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);
@@ -100,6 +110,7 @@ static int dev_dax_kmem_remove(struct de
 	 */
 	rc = remove_memory(dev_dax->target_node, kmem_start, kmem_size);
 	if (rc) {
+		any_hotremove_failed = true;
 		dev_err(dev,
 			"DAX region %pR cannot be hotremoved until the next reboot\n",
 			res);
@@ -124,6 +135,7 @@ static int dev_dax_kmem_remove(struct de
 	 * permanently pinned as reserved by the unreleased
 	 * request_mem_region().
 	 */
+	any_hotremove_failed = true;
 	return 0;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
@@ -137,12 +149,24 @@ static struct dax_device_driver device_d
 
 static int __init dax_kmem_init(void)
 {
-	return dax_driver_register(&device_dax_kmem_driver);
+	int rc;
+
+	/* Resource name is permanently allocated if any hotremove fails. */
+	kmem_name = kstrdup_const("System RAM (kmem)", GFP_KERNEL);
+	if (!kmem_name)
+		return -ENOMEM;
+
+	rc = dax_driver_register(&device_dax_kmem_driver);
+	if (rc)
+		kfree_const(kmem_name);
+	return rc;
 }
 
 static void __exit dax_kmem_exit(void)
 {
 	dax_driver_unregister(&device_dax_kmem_driver);
+	if (!any_hotremove_failed)
+		kfree_const(kmem_name);
 }
 
 MODULE_AUTHOR("Intel Corporation");
_

Patches currently in -mm which might be from david@redhat.com are

device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch
mm-memory_hotplug-introduce-add_memory_driver_managed.patch
kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch
device-dax-add-memory-via-add_memory_driver_managed.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (47 preceding siblings ...)
  2020-05-08 23:56 ` + device-dax-add-memory-via-add_memory_driver_managed.patch " Andrew Morton
@ 2020-05-09  0:45 ` Andrew Morton
  2020-05-11 19:08 ` + mm-reset-numa-stats-for-boot-pagesets-v3.patch " Andrew Morton
                   ` (31 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-09  0:45 UTC (permalink / raw)
  To: akpm, anshuman.khandual, bhsharma, catalin.marinas, david,
	dyoung, ebiederm, james.morse, mhocko, mm-commits, piliu, will


The patch titled
     Subject: mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix
has been added to the -mm tree.  Its filename is
     mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix

use MEMORY_HOTPLUG_RES_NAME in the debug message as well

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: piliu <piliu@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix
+++ a/mm/memory_hotplug.c
@@ -129,7 +129,8 @@ static struct resource *register_memory_
 			       resource_name, flags);
 
 	if (!res) {
-		pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n",
+		pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME
+				" region: %016llx->%016llx\n",
 				start, start + size);
 		return ERR_PTR(-EEXIST);
 	}
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fixpatch-added-to-mm-tree.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch
linux-next-rejects.patch
linux-next-git-rejects.patch
arch-x86-makefile-use-config_shell.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-reset-numa-stats-for-boot-pagesets-v3.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (48 preceding siblings ...)
  2020-05-09  0:45 ` + mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
@ 2020-05-11 19:08 ` Andrew Morton
  2020-05-11 19:54 ` + mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch " Andrew Morton
                   ` (30 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 19:08 UTC (permalink / raw)
  To: aneesh.kumar, khlebnikov, kirill, mhocko, mm-commits, sandipan, vbabka


The patch titled
     Subject: mm-reset-numa-stats-for-boot-pagesets-v3
has been added to the -mm tree.  Its filename is
     mm-reset-numa-stats-for-boot-pagesets-v3.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-reset-numa-stats-for-boot-pagesets-v3.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-reset-numa-stats-for-boot-pagesets-v3.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Sandipan Das <sandipan@linux.ibm.com>
Subject: mm-reset-numa-stats-for-boot-pagesets-v3

Drop the redundant static key-based check to see if numa stats are enabled
as suggested by Vlastimil.

Link: http://lkml.kernel.org/r/20200511170356.162531-1-sandipan@linux.ibm.com
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

--- a/mm/page_alloc.c~mm-reset-numa-stats-for-boot-pagesets-v3
+++ a/mm/page_alloc.c
@@ -6242,26 +6242,22 @@ void __init setup_per_cpu_pageset(void)
 {
 	struct pglist_data *pgdat;
 	struct zone *zone;
+	int __maybe_unused cpu;
 
 	for_each_populated_zone(zone)
 		setup_zone_pageset(zone);
 
 #ifdef CONFIG_NUMA
-	if (static_branch_likely(&vm_numa_stat_key)) {
-		struct per_cpu_pageset *pcp;
-		int cpu;
-
-		/*
-		 * Unpopulated zones continue using the boot pagesets.
-		 * The numa stats for these pagesets need to be reset.
-		 * Otherwise, they will end up skewing the stats of
-		 * the nodes these zones are associated with.
-		 */
-		for_each_possible_cpu(cpu) {
-			pcp = &per_cpu(boot_pageset, cpu);
-			memset(pcp->vm_numa_stat_diff, 0,
-			       sizeof(pcp->vm_numa_stat_diff));
-		}
+	/*
+	 * Unpopulated zones continue using the boot pagesets.
+	 * The numa stats for these pagesets need to be reset.
+	 * Otherwise, they will end up skewing the stats of
+	 * the nodes these zones are associated with.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct per_cpu_pageset *pcp = &per_cpu(boot_pageset, cpu);
+		memset(pcp->vm_numa_stat_diff, 0,
+		       sizeof(pcp->vm_numa_stat_diff));
 	}
 #endif
 
_

Patches currently in -mm which might be from sandipan@linux.ibm.com are

mm-reset-numa-stats-for-boot-pagesets.patch
mm-reset-numa-stats-for-boot-pagesets-v3.patch
selftests-vm-pkeys-use-sane-types-for-pkey-register.patch
selftests-vm-pkeys-add-helpers-for-pkey-bits.patch
selftests-vm-pkeys-use-the-correct-huge-page-size.patch
selftests-vm-pkeys-introduce-powerpc-support-fix.patch
selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch
selftests-vm-pkeys-use-the-correct-page-size-on-powerpc.patch
selftests-vm-pkeys-fix-multilib-builds-for-x86.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (49 preceding siblings ...)
  2020-05-11 19:08 ` + mm-reset-numa-stats-for-boot-pagesets-v3.patch " Andrew Morton
@ 2020-05-11 19:54 ` Andrew Morton
  2020-05-11 20:31 ` [to-be-updated] kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image.patch removed from " Andrew Morton
                   ` (29 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 19:54 UTC (permalink / raw)
  To: alex.shi, guro, hannes, hughd, iamjoonsoo.kim, kirill, mhocko,
	mm-commits, shakeelb


The patch titled
     Subject: mm: shmem: remove rare optimization when swapin races with hole punching
has been added to the -mm tree.  Its filename is
     mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm: shmem: remove rare optimization when swapin races with hole punching

Commit 215c02bc33bb ("tmpfs: fix shmem_getpage_gfp() VM_BUG_ON")
recognized that hole punching can race with swapin and removed the
BUG_ON() for a truncated entry from the swapin path.

The patch also added a swapcache deletion to optimize this rare case:
Since swapin has the page locked, and free_swap_and_cache() merely
trylocks, this situation can leave the page stranded in swapcache. 
Usually, page reclaim picks up stale swapcache pages, and the race can
happen at any other time when the page is locked.  (The same happens for
non-shmem swapin racing with page table zapping.) The thinking here was:
we already observed the race and we have the page locked, we may as well
do the cleanup instead of waiting for reclaim.

However, this optimization complicates the next patch which moves the
cgroup charging code around.  As this is just a minor speedup for a race
condition that is so rare that it required a fuzzer to trigger the
original BUG_ON(), it's no longer worth the complications.

Link: http://lkml.kernel.org/r/20200511181056.GA339505@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |   25 +++++++------------------
 1 file changed, 7 insertions(+), 18 deletions(-)

--- a/mm/shmem.c~mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching
+++ a/mm/shmem.c
@@ -1665,27 +1665,16 @@ static int shmem_swapin_page(struct inod
 	}
 
 	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
-	if (!error) {
-		error = shmem_add_to_page_cache(page, mapping, index,
-						swp_to_radix_entry(swap), gfp);
-		/*
-		 * We already confirmed swap under page lock, and make
-		 * no memory allocation here, so usually no possibility
-		 * of error; but free_swap_and_cache() only trylocks a
-		 * page, so it is just possible that the entry has been
-		 * truncated or holepunched since swap was confirmed.
-		 * shmem_undo_range() will have done some of the
-		 * unaccounting, now delete_from_swap_cache() will do
-		 * the rest.
-		 */
-		if (error) {
-			mem_cgroup_cancel_charge(page, memcg);
-			delete_from_swap_cache(page);
-		}
-	}
 	if (error)
 		goto failed;
 
+	error = shmem_add_to_page_cache(page, mapping, index,
+					swp_to_radix_entry(swap), gfp);
+	if (error) {
+		mem_cgroup_cancel_charge(page, memcg);
+		goto failed;
+	}
+
 	mem_cgroup_commit_charge(page, memcg, true);
 
 	spin_lock_irq(&info->lock);
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [to-be-updated] kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image.patch removed from -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (50 preceding siblings ...)
  2020-05-11 19:54 ` + mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch " Andrew Morton
@ 2020-05-11 20:31 ` Andrew Morton
  2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch " Andrew Morton
                   ` (28 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:31 UTC (permalink / raw)
  To: anshuman.khandual, bhsharma, catalin.marinas, david, dyoung,
	ebiederm, james.morse, mhocko, mm-commits, piliu, will


The patch titled
     Subject: kexec: prevent removal of memory in use by a loaded kexec image
has been removed from the -mm tree.  Its filename was
     kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: James Morse <james.morse@arm.com>
Subject: kexec: prevent removal of memory in use by a loaded kexec image

Patch series "kexec/memory_hotplug: Prevent removal and accidental use".

arm64 recently queued support for memory hotremove, which led to some new
corner cases for kexec.

If the kexec segments are loaded for a removable region, that region may
be removed before kexec actually occurs.  This causes the first kernel to
lockup when applying the relocations.  (I've triggered this on x86 too).

The first patch adds a memory notifier for kexec so that it can refuse to
allow in-use regions to be taken offline.

This doesn't solve the problem for arm64, where the new kernel must
initially rely on the data structures from the first boot to describe
memory.  These don't describe hotpluggable memory.  If kexec places the
kernel in one of these regions, it must also provide a DT that describes
the region in which the kernel was mapped as memory.  (and somehow ensure
its always present in the future...)

To prevent this from happening accidentally with unaware user-space,
patches two and three allow arm64 to give these regions a different name.

This is a change in behaviour for arm64 as memory hotadd and hotremove
were added separately.

I haven't tried kdump.  Unaware kdump from user-space probably won't
describe the hotplug regions if the name is different, which saves us from
problems if the memory is no longer present at kdump time, but means the
vmcore is incomplete.


This patch (of 3):

An image loaded for kexec is not stored in place, instead its segments are
scattered through memory, and are re-assembled when needed.  In the
meantime, the target memory may have been removed.

Because mm is not aware that this memory is still in use, it allows it
to be removed.

Add a memory notifier to prevent the removal of memory regions that
overlap with a loaded kexec image segment.  e.g., when triggered from the
Qemu console:

| kexec_core: memory region in use
| memory memory32: Offline failed.

Kexec allows user-space to specify the address that the kexec image
should be loaded to.  Because this memory may be in use, an image
loaded for kexec is not stored in place, instead its segments are
scattered through memory, and are re-assembled when needed.  In the
meantime, the target memory may have been removed.

The target is described by user space during kexec_load() and that user
space - right now - parses /proc/iomem to find applicable system
memory.

Link: http://lkml.kernel.org/r/20200326180730.4754-1-james.morse@arm.com
Link: http://lkml.kernel.org/r/20200326180730.4754-2-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: piliu <piliu@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/kexec_core.c |   56 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

--- a/kernel/kexec_core.c~kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image
+++ a/kernel/kexec_core.c
@@ -12,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/kexec.h>
+#include <linux/memory.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/highmem.h>
@@ -22,10 +23,12 @@
 #include <linux/elf.h>
 #include <linux/elfcore.h>
 #include <linux/utsname.h>
+#include <linux/notifier.h>
 #include <linux/numa.h>
 #include <linux/suspend.h>
 #include <linux/device.h>
 #include <linux/freezer.h>
+#include <linux/pfn.h>
 #include <linux/pm.h>
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
@@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+/*
+ * If user-space wants to offline memory that is in use by a loaded kexec
+ * image, it should unload the image first.
+ */
+static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
+			 void *data)
+{
+	int rv = NOTIFY_OK, i;
+	struct memory_notify *arg = data;
+	unsigned long pfn = arg->start_pfn;
+	unsigned long nr_segments, sstart, send;
+	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
+
+	might_sleep();
+
+	if (action != MEM_GOING_OFFLINE)
+		return NOTIFY_DONE;
+
+	mutex_lock(&kexec_mutex);
+	if (kexec_image) {
+		nr_segments = kexec_image->nr_segments;
+
+		for (i = 0; i < nr_segments; i++) {
+			sstart = PFN_DOWN(kexec_image->segment[i].mem);
+			send = PFN_UP(kexec_image->segment[i].mem +
+				      kexec_image->segment[i].memsz);
+
+			if ((pfn <= sstart && sstart < end_pfn) ||
+			    (pfn <= send && send < end_pfn)) {
+				pr_warn("Memory region in use\n");
+				rv = NOTIFY_BAD;
+				break;
+			}
+		}
+	}
+	mutex_unlock(&kexec_mutex);
+
+	return rv;
+}
+
+static struct notifier_block mem_remove_nb = {
+	.notifier_call = mem_remove_cb,
+};
+
+static int __init register_mem_remove_cb(void)
+{
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		return register_memory_notifier(&mem_remove_nb);
+
+	return 0;
+}
+device_initcall(register_mem_remove_cb);
_

Patches currently in -mm which might be from james.morse@arm.com are

mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch
arm64-memory-give-hotplug-memory-a-different-resource-name.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch removed from -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (51 preceding siblings ...)
  2020-05-11 20:31 ` [to-be-updated] kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image.patch removed from " Andrew Morton
@ 2020-05-11 20:31 ` Andrew Morton
  2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
                   ` (27 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:31 UTC (permalink / raw)
  To: akpm, anshuman.khandual, bhsharma, catalin.marinas, david,
	dyoung, ebiederm, james.morse, mhocko, mm-commits, piliu, will


The patch titled
     Subject: mm/memory_hotplug: allow arch override of non boot memory resource names
has been removed from the -mm tree.  Its filename was
     mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: James Morse <james.morse@arm.com>
Subject: mm/memory_hotplug: allow arch override of non boot memory resource names

Memory added to the system by hotplug has a 'System RAM' resource created
for it.  This is exposed to user-space via /proc/iomem.

This poses problems for kexec on arm64.  If kexec decides to place the
kernel in one of these newly onlined regions, the new kernel will find
itself booting from a region not described as memory in the firmware
tables.

Arm64 doesn't have a structure like the e820 memory map that can be
re-written when memory is brought online.  Instead arm64 uses the UEFI
memory map, or the memory node from the DT, sometimes both.  We never
rewrite these.

Allow an architecture to specify a different name for these hotplug
regions.

Link: http://lkml.kernel.org/r/20200326180730.4754-3-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: piliu <piliu@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names
+++ a/mm/memory_hotplug.c
@@ -42,6 +42,10 @@
 #include "internal.h"
 #include "shuffle.h"
 
+#ifndef MEMORY_HOTPLUG_RES_NAME
+#define MEMORY_HOTPLUG_RES_NAME "System RAM"
+#endif
+
 /*
  * online_page_callback contains pointer to current page onlining function.
  * Initially it is generic_online_page(). If it is required it could be
@@ -104,7 +108,7 @@ static struct resource *register_memory_
 	struct resource *res;
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
-	if (strcmp(resource_name, "System RAM"))
+	if (strcmp(resource_name, MEMORY_HOTPLUG_RES_NAME))
 		flags |= IORESOURCE_MEM_DRIVER_MANAGED;
 
 	/*
_

Patches currently in -mm which might be from james.morse@arm.com are

arm64-memory-give-hotplug-memory-a-different-resource-name.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch removed from -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (52 preceding siblings ...)
  2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch " Andrew Morton
@ 2020-05-11 20:31 ` Andrew Morton
  2020-05-11 20:31 ` [to-be-updated] arm64-memory-give-hotplug-memory-a-different-resource-name.patch " Andrew Morton
                   ` (26 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:31 UTC (permalink / raw)
  To: akpm, anshuman.khandual, bhsharma, catalin.marinas, david,
	dyoung, ebiederm, james.morse, mhocko, mm-commits, piliu, will


The patch titled
     Subject: mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix
has been removed from the -mm tree.  Its filename was
     mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix

use MEMORY_HOTPLUG_RES_NAME in the debug message as well

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: piliu <piliu@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memory_hotplug.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix
+++ a/mm/memory_hotplug.c
@@ -129,7 +129,8 @@ static struct resource *register_memory_
 			       resource_name, flags);
 
 	if (!res) {
-		pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n",
+		pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME
+				" region: %016llx->%016llx\n",
 				start, start + size);
 		return ERR_PTR(-EEXIST);
 	}
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fixpatch-added-to-mm-tree.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
linux-next-rejects.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [to-be-updated] arm64-memory-give-hotplug-memory-a-different-resource-name.patch removed from -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (53 preceding siblings ...)
  2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
@ 2020-05-11 20:31 ` Andrew Morton
  2020-05-11 20:38 ` + arm-add-support-for-folded-p4d-page-tables-fix.patch added to " Andrew Morton
                   ` (25 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:31 UTC (permalink / raw)
  To: anshuman.khandual, bhsharma, catalin.marinas, david, dyoung,
	ebiederm, james.morse, mhocko, mm-commits, piliu, will


The patch titled
     Subject: arm64: memory: give hotplug memory a different resource name
has been removed from the -mm tree.  Its filename was
     arm64-memory-give-hotplug-memory-a-different-resource-name.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: James Morse <james.morse@arm.com>
Subject: arm64: memory: give hotplug memory a different resource name

If kexec chooses to place the kernel in a memory region that was added
after boot, we fail to boot as the kernel is running from a location that
is not described as memory by the UEFI memory map or the original DT.

To prevent unaware user-space kexec from doing this accidentally, give
these regions a different name.

Link: http://lkml.kernel.org/r/20200326180730.4754-4-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: piliu <piliu@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/asm/memory.h |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/arch/arm64/include/asm/memory.h~arm64-memory-give-hotplug-memory-a-different-resource-name
+++ a/arch/arm64/include/asm/memory.h
@@ -156,6 +156,17 @@
 #define IOREMAP_MAX_ORDER	(PMD_SHIFT)
 #endif
 
+/*
+ * Memory hotplug allows new regions of 'System RAM' to be added to the system.
+ * These aren't described as memory by the UEFI memory map, or DT memory node.
+ * If we kexec from one of these regions, the new kernel boots from a location
+ * that isn't described as RAM.
+ *
+ * Give these resources a different name, so unaware kexec doesn't do this by
+ * accident.
+ */
+#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)"
+
 #ifndef __ASSEMBLY__
 extern u64			vabits_actual;
 #define PAGE_END		(_PAGE_END(vabits_actual))
_

Patches currently in -mm which might be from james.morse@arm.com are

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + arm-add-support-for-folded-p4d-page-tables-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (54 preceding siblings ...)
  2020-05-11 20:31 ` [to-be-updated] arm64-memory-give-hotplug-memory-a-different-resource-name.patch " Andrew Morton
@ 2020-05-11 20:38 ` Andrew Morton
  2020-05-11 20:40 ` + mm-add-debug_wx-support-fix-2.patch " Andrew Morton
                   ` (24 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:38 UTC (permalink / raw)
  To: m.szyprowski, mm-commits, rppt


The patch titled
     Subject: arm-add-support-for-folded-p4d-page-tables-fix
has been added to the -mm tree.  Its filename is
     arm-add-support-for-folded-p4d-page-tables-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/arm-add-support-for-folded-p4d-page-tables-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/arm-add-support-for-folded-p4d-page-tables-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@linux.ibm.com>
Subject: arm-add-support-for-folded-p4d-page-tables-fix

fix kexec

Link: http://lkml.kernel.org/r/20200508174232.GA759899@linux.ibm.com
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/mm/init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/mm/init.c~arm-add-support-for-folded-p4d-page-tables-fix
+++ a/arch/arm/mm/init.c
@@ -571,7 +571,7 @@ static inline void section_update(unsign
 {
 	pmd_t *pmd;
 
-	pmd = pmd_off_k(addr);
+	pmd = pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, addr), addr), addr), addr);
 
 #ifdef CONFIG_ARM_LPAE
 	pmd[0] = __pmd((pmd_val(pmd[0]) & mask) | prot);
_

Patches currently in -mm which might be from rppt@linux.ibm.com are

h8300-remove-usage-of-__arch_use_5level_hack.patch
arm-add-support-for-folded-p4d-page-tables.patch
arm-add-support-for-folded-p4d-page-tables-fix.patch
arm64-add-support-for-folded-p4d-page-tables.patch
hexagon-remove-__arch_use_5level_hack.patch
ia64-add-support-for-folded-p4d-page-tables.patch
nios2-add-support-for-folded-p4d-page-tables.patch
openrisc-add-support-for-folded-p4d-page-tables.patch
powerpc-add-support-for-folded-p4d-page-tables.patch
powerpc-add-support-for-folded-p4d-page-tables-fix.patch
sh-drop-__pxd_offset-macros-that-duplicate-pxd_index-ones.patch
sh-add-support-for-folded-p4d-page-tables.patch
unicore32-remove-__arch_use_5level_hack.patch
asm-generic-remove-pgtable-nop4d-hackh.patch
mm-remove-__arch_has_5level_hack-and-include-asm-generic-5level-fixuph.patch
mm-memblock-replace-dereferences-of-memblock_regionnid-with-api-calls.patch
mm-make-early_pfn_to_nid-and-related-defintions-close-to-each-other.patch
mm-remove-config_have_memblock_node_map-option.patch
mm-free_area_init-use-maximal-zone-pfns-rather-than-zone-sizes.patch
mm-use-free_area_init-instead-of-free_area_init_nodes.patch
alpha-simplify-detection-of-memory-zone-boundaries.patch
arm-simplify-detection-of-memory-zone-boundaries.patch
arm64-simplify-detection-of-memory-zone-boundaries-for-uma-configs.patch
csky-simplify-detection-of-memory-zone-boundaries.patch
m68k-mm-simplify-detection-of-memory-zone-boundaries.patch
parisc-simplify-detection-of-memory-zone-boundaries.patch
sparc32-simplify-detection-of-memory-zone-boundaries.patch
unicore32-simplify-detection-of-memory-zone-boundaries.patch
xtensa-simplify-detection-of-memory-zone-boundaries.patch
mm-remove-early_pfn_in_nid-and-config_nodes_span_other_nodes.patch
mm-free_area_init-allow-defining-max_zone_pfn-in-descending-order.patch
mm-free_area_init-allow-defining-max_zone_pfn-in-descending-order-fix-2.patch
mm-rename-free_area_init_node-to-free_area_init_memoryless_node.patch
mm-clean-up-free_area_init_node-and-its-helpers.patch
mm-simplify-find_min_pfn_with_active_regions.patch
docs-vm-update-memory-models-documentation.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-add-debug_wx-support-fix-2.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (55 preceding siblings ...)
  2020-05-11 20:38 ` + arm-add-support-for-folded-p4d-page-tables-fix.patch added to " Andrew Morton
@ 2020-05-11 20:40 ` Andrew Morton
  2020-05-11 20:41 ` + mm-add-debug_wx-support-fix-3.patch " Andrew Morton
                   ` (23 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:40 UTC (permalink / raw)
  To: mm-commits, will, zong.li


The patch titled
     Subject: mm: remove the specific name of arm64
has been added to the -mm tree.  Its filename is
     mm-add-debug_wx-support-fix-2.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-add-debug_wx-support-fix-2.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-add-debug_wx-support-fix-2.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Zong Li <zong.li@sifive.com>
Subject: mm: remove the specific name of arm64

UXN is the name of arm64 page-table descriptors.

Link: http://lkml.kernel.org/r/3a6a92ecedc54e1d0fc941398e63d504c2cd5611.1589178399.git.zong.li@sifive.com
Signed-off-by: Zong Li <zong.li@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/Kconfig.debug |    2 --
 1 file changed, 2 deletions(-)

--- a/mm/Kconfig.debug~mm-add-debug_wx-support-fix-2
+++ a/mm/Kconfig.debug
@@ -130,8 +130,6 @@ config DEBUG_WX
 
 	  This is useful for discovering cases where the kernel is leaving W+X
 	  mappings after applying NX, as such mappings are a security risk.
-	  This check also includes UXN, which should be set on all kernel
-	  mappings.
 
 	  Look for a message in dmesg output like this:
 
_

Patches currently in -mm which might be from zong.li@sifive.com are

mm-add-debug_wx-support.patch
mm-add-debug_wx-support-fix-2.patch
riscv-support-debug_wx.patch
x86-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch
arm64-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-add-debug_wx-support-fix-3.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (56 preceding siblings ...)
  2020-05-11 20:40 ` + mm-add-debug_wx-support-fix-2.patch " Andrew Morton
@ 2020-05-11 20:41 ` Andrew Morton
  2020-05-11 20:50 ` + arm64-mm-drop-__have_arch_huge_ptep_get.patch " Andrew Morton
                   ` (22 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:41 UTC (permalink / raw)
  To: lkp, mm-commits, zong.li


The patch titled
     Subject: mm: add MMU dependency for DEBUG_WX
has been added to the -mm tree.  Its filename is
     mm-add-debug_wx-support-fix-3.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-add-debug_wx-support-fix-3.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-add-debug_wx-support-fix-3.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Zong Li <zong.li@sifive.com>
Subject: mm: add MMU dependency for DEBUG_WX

DEBUG_WX should be enabled on MMU system. It had got build error when
enabling DEBUG_WX on NOMMU.

Link: http://lkml.kernel.org/r/4a674ac7863ff39ca91847b10e51209771f99416.1589178399.git.zong.li@sifive.com
Signed-off-by: Zong Li <zong.li@sifive.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/Kconfig.debug |    1 +
 1 file changed, 1 insertion(+)

--- a/mm/Kconfig.debug~mm-add-debug_wx-support-fix-3
+++ a/mm/Kconfig.debug
@@ -124,6 +124,7 @@ config ARCH_HAS_DEBUG_WX
 config DEBUG_WX
 	bool "Warn on W+X mappings at boot"
 	depends on ARCH_HAS_DEBUG_WX
+	depends on MMU
 	select PTDUMP_CORE
 	help
 	  Generate a warning if any W+X mappings are found at boot.
_

Patches currently in -mm which might be from zong.li@sifive.com are

mm-add-debug_wx-support.patch
mm-add-debug_wx-support-fix-2.patch
mm-add-debug_wx-support-fix-3.patch
riscv-support-debug_wx.patch
x86-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch
arm64-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + arm64-mm-drop-__have_arch_huge_ptep_get.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (57 preceding siblings ...)
  2020-05-11 20:41 ` + mm-add-debug_wx-support-fix-3.patch " Andrew Morton
@ 2020-05-11 20:50 ` Andrew Morton
  2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch " Andrew Morton
                   ` (21 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:50 UTC (permalink / raw)
  To: anshuman.khandual, benh, borntraeger, bp, catalin.marinas,
	dalias, davem, deller, fenghua.yu, gor, heiko.carstens, hpa,
	James.Bottomley, linux, mike.kravetz, mingo, mm-commits, mpe,
	palmer, paul.walmsley, paulus, tglx, tony.luck, tsbogend, will,
	ysato


The patch titled
     Subject: arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET
has been added to the -mm tree.  Its filename is
     arm64-mm-drop-__have_arch_huge_ptep_get.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/arm64-mm-drop-__have_arch_huge_ptep_get.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/arm64-mm-drop-__have_arch_huge_ptep_get.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET

Patch series "mm/hugetlb: Add some new generic fallbacks", v3.

This series adds the following new generic fallbacks.  Before that it
drops __HAVE_ARCH_HUGE_PTEP_GET from arm64 platform.

1. is_hugepage_only_range()
2. arch_clear_hugepage_flags()

After this arm (32 bit) remains the sole platform defining it's own
huge_ptep_get() via __HAVE_ARCH_HUGE_PTEP_GET.


This patch (of 3):

Platform specific huge_ptep_get() is required only when fetching the huge
PTE involves more than just dereferencing the page table pointer.  This is
not the case on arm64 platform.  Hence huge_ptep_pte() can be dropped
along with it's __HAVE_ARCH_HUGE_PTEP_GET subscription.  Before that, it
updates the generic huge_ptep_get() with READ_ONCE() which will prevent
known page table issues with THP on arm64.

Link: http://lkml.kernel.org/r/1588907271-11920-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r//1506527369-19535-1-git-send-email-will.deacon@arm.com/
Link: http://lkml.kernel.org/r/1588907271-11920-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm64/include/asm/hugetlb.h |    6 ------
 include/asm-generic/hugetlb.h    |    2 +-
 2 files changed, 1 insertion(+), 7 deletions(-)

--- a/arch/arm64/include/asm/hugetlb.h~arm64-mm-drop-__have_arch_huge_ptep_get
+++ a/arch/arm64/include/asm/hugetlb.h
@@ -17,12 +17,6 @@
 extern bool arch_hugetlb_migration_supported(struct hstate *h);
 #endif
 
-#define __HAVE_ARCH_HUGE_PTEP_GET
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-	return READ_ONCE(*ptep);
-}
-
 static inline int is_hugepage_only_range(struct mm_struct *mm,
 					 unsigned long addr, unsigned long len)
 {
--- a/include/asm-generic/hugetlb.h~arm64-mm-drop-__have_arch_huge_ptep_get
+++ a/include/asm-generic/hugetlb.h
@@ -122,7 +122,7 @@ static inline int huge_ptep_set_access_f
 #ifndef __HAVE_ARCH_HUGE_PTEP_GET
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
-	return *ptep;
+	return READ_ONCE(*ptep);
 }
 #endif
 
_

Patches currently in -mm which might be from anshuman.khandual@arm.com are

arm64-mm-drop-__have_arch_huge_ptep_get.patch
mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch
mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch
powerpc-mm-drop-platform-defined-pmd_mknotpresent.patch
mm-thp-rename-pmd_mknotpresent-as-pmd_mknotvalid.patch
mm-thp-rename-pmd_mknotpresent-as-pmd_mkinvalid-v2.patch
x86-mm-define-mm_p4d_folded.patch
mm-debug-add-tests-validating-architecture-page-table-helpers.patch
mm-debug-add-tests-validating-architecture-page-table-helpers-v17.patch
mm-debug-add-tests-validating-architecture-page-table-helpers-v18.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (58 preceding siblings ...)
  2020-05-11 20:50 ` + arm64-mm-drop-__have_arch_huge_ptep_get.patch " Andrew Morton
@ 2020-05-11 20:50 ` Andrew Morton
  2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch " Andrew Morton
                   ` (20 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:50 UTC (permalink / raw)
  To: anshuman.khandual, benh, borntraeger, bp, catalin.marinas,
	dalias, davem, deller, fenghua.yu, gor, heiko.carstens, hpa,
	James.Bottomley, linux, mike.kravetz, mingo, mm-commits, mpe,
	palmer, paul.walmsley, paulus, tglx, tony.luck, tsbogend, will,
	ysato


The patch titled
     Subject: mm/hugetlb: define a generic fallback for is_hugepage_only_range()
has been added to the -mm tree.  Its filename is
     mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/hugetlb: define a generic fallback for is_hugepage_only_range()

There are multiple similar definitions for is_hugepage_only_range() on
various platforms.  Lets just add it's generic fallback definition for
platforms that do not override.  This help reduce code duplication.

Link: http://lkml.kernel.org/r/1588907271-11920-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/include/asm/hugetlb.h     |    6 ------
 arch/arm64/include/asm/hugetlb.h   |    6 ------
 arch/ia64/include/asm/hugetlb.h    |    1 +
 arch/mips/include/asm/hugetlb.h    |    7 -------
 arch/parisc/include/asm/hugetlb.h  |    6 ------
 arch/powerpc/include/asm/hugetlb.h |    1 +
 arch/riscv/include/asm/hugetlb.h   |    6 ------
 arch/s390/include/asm/hugetlb.h    |    7 -------
 arch/sh/include/asm/hugetlb.h      |    6 ------
 arch/sparc/include/asm/hugetlb.h   |    6 ------
 arch/x86/include/asm/hugetlb.h     |    6 ------
 include/linux/hugetlb.h            |    9 +++++++++
 12 files changed, 11 insertions(+), 56 deletions(-)

--- a/arch/arm64/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/arm64/include/asm/hugetlb.h
@@ -17,12 +17,6 @@
 extern bool arch_hugetlb_migration_supported(struct hstate *h);
 #endif
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr, unsigned long len)
-{
-	return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 	clear_bit(PG_dcache_clean, &page->flags);
--- a/arch/arm/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/arm/include/asm/hugetlb.h
@@ -14,12 +14,6 @@
 #include <asm/hugetlb-3level.h>
 #include <asm-generic/hugetlb.h>
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr, unsigned long len)
-{
-	return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 	clear_bit(PG_dcache_clean, &page->flags);
--- a/arch/ia64/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/ia64/include/asm/hugetlb.h
@@ -20,6 +20,7 @@ static inline int is_hugepage_only_range
 	return (REGION_NUMBER(addr) == RGN_HPAGE ||
 		REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
 }
+#define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
--- a/arch/mips/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/mips/include/asm/hugetlb.h
@@ -11,13 +11,6 @@
 
 #include <asm/page.h>
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr,
-					 unsigned long len)
-{
-	return 0;
-}
-
 #define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 static inline int prepare_hugepage_range(struct file *file,
 					 unsigned long addr,
--- a/arch/parisc/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/parisc/include/asm/hugetlb.h
@@ -12,12 +12,6 @@ void set_huge_pte_at(struct mm_struct *m
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep);
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr,
-					 unsigned long len) {
-	return 0;
-}
-
 /*
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
--- a/arch/powerpc/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/powerpc/include/asm/hugetlb.h
@@ -30,6 +30,7 @@ static inline int is_hugepage_only_range
 		return slice_is_hugepage_only_range(mm, addr, len);
 	return 0;
 }
+#define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
 void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
--- a/arch/riscv/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/riscv/include/asm/hugetlb.h
@@ -5,12 +5,6 @@
 #include <asm-generic/hugetlb.h>
 #include <asm/page.h>
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr,
-					 unsigned long len) {
-	return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
--- a/arch/s390/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/s390/include/asm/hugetlb.h
@@ -21,13 +21,6 @@ pte_t huge_ptep_get(pte_t *ptep);
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 			      unsigned long addr, pte_t *ptep);
 
-static inline bool is_hugepage_only_range(struct mm_struct *mm,
-					  unsigned long addr,
-					  unsigned long len)
-{
-	return false;
-}
-
 /*
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
--- a/arch/sh/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/sh/include/asm/hugetlb.h
@@ -5,12 +5,6 @@
 #include <asm/cacheflush.h>
 #include <asm/page.h>
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr,
-					 unsigned long len) {
-	return 0;
-}
-
 /*
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
--- a/arch/sparc/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/sparc/include/asm/hugetlb.h
@@ -20,12 +20,6 @@ void set_huge_pte_at(struct mm_struct *m
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep);
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr,
-					 unsigned long len) {
-	return 0;
-}
-
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 					 unsigned long addr, pte_t *ptep)
--- a/arch/x86/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/arch/x86/include/asm/hugetlb.h
@@ -7,12 +7,6 @@
 
 #define hugepages_supported() boot_cpu_has(X86_FEATURE_PSE)
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-					 unsigned long addr,
-					 unsigned long len) {
-	return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
--- a/include/linux/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range
+++ a/include/linux/hugetlb.h
@@ -591,6 +591,15 @@ static inline unsigned int blocks_per_hu
 
 #include <asm/hugetlb.h>
 
+#ifndef is_hugepage_only_range
+static inline int is_hugepage_only_range(struct mm_struct *mm,
+					unsigned long addr, unsigned long len)
+{
+	return 0;
+}
+#define is_hugepage_only_range is_hugepage_only_range
+#endif
+
 #ifndef arch_make_huge_pte
 static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
 				       struct page *page, int writable)
_

Patches currently in -mm which might be from anshuman.khandual@arm.com are

arm64-mm-drop-__have_arch_huge_ptep_get.patch
mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch
mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch
powerpc-mm-drop-platform-defined-pmd_mknotpresent.patch
mm-thp-rename-pmd_mknotpresent-as-pmd_mknotvalid.patch
mm-thp-rename-pmd_mknotpresent-as-pmd_mkinvalid-v2.patch
x86-mm-define-mm_p4d_folded.patch
mm-debug-add-tests-validating-architecture-page-table-helpers.patch
mm-debug-add-tests-validating-architecture-page-table-helpers-v17.patch
mm-debug-add-tests-validating-architecture-page-table-helpers-v18.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (59 preceding siblings ...)
  2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch " Andrew Morton
@ 2020-05-11 20:50 ` Andrew Morton
  2020-05-11 21:00 ` + mm-introduce-external-memory-hinting-api-fix-2-fix.patch " Andrew Morton
                   ` (19 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 20:50 UTC (permalink / raw)
  To: anshuman.khandual, benh, borntraeger, bp, catalin.marinas,
	dalias, davem, deller, fenghua.yu, gor, heiko.carstens, hpa,
	James.Bottomley, linux, mike.kravetz, mingo, mm-commits, mpe,
	palmer, paul.walmsley, paulus, tglx, tony.luck, tsbogend, will,
	ysato


The patch titled
     Subject: mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags()
has been added to the -mm tree.  Its filename is
     mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Anshuman Khandual <anshuman.khandual@arm.com>
Subject: mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags()

There are multiple similar definitions for arch_clear_hugepage_flags() on
various platforms.  Lets just add it's generic fallback definition for
platforms that do not override.  This help reduce code duplication.

Link: http://lkml.kernel.org/r/1588907271-11920-4-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/arm/include/asm/hugetlb.h     |    1 +
 arch/arm64/include/asm/hugetlb.h   |    1 +
 arch/ia64/include/asm/hugetlb.h    |    4 ----
 arch/mips/include/asm/hugetlb.h    |    4 ----
 arch/parisc/include/asm/hugetlb.h  |    4 ----
 arch/powerpc/include/asm/hugetlb.h |    4 ----
 arch/riscv/include/asm/hugetlb.h   |    4 ----
 arch/s390/include/asm/hugetlb.h    |    1 +
 arch/sh/include/asm/hugetlb.h      |    1 +
 arch/sparc/include/asm/hugetlb.h   |    4 ----
 arch/x86/include/asm/hugetlb.h     |    4 ----
 include/linux/hugetlb.h            |    5 +++++
 12 files changed, 9 insertions(+), 28 deletions(-)

--- a/arch/arm64/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/arm64/include/asm/hugetlb.h
@@ -21,6 +21,7 @@ static inline void arch_clear_hugepage_f
 {
 	clear_bit(PG_dcache_clean, &page->flags);
 }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
 extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
 				struct page *page, int writable);
--- a/arch/arm/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/arm/include/asm/hugetlb.h
@@ -18,5 +18,6 @@ static inline void arch_clear_hugepage_f
 {
 	clear_bit(PG_dcache_clean, &page->flags);
 }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
 #endif /* _ASM_ARM_HUGETLB_H */
--- a/arch/ia64/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/ia64/include/asm/hugetlb.h
@@ -28,10 +28,6 @@ static inline void huge_ptep_clear_flush
 {
 }
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include <asm-generic/hugetlb.h>
 
 #endif /* _ASM_IA64_HUGETLB_H */
--- a/arch/mips/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/mips/include/asm/hugetlb.h
@@ -75,10 +75,6 @@ static inline int huge_ptep_set_access_f
 	return changed;
 }
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include <asm-generic/hugetlb.h>
 
 #endif /* __ASM_HUGETLB_H */
--- a/arch/parisc/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/parisc/include/asm/hugetlb.h
@@ -42,10 +42,6 @@ int huge_ptep_set_access_flags(struct vm
 					     unsigned long addr, pte_t *ptep,
 					     pte_t pte, int dirty);
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include <asm-generic/hugetlb.h>
 
 #endif /* _ASM_PARISC64_HUGETLB_H */
--- a/arch/powerpc/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/powerpc/include/asm/hugetlb.h
@@ -61,10 +61,6 @@ int huge_ptep_set_access_flags(struct vm
 			       unsigned long addr, pte_t *ptep,
 			       pte_t pte, int dirty);
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include <asm-generic/hugetlb.h>
 
 #else /* ! CONFIG_HUGETLB_PAGE */
--- a/arch/riscv/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/riscv/include/asm/hugetlb.h
@@ -5,8 +5,4 @@
 #include <asm-generic/hugetlb.h>
 #include <asm/page.h>
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #endif /* _ASM_RISCV_HUGETLB_H */
--- a/arch/s390/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/s390/include/asm/hugetlb.h
@@ -39,6 +39,7 @@ static inline void arch_clear_hugepage_f
 {
 	clear_bit(PG_arch_1, &page->flags);
 }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
 static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
 				  pte_t *ptep, unsigned long sz)
--- a/arch/sh/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/sh/include/asm/hugetlb.h
@@ -30,6 +30,7 @@ static inline void arch_clear_hugepage_f
 {
 	clear_bit(PG_dcache_clean, &page->flags);
 }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
 #include <asm-generic/hugetlb.h>
 
--- a/arch/sparc/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/sparc/include/asm/hugetlb.h
@@ -47,10 +47,6 @@ static inline int huge_ptep_set_access_f
 	return changed;
 }
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
 void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 			    unsigned long end, unsigned long floor,
--- a/arch/x86/include/asm/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/arch/x86/include/asm/hugetlb.h
@@ -7,8 +7,4 @@
 
 #define hugepages_supported() boot_cpu_has(X86_FEATURE_PSE)
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #endif /* _ASM_X86_HUGETLB_H */
--- a/include/linux/hugetlb.h~mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags
+++ a/include/linux/hugetlb.h
@@ -600,6 +600,11 @@ static inline int is_hugepage_only_range
 #define is_hugepage_only_range is_hugepage_only_range
 #endif
 
+#ifndef arch_clear_hugepage_flags
+static inline void arch_clear_hugepage_flags(struct page *page) { }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
+#endif
+
 #ifndef arch_make_huge_pte
 static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
 				       struct page *page, int writable)
_

Patches currently in -mm which might be from anshuman.khandual@arm.com are

arm64-mm-drop-__have_arch_huge_ptep_get.patch
mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch
mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch
powerpc-mm-drop-platform-defined-pmd_mknotpresent.patch
mm-thp-rename-pmd_mknotpresent-as-pmd_mknotvalid.patch
mm-thp-rename-pmd_mknotpresent-as-pmd_mkinvalid-v2.patch
x86-mm-define-mm_p4d_folded.patch
mm-debug-add-tests-validating-architecture-page-table-helpers.patch
mm-debug-add-tests-validating-architecture-page-table-helpers-v17.patch
mm-debug-add-tests-validating-architecture-page-table-helpers-v18.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-introduce-external-memory-hinting-api-fix-2-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (60 preceding siblings ...)
  2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch " Andrew Morton
@ 2020-05-11 21:00 ` Andrew Morton
  2020-05-11 21:02 ` + mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch " Andrew Morton
                   ` (18 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:00 UTC (permalink / raw)
  To: akpm, minchan, mm-commits


The patch titled
     Subject: mm-introduce-external-memory-hinting-api-fix-2-fix
has been added to the -mm tree.  Its filename is
     mm-introduce-external-memory-hinting-api-fix-2-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-introduce-external-memory-hinting-api-fix-2-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-introduce-external-memory-hinting-api-fix-2-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-introduce-external-memory-hinting-api-fix-2-fix

the compat bit comes later

Cc: Minchan Kim <minchan@kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/mips/kernel/syscalls/syscall_o32.tbl |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/kernel/syscalls/syscall_o32.tbl~mm-introduce-external-memory-hinting-api-fix-2-fix
+++ a/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -425,4 +425,4 @@
 435	o32	clone3				__sys_clone3
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
-439	o32	process_madvise			sys_process_madvise		compat_sys_process_madvise
+439	o32	process_madvise			sys_process_madvise
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fix.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
linux-next-rejects.patch
mm-introduce-external-memory-hinting-api-fix-2-fix.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch
mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (61 preceding siblings ...)
  2020-05-11 21:00 ` + mm-introduce-external-memory-hinting-api-fix-2-fix.patch " Andrew Morton
@ 2020-05-11 21:02 ` Andrew Morton
  2020-05-11 21:12 ` [alternative-merged] mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch removed from " Andrew Morton
                   ` (17 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:02 UTC (permalink / raw)
  To: akpm, minchan, mm-commits


The patch titled
     Subject: mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix
has been added to the -mm tree.  Its filename is
     mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix

add compat_sys_process_madvise to mips syscall table

Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/mips/kernel/syscalls/syscall_o32.tbl |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/kernel/syscalls/syscall_o32.tbl~mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix
+++ a/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -425,4 +425,4 @@
 435	o32	clone3				__sys_clone3
 437	o32	openat2				sys_openat2
 438	o32	pidfd_getfd			sys_pidfd_getfd
-439	o32	process_madvise			sys_process_madvise
+439	o32	process_madvise			sys_process_madvise		compat_sys_process_madvise
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fix.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
linux-next-rejects.patch
mm-introduce-external-memory-hinting-api-fix-2-fix.patch
mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [alternative-merged] mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch removed from -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (62 preceding siblings ...)
  2020-05-11 21:02 ` + mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch " Andrew Morton
@ 2020-05-11 21:12 ` Andrew Morton
  2020-05-11 21:21 ` + seq_file-introduce-define_seq_attribute-helper-macro.patch added to " Andrew Morton
                   ` (16 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:12 UTC (permalink / raw)
  To: guro, hannes, mgorman, mhocko, mm-commits, shakeelb


The patch titled
     Subject: mm: vmscan: consistent update to pgsteal and pgscan
has been removed from the -mm tree.  Its filename was
     mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch

This patch was dropped because an alternative patch was merged

------------------------------------------------------
From: Shakeel Butt <shakeelb@google.com>
Subject: mm: vmscan: consistent update to pgsteal and pgscan

One way to measure the efficiency of memory reclaim is to look at the
ratio (pgscan+pfrefill)/pgsteal.  However at the moment these stats are
not updated consistently at the system level and the ratio of these are
not very meaningful.  The pgsteal and pgscan are updated for only global
reclaim while pgrefill gets updated for global as well as cgroup reclaim.

Please note that this difference is only for system level vmstats.  The
cgroup stats returned by memory.stat are actually consistent.  The
cgroup's pgsteal contains number of reclaimed pages for global as well as
cgroup reclaim.  So, one way to get the system level stats is to get these
stats from root's memory.stat but root does not expose that interface. 
Also for !CONFIG_MEMCG machines /proc/vmstat is the only way to get these
stats.  So, make these stats consistent.

Link: http://lkml.kernel.org/r/20200507204913.18661-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

--- a/mm/vmscan.c~mm-vmscan-consistent-update-to-pgsteal-and-pgscan
+++ a/mm/vmscan.c
@@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
-	if (!cgroup_reclaim(sc))
-		__count_vm_events(item, nr_scanned);
+	__count_vm_events(item, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
 	spin_unlock_irq(&pgdat->lru_lock);
 
@@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to
 	spin_lock_irq(&pgdat->lru_lock);
 
 	item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
-	if (!cgroup_reclaim(sc))
-		__count_vm_events(item, nr_reclaimed);
+	__count_vm_events(item, nr_reclaimed);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
 	reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
 	reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
_

Patches currently in -mm which might be from shakeelb@google.com are

memcg-optimize-memorynuma_stat-like-memorystat.patch
memcg-expose-root-cgroups-memorystat.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + seq_file-introduce-define_seq_attribute-helper-macro.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (63 preceding siblings ...)
  2020-05-11 21:12 ` [alternative-merged] mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch removed from " Andrew Morton
@ 2020-05-11 21:21 ` Andrew Morton
  2020-05-11 21:21 ` + mm-vmstat-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
                   ` (15 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:21 UTC (permalink / raw)
  To: anil.s.keshavamurthy, davem, gregkh, mhiramat, mingo, mm-commits,
	viro, wangkefeng.wang


The patch titled
     Subject: include/linux/seq_file.h: introduce DEFINE_SEQ_ATTRIBUTE() helper macro
has been added to the -mm tree.  Its filename is
     seq_file-introduce-define_seq_attribute-helper-macro.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/seq_file-introduce-define_seq_attribute-helper-macro.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/seq_file-introduce-define_seq_attribute-helper-macro.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: include/linux/seq_file.h: introduce DEFINE_SEQ_ATTRIBUTE() helper macro

Patch series "seq_file: Introduce DEFINE_SEQ_ATTRIBUTE() helper macro".

As discussed in
https://lore.kernel.org/lkml/20191129222310.GA3712618@kroah.com/, we could
introduce a new helper macro to reduce losts of boilerplate code, vmstat
and kprobes is the example which covert to use it, if this is accepted, I
will send out more cleanups.


This patch (of 3):

Introduce DEFINE_SEQ_ATTRIBUTE() helper macro to decrease code duplication.

Link: http://lkml.kernel.org/r/20200509064031.181091-1-wangkefeng.wang@huawei.com
Link: http://lkml.kernel.org/r/20200509064031.181091-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/seq_file.h |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

--- a/include/linux/seq_file.h~seq_file-introduce-define_seq_attribute-helper-macro
+++ a/include/linux/seq_file.h
@@ -145,6 +145,25 @@ void *__seq_open_private(struct file *,
 int seq_open_private(struct file *, const struct seq_operations *, int);
 int seq_release_private(struct inode *, struct file *);
 
+#define DEFINE_SEQ_ATTRIBUTE(__name)					\
+static int __name ## _open(struct inode *inode, struct file *file)	\
+{									\
+	int ret = seq_open(file, &__name ## _sops);			\
+	if (!ret && inode->i_private) {					\
+			struct seq_file *seq_f = file->private_data;	\
+			seq_f->private = inode->i_private;		\
+	}								\
+	return ret;							\
+}									\
+									\
+static const struct file_operations __name ## _fops = {			\
+	.owner		= THIS_MODULE,					\
+	.open		= __name ## _open,				\
+	.read		= seq_read,					\
+	.llseek		= seq_lseek,					\
+	.release	= seq_release,					\
+}
+
 #define DEFINE_SHOW_ATTRIBUTE(__name)					\
 static int __name ## _open(struct inode *inode, struct file *file)	\
 {									\
_

Patches currently in -mm which might be from wangkefeng.wang@huawei.com are

seq_file-introduce-define_seq_attribute-helper-macro.patch
mm-vmstat-convert-to-use-define_seq_attribute-macro.patch
kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-vmstat-convert-to-use-define_seq_attribute-macro.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (64 preceding siblings ...)
  2020-05-11 21:21 ` + seq_file-introduce-define_seq_attribute-helper-macro.patch added to " Andrew Morton
@ 2020-05-11 21:21 ` Andrew Morton
  2020-05-11 21:21 ` + kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
                   ` (14 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:21 UTC (permalink / raw)
  To: anil.s.keshavamurthy, davem, gregkh, mhiramat, mingo, mm-commits,
	viro, wangkefeng.wang


The patch titled
     Subject: mm/vmstat.c: convert to use DEFINE_SEQ_ATTRIBUTE macro
has been added to the -mm tree.  Its filename is
     mm-vmstat-convert-to-use-define_seq_attribute-macro.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-vmstat-convert-to-use-define_seq_attribute-macro.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-vmstat-convert-to-use-define_seq_attribute-macro.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: mm/vmstat.c: convert to use DEFINE_SEQ_ATTRIBUTE macro

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Link: http://lkml.kernel.org/r/20200509064031.181091-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmstat.c |   32 ++++++--------------------------
 1 file changed, 6 insertions(+), 26 deletions(-)

--- a/mm/vmstat.c~mm-vmstat-convert-to-use-define_seq_attribute-macro
+++ a/mm/vmstat.c
@@ -2055,24 +2055,14 @@ static int unusable_show(struct seq_file
 	return 0;
 }
 
-static const struct seq_operations unusable_op = {
+static const struct seq_operations unusable_sops = {
 	.start	= frag_start,
 	.next	= frag_next,
 	.stop	= frag_stop,
 	.show	= unusable_show,
 };
 
-static int unusable_open(struct inode *inode, struct file *file)
-{
-	return seq_open(file, &unusable_op);
-}
-
-static const struct file_operations unusable_file_ops = {
-	.open		= unusable_open,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= seq_release,
-};
+DEFINE_SEQ_ATTRIBUTE(unusable);
 
 static void extfrag_show_print(struct seq_file *m,
 					pg_data_t *pgdat, struct zone *zone)
@@ -2107,24 +2097,14 @@ static int extfrag_show(struct seq_file
 	return 0;
 }
 
-static const struct seq_operations extfrag_op = {
+static const struct seq_operations extfrag_sops = {
 	.start	= frag_start,
 	.next	= frag_next,
 	.stop	= frag_stop,
 	.show	= extfrag_show,
 };
 
-static int extfrag_open(struct inode *inode, struct file *file)
-{
-	return seq_open(file, &extfrag_op);
-}
-
-static const struct file_operations extfrag_file_ops = {
-	.open		= extfrag_open,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= seq_release,
-};
+DEFINE_SEQ_ATTRIBUTE(extfrag);
 
 static int __init extfrag_debug_init(void)
 {
@@ -2133,10 +2113,10 @@ static int __init extfrag_debug_init(voi
 	extfrag_debug_root = debugfs_create_dir("extfrag", NULL);
 
 	debugfs_create_file("unusable_index", 0444, extfrag_debug_root, NULL,
-			    &unusable_file_ops);
+			    &unusable_fops);
 
 	debugfs_create_file("extfrag_index", 0444, extfrag_debug_root, NULL,
-			    &extfrag_file_ops);
+			    &extfrag_fops);
 
 	return 0;
 }
_

Patches currently in -mm which might be from wangkefeng.wang@huawei.com are

seq_file-introduce-define_seq_attribute-helper-macro.patch
mm-vmstat-convert-to-use-define_seq_attribute-macro.patch
kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (65 preceding siblings ...)
  2020-05-11 21:21 ` + mm-vmstat-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
@ 2020-05-11 21:21 ` Andrew Morton
  2020-05-11 21:22 ` + seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch " Andrew Morton
                   ` (13 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:21 UTC (permalink / raw)
  To: anil.s.keshavamurthy, davem, gregkh, mhiramat, mingo, mm-commits,
	viro, wangkefeng.wang


The patch titled
     Subject: kernel/kprobes.c: convert to use DEFINE_SEQ_ATTRIBUTE macro
has been added to the -mm tree.  Its filename is
     kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: kernel/kprobes.c: convert to use DEFINE_SEQ_ATTRIBUTE macro

Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

Link: http://lkml.kernel.org/r/20200509064031.181091-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/kprobes.c |   33 ++++++---------------------------
 1 file changed, 6 insertions(+), 27 deletions(-)

--- a/kernel/kprobes.c~kernel-kprobes-convert-to-use-define_seq_attribute-macro
+++ a/kernel/kprobes.c
@@ -2398,24 +2398,14 @@ static int show_kprobe_addr(struct seq_f
 	return 0;
 }
 
-static const struct seq_operations kprobes_seq_ops = {
+static const struct seq_operations kprobes_sops = {
 	.start = kprobe_seq_start,
 	.next  = kprobe_seq_next,
 	.stop  = kprobe_seq_stop,
 	.show  = show_kprobe_addr
 };
 
-static int kprobes_open(struct inode *inode, struct file *filp)
-{
-	return seq_open(filp, &kprobes_seq_ops);
-}
-
-static const struct file_operations debugfs_kprobes_operations = {
-	.open           = kprobes_open,
-	.read           = seq_read,
-	.llseek         = seq_lseek,
-	.release        = seq_release,
-};
+DEFINE_SEQ_ATTRIBUTE(kprobes);
 
 /* kprobes/blacklist -- shows which functions can not be probed */
 static void *kprobe_blacklist_seq_start(struct seq_file *m, loff_t *pos)
@@ -2446,24 +2436,14 @@ static int kprobe_blacklist_seq_show(str
 	return 0;
 }
 
-static const struct seq_operations kprobe_blacklist_seq_ops = {
+static const struct seq_operations kprobe_blacklist_sops = {
 	.start = kprobe_blacklist_seq_start,
 	.next  = kprobe_blacklist_seq_next,
 	.stop  = kprobe_seq_stop,	/* Reuse void function */
 	.show  = kprobe_blacklist_seq_show,
 };
 
-static int kprobe_blacklist_open(struct inode *inode, struct file *filp)
-{
-	return seq_open(filp, &kprobe_blacklist_seq_ops);
-}
-
-static const struct file_operations debugfs_kprobe_blacklist_ops = {
-	.open           = kprobe_blacklist_open,
-	.read           = seq_read,
-	.llseek         = seq_lseek,
-	.release        = seq_release,
-};
+DEFINE_SEQ_ATTRIBUTE(kprobe_blacklist);
 
 static int arm_all_kprobes(void)
 {
@@ -2622,13 +2602,12 @@ static int __init debugfs_kprobe_init(vo
 
 	dir = debugfs_create_dir("kprobes", NULL);
 
-	debugfs_create_file("list", 0400, dir, NULL,
-			    &debugfs_kprobes_operations);
+	debugfs_create_file("list", 0400, dir, NULL, &kprobes_fops);
 
 	debugfs_create_file("enabled", 0600, dir, &value, &fops_kp);
 
 	debugfs_create_file("blacklist", 0400, dir, NULL,
-			    &debugfs_kprobe_blacklist_ops);
+			    &kprobe_blacklist_fops);
 
 	return 0;
 }
_

Patches currently in -mm which might be from wangkefeng.wang@huawei.com are

seq_file-introduce-define_seq_attribute-helper-macro.patch
mm-vmstat-convert-to-use-define_seq_attribute-macro.patch
kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (66 preceding siblings ...)
  2020-05-11 21:21 ` + kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
@ 2020-05-11 21:22 ` Andrew Morton
  2020-05-11 21:23 ` + lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch " Andrew Morton
                   ` (12 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:22 UTC (permalink / raw)
  To: akpm, mm-commits, wangkefeng.wang


The patch titled
     Subject: seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes
has been added to the -mm tree.  Its filename is
     seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrew Morton <akpm@linux-foundation.org>
Subject: seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes

WARNING: suspect code indent for conditional statements (8, 24)
#43: FILE: include/linux/seq_file.h:152:
+	if (!ret && inode->i_private) {					\
+			struct seq_file *seq_f = file->private_data;	\

total: 0 errors, 1 warnings, 25 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/seq_file-introduce-define_seq_attribute-helper-macro.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/seq_file.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/seq_file.h~seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes
+++ a/include/linux/seq_file.h
@@ -150,8 +150,8 @@ static int __name ## _open(struct inode
 {									\
 	int ret = seq_open(file, &__name ## _sops);			\
 	if (!ret && inode->i_private) {					\
-			struct seq_file *seq_f = file->private_data;	\
-			seq_f->private = inode->i_private;		\
+		struct seq_file *seq_f = file->private_data;		\
+		seq_f->private = inode->i_private;			\
 	}								\
 	return ret;							\
 }									\
_

Patches currently in -mm which might be from akpm@linux-foundation.org are

ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
mm.patch
mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
mm-gupc-updating-the-documentation-fix.patch
mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
mm-remove-__vmalloc_node_flags_caller-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
mm-remove-vmalloc_user_node_flags-fix.patch
arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
mm-add-debug_wx-support-fix.patch
riscv-support-debug_wx-fix.patch
mm-replace-zero-length-array-with-flexible-array-member-fix.patch
mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch
linux-next-rejects.patch
mm-introduce-external-memory-hinting-api-fix-2-fix.patch
mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
kernel-forkc-export-kernel_thread-to-modules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (67 preceding siblings ...)
  2020-05-11 21:22 ` + seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch " Andrew Morton
@ 2020-05-11 21:23 ` Andrew Morton
  2020-05-11 21:31 ` [alternative-merged] lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch removed from " Andrew Morton
                   ` (11 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:23 UTC (permalink / raw)
  To: jack, mm-commits, tan.hu, wang.liang82, wang.yi59, xue.zhihong


The patch titled
     Subject: lib/flex_proportions.c: cleanup __fprop_inc_percpu_max
has been added to the -mm tree.  Its filename is
     lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Tan Hu <tan.hu@zte.com.cn>
Subject: lib/flex_proportions.c: cleanup __fprop_inc_percpu_max

If the given type has fraction smaller than max_frac/FPROP_FRAC_BASE, the
code could be modified to call __fprop_inc_percpu() directly and easier to
understand.  After this patch, fprop_reflect_period_percpu() will be
called twice, and quicky return on pl->period == p->period test, so it
would not result to significant downside of performance.

Thanks for Jan's guidance.

Link: http://lkml.kernel.org/r/1589004753-27554-1-git-send-email-tan.hu@zte.com.cn
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Cc: Jan Kara <jack@suse.cz>
Cc: <xue.zhihong@zte.com.cn>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: <wang.liang82@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/flex_proportions.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/lib/flex_proportions.c~lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max
+++ a/lib/flex_proportions.c
@@ -266,8 +266,7 @@ void __fprop_inc_percpu_max(struct fprop
 		if (numerator >
 		    (((u64)denominator) * max_frac) >> FPROP_FRAC_SHIFT)
 			return;
-	} else
-		fprop_reflect_period_percpu(p, pl);
-	percpu_counter_add_batch(&pl->events, 1, PROP_BATCH);
-	percpu_counter_add(&p->events, 1);
+	}
+
+	__fprop_inc_percpu(p, pl);
 }
_

Patches currently in -mm which might be from tan.hu@zte.com.cn are

lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch
lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [alternative-merged] lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch removed from -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (68 preceding siblings ...)
  2020-05-11 21:23 ` + lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch " Andrew Morton
@ 2020-05-11 21:31 ` Andrew Morton
  2020-05-11 21:44 ` + xtensa-add-loglvl-to-show_trace-fix.patch added to " Andrew Morton
                   ` (10 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:31 UTC (permalink / raw)
  To: jack, mm-commits, tan.hu, wang.liang82, wang.yi59, xue.zhihong


The patch titled
     Subject: lib/flex_proportions.c: aging counts when fraction smaller than max_frac/FPROP_FRAC_BASE
has been removed from the -mm tree.  Its filename was
     lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch

This patch was dropped because an alternative patch was merged

------------------------------------------------------
From: Tan Hu <tan.hu@zte.com.cn>
Subject: lib/flex_proportions.c: aging counts when fraction smaller than max_frac/FPROP_FRAC_BASE

If the given type has fraction smaller than max_frac/FPROP_FRAC_BASE,
__fprop_inc_percpu_max should follow the design formula and aging fraction
too.

Link: http://lkml.kernel.org/r/1588746088-38605-1-git-send-email-tan.hu@zte.com.cn
Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Cc: <xue.zhihong@zte.com.cn>
Cc: Yi Wang <wang.yi59@zte.com.cn>
Cc: <wang.liang82@zte.com.cn>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 lib/flex_proportions.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/lib/flex_proportions.c~lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base
+++ a/lib/flex_proportions.c
@@ -266,8 +266,7 @@ void __fprop_inc_percpu_max(struct fprop
 		if (numerator >
 		    (((u64)denominator) * max_frac) >> FPROP_FRAC_SHIFT)
 			return;
-	} else
-		fprop_reflect_period_percpu(p, pl);
-	percpu_counter_add_batch(&pl->events, 1, PROP_BATCH);
-	percpu_counter_add(&p->events, 1);
+	}
+
+	__fprop_inc_percpu(p, pl);
 }
_

Patches currently in -mm which might be from tan.hu@zte.com.cn are

lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + xtensa-add-loglvl-to-show_trace-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (69 preceding siblings ...)
  2020-05-11 21:31 ` [alternative-merged] lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch removed from " Andrew Morton
@ 2020-05-11 21:44 ` Andrew Morton
  2020-05-11 22:44 ` mmotm 2020-05-11-15-43 uploaded Andrew Morton
                   ` (9 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 21:44 UTC (permalink / raw)
  To: dima, mm-commits, rppt


The patch titled
     Subject: xtensa-add-loglvl-to-show_trace-fix
has been added to the -mm tree.  Its filename is
     xtensa-add-loglvl-to-show_trace-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/xtensa-add-loglvl-to-show_trace-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/xtensa-add-loglvl-to-show_trace-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Mike Rapoport <rppt@kernel.org>
Subject: xtensa-add-loglvl-to-show_trace-fix

build fix

Link: http://lkml.kernel.org/r/20200511194534.GA1018386@kernel.org
Cc: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/xtensa/kernel/traps.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/xtensa/kernel/traps.c~xtensa-add-loglvl-to-show_trace-fix
+++ a/arch/xtensa/kernel/traps.c
@@ -515,7 +515,7 @@ void show_stack(struct task_struct *task
 	print_hex_dump(KERN_INFO, " ", DUMP_PREFIX_NONE,
 		       STACK_DUMP_LINE_SIZE, STACK_DUMP_ENTRY_SIZE,
 		       sp, len, false);
-	show_trace(task, stack, KERN_INFO);
+	show_trace(task, sp, KERN_INFO);
 }
 
 DEFINE_SPINLOCK(die_lock);
_

Patches currently in -mm which might be from rppt@kernel.org are

mm-free_area_init-allow-defining-max_zone_pfn-in-descending-order-fix.patch
xtensa-add-loglvl-to-show_trace-fix.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* mmotm 2020-05-11-15-43 uploaded
  2020-05-08  1:35 incoming Andrew Morton
                   ` (70 preceding siblings ...)
  2020-05-11 21:44 ` + xtensa-add-loglvl-to-show_trace-fix.patch added to " Andrew Morton
@ 2020-05-11 22:44 ` Andrew Morton
  2020-05-12 21:03 ` + kasan-consistently-disable-debugging-features.patch added to -mm tree Andrew Morton
                   ` (8 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-11 22:44 UTC (permalink / raw)
  To: broonie, linux-fsdevel, linux-kernel, linux-mm, linux-next,
	mhocko, mm-commits, sfr

The mm-of-the-moment snapshot 2020-05-11-15-43 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE-yyyy-mm-dd-hh-mm-ss.  Both contain the string yyyy-mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

	https://github.com/hnaz/linux-mm

The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

	https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.7-rc5:
(patches marked "*" will be included in linux-next)

* checkpatch-test-git_dir-changes.patch
* proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
* proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
* kcov-cleanup-debug-messages.patch
* kcov-fix-potential-use-after-free-in-kcov_remote_start.patch
* kcov-move-t-kcov-assignments-into-kcov_start-stop.patch
* kcov-move-t-kcov_sequence-assignment.patch
* kcov-use-t-kcov_mode-as-enabled-indicator.patch
* kcov-collect-coverage-from-interrupts.patch
* usb-core-kcov-collect-coverage-from-usb-complete-callback.patch
* memcg-optimize-memorynuma_stat-like-memorystat.patch
* mm-memcg-fix-inconsistent-oom-event-behavior.patch
* epoll-call-final-ep_events_available-check-under-the-lock.patch
* mm-gup-fix-fixup_user_fault-on-multiple-retries.patch
* lib-lzo-fix-ambiguous-encoding-bug-in-lzo-rle.patch
* userfaultfd-fix-remap-event-with-mremap_dontunmap.patch
* ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index.patch
* ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch
* device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
* scripts-support-compiled-source-improved-precise.patch
* scripts-add-a-intermediate-file-for-make-gtags.patch
* squashfs-migrate-from-ll_rw_block-usage-to-bio.patch
* squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
* ocfs2-add-missing-annotation-for-dlm_empty_lockres.patch
* ocfs2-mount-shared-volume-without-ha-stack.patch
* drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
* ramfs-support-o_tmpfile.patch
* vfs-track-per-sb-writeback-errors-and-report-them-to-syncfs.patch
* buffer-record-blockdev-write-errors-in-super_block-that-it-backs.patch
* kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
  mm.patch
* usercopy-mark-dma-kmalloc-caches-as-usercopy-caches.patch
* mm-slub-fix-corrupted-freechain-in-deactivate_slab.patch
* mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
* slub-remove-userspace-notifier-for-cache-add-remove.patch
* slub-remove-kmalloc-under-list_lock-from-list_slab_objects.patch
* mm-slub-fix-stack-overruns-with-slub_stats.patch
* mm-slub-add-panic_on_error-to-the-debug-facilities.patch
* mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
* mm-dump_page-do-not-crash-with-invalid-mapping-pointer.patch
* mm-move-readahead-prototypes-from-mmh.patch
* mm-return-void-from-various-readahead-functions.patch
* mm-ignore-return-value-of-readpages.patch
* mm-move-readahead-nr_pages-check-into-read_pages.patch
* mm-add-new-readahead_control-api.patch
* mm-use-readahead_control-to-pass-arguments.patch
* mm-rename-various-offset-parameters-to-index.patch
* mm-rename-readahead-loop-variable-to-i.patch
* mm-remove-page_offset-from-readahead-loop.patch
* mm-put-readahead-pages-in-cache-earlier.patch
* mm-add-readahead-address-space-operation.patch
* mm-move-end_index-check-out-of-readahead-loop.patch
* mm-add-page_cache_readahead_unbounded.patch
* mm-document-why-we-dont-set-pagereadahead.patch
* mm-use-memalloc_nofs_save-in-readahead-path.patch
* fs-convert-mpage_readpages-to-mpage_readahead.patch
* btrfs-convert-from-readpages-to-readahead.patch
* erofs-convert-uncompressed-files-from-readpages-to-readahead.patch
* erofs-convert-compressed-files-from-readpages-to-readahead.patch
* ext4-convert-from-readpages-to-readahead.patch
* ext4-pass-the-inode-to-ext4_mpage_readpages.patch
* f2fs-convert-from-readpages-to-readahead.patch
* f2fs-pass-the-inode-to-f2fs_mpage_readpages.patch
* fuse-convert-from-readpages-to-readahead.patch
* fuse-convert-from-readpages-to-readahead-fix.patch
* iomap-convert-from-readpages-to-readahead.patch
* mm-gupc-updating-the-documentation.patch
* mm-gupc-updating-the-documentation-fix.patch
* mm-swapfile-use-list_prevnext_entry-instead-of-open-coding.patch
* mm-swap_state-fix-a-data-race-in-swapin_nr_pages.patch
* mm-swap-properly-update-readahead-statistics-in-unuse_pte_range.patch
* mm-swapfilec-offset-is-only-used-when-there-is-more-slots.patch
* mm-swapfilec-explicitly-show-ssd-non-ssd-is-handled-mutually-exclusive.patch
* mm-swapfilec-remove-the-unnecessary-goto-for-ssd-case.patch
* mm-swapfilec-simplify-the-calculation-of-n_goal.patch
* mm-swapfilec-remove-the-extra-check-in-scan_swap_map_slots.patch
* mm-swapfilec-found_free-could-be-represented-by-tmp-max.patch
* mm-swapfilec-tmp-is-always-smaller-than-max.patch
* mm-swapfilec-omit-a-duplicate-code-by-compare-tmp-and-max-first.patch
* swap-try-to-scan-more-free-slots-even-when-fragmented.patch
* mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable.patch
* mm-swapfilec-classify-swap_map_xxx-to-make-it-more-readable-fix.patch
* mm-swapfilec-__swap_entry_free-always-free-1-entry.patch
* mm-memcg-add-workingset_restore-in-memorystat.patch
* mm-memcg-avoid-stale-protection-values-when-cgroup-is-above-protection.patch
* mm-memcg-decouple-elowmin-state-mutations-from-protection-checks.patch
* mm-memcontrol-simplify-value-comparison-between-count-and-limit.patch
* mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
* mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
* mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
* mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch
* mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
* mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
* mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
* mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
* mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
* mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
* mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
* mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
* mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
* mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
* mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
* mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
* mm-memcontrol-charge-swapin-pages-on-instantiation.patch
* mm-memcontrol-document-the-new-swap-control-behavior.patch
* mm-memcontrol-delete-unused-lrucare-handling.patch
* mm-memcontrol-update-page-mem_cgroup-stability-rules.patch
* memcg-expose-root-cgroups-memorystat.patch
* h8300-remove-usage-of-__arch_use_5level_hack.patch
* arm-add-support-for-folded-p4d-page-tables.patch
* arm-add-support-for-folded-p4d-page-tables-fix.patch
* arm64-add-support-for-folded-p4d-page-tables.patch
* arm64-add-support-for-folded-p4d-page-tables-fix.patch
* hexagon-remove-__arch_use_5level_hack.patch
* ia64-add-support-for-folded-p4d-page-tables.patch
* nios2-add-support-for-folded-p4d-page-tables.patch
* openrisc-add-support-for-folded-p4d-page-tables.patch
* powerpc-add-support-for-folded-p4d-page-tables.patch
* powerpc-add-support-for-folded-p4d-page-tables-fix.patch
* sh-fault-modernize-printing-of-kernel-messages.patch
* sh-drop-__pxd_offset-macros-that-duplicate-pxd_index-ones.patch
* sh-add-support-for-folded-p4d-page-tables.patch
* unicore32-remove-__arch_use_5level_hack.patch
* asm-generic-remove-pgtable-nop4d-hackh.patch
* mm-remove-__arch_has_5level_hack-and-include-asm-generic-5level-fixuph.patch
* mm-gupc-further-document-vma_permits_fault.patch
* proc-pid-smaps-add-pmd-migration-entry-parsing.patch
* mm-mmap-fix-the-adjusted-length-error.patch
* mm-memory-remove-unnecessary-pte_devmap-case-in-copy_one_pte.patch
* x86-hyperv-use-vmalloc_exec-for-the-hypercall-page.patch
* x86-fix-vmap-arguments-in-map_irq_stack.patch
* staging-android-ion-use-vmap-instead-of-vm_map_ram.patch
* staging-media-ipu3-use-vmap-instead-of-reimplementing-it.patch
* dma-mapping-use-vmap-insted-of-reimplementing-it.patch
* powerpc-add-an-ioremap_phb-helper.patch
* powerpc-remove-__ioremap_at-and-__iounmap_at.patch
* mm-remove-__get_vm_area.patch
* mm-unexport-unmap_kernel_range_noflush.patch
* mm-rename-config_pgtable_mapping-to-config_zsmalloc_pgtable_mapping.patch
* mm-only-allow-page-table-mappings-for-built-in-zsmalloc.patch
* mm-pass-addr-as-unsigned-long-to-vb_free.patch
* mm-remove-vmap_page_range_noflush-and-vunmap_page_range.patch
* mm-rename-vmap_page_range-to-map_kernel_range.patch
* mm-dont-return-the-number-of-pages-from-map_kernel_range_noflush.patch
* mm-remove-map_vm_range.patch
* mm-remove-unmap_vmap_area.patch
* mm-remove-the-prot-argument-from-vm_map_ram.patch
* mm-enforce-that-vmap-cant-map-pages-executable.patch
* gpu-drm-remove-the-powerpc-hack-in-drm_legacy_sg_alloc.patch
* mm-remove-the-pgprot-argument-to-__vmalloc.patch
* mm-remove-the-prot-argument-to-__vmalloc_node.patch
* mm-remove-both-instances-of-__vmalloc_node_flags.patch
* mm-remove-__vmalloc_node_flags_caller.patch
* mm-remove-__vmalloc_node_flags_caller-fix.patch
* mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node.patch
* mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix.patch
* mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch
* mm-remove-vmalloc_user_node_flags.patch
* mm-remove-vmalloc_user_node_flags-fix.patch
* arm64-use-__vmalloc_node-in-arch_alloc_vmap_stack.patch
* powerpc-use-__vmalloc_node-in-alloc_vm_stack.patch
* s390-use-__vmalloc_node-in-stack_alloc.patch
* mm-init-report-kasan-tag-information-stored-in-page-flags.patch
* kasan-stop-tests-being-eliminated-as-dead-code-with-fortify_source.patch
* kasan-stop-tests-being-eliminated-as-dead-code-with-fortify_source-v4.patch
* stringh-fix-incompatibility-between-fortify_source-and-kasan.patch
* mm-clarify-__gfp_memalloc-usage.patch
* mm-memblock-replace-dereferences-of-memblock_regionnid-with-api-calls.patch
* mm-make-early_pfn_to_nid-and-related-defintions-close-to-each-other.patch
* mm-remove-config_have_memblock_node_map-option.patch
* mm-free_area_init-use-maximal-zone-pfns-rather-than-zone-sizes.patch
* mm-use-free_area_init-instead-of-free_area_init_nodes.patch
* alpha-simplify-detection-of-memory-zone-boundaries.patch
* arm-simplify-detection-of-memory-zone-boundaries.patch
* arm64-simplify-detection-of-memory-zone-boundaries-for-uma-configs.patch
* csky-simplify-detection-of-memory-zone-boundaries.patch
* m68k-mm-simplify-detection-of-memory-zone-boundaries.patch
* parisc-simplify-detection-of-memory-zone-boundaries.patch
* sparc32-simplify-detection-of-memory-zone-boundaries.patch
* unicore32-simplify-detection-of-memory-zone-boundaries.patch
* xtensa-simplify-detection-of-memory-zone-boundaries.patch
* mm-memmap_init-iterate-over-memblock-regions-rather-that-check-each-pfn.patch
* mm-memmap_init-iterate-over-memblock-regions-rather-that-check-each-pfn-fix.patch
* mm-remove-early_pfn_in_nid-and-config_nodes_span_other_nodes.patch
* mm-free_area_init-allow-defining-max_zone_pfn-in-descending-order.patch
* mm-free_area_init-allow-defining-max_zone_pfn-in-descending-order-fix.patch
* mm-free_area_init-allow-defining-max_zone_pfn-in-descending-order-fix-2.patch
* mm-rename-free_area_init_node-to-free_area_init_memoryless_node.patch
* mm-clean-up-free_area_init_node-and-its-helpers.patch
* mm-simplify-find_min_pfn_with_active_regions.patch
* docs-vm-update-memory-models-documentation.patch
* mm-page_allocc-bad_-is-not-necessary-when-pagehwpoison.patch
* mm-page_allocc-bad_flags-is-not-necessary-for-bad_page.patch
* mm-page_allocc-rename-free_pages_check_bad-to-check_free_page_bad.patch
* mm-page_allocc-rename-free_pages_check-to-check_free_page.patch
* mm-page_allocc-extract-check__page_bad-common-part-to-page_bad_reason.patch
* mmpage_alloccma-conditionally-prefer-cma-pageblocks-for-movable-allocations.patch
* mmpage_alloccma-conditionally-prefer-cma-pageblocks-for-movable-allocations-fix.patch
* mm-call-touch_nmi_watchdog-on-max-order-boundaries-in-deferred-init.patch
* mm-initialize-deferred-pages-with-interrupts-enabled.patch
* mm-call-cond_resched-from-deferred_init_memmap.patch
* mm-remove-unused-free_bootmem_with_active_regions.patch
* mm-page_allocc-only-tune-sysctl_lowmem_reserve_ratio-value-once-when-changing-it.patch
* mm-page_allocc-clear-out-zone-lowmem_reserve-if-the-zone-is-empty.patch
* mm-vmstatc-do-not-show-lowmem-reserve-protection-information-of-empty-zone.patch
* mm-page_alloc-use-ac-high_zoneidx-for-classzone_idx.patch
* mm-page_alloc-integrate-classzone_idx-and-high_zoneidx.patch
* mm-page_allocc-use-node_mask_none-in-build_zonelists.patch
* mm-rename-gfpflags_to_migratetype-to-gfp_migratetype-for-same-convention.patch
* mm-reset-numa-stats-for-boot-pagesets.patch
* mm-reset-numa-stats-for-boot-pagesets-v3.patch
* mm-vmscanc-use-update_lru_size-in-update_lru_sizes.patch
* mm-vmscan-count-layzfree-pages-and-fix-nr_isolated_-mismatch.patch
* mm-vmscanc-change-prototype-for-shrink_page_list.patch
* mm-vmscan-update-the-comment-of-should_continue_reclaim.patch
* tools-vm-page_owner_sort-filter-out-unneeded-line.patch
* mm-mempolicy-fix-up-gup-usage-in-lookup_node.patch
* mm-memblock-fix-minor-typo-and-unclear-comment.patch
* hugetlb_cgroup-remove-unused-variable-i.patch
* khugepaged-add-self-test.patch
* khugepaged-add-self-test-fix.patch
* khugepaged-add-self-test-fix-2.patch
* khugepaged-add-self-test-fix-2-fix.patch
* khugepaged-do-not-stop-collapse-if-less-than-half-ptes-are-referenced.patch
* khugepaged-drain-all-lru-caches-before-scanning-pages.patch
* khugepaged-drain-lru-add-pagevec-after-swapin.patch
* khugepaged-allow-to-collapse-a-page-shared-across-fork.patch
* khugepaged-allow-to-collapse-pte-mapped-compound-pages.patch
* thp-change-cow-semantics-for-anon-thp.patch
* khugepaged-introduce-max_ptes_shared-tunable.patch
* khugepaged-introduce-max_ptes_shared-tunable-fix.patch
* hugetlbfs-add-arch_hugetlb_valid_size.patch
* hugetlbfs-move-hugepagesz=-parsing-to-arch-independent-code.patch
* hugetlbfs-remove-hugetlb_add_hstate-warning-for-existing-hstate.patch
* hugetlbfs-remove-hugetlb_add_hstate-warning-for-existing-hstate-fix.patch
* hugetlbfs-clean-up-command-line-processing.patch
* hugetlbfs-move-hugepagesz=-parsing-to-arch-independent-code-fix.patch
* mm-hugetlb-avoid-unnecessary-check-on-pud-and-pmd-entry-in-huge_pte_offset.patch
* arm64-mm-drop-__have_arch_huge_ptep_get.patch
* mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch
* mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch
* arch-kmap-remove-bug_on.patch
* arch-xtensa-move-kmap-build-bug-out-of-the-way.patch
* arch-kmap-remove-redundant-arch-specific-kmaps.patch
* arch-kunmap-remove-duplicate-kunmap-implementations.patch
* arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch
* x86powerpcmicroblaze-kmap-move-preempt-disable.patch
* arch-kmap_atomic-consolidate-duplicate-code.patch
* arch-kmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
* arch-kunmap_atomic-consolidate-duplicate-code.patch
* arch-kunmap_atomic-consolidate-duplicate-code-checkpatch-fixes.patch
* arch-kmap-ensure-kmap_prot-visibility.patch
* arch-kmap-dont-hard-code-kmap_prot-values.patch
* arch-kmap-define-kmap_atomic_prot-for-all-archs.patch
* drm-remove-drm-specific-kmap_atomic-code.patch
* kmap-remove-kmap_atomic_to_page.patch
* parisc-kmap-remove-duplicate-kmap-code.patch
* sparc-remove-unnecessary-includes.patch
* kmap-consolidate-kmap_prot-definitions.patch
* kmap-consolidate-kmap_prot-definitions-checkpatch-fixes.patch
* mm-thp-dont-need-drain-lru-cache-when-splitting-and-mlocking-thp.patch
* powerpc-mm-drop-platform-defined-pmd_mknotpresent.patch
* mm-thp-rename-pmd_mknotpresent-as-pmd_mknotvalid.patch
* mm-thp-rename-pmd_mknotpresent-as-pmd_mkinvalid-v2.patch
* drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup.patch
* drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
* mm-add-debug_wx-support.patch
* mm-add-debug_wx-support-fix.patch
* mm-add-debug_wx-support-fix-2.patch
* mm-add-debug_wx-support-fix-3.patch
* riscv-support-debug_wx.patch
* riscv-support-debug_wx-fix.patch
* x86-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch
* arm64-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch
* mm-add-kvfree_sensitive-for-freeing-sensitive-data-objects.patch
* mm-memory_hotplug-refrain-from-adding-memory-into-an-impossible-node.patch
* powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
* mm-memory_hotplug-remove-is_mem_section_removable.patch
* mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
* mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch
* mm-memory_hotplug-introduce-add_memory_driver_managed.patch
* kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch
* device-dax-add-memory-via-add_memory_driver_managed.patch
* mm-replace-zero-length-array-with-flexible-array-member.patch
* mm-replace-zero-length-array-with-flexible-array-member-fix.patch
* mm-memory_hotplug-fix-a-typo-in-comment-recoreded-recorded.patch
* mm-ksm-fix-a-typo-in-comment-alreaady-already.patch
* mm-ksm-fix-a-typo-in-comment-alreaady-already-v2.patch
* mm-mmap-fix-a-typo-in-comment-compatbility-compatibility.patch
* mm-hugetlb-fix-a-typo-in-comment-manitained-maintained.patch
* mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2.patch
* mm-hugetlb-fix-a-typo-in-comment-manitained-maintained-v2-checkpatch-fixes.patch
* mm-vmsan-fix-some-typos-in-comment.patch
* mm-compaction-fix-a-typo-in-comment-pessemistic-pessimistic.patch
* mm-memblock-fix-a-typo-in-comment-implict-implicit.patch
* mm-list_lru-fix-a-typo-in-comment-numbesr-numbers.patch
* mm-filemap-fix-a-typo-in-comment-unneccssary-unnecessary.patch
* mm-frontswap-fix-some-typos-in-frontswapc.patch
* mm-memcg-fix-some-typos-in-memcontrolc.patch
* mm-fix-a-typo-in-comment-strucure-structure.patch
* mm-slub-fix-a-typo-in-comment-disambiguiation-disambiguation.patch
* mm-sparse-fix-a-typo-in-comment-convienence-convenience.patch
* mm-page-writeback-fix-a-typo-in-comment-effictive-effective.patch
* mm-memory-fix-a-typo-in-comment-attampt-attempt.patch
* mm-use-false-for-bool-variable.patch
* mm-return-true-in-cpupid_pid_unset.patch
* zcomp-use-array_size-for-backends-list.patch
* info-task-hung-in-generic_file_write_iter.patch
* info-task-hung-in-generic_file_write-fix.patch
* kernel-hung_taskc-monitor-killed-tasks.patch
* proc-rename-catch-function-argument.patch
* x86-mm-define-mm_p4d_folded.patch
* mm-debug-add-tests-validating-architecture-page-table-helpers.patch
* mm-debug-add-tests-validating-architecture-page-table-helpers-v17.patch
* mm-debug-add-tests-validating-architecture-page-table-helpers-v18.patch
* userc-make-uidhash_table-static.patch
* dynamic_debug-add-an-option-to-enable-dynamic-debug-for-modules-only.patch
* dynamic_debug-add-an-option-to-enable-dynamic-debug-for-modules-only-v2.patch
* get_maintainer-add-email-addresses-from-yaml-files.patch
* bitops-avoid-clang-shift-count-overflow-warnings.patch
* lib-math-avoid-trailing-n-hidden-in-pr_fmt.patch
* lib-add-might_fault-to-strncpy_from_user.patch
* lib-optimize-cpumask_local_spread.patch
* lib-test_lockupc-make-test_inode-static.patch
* lib-zlib-remove-outdated-and-incorrect-pre-increment-optimization.patch
* percpu_ref-use-a-more-common-logging-style.patch
* lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch
* checkpatch-additional-maintainer-section-entry-ordering-checks.patch
* checkpatch-look-for-c99-comments-in-ctx_locate_comment.patch
* checkpatch-disallow-git-and-file-fix.patch
* checkpatch-use-patch-subject-when-reading-from-stdin.patch
* checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch
* fs-binfmt_elf-remove-redundant-elf_map-ifndef.patch
* elfnote-mark-all-note-sections-shf_alloc.patch
* fs-binfmt_elfc-allocate-initialized-memory-in-fill_thread_core_info.patch
* fat-dont-allow-to-mount-if-the-fat-length-==-0.patch
* fat-improve-the-readahead-for-fat-entries.patch
* fs-seq_filec-seq_read-update-pr_info_ratelimited.patch
* seq_file-introduce-define_seq_attribute-helper-macro.patch
* seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch
* mm-vmstat-convert-to-use-define_seq_attribute-macro.patch
* kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch
* exec-simplify-the-copy_strings_kernel-calling-convention.patch
* exec-open-code-copy_string_kernel.patch
* umh-fix-refcount-underflow-in-fork_usermode_blob.patch
* rapidio-avoid-data-race-between-file-operation-callbacks-and-mport_cdev_add.patch
* kernel-relayc-fix-read_pos-error-when-multiple-readers.patch
* aio-simplify-read_events.patch
* add-kernel-config-option-for-twisting-kernel-behavior.patch
* twist-allow-disabling-k_spec-function-in-drivers-tty-vt-keyboardc.patch
* twist-add-option-for-selecting-twist-options-for-syzkallers-testing.patch
* selftests-x86-pkeys-move-selftests-to-arch-neutral-directory.patch
* selftests-vm-pkeys-rename-all-references-to-pkru-to-a-generic-name.patch
* selftests-vm-pkeys-move-generic-definitions-to-header-file.patch
* selftests-vm-pkeys-move-some-definitions-to-arch-specific-header.patch
* selftests-vm-pkeys-make-gcc-check-arguments-of-sigsafe_printf.patch
* selftests-vm-pkeys-use-sane-types-for-pkey-register.patch
* selftests-vm-pkeys-add-helpers-for-pkey-bits.patch
* selftests-vm-pkeys-fix-pkey_disable_clear.patch
* selftests-vm-pkeys-fix-assertion-in-pkey_disable_set-clear.patch
* selftests-vm-pkeys-fix-alloc_random_pkey-to-make-it-really-random.patch
* selftests-vm-pkeys-use-the-correct-huge-page-size.patch
* selftests-vm-pkeys-introduce-generic-pkey-abstractions.patch
* selftests-vm-pkeys-introduce-powerpc-support.patch
* selftests-vm-pkeys-introduce-powerpc-support-fix.patch
* selftests-vm-pkeys-fix-number-of-reserved-powerpc-pkeys.patch
* selftests-vm-pkeys-fix-assertion-in-test_pkey_alloc_exhaust.patch
* selftests-vm-pkeys-improve-checks-to-determine-pkey-support.patch
* selftests-vm-pkeys-associate-key-on-a-mapped-page-and-detect-access-violation.patch
* selftests-vm-pkeys-associate-key-on-a-mapped-page-and-detect-write-violation.patch
* selftests-vm-pkeys-detect-write-violation-on-a-mapped-access-denied-key-page.patch
* selftests-vm-pkeys-introduce-a-sub-page-allocator.patch
* selftests-vm-pkeys-test-correct-behaviour-of-pkey-0.patch
* selftests-vm-pkeys-override-access-right-definitions-on-powerpc.patch
* selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch
* selftests-vm-pkeys-use-the-correct-page-size-on-powerpc.patch
* selftests-vm-pkeys-fix-multilib-builds-for-x86.patch
* tools-testing-selftests-vm-remove-duplicate-headers.patch
* ubsan-fix-gcc-10-warnings.patch
* ipc-msg-add-missing-annotation-for-freeque.patch
* ipc-use-a-work-queue-to-free_ipc.patch
* ipc-convert-ipcs_idr-to-xarray.patch
* ipc-convert-ipcs_idr-to-xarray-update.patch
  linux-next.patch
  linux-next-rejects.patch
* xarrayh-correct-return-code-for-xa_store_bhirq.patch
* kernel-sysctl-support-setting-sysctl-parameters-from-kernel-command-line.patch
* kernel-sysctl-support-handling-command-line-aliases.patch
* kernel-hung_task-convert-hung_task_panic-boot-parameter-to-sysctl.patch
* tools-testing-selftests-sysctl-sysctlsh-support-config_test_sysctl=y.patch
* lib-test_sysctl-support-testing-of-sysctl-boot-parameter.patch
* kernel-watchdogc-convert-soft-hardlockup-boot-parameters-to-sysctl-aliases.patch
* kernel-hung_taskc-introduce-sysctl-to-print-all-traces-when-a-hung-task-is-detected.patch
* panic-add-sysctl-to-dump-all-cpus-backtraces-on-oops-event.patch
* stacktrace-cleanup-inconsistent-variable-type.patch
* amdgpu-a-null-mm-does-not-mean-a-thread-is-a-kthread.patch
* kernel-move-use_mm-unuse_mm-to-kthreadc.patch
* kernel-move-use_mm-unuse_mm-to-kthreadc-v2.patch
* kernel-better-document-the-use_mm-unuse_mm-api-contract.patch
* kernel-better-document-the-use_mm-unuse_mm-api-contract-v2.patch
* kernel-better-document-the-use_mm-unuse_mm-api-contract-v2-fix.patch
* kernel-set-user_ds-in-kthread_use_mm.patch
* mm-kmemleak-silence-kcsan-splats-in-checksum.patch
* kallsyms-printk-add-loglvl-to-print_ip_sym.patch
* alpha-add-show_stack_loglvl.patch
* arc-add-show_stack_loglvl.patch
* arm-asm-add-loglvl-to-c_backtrace.patch
* arm-add-loglvl-to-unwind_backtrace.patch
* arm-add-loglvl-to-dump_backtrace.patch
* arm-wire-up-dump_backtrace_entrystm.patch
* arm-add-show_stack_loglvl.patch
* arm64-add-loglvl-to-dump_backtrace.patch
* arm64-add-show_stack_loglvl.patch
* c6x-add-show_stack_loglvl.patch
* csky-add-show_stack_loglvl.patch
* h8300-add-show_stack_loglvl.patch
* hexagon-add-show_stack_loglvl.patch
* ia64-pass-log-level-as-arg-into-ia64_do_show_stack.patch
* ia64-add-show_stack_loglvl.patch
* m68k-add-show_stack_loglvl.patch
* microblaze-add-loglvl-to-microblaze_unwind_inner.patch
* microblaze-add-loglvl-to-microblaze_unwind.patch
* microblaze-add-show_stack_loglvl.patch
* mips-add-show_stack_loglvl.patch
* nds32-add-show_stack_loglvl.patch
* nios2-add-show_stack_loglvl.patch
* openrisc-add-show_stack_loglvl.patch
* parisc-add-show_stack_loglvl.patch
* powerpc-add-show_stack_loglvl.patch
* riscv-add-show_stack_loglvl.patch
* s390-add-show_stack_loglvl.patch
* sh-add-loglvl-to-dump_mem.patch
* sh-remove-needless-printk.patch
* sh-add-loglvl-to-printk_address.patch
* sh-add-loglvl-to-show_trace.patch
* sh-add-show_stack_loglvl.patch
* sparc-add-show_stack_loglvl.patch
* um-sysrq-remove-needless-variable-sp.patch
* um-add-show_stack_loglvl.patch
* unicore32-remove-unused-pmode-argument-in-c_backtrace.patch
* unicore32-add-loglvl-to-c_backtrace.patch
* unicore32-add-show_stack_loglvl.patch
* x86-add-missing-const-qualifiers-for-log_lvl.patch
* x86-add-show_stack_loglvl.patch
* xtensa-add-loglvl-to-show_trace.patch
* xtensa-add-loglvl-to-show_trace-fix.patch
* xtensa-add-show_stack_loglvl.patch
* sysrq-use-show_stack_loglvl.patch
* x86-amd_gart-print-stacktrace-for-a-leak-with-kern_err.patch
* power-use-show_stack_loglvl.patch
* kdb-dont-play-with-console_loglevel.patch
* sched-print-stack-trace-with-kern_info.patch
* kernel-use-show_stack_loglvl.patch
* kernel-rename-show_stack_loglvl-=-show_stack.patch
* mm-frontswap-mark-various-intentional-data-races.patch
* mm-page_io-mark-various-intentional-data-races.patch
* mm-page_io-mark-various-intentional-data-races-v2.patch
* mm-swap_state-mark-various-intentional-data-races.patch
* mm-filemap-fix-a-data-race-in-filemap_fault.patch
* mm-swapfile-fix-and-annotate-various-data-races.patch
* mm-swapfile-fix-and-annotate-various-data-races-v2.patch
* mm-page_counter-fix-various-data-races-at-memsw.patch
* mm-memcontrol-fix-a-data-race-in-scan-count.patch
* mm-list_lru-fix-a-data-race-in-list_lru_count_one.patch
* mm-mempool-fix-a-data-race-in-mempool_free.patch
* mm-util-annotate-an-data-race-at-vm_committed_as.patch
* mm-rmap-annotate-a-data-race-at-tlb_flush_batched.patch
* mm-annotate-a-data-race-in-page_zonenum.patch
* mm-swap-annotate-data-races-for-lru_rotate_pvecs.patch
* net-zerocopy-use-vm_insert_pages-for-tcp-rcv-zerocopy.patch
* mm-mmapc-add-more-sanity-checks-to-get_unmapped_area.patch
* mm-mmapc-do-not-allow-mappings-outside-of-allowed-limits.patch
* mm-pass-task-and-mm-to-do_madvise.patch
* mm-introduce-external-memory-hinting-api.patch
* mm-introduce-external-memory-hinting-api-fix.patch
* mm-introduce-external-memory-hinting-api-fix-2.patch
* mm-introduce-external-memory-hinting-api-fix-2-fix.patch
* mm-check-fatal-signal-pending-of-target-process.patch
* pid-move-pidfd_get_pid-function-to-pidc.patch
* mm-support-both-pid-and-pidfd-for-process_madvise.patch
* mm-madvise-allow-ksm-hints-for-remote-api.patch
* mm-support-vector-address-ranges-for-process_madvise.patch
* mm-support-vector-address-ranges-for-process_madvise-fix.patch
* mm-support-vector-address-ranges-for-process_madvise-fix-fix.patch
* mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix.patch
* mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix.patch
* mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch
* mm-remove-duplicated-include-from-madvisec.patch
* mm-expand-documentation-over-__read_mostly.patch
* doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch
* doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch
* fix-read-buffer-overflow-in-delta-ipc.patch
  make-sure-nobodys-leaking-resources.patch
  releasing-resources-with-children.patch
  mutex-subsystem-synchro-test-module.patch
  kernel-forkc-export-kernel_thread-to-modules.patch
  workaround-for-a-pci-restoring-bug.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + kasan-consistently-disable-debugging-features.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (71 preceding siblings ...)
  2020-05-11 22:44 ` mmotm 2020-05-11-15-43 uploaded Andrew Morton
@ 2020-05-12 21:03 ` Andrew Morton
  2020-05-12 21:03 ` + kasan-add-missing-functions-declarations-to-kasanh.patch " Andrew Morton
                   ` (7 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-12 21:03 UTC (permalink / raw)
  To: andreyknvl, aryabinin, dvyukov, glider, leonro, mm-commits


The patch titled
     Subject: kasan: consistently disable debugging features
has been added to the -mm tree.  Its filename is
     kasan-consistently-disable-debugging-features.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kasan-consistently-disable-debugging-features.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kasan-consistently-disable-debugging-features.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: consistently disable debugging features

KASAN is incompatible with some kernel debugging/tracing features. 
There's been multiple patches that disable those feature for some of KASAN
files one by one.  Instead of prolonging that, disable these features for
all KASAN files at once.

Link: http://lkml.kernel.org/r/29bd753d5ff5596425905b0b07f51153e2345cc1.1589297433.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/Makefile |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

--- a/mm/kasan/Makefile~kasan-consistently-disable-debugging-features
+++ a/mm/kasan/Makefile
@@ -1,23 +1,28 @@
 # SPDX-License-Identifier: GPL-2.0
 KASAN_SANITIZE := n
-UBSAN_SANITIZE_common.o := n
-UBSAN_SANITIZE_generic.o := n
-UBSAN_SANITIZE_generic_report.o := n
-UBSAN_SANITIZE_tags.o := n
+UBSAN_SANITIZE := n
 KCOV_INSTRUMENT := n
 
+# Disable ftrace to avoid recursion.
 CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_generic.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_generic_report.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_init.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_quarantine.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_report.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_tags.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_tags_report.o = $(CC_FLAGS_FTRACE)
 
 # Function splitter causes unnecessary splits in __asan_load1/__asan_store1
 # see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63533

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + kasan-add-missing-functions-declarations-to-kasanh.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (72 preceding siblings ...)
  2020-05-12 21:03 ` + kasan-consistently-disable-debugging-features.patch added to -mm tree Andrew Morton
@ 2020-05-12 21:03 ` Andrew Morton
  2020-05-12 21:04 ` + kasan-move-kasan_report-into-reportc.patch " Andrew Morton
                   ` (6 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-12 21:03 UTC (permalink / raw)
  To: andreyknvl, aryabinin, dvyukov, glider, leon, mm-commits


The patch titled
     Subject: kasan: add missing functions declarations to kasan.h
has been added to the -mm tree.  Its filename is
     kasan-add-missing-functions-declarations-to-kasanh.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kasan-add-missing-functions-declarations-to-kasanh.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kasan-add-missing-functions-declarations-to-kasanh.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: add missing functions declarations to kasan.h

KASAN is currently missing declarations for __asan_report* and __hwasan*
functions.  This can lead to compiler warnings.

Link: http://lkml.kernel.org/r/45b445a76a79208918f0cc44bfabebaea909b54d.1589297433.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Leon Romanovsky <leon@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/kasan.h |   34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

--- a/mm/kasan/kasan.h~kasan-add-missing-functions-declarations-to-kasanh
+++ a/mm/kasan/kasan.h
@@ -212,8 +212,6 @@ static inline const void *arch_kasan_set
 asmlinkage void kasan_unpoison_task_stack_below(const void *watermark);
 void __asan_register_globals(struct kasan_global *globals, size_t size);
 void __asan_unregister_globals(struct kasan_global *globals, size_t size);
-void __asan_loadN(unsigned long addr, size_t size);
-void __asan_storeN(unsigned long addr, size_t size);
 void __asan_handle_no_return(void);
 void __asan_alloca_poison(unsigned long addr, size_t size);
 void __asan_allocas_unpoison(const void *stack_top, const void *stack_bottom);
@@ -228,6 +226,8 @@ void __asan_load8(unsigned long addr);
 void __asan_store8(unsigned long addr);
 void __asan_load16(unsigned long addr);
 void __asan_store16(unsigned long addr);
+void __asan_loadN(unsigned long addr, size_t size);
+void __asan_storeN(unsigned long addr, size_t size);
 
 void __asan_load1_noabort(unsigned long addr);
 void __asan_store1_noabort(unsigned long addr);
@@ -239,6 +239,21 @@ void __asan_load8_noabort(unsigned long
 void __asan_store8_noabort(unsigned long addr);
 void __asan_load16_noabort(unsigned long addr);
 void __asan_store16_noabort(unsigned long addr);
+void __asan_loadN_noabort(unsigned long addr, size_t size);
+void __asan_storeN_noabort(unsigned long addr, size_t size);
+
+void __asan_report_load1_noabort(unsigned long addr);
+void __asan_report_store1_noabort(unsigned long addr);
+void __asan_report_load2_noabort(unsigned long addr);
+void __asan_report_store2_noabort(unsigned long addr);
+void __asan_report_load4_noabort(unsigned long addr);
+void __asan_report_store4_noabort(unsigned long addr);
+void __asan_report_load8_noabort(unsigned long addr);
+void __asan_report_store8_noabort(unsigned long addr);
+void __asan_report_load16_noabort(unsigned long addr);
+void __asan_report_store16_noabort(unsigned long addr);
+void __asan_report_load_n_noabort(unsigned long addr, size_t size);
+void __asan_report_store_n_noabort(unsigned long addr, size_t size);
 
 void __asan_set_shadow_00(const void *addr, size_t size);
 void __asan_set_shadow_f1(const void *addr, size_t size);
@@ -247,4 +262,19 @@ void __asan_set_shadow_f3(const void *ad
 void __asan_set_shadow_f5(const void *addr, size_t size);
 void __asan_set_shadow_f8(const void *addr, size_t size);
 
+void __hwasan_load1_noabort(unsigned long addr);
+void __hwasan_store1_noabort(unsigned long addr);
+void __hwasan_load2_noabort(unsigned long addr);
+void __hwasan_store2_noabort(unsigned long addr);
+void __hwasan_load4_noabort(unsigned long addr);
+void __hwasan_store4_noabort(unsigned long addr);
+void __hwasan_load8_noabort(unsigned long addr);
+void __hwasan_store8_noabort(unsigned long addr);
+void __hwasan_load16_noabort(unsigned long addr);
+void __hwasan_store16_noabort(unsigned long addr);
+void __hwasan_loadN_noabort(unsigned long addr, size_t size);
+void __hwasan_storeN_noabort(unsigned long addr, size_t size);
+
+void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size);
+
 #endif
_

Patches currently in -mm which might be from andreyknvl@google.com are

kcov-cleanup-debug-messages.patch
kcov-fix-potential-use-after-free-in-kcov_remote_start.patch
kcov-move-t-kcov-assignments-into-kcov_start-stop.patch
kcov-move-t-kcov_sequence-assignment.patch
kcov-use-t-kcov_mode-as-enabled-indicator.patch
kcov-collect-coverage-from-interrupts.patch
usb-core-kcov-collect-coverage-from-usb-complete-callback.patch
kasan-consistently-disable-debugging-features.patch
kasan-add-missing-functions-declarations-to-kasanh.patch
kasan-move-kasan_report-into-reportc.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + kasan-move-kasan_report-into-reportc.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (73 preceding siblings ...)
  2020-05-12 21:03 ` + kasan-add-missing-functions-declarations-to-kasanh.patch " Andrew Morton
@ 2020-05-12 21:04 ` Andrew Morton
  2020-05-13 18:31 ` + mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch " Andrew Morton
                   ` (5 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-12 21:04 UTC (permalink / raw)
  To: andreyknvl, aryabinin, dvyukov, glider, leon, leonro, mm-commits


The patch titled
     Subject: kasan: move kasan_report() into report.c
has been added to the -mm tree.  Its filename is
     kasan-move-kasan_report-into-reportc.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kasan-move-kasan_report-into-reportc.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kasan-move-kasan_report-into-reportc.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: move kasan_report() into report.c

The kasan_report() functions belongs to report.c, as it's a common
functions that does error reporting.

Link: http://lkml.kernel.org/r/78a81fde6eeda9db72a7fd55fbc33173a515e4b1.1589297433.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Leon Romanovsky <leon@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/kasan/common.c |   19 -------------------
 mm/kasan/report.c |   22 ++++++++++++++++++++--
 2 files changed, 20 insertions(+), 21 deletions(-)

--- a/mm/kasan/common.c~kasan-move-kasan_report-into-reportc
+++ a/mm/kasan/common.c
@@ -33,7 +33,6 @@
 #include <linux/types.h>
 #include <linux/vmalloc.h>
 #include <linux/bug.h>
-#include <linux/uaccess.h>
 
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
@@ -613,24 +612,6 @@ void kasan_free_shadow(const struct vm_s
 }
 #endif
 
-extern void __kasan_report(unsigned long addr, size_t size, bool is_write, unsigned long ip);
-extern bool report_enabled(void);
-
-bool kasan_report(unsigned long addr, size_t size, bool is_write, unsigned long ip)
-{
-	unsigned long flags = user_access_save();
-	bool ret = false;
-
-	if (likely(report_enabled())) {
-		__kasan_report(addr, size, is_write, ip);
-		ret = true;
-	}
-
-	user_access_restore(flags);
-
-	return ret;
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static bool shadow_mapped(unsigned long addr)
 {
--- a/mm/kasan/report.c~kasan-move-kasan_report-into-reportc
+++ a/mm/kasan/report.c
@@ -29,6 +29,7 @@
 #include <linux/kasan.h>
 #include <linux/module.h>
 #include <linux/sched/task_stack.h>
+#include <linux/uaccess.h>
 
 #include <asm/sections.h>
 
@@ -454,7 +455,7 @@ static void print_shadow_for_address(con
 	}
 }
 
-bool report_enabled(void)
+static bool report_enabled(void)
 {
 	if (current->kasan_depth)
 		return false;
@@ -479,7 +480,8 @@ void kasan_report_invalid_free(void *obj
 	end_report(&flags);
 }
 
-void __kasan_report(unsigned long addr, size_t size, bool is_write, unsigned long ip)
+static void __kasan_report(unsigned long addr, size_t size, bool is_write,
+				unsigned long ip)
 {
 	struct kasan_access_info info;
 	void *tagged_addr;
@@ -518,6 +520,22 @@ void __kasan_report(unsigned long addr,
 	end_report(&flags);
 }
 
+bool kasan_report(unsigned long addr, size_t size, bool is_write,
+			unsigned long ip)
+{
+	unsigned long flags = user_access_save();
+	bool ret = false;
+
+	if (likely(report_enabled())) {
+		__kasan_report(addr, size, is_write, ip);
+		ret = true;
+	}
+
+	user_access_restore(flags);
+
+	return ret;
+}
+
 #ifdef CONFIG_KASAN_INLINE
 /*
  * With CONFIG_KASAN_INLINE, accesses to bogus pointers (outside the high
_

Patches currently in -mm which might be from andreyknvl@google.com are

kcov-cleanup-debug-messages.patch
kcov-fix-potential-use-after-free-in-kcov_remote_start.patch
kcov-move-t-kcov-assignments-into-kcov_start-stop.patch
kcov-move-t-kcov_sequence-assignment.patch
kcov-use-t-kcov_mode-as-enabled-indicator.patch
kcov-collect-coverage-from-interrupts.patch
usb-core-kcov-collect-coverage-from-usb-complete-callback.patch
kasan-consistently-disable-debugging-features.patch
kasan-add-missing-functions-declarations-to-kasanh.patch
kasan-move-kasan_report-into-reportc.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (74 preceding siblings ...)
  2020-05-12 21:04 ` + kasan-move-kasan_report-into-reportc.patch " Andrew Morton
@ 2020-05-13 18:31 ` Andrew Morton
  2020-05-13 20:56 ` + kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch " Andrew Morton
                   ` (4 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-13 18:31 UTC (permalink / raw)
  To: khlebnikov, mgorman, minchan, mm-commits, rientjes, vbabka


The patch titled
     Subject: mm/compaction: avoid VM_BUG_ON(PageSlab()) in page_mapcount()
has been added to the -mm tree.  Its filename is
     mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Subject: mm/compaction: avoid VM_BUG_ON(PageSlab()) in page_mapcount()

isolate_migratepages_block() runs some checks out of lru_lock when
choosing pages for migration.  After checking PageLRU() it checks extra
page references by comparing page_count() and page_mapcount().  Between
these two checks page could be removed from lru, freed and taken by slab.

As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). 
Race window is tiny.  For certain workload this happens around once a
year.

 page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0
 flags: 0x500000000008100(slab|head)
 raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180
 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
 page dumped because: VM_BUG_ON_PAGE(PageSlab(page))
 ------------[ cut here ]------------
 kernel BUG at ./include/linux/mm.h:628!
 invalid opcode: 0000 [#1] SMP NOPTI
 CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G        W         4.19.109-27 #1
 Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019
 RIP: 0010:isolate_migratepages_block+0x986/0x9b0

To fix just opencode page_mapcount() in racy check for 0-order case and
recheck carefully under lru_lock when page cannot escape from lru.

Also add checking extra references for file pages and swap cache.

Link: http://lkml.kernel.org/r/158937872515.474360.5066096871639561424.stgit@buzz
Fixes: 119d6d59dcc0 ("mm, compaction: avoid isolating pinned pages")
Fixes: 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/compaction.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

--- a/mm/compaction.c~mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount
+++ a/mm/compaction.c
@@ -935,12 +935,16 @@ isolate_migratepages_block(struct compac
 		}
 
 		/*
-		 * Migration will fail if an anonymous page is pinned in memory,
+		 * Migration will fail if an page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an
-		 * admittedly racy check.
+		 * admittedly racy check simplest case for 0-order pages.
+		 *
+		 * Open code page_mapcount() to avoid VM_BUG_ON(PageSlab(page)).
+		 * Page could have extra reference from mapping or swap cache.
 		 */
-		if (!page_mapping(page) &&
-		    page_count(page) > page_mapcount(page))
+		if (!PageCompound(page) &&
+		    page_count(page) > atomic_read(&page->_mapcount) + 1 +
+				(!PageAnon(page) || PageSwapCache(page)))
 			goto isolate_fail;
 
 		/*
@@ -975,6 +979,11 @@ isolate_migratepages_block(struct compac
 				low_pfn += compound_nr(page) - 1;
 				goto isolate_fail;
 			}
+
+			/* Recheck page extra references under lock */
+			if (page_count(page) > page_mapcount(page) +
+				    (!PageAnon(page) || PageSwapCache(page)))
+				goto isolate_fail;
 		}
 
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
_

Patches currently in -mm which might be from khlebnikov@yandex-team.ru are

mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch
kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (75 preceding siblings ...)
  2020-05-13 18:31 ` + mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch " Andrew Morton
@ 2020-05-13 20:56 ` Andrew Morton
  2020-05-13 21:26 ` + vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch " Andrew Morton
                   ` (3 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-13 20:56 UTC (permalink / raw)
  To: aquini, keescook, mcgrof, mm-commits, tytso, yzaikin


The patch titled
     Subject: kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted
has been added to the -mm tree.  Its filename is
     kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Rafael Aquini <aquini@redhat.com>
Subject: kernel/sysctl.c: ignore out-of-range taint bits introduced via kernel.tainted

Users with SYS_ADMIN capability can add arbitrary taint flags to the
running kernel by writing to /proc/sys/kernel/tainted or issuing the
command 'sysctl -w kernel.tainted=...'.  This interface, however, is open
for any integer value and this might cause an invalid set of flags being
committed to the tainted_mask bitset.

This patch introduces a simple way for proc_taint() to ignore any eventual
invalid bit coming from the user input before committing those bits to the
kernel tainted_mask.

Link: http://lkml.kernel.org/r/20200512223946.888020-1-aquini@redhat.com
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/sysctl.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/kernel/sysctl.c~kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted
+++ a/kernel/sysctl.c
@@ -870,10 +870,9 @@ static int proc_taint(struct ctl_table *
 		 * to everyone's atomic.h for this
 		 */
 		int i;
-		for (i = 0; i < BITS_PER_LONG && tmptaint >> i; i++) {
-			if ((tmptaint >> i) & 1)
+		for (i = 0; i < TAINT_FLAGS_COUNT; i++)
+			if ((1UL << i) & tmptaint)
 				add_taint(i, LOCKDEP_STILL_OK);
-		}
 	}
 
 	return err;
_

Patches currently in -mm which might be from aquini@redhat.com are

mm-slub-add-panic_on_error-to-the-debug-facilities.patch
kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (76 preceding siblings ...)
  2020-05-13 20:56 ` + kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch " Andrew Morton
@ 2020-05-13 21:26 ` Andrew Morton
  2020-05-13 22:00 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch " Andrew Morton
                   ` (2 subsequent siblings)
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-13 21:26 UTC (permalink / raw)
  To: david, guro, hannes, laoar.shao, mhocko, mm-commits, viro


The patch titled
     Subject: Re: vfs: keep inodes with page cache off the inode shrinker LRU
has been added to the -mm tree.  Its filename is
     vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: vfs: keep inodes with page cache off the inode shrinker LRU

The VFS inode shrinker is currently allowed to reclaim cold inodes with
populated page cache.  This behavior goes back to CONFIG_HIGHMEM setups,
which required the ability to drop page cache in large highem zones to
free up struct inodes in comparatively tiny lowmem zones.

However, it has significant side effects that are hard to justify on
systems without highmem:

- It can drop gigabytes of hot and active page cache on the floor
  without consulting the VM (recorded as "inodesteal" events in
  /proc/vmstat).  Such an "aging inversion" between unreferenced inodes
  holding hot cache easily happens in practice: for example, a git tree
  whose objects are accessed frequently but no open file descriptors are
  maintained throughout.

- It can als prematurely drop non-resident info stored inside the
  inodes, which can let massive cache thrashing go unnoticed.  This in
  turn means we're making the wrong decisions in page reclaim and can get
  stuck in pathological thrashing.  The thrashing also doesn't show up as
  memory pressure in psi, causing failure in the userspace OOM detection
  chain, which can leave a system (or Android device e.g.) stranded in a
  livelock.

History

As this keeps causing problems for people, there have been several
attempts to address this.

One recent attempt was to make the inode shrinker simply skip over inodes
that still contain pages: a76cf1a474d7 ("mm: don't reclaim inodes with
many attached pages").

However, this change had to be reverted in 69056ee6a8a3 ("Revert "mm:
don't reclaim inodes with many attached pages"") because it caused severe
reclaim performance problems: Inodes that sit on the shrinker LRU are
attracting reclaim pressure away from the page cache and toward the VFS. 
If we then permanently exempt sizable portions of this pool from actually
getting reclaimed when looked at, this pressure accumulates as deferred
shrinker work (a mechanism for *temporarily* unreclaimable objects) until
it causes mayhem in the VFS cache pools.

In the bug quoted in 69056ee6a8a3 in particular, the excessive pressure
drove the XFS shrinker into dirty objects, where it caused synchronous,
IO-bound stalls, even as there was plenty of clean page cache that should
have been reclaimed instead.

Another variant of this problem was recently observed, where the kernel
violates cgroups' memory.low protection settings and reclaims page cache
way beyond the configured thresholds.  It was followed by a proposal of a
modified form of the reverted commit above, that implements
memory.low-sensitive shrinker skipping over populated inodes on the LRU
[1].  However, this proposal continues to run the risk of attracting
disproportionate reclaim pressure to a pool of still-used inodes, while
not addressing the more generic reclaim inversion problem outside of a
very specific cgroup application.

[1] https://lore.kernel.org/linux-mm/1578499437-1664-1-git-send-email-laoar.shao@gmail.com/

Solution

This patch fixes the aging inversion described above on !CONFIG_HIGHMEM
systems, without reintroducing the problems associated with excessive
shrinker LRU rotations, by keeping populated inodes off the shrinker LRUs
entirely.

Currently, inodes are kept off the shrinker LRU as long as they have an
elevated i_count, indicating an active user.  Unfortunately, the page
cache cannot simply hold an i_count reference, because unlink() *should*
result in the inode being dropped and its cache invalidated.

Instead, this patch makes iput_final() consult the state of the page cache
and punt the LRU linking to the VM if the inode is still populated; the VM
in turn checks the inode state when it depopulates the page cache, and
adds the inode to the LRU if necessary.

This is not unlike what we do for dirty inodes, which are moved off the
LRU permanently until writeback completion puts them back on (iff still
unused).  We can reuse the same code -- inode_add_lru() - here.

This is also not unlike page reclaim, where the lower VM layer has to
negotiate state with the higher VFS layer.  Follow existing precedence and
handle the inversion as much as possible on the VM side:

- introduce an I_PAGES flag that the VM maintains under the i_lock, so
  that any inode code holding that lock can check the page cache state
  without having to lock and inspect the struct address_space

- introduce inode_pages_set() and inode_pages_clear() to maintain the
  inode LRU state from the VM side, then update all cache mutators to use
  them when populating the first cache entry or clearing the last

With this, the concept of "inodesteal" - where the inode shrinker drops
page cache - is relegated to CONFIG_HIGHMEM systems only.  The VM is in
charge of the cache, the shrinker in charge of struct inode.

Footnotes

- For debuggability, add vmstat counters that track the number of times
  a new cache entry pulls a previously unused inode off the LRU
  (pginoderescue), as well as how many times existing cache deferred an
  LRU addition.  Keep the pginodesteal/kswapd_inodesteal counters for
  backwards compatibility, but they'll just show 0 now.

- Fix /proc/sys/vm/drop_caches to drop shadow entries from the page
  cache.  Not doing so has always been a bit strange, but since most
  people drop cache and metadata cache together, the inode shrinker would
  have taken care of them before - no more, so do it VM-side.

Link: http://lkml.kernel.org/r/20200512212936.GA450429@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/block_dev.c                |    2 
 fs/dax.c                      |   14 ++++
 fs/drop_caches.c              |    2 
 fs/inode.c                    |  112 +++++++++++++++++++++++++++++---
 fs/internal.h                 |    2 
 include/linux/fs.h            |   17 ++++
 include/linux/pagemap.h       |    2 
 include/linux/vm_event_item.h |    3 
 mm/filemap.c                  |   37 ++++++++--
 mm/huge_memory.c              |    3 
 mm/truncate.c                 |   34 +++++++--
 mm/vmscan.c                   |    6 +
 mm/vmstat.c                   |    4 -
 mm/workingset.c               |    4 +
 14 files changed, 208 insertions(+), 34 deletions(-)

--- a/fs/block_dev.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/fs/block_dev.c
@@ -79,7 +79,7 @@ void kill_bdev(struct block_device *bdev
 {
 	struct address_space *mapping = bdev->bd_inode->i_mapping;
 
-	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
+	if (mapping_empty(mapping))
 		return;
 
 	invalidate_bh_lrus();
--- a/fs/dax.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/fs/dax.c
@@ -478,9 +478,11 @@ static void *grab_mapping_entry(struct x
 {
 	unsigned long index = xas->xa_index;
 	bool pmd_downgrade = false; /* splitting PMD entry into PTE entries? */
+	int populated;
 	void *entry;
 
 retry:
+	populated = 0;
 	xas_lock_irq(xas);
 	entry = get_unlocked_entry(xas, order);
 
@@ -526,6 +528,8 @@ retry:
 		xas_store(xas, NULL);	/* undo the PMD join */
 		dax_wake_entry(xas, entry, true);
 		mapping->nrexceptional--;
+		if (mapping_empty(mapping))
+			populated = -1;
 		entry = NULL;
 		xas_set(xas, index);
 	}
@@ -541,11 +545,17 @@ retry:
 		dax_lock_entry(xas, entry);
 		if (xas_error(xas))
 			goto out_unlock;
+		if (mapping_empty(mapping))
+			populated++;
 		mapping->nrexceptional++;
 	}
 
 out_unlock:
 	xas_unlock_irq(xas);
+	if (populated == -1)
+		inode_pages_clear(mapping->inode);
+	else if (populated == 1)
+		inode_pages_set(mapping->inode);
 	if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM))
 		goto retry;
 	if (xas->xa_node == XA_ERROR(-ENOMEM))
@@ -631,6 +641,7 @@ static int __dax_invalidate_entry(struct
 					  pgoff_t index, bool trunc)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
+	bool empty = false;
 	int ret = 0;
 	void *entry;
 
@@ -645,10 +656,13 @@ static int __dax_invalidate_entry(struct
 	dax_disassociate_entry(entry, mapping, trunc);
 	xas_store(&xas, NULL);
 	mapping->nrexceptional--;
+	empty = mapping_empty(mapping);
 	ret = 1;
 out:
 	put_unlocked_entry(&xas, entry);
 	xas_unlock_irq(&xas);
+	if (empty)
+		inode_pages_clear(mapping->host);
 	return ret;
 }
 
--- a/fs/drop_caches.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/fs/drop_caches.c
@@ -27,7 +27,7 @@ static void drop_pagecache_sb(struct sup
 		 * we need to reschedule to avoid softlockups.
 		 */
 		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
-		    (inode->i_mapping->nrpages == 0 && !need_resched())) {
+		    (mapping_empty(inode->i_mapping) && !need_resched())) {
 			spin_unlock(&inode->i_lock);
 			continue;
 		}
--- a/fs/inode.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/fs/inode.c
@@ -431,26 +431,101 @@ static void inode_lru_list_add(struct in
 		inode->i_state |= I_REFERENCED;
 }
 
+static void inode_lru_list_del(struct inode *inode)
+{
+	if (list_lru_del(&inode->i_sb->s_inode_lru, &inode->i_lru))
+		this_cpu_dec(nr_unused);
+}
+
 /*
  * Add inode to LRU if needed (inode is unused and clean).
  *
  * Needs inode->i_lock held.
  */
-void inode_add_lru(struct inode *inode)
+bool inode_add_lru(struct inode *inode)
 {
-	if (!(inode->i_state & (I_DIRTY_ALL | I_SYNC |
-				I_FREEING | I_WILL_FREE)) &&
-	    !atomic_read(&inode->i_count) && inode->i_sb->s_flags & SB_ACTIVE)
-		inode_lru_list_add(inode);
+	if (inode->i_state &
+	    (I_DIRTY_ALL | I_SYNC | I_FREEING | I_WILL_FREE | I_PAGES))
+		return false;
+	if (atomic_read(&inode->i_count))
+		return false;
+	if (!(inode->i_sb->s_flags & SB_ACTIVE))
+		return false;
+	inode_lru_list_add(inode);
+	return true;
 }
 
+/*
+ * Usually, inodes become reclaimable when they are no longer
+ * referenced and their page cache has been reclaimed. The following
+ * API allows the VM to communicate cache population state to the VFS.
+ *
+ * However, on CONFIG_HIGHMEM we can't wait for the page cache to go
+ * away: cache pages allocated in a large highmem zone could pin
+ * struct inode memory allocated in relatively small lowmem zones. So
+ * when CONFIG_HIGHMEM is enabled, we tie cache to the inode lifetime.
+ */
 
-static void inode_lru_list_del(struct inode *inode)
+#ifndef CONFIG_HIGHMEM
+/**
+ * inode_pages_set - mark the inode as holding page cache
+ * @inode: the inode whose first cache page was just added
+ *
+ * Tell the VFS that this inode has populated page cache and must not
+ * be reclaimed by the inode shrinker.
+ *
+ * The caller must hold the page lock of the just-added page: by
+ * pinning the page, the page cache cannot become depopulated, and we
+ * can safely set I_PAGES without a race check under the i_pages lock.
+ *
+ * This function acquires the i_lock.
+ */
+void inode_pages_set(struct inode *inode)
 {
+	spin_lock(&inode->i_lock);
+	if (!(inode->i_state & I_PAGES)) {
+		inode->i_state |= I_PAGES;
+		if (!list_empty(&inode->i_lru)) {
+			count_vm_event(PGINODERESCUE);
+			inode_lru_list_del(inode);
+		}
+	}
+	spin_unlock(&inode->i_lock);
+}
 
-	if (list_lru_del(&inode->i_sb->s_inode_lru, &inode->i_lru))
-		this_cpu_dec(nr_unused);
+/**
+ * inode_pages_clear - mark the inode as not holding page cache
+ * @inode: the inode whose last cache page was just removed
+ *
+ * Tell the VFS that the inode no longer holds page cache and that its
+ * lifetime is to be handed over to the inode shrinker LRU.
+ *
+ * This function acquires the i_lock and the i_pages lock.
+ */
+void inode_pages_clear(struct inode *inode)
+{
+	struct address_space *mapping = &inode->i_data;
+	bool add_to_lru = false;
+	unsigned long flags;
+
+	spin_lock(&inode->i_lock);
+
+	xa_lock_irqsave(&mapping->i_pages, flags);
+	if ((inode->i_state & I_PAGES) && mapping_empty(mapping)) {
+		inode->i_state &= ~I_PAGES;
+		add_to_lru = true;
+	}
+	xa_unlock_irqrestore(&mapping->i_pages, flags);
+
+	if (add_to_lru) {
+		WARN_ON_ONCE(!list_empty(&inode->i_lru));
+		if (inode_add_lru(inode))
+			__count_vm_event(PGINODEDELAYED);
+	}
+
+	spin_unlock(&inode->i_lock);
 }
+#endif /* !CONFIG_HIGHMEM */
 
 /**
  * inode_sb_list_add - add inode to the superblock list of inodes
@@ -743,6 +818,8 @@ static enum lru_status inode_lru_isolate
 	if (!spin_trylock(&inode->i_lock))
 		return LRU_SKIP;
 
+	WARN_ON_ONCE(inode->i_state & I_PAGES);
+
 	/*
 	 * Referenced or dirty inodes are still in use. Give them another pass
 	 * through the LRU as we canot reclaim them now.
@@ -762,7 +839,18 @@ static enum lru_status inode_lru_isolate
 		return LRU_ROTATE;
 	}
 
-	if (inode_has_buffers(inode) || inode->i_data.nrpages) {
+	/*
+	 * Usually, populated inodes shouldn't be on the shrinker LRU,
+	 * but they can be briefly visible when a new page is added to
+	 * an inode that was already linked but inode_pages_set()
+	 * hasn't run yet to move them off.
+	 *
+	 * The other exception is on HIGHMEM systems: highmem cache
+	 * can pin lowmem struct inodes, and we might be in dire
+	 * straits in the lower zones. Purge cache to free the inode.
+	 */
+	if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) {
+#ifdef CONFIG_HIGHMEM
 		__iget(inode);
 		spin_unlock(&inode->i_lock);
 		spin_unlock(lru_lock);
@@ -779,6 +867,12 @@ static enum lru_status inode_lru_isolate
 		iput(inode);
 		spin_lock(lru_lock);
 		return LRU_RETRY;
+#else
+		list_lru_isolate(lru, &inode->i_lru);
+		spin_unlock(&inode->i_lock);
+		this_cpu_dec(nr_unused);
+		return LRU_REMOVED;
+#endif
 	}
 
 	WARN_ON(inode->i_state & I_NEW);
--- a/fs/internal.h~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/fs/internal.h
@@ -137,7 +137,7 @@ extern int vfs_open(const struct path *,
  * inode.c
  */
 extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
-extern void inode_add_lru(struct inode *inode);
+extern bool inode_add_lru(struct inode *inode);
 extern int dentry_needs_remove_privs(struct dentry *dentry);
 
 /*
--- a/include/linux/fs.h~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/include/linux/fs.h
@@ -592,6 +592,11 @@ static inline void mapping_allow_writabl
 	atomic_inc(&mapping->i_mmap_writable);
 }
 
+static inline bool mapping_empty(struct address_space *mapping)
+{
+	return mapping->nrpages + mapping->nrexceptional == 0;
+}
+
 /*
  * Use sequence counter to get consistent i_size on 32-bit processors.
  */
@@ -2162,6 +2167,9 @@ static inline void kiocb_clone(struct ki
  *
  * I_CREATING		New object's inode in the middle of setting up.
  *
+ * I_PAGES		Inode is holding page cache that needs to get reclaimed
+ *			first before the inode can go onto the shrinker LRU.
+ *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -2184,6 +2192,7 @@ static inline void kiocb_clone(struct ki
 #define I_WB_SWITCH		(1 << 13)
 #define I_OVL_INUSE		(1 << 14)
 #define I_CREATING		(1 << 15)
+#define I_PAGES			(1 << 16)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3123,6 +3132,14 @@ static inline void remove_inode_hash(str
 		__remove_inode_hash(inode);
 }
 
+#ifndef CONFIG_HIGHMEM
+extern void inode_pages_set(struct inode *inode);
+extern void inode_pages_clear(struct inode *inode);
+#else
+static inline void inode_pages_set(struct inode *inode) {}
+static inline void inode_pages_clear(struct inode *inode) {}
+#endif
+
 extern void inode_sb_list_add(struct inode *inode);
 
 #ifdef CONFIG_BLOCK
--- a/include/linux/pagemap.h~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/include/linux/pagemap.h
@@ -613,7 +613,7 @@ int add_to_page_cache_locked(struct page
 int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
 				pgoff_t index, gfp_t gfp_mask);
 extern void delete_from_page_cache(struct page *page);
-extern void __delete_from_page_cache(struct page *page, void *shadow);
+extern bool __delete_from_page_cache(struct page *page, void *shadow);
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct pagevec *pvec);
--- a/include/linux/vm_event_item.h~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/include/linux/vm_event_item.h
@@ -38,7 +38,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 #ifdef CONFIG_NUMA
 		PGSCAN_ZONE_RECLAIM_FAILED,
 #endif
-		PGINODESTEAL, SLABS_SCANNED, KSWAPD_INODESTEAL,
+		SLABS_SCANNED,
+		PGINODESTEAL, KSWAPD_INODESTEAL, PGINODERESCUE, PGINODEDELAYED,
 		KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
 		PAGEOUTRUN, PGROTATED,
 		DROP_PAGECACHE, DROP_SLAB,
--- a/mm/filemap.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/mm/filemap.c
@@ -116,8 +116,8 @@
  *   ->tasklist_lock            (memory_failure, collect_procs_ao)
  */
 
-static void page_cache_delete(struct address_space *mapping,
-				   struct page *page, void *shadow)
+static bool __must_check page_cache_delete(struct address_space *mapping,
+					   struct page *page, void *shadow)
 {
 	XA_STATE(xas, &mapping->i_pages, page->index);
 	unsigned int nr = 1;
@@ -151,6 +151,8 @@ static void page_cache_delete(struct add
 		smp_wmb();
 	}
 	mapping->nrpages -= nr;
+
+	return mapping_empty(mapping);
 }
 
 static void unaccount_page_cache_page(struct address_space *mapping,
@@ -227,15 +229,18 @@ static void unaccount_page_cache_page(st
  * Delete a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
  * is safe.  The caller must hold the i_pages lock.
+ *
+ * If this returns true, the caller must call inode_pages_clear()
+ * after dropping the i_pages lock.
  */
-void __delete_from_page_cache(struct page *page, void *shadow)
+bool __must_check __delete_from_page_cache(struct page *page, void *shadow)
 {
 	struct address_space *mapping = page->mapping;
 
 	trace_mm_filemap_delete_from_page_cache(page);
 
 	unaccount_page_cache_page(mapping, page);
-	page_cache_delete(mapping, page, shadow);
+	return page_cache_delete(mapping, page, shadow);
 }
 
 static void page_cache_free_page(struct address_space *mapping,
@@ -267,12 +272,16 @@ void delete_from_page_cache(struct page
 {
 	struct address_space *mapping = page_mapping(page);
 	unsigned long flags;
+	bool empty;
 
 	BUG_ON(!PageLocked(page));
 	xa_lock_irqsave(&mapping->i_pages, flags);
-	__delete_from_page_cache(page, NULL);
+	empty = __delete_from_page_cache(page, NULL);
 	xa_unlock_irqrestore(&mapping->i_pages, flags);
 
+	if (empty)
+		inode_pages_clear(mapping->host);
+
 	page_cache_free_page(mapping, page);
 }
 EXPORT_SYMBOL(delete_from_page_cache);
@@ -291,8 +300,8 @@ EXPORT_SYMBOL(delete_from_page_cache);
  *
  * The function expects the i_pages lock to be held.
  */
-static void page_cache_delete_batch(struct address_space *mapping,
-			     struct pagevec *pvec)
+static bool __must_check page_cache_delete_batch(struct address_space *mapping,
+						 struct pagevec *pvec)
 {
 	XA_STATE(xas, &mapping->i_pages, pvec->pages[0]->index);
 	int total_pages = 0;
@@ -337,12 +346,15 @@ static void page_cache_delete_batch(stru
 		total_pages++;
 	}
 	mapping->nrpages -= total_pages;
+
+	return mapping_empty(mapping);
 }
 
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct pagevec *pvec)
 {
 	int i;
+	bool empty;
 	unsigned long flags;
 
 	if (!pagevec_count(pvec))
@@ -354,9 +366,12 @@ void delete_from_page_cache_batch(struct
 
 		unaccount_page_cache_page(mapping, pvec->pages[i]);
 	}
-	page_cache_delete_batch(mapping, pvec);
+	empty = page_cache_delete_batch(mapping, pvec);
 	xa_unlock_irqrestore(&mapping->i_pages, flags);
 
+	if (empty)
+		inode_pages_clear(mapping->host);
+
 	for (i = 0; i < pagevec_count(pvec); i++)
 		page_cache_free_page(mapping, pvec->pages[i]);
 }
@@ -833,6 +848,7 @@ static int __add_to_page_cache_locked(st
 	XA_STATE(xas, &mapping->i_pages, offset);
 	int huge = PageHuge(page);
 	struct mem_cgroup *memcg;
+	bool populated = false;
 	int error;
 	void *old;
 
@@ -860,6 +876,7 @@ static int __add_to_page_cache_locked(st
 		if (xas_error(&xas))
 			goto unlock;
 
+		populated = mapping_empty(mapping);
 		if (xa_is_value(old)) {
 			mapping->nrexceptional--;
 			if (shadowp)
@@ -880,6 +897,10 @@ unlock:
 	if (!huge)
 		mem_cgroup_commit_charge(page, memcg, false, false);
 	trace_mm_filemap_add_to_page_cache(page);
+
+	if (populated)
+		inode_pages_set(mapping->host);
+
 	return 0;
 error:
 	page->mapping = NULL;
--- a/mm/huge_memory.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/mm/huge_memory.c
@@ -2608,7 +2608,8 @@ static void __split_huge_page(struct pag
 		/* Some pages can be beyond i_size: drop them from page cache */
 		if (head[i].index >= end) {
 			ClearPageDirty(head + i);
-			__delete_from_page_cache(head + i, NULL);
+			/* We know we're not removing the last page */
+			(void)__delete_from_page_cache(head + i, NULL);
 			if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head))
 				shmem_uncharge(head->mapping->host, 1);
 			put_page(head + i);
--- a/mm/truncate.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/mm/truncate.c
@@ -31,24 +31,31 @@
  * itself locked.  These unlocked entries need verification under the tree
  * lock.
  */
-static inline void __clear_shadow_entry(struct address_space *mapping,
-				pgoff_t index, void *entry)
+static bool __must_check __clear_shadow_entry(struct address_space *mapping,
+					      pgoff_t index, void *entry)
 {
 	XA_STATE(xas, &mapping->i_pages, index);
 
 	xas_set_update(&xas, workingset_update_node);
 	if (xas_load(&xas) != entry)
-		return;
+		return 0;
 	xas_store(&xas, NULL);
 	mapping->nrexceptional--;
+
+	return mapping_empty(mapping);
 }
 
 static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
 			       void *entry)
 {
+	bool empty;
+
 	xa_lock_irq(&mapping->i_pages);
-	__clear_shadow_entry(mapping, index, entry);
+	empty = __clear_shadow_entry(mapping, index, entry);
 	xa_unlock_irq(&mapping->i_pages);
+
+	if (empty)
+		inode_pages_clear(mapping->host);
 }
 
 /*
@@ -61,7 +68,7 @@ static void truncate_exceptional_pvec_en
 				pgoff_t end)
 {
 	int i, j;
-	bool dax, lock;
+	bool dax, lock, empty = false;
 
 	/* Handled by shmem itself */
 	if (shmem_mapping(mapping))
@@ -96,11 +103,16 @@ static void truncate_exceptional_pvec_en
 			continue;
 		}
 
-		__clear_shadow_entry(mapping, index, page);
+		if (__clear_shadow_entry(mapping, index, page))
+			empty = true;
 	}
 
 	if (lock)
 		xa_unlock_irq(&mapping->i_pages);
+
+	if (empty)
+		inode_pages_clear(mapping->host);
+
 	pvec->nr = j;
 }
 
@@ -300,7 +312,7 @@ void truncate_inode_pages_range(struct a
 	pgoff_t		index;
 	int		i;
 
-	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
+	if (mapping_empty(mapping))
 		goto out;
 
 	/* Offsets within partial pages */
@@ -636,6 +648,7 @@ static int
 invalidate_complete_page2(struct address_space *mapping, struct page *page)
 {
 	unsigned long flags;
+	bool empty;
 
 	if (page->mapping != mapping)
 		return 0;
@@ -648,9 +661,12 @@ invalidate_complete_page2(struct address
 		goto failed;
 
 	BUG_ON(page_has_private(page));
-	__delete_from_page_cache(page, NULL);
+	empty = __delete_from_page_cache(page, NULL);
 	xa_unlock_irqrestore(&mapping->i_pages, flags);
 
+	if (empty)
+		inode_pages_clear(mapping->host);
+
 	if (mapping->a_ops->freepage)
 		mapping->a_ops->freepage(page);
 
@@ -692,7 +708,7 @@ int invalidate_inode_pages2_range(struct
 	int ret2 = 0;
 	int did_range_unmap = 0;
 
-	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
+	if (mapping_empty(mapping))
 		goto out;
 
 	pagevec_init(&pvec);
--- a/mm/vmscan.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/mm/vmscan.c
@@ -901,6 +901,7 @@ static int __remove_mapping(struct addre
 	} else {
 		void (*freepage)(struct page *);
 		void *shadow = NULL;
+		int empty;
 
 		freepage = mapping->a_ops->freepage;
 		/*
@@ -922,9 +923,12 @@ static int __remove_mapping(struct addre
 		if (reclaimed && page_is_file_lru(page) &&
 		    !mapping_exiting(mapping) && !dax_mapping(mapping))
 			shadow = workingset_eviction(page, target_memcg);
-		__delete_from_page_cache(page, shadow);
+		empty = __delete_from_page_cache(page, shadow);
 		xa_unlock_irqrestore(&mapping->i_pages, flags);
 
+		if (empty)
+			inode_pages_clear(mapping->host);
+
 		if (freepage != NULL)
 			freepage(page);
 	}
--- a/mm/vmstat.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/mm/vmstat.c
@@ -1205,9 +1205,11 @@ const char * const vmstat_text[] = {
 #ifdef CONFIG_NUMA
 	"zone_reclaim_failed",
 #endif
-	"pginodesteal",
 	"slabs_scanned",
+	"pginodesteal",
 	"kswapd_inodesteal",
+	"pginoderescue",
+	"pginodedelayed",
 	"kswapd_low_wmark_hit_quickly",
 	"kswapd_high_wmark_hit_quickly",
 	"pageoutrun",
--- a/mm/workingset.c~vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru
+++ a/mm/workingset.c
@@ -491,6 +491,7 @@ static enum lru_status shadow_lru_isolat
 	struct xa_node *node = container_of(item, struct xa_node, private_list);
 	XA_STATE(xas, node->array, 0);
 	struct address_space *mapping;
+	bool empty = false;
 	int ret;
 
 	/*
@@ -529,6 +530,7 @@ static enum lru_status shadow_lru_isolat
 	if (WARN_ON_ONCE(node->count != node->nr_values))
 		goto out_invalid;
 	mapping->nrexceptional -= node->nr_values;
+	empty = mapping_empty(mapping);
 	xas.xa_node = xa_parent_locked(&mapping->i_pages, node);
 	xas.xa_offset = node->offset;
 	xas.xa_shift = node->shift + XA_CHUNK_SHIFT;
@@ -542,6 +544,8 @@ static enum lru_status shadow_lru_isolat
 
 out_invalid:
 	xa_unlock_irq(&mapping->i_pages);
+	if (empty)
+		inode_pages_clear(mapping->host);
 	ret = LRU_REMOVED_RETRY;
 out:
 	cond_resched();
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch
mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (77 preceding siblings ...)
  2020-05-13 21:26 ` + vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch " Andrew Morton
@ 2020-05-13 22:00 ` Andrew Morton
  2020-05-13 22:48 ` + mm-swap-use-prandom_u32_max.patch " Andrew Morton
  2020-05-13 22:54 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch " Andrew Morton
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-13 22:00 UTC (permalink / raw)
  To: cai, hannes, mm-commits, sfr


The patch titled
     Subject: mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix
has been added to the -mm tree.  Its filename is
     mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix

Fix crash.  If you have CONFIG_NUMA and the allocation fails, *hpage could
contain an ERR_PTR instead of being NULL.

Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.org
Reported-by: Qian Cai <cai@lca.pw>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/khugepaged.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/khugepaged.c~mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix
+++ a/mm/khugepaged.c
@@ -1097,7 +1097,7 @@ static void collapse_huge_page(struct mm
 out_up_write:
 	up_write(&mm->mmap_sem);
 out_nolock:
-	if (*hpage)
+	if (!IS_ERR_OR_NULL(*hpage))
 		mem_cgroup_uncharge(*hpage);
 	trace_mm_collapse_huge_page(mm, isolated, result);
 	return;
@@ -1834,7 +1834,7 @@ xa_unlocked:
 	unlock_page(new_page);
 out:
 	VM_BUG_ON(!list_empty(&pagelist));
-	if (*hpage)
+	if (!IS_ERR_OR_NULL(*hpage))
 		mem_cgroup_uncharge(*hpage);
 	/* TODO: tracepoints */
 }
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch
mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-swap-use-prandom_u32_max.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (78 preceding siblings ...)
  2020-05-13 22:00 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch " Andrew Morton
@ 2020-05-13 22:48 ` Andrew Morton
  2020-05-13 22:54 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch " Andrew Morton
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-13 22:48 UTC (permalink / raw)
  To: hughd, mhocko, minchan, mm-commits, tim.c.chen, ying.huang


The patch titled
     Subject: mm/swapfile.c: use prandom_u32_max()
has been added to the -mm tree.  Its filename is
     mm-swap-use-prandom_u32_max.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-swap-use-prandom_u32_max.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-swap-use-prandom_u32_max.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Huang Ying <ying.huang@intel.com>
Subject: mm/swapfile.c: use prandom_u32_max()

To improve the code readability and take advantage of the common
implementation.

Link: http://lkml.kernel.org/r/20200512081013.520201-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/swapfile.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/swapfile.c~mm-swap-use-prandom_u32_max
+++ a/mm/swapfile.c
@@ -3209,7 +3209,7 @@ SYSCALL_DEFINE2(swapon, const char __use
 		 * select a random position to start with to help wear leveling
 		 * SSD
 		 */
-		p->cluster_next = 1 + (prandom_u32() % p->highest_bit);
+		p->cluster_next = 1 + prandom_u32_max(p->highest_bit);
 		nr_cluster = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
 
 		cluster_info = kvcalloc(nr_cluster, sizeof(*cluster_info),
_

Patches currently in -mm which might be from ying.huang@intel.com are

swap-try-to-scan-more-free-slots-even-when-fragmented.patch
mm-swap-use-prandom_u32_max.patch
proc-pid-smaps-add-pmd-migration-entry-parsing.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

* + mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch added to -mm tree
  2020-05-08  1:35 incoming Andrew Morton
                   ` (79 preceding siblings ...)
  2020-05-13 22:48 ` + mm-swap-use-prandom_u32_max.patch " Andrew Morton
@ 2020-05-13 22:54 ` Andrew Morton
  80 siblings, 0 replies; 82+ messages in thread
From: Andrew Morton @ 2020-05-13 22:54 UTC (permalink / raw)
  To: geert, hannes, mm-commits, naresh.kamboju, rdunlap, sfr


The patch titled
     Subject: mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix
has been added to the -mm tree.  Its filename is
     mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix

The THP page size macros are CONFIG_TRANSPARENT_HUGEPAGE only.

We already ifdef most THP-related code in memcg, but not these
particular stats. Memcg used to track the pages as they came in, and
PageTransHuge() + hpage_nr_pages() work when THP is not compiled in.

Switching to native vmstat counters, memcg doesn't see the pages, it
only gets a count of THPs. To translate that to bytes, it has to know
how big the THPs are - and that's only available for CONFIG_THP.

Add the necessary ifdefs. /proc/meminfo, smaps etc. also don't show
the THP counters when the feature is compiled out. The event counts
(THP_FAULT_ALLOC, THP_COLLAPSE_ALLOC) were already conditional also.

Style touchup: HPAGE_PMD_NR * PAGE_SIZE is silly. Use HPAGE_PMD_SIZE.

Link: http://lkml.kernel.org/r/20200512121750.GA397968@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>	[build-tested]
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/mm/memcontrol.c~mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix
+++ a/mm/memcontrol.c
@@ -1401,9 +1401,11 @@ static char *memory_stat_format(struct m
 		       (u64)memcg_page_state(memcg, NR_WRITEBACK) *
 		       PAGE_SIZE);
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	seq_buf_printf(&s, "anon_thp %llu\n",
 		       (u64)memcg_page_state(memcg, NR_ANON_THPS) *
-		       HPAGE_PMD_NR * PAGE_SIZE);
+		       HPAGE_PMD_SIZE);
+#endif
 
 	for (i = 0; i < NR_LRU_LISTS; i++)
 		seq_buf_printf(&s, "%s %llu\n", lru_list_name(i),
@@ -3752,7 +3754,9 @@ static int memcg_numa_stat_show(struct s
 static const unsigned int memcg1_stats[] = {
 	NR_FILE_PAGES,
 	NR_ANON_MAPPED,
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	NR_ANON_THPS,
+#endif
 	NR_SHMEM,
 	NR_FILE_MAPPED,
 	NR_FILE_DIRTY,
@@ -3763,7 +3767,9 @@ static const unsigned int memcg1_stats[]
 static const char *const memcg1_stat_names[] = {
 	"cache",
 	"rss",
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	"rss_huge",
+#endif
 	"shmem",
 	"mapped_file",
 	"dirty",
@@ -3794,8 +3800,10 @@ static int memcg_stat_show(struct seq_fi
 		if (memcg1_stats[i] == MEMCG_SWAP && !do_memsw_account())
 			continue;
 		nr = memcg_page_state_local(memcg, memcg1_stats[i]);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 		if (memcg1_stats[i] == NR_ANON_THPS)
 			nr *= HPAGE_PMD_NR;
+#endif
 		seq_printf(m, "%s %lu\n", memcg1_stat_names[i], nr * PAGE_SIZE);
 	}
 
_

Patches currently in -mm which might be from hannes@cmpxchg.org are

vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch
mm-fix-numa-node-file-count-error-in-replace_page_cache.patch
mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch
mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch
mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch
mm-memcontrol-move-out-cgroup-swaprate-throttling.patch
mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch
mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch
mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch
mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch
mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch
mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch
mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch
mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch
mm-memcontrol-prepare-swap-controller-setup-for-integration.patch
mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch
mm-memcontrol-charge-swapin-pages-on-instantiation.patch
mm-memcontrol-delete-unused-lrucare-handling.patch
mm-memcontrol-update-page-mem_cgroup-stability-rules.patch

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2020-05-13 22:54 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08  1:35 incoming Andrew Morton
2020-05-08  1:35 ` [patch 01/15] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Andrew Morton
2020-05-08  1:35 ` [patch 02/15] mm, memcg: fix error return value of mem_cgroup_css_alloc() Andrew Morton
2020-05-08  1:35 ` [patch 03/15] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Andrew Morton
2020-05-08  1:35 ` [patch 04/15] kernel/kcov.c: fix typos in kcov_remote_start documentation Andrew Morton
2020-05-08  1:35 ` [patch 05/15] scripts/decodecode: fix trapping instruction formatting Andrew Morton
2020-05-08  1:35 ` [patch 06/15] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Andrew Morton
2020-05-08  1:35 ` [patch 07/15] eventpoll: fix missing wakeup for ovflist in ep_poll_callback Andrew Morton
2020-05-08  1:36 ` [patch 08/15] scripts/gdb: repair rb_first() and rb_last() Andrew Morton
2020-05-08  1:36 ` [patch 09/15] mm/slub: fix incorrect interpretation of s->offset Andrew Morton
2020-05-08  1:36 ` [patch 10/15] percpu: make pcpu_alloc() aware of current gfp context Andrew Morton
2020-05-08  1:36 ` [patch 11/15] kselftests: introduce new epoll60 testcase for catching lost wakeups Andrew Morton
2020-05-08  1:36 ` [patch 12/15] epoll: atomically remove wait entry on wake up Andrew Morton
2020-05-08  1:36 ` [patch 13/15] mm/vmscan: remove unnecessary argument description of isolate_lru_pages() Andrew Morton
2020-05-08  1:36 ` [patch 14/15] ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST Andrew Morton
2020-05-08  1:36 ` [patch 15/15] mm: limit boost_watermark on small zones Andrew Morton
2020-05-08  2:08 ` + arch-kunmap-remove-duplicate-kunmap-implementations-fix.patch added to -mm tree Andrew Morton
2020-05-08 20:46 ` + ipc-utilc-sysvipc_find_ipc-incorrectly-updates-position-index-fix.patch " Andrew Morton
2020-05-08 20:50 ` + mm-introduce-external-memory-hinting-api-fix-2.patch " Andrew Morton
2020-05-08 21:32 ` + checkpatch-use-patch-subject-when-reading-from-stdin-fix.patch " Andrew Morton
2020-05-08 23:19 ` + mm-fix-numa-node-file-count-error-in-replace_page_cache.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-fix-stat-corrupting-race-in-charge-moving.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-drop-compound-parameter-from-memcg-charging-api.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-move-out-cgroup-swaprate-throttling.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-convert-page-cache-to-a-new-mem_cgroup_charge-api.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-prepare-uncharging-for-removal-of-private-page-type-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-prepare-move_account-for-removal-of-private-page-type-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-prepare-cgroup-vmstat-infrastructure-for-native-anon-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_file_pages-and-nr_shmem-counters.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_mapped-counter.patch " Andrew Morton
2020-05-08 23:19 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-drop-unused-try-commit-cancel-charge-api.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-prepare-swap-controller-setup-for-integration.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-make-swap-tracking-an-integral-part-of-memory-control.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-charge-swapin-pages-on-instantiation.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-document-the-new-swap-control-behavior.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-delete-unused-lrucare-handling.patch " Andrew Morton
2020-05-08 23:20 ` + mm-memcontrol-update-page-mem_cgroup-stability-rules.patch " Andrew Morton
2020-05-08 23:38 ` + selftests-vm-pkeys-introduce-powerpc-support-fix.patch " Andrew Morton
2020-05-08 23:38 ` + selftests-vm-pkeys-override-access-right-definitions-on-powerpc-fix.patch " Andrew Morton
2020-05-08 23:40 ` + memcg-expose-root-cgroups-memorystat.patch " Andrew Morton
2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked.patch " Andrew Morton
2020-05-08 23:43 ` + doc-cgroup-update-note-about-conditions-when-oom-killer-is-invoked-fix.patch " Andrew Morton
2020-05-08 23:48 ` + zcomp-use-array_size-for-backends-list.patch " Andrew Morton
2020-05-08 23:53 ` + device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch " Andrew Morton
2020-05-08 23:56 ` + mm-memory_hotplug-introduce-add_memory_driver_managed.patch " Andrew Morton
2020-05-08 23:56 ` + kexec_file-dont-place-kexec-images-on-ioresource_mem_driver_managed.patch " Andrew Morton
2020-05-08 23:56 ` + device-dax-add-memory-via-add_memory_driver_managed.patch " Andrew Morton
2020-05-09  0:45 ` + mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
2020-05-11 19:08 ` + mm-reset-numa-stats-for-boot-pagesets-v3.patch " Andrew Morton
2020-05-11 19:54 ` + mm-shmem-remove-rare-optimization-when-swapin-races-with-hole-punching.patch " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] kexec-prevent-removal-of-memory-in-use-by-a-loaded-kexec-image.patch removed from " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names.patch " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix.patch " Andrew Morton
2020-05-11 20:31 ` [to-be-updated] arm64-memory-give-hotplug-memory-a-different-resource-name.patch " Andrew Morton
2020-05-11 20:38 ` + arm-add-support-for-folded-p4d-page-tables-fix.patch added to " Andrew Morton
2020-05-11 20:40 ` + mm-add-debug_wx-support-fix-2.patch " Andrew Morton
2020-05-11 20:41 ` + mm-add-debug_wx-support-fix-3.patch " Andrew Morton
2020-05-11 20:50 ` + arm64-mm-drop-__have_arch_huge_ptep_get.patch " Andrew Morton
2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-is_hugepage_only_range.patch " Andrew Morton
2020-05-11 20:50 ` + mm-hugetlb-define-a-generic-fallback-for-arch_clear_hugepage_flags.patch " Andrew Morton
2020-05-11 21:00 ` + mm-introduce-external-memory-hinting-api-fix-2-fix.patch " Andrew Morton
2020-05-11 21:02 ` + mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch " Andrew Morton
2020-05-11 21:12 ` [alternative-merged] mm-vmscan-consistent-update-to-pgsteal-and-pgscan.patch removed from " Andrew Morton
2020-05-11 21:21 ` + seq_file-introduce-define_seq_attribute-helper-macro.patch added to " Andrew Morton
2020-05-11 21:21 ` + mm-vmstat-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
2020-05-11 21:21 ` + kernel-kprobes-convert-to-use-define_seq_attribute-macro.patch " Andrew Morton
2020-05-11 21:22 ` + seq_file-introduce-define_seq_attribute-helper-macro-checkpatch-fixes.patch " Andrew Morton
2020-05-11 21:23 ` + lib-flex_proportionsc-cleanup-__fprop_inc_percpu_max.patch " Andrew Morton
2020-05-11 21:31 ` [alternative-merged] lib-flex_proportionsc-aging-counts-when-fraction-smaller-than-max_frac-fprop_frac_base.patch removed from " Andrew Morton
2020-05-11 21:44 ` + xtensa-add-loglvl-to-show_trace-fix.patch added to " Andrew Morton
2020-05-11 22:44 ` mmotm 2020-05-11-15-43 uploaded Andrew Morton
2020-05-12 21:03 ` + kasan-consistently-disable-debugging-features.patch added to -mm tree Andrew Morton
2020-05-12 21:03 ` + kasan-add-missing-functions-declarations-to-kasanh.patch " Andrew Morton
2020-05-12 21:04 ` + kasan-move-kasan_report-into-reportc.patch " Andrew Morton
2020-05-13 18:31 ` + mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch " Andrew Morton
2020-05-13 20:56 ` + kernel-sysctl-ignore-out-of-range-taint-bits-introduced-via-kerneltainted.patch " Andrew Morton
2020-05-13 21:26 ` + vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch " Andrew Morton
2020-05-13 22:00 ` + mm-memcontrol-convert-anon-and-file-thp-to-new-mem_cgroup_charge-api-fix.patch " Andrew Morton
2020-05-13 22:48 ` + mm-swap-use-prandom_u32_max.patch " Andrew Morton
2020-05-13 22:54 ` + mm-memcontrol-switch-to-native-nr_anon_thps-counter-fix.patch " Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).