[patch 0/5] optionally sync per-CPU vmstats counter on return to userspace

All of lore.kernel.org
 help / color / mirror / Atom feed

* [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
@ 2021-07-01 21:03 Marcelo Tosatti
  2021-07-01 21:03 ` [patch 1/5] sched: isolation: introduce vmstat_sync isolcpu flags Marcelo Tosatti
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-01 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal

The logic to disable vmstat worker thread, when entering
nohz full, does not cover all scenarios. For example, it is possible
for the following to happen:

1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop

Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
the mlock, vmstat shepherd can restart vmstat worker thread on
the CPU in question.

To fix this, optionally sync the vmstat counters when returning
from userspace, controllable by a new "vmstat_sync" isolcpus
flags (default off).

See individual patches for details.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 1/5] sched: isolation: introduce vmstat_sync isolcpu flags
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
@ 2021-07-01 21:03 ` Marcelo Tosatti
  2021-07-01 21:03 ` [patch 2/5] common entry: add hook for isolation to __syscall_exit_to_user_mode_work Marcelo Tosatti
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-01 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Marcelo Tosatti

Add a new isolcpus flag "vmstat_sync" to control whether
to sync per-CPU counters to global counters when returning
to userspace.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6-vmstat-update/include/linux/sched/isolation.h
===================================================================
--- linux-2.6-vmstat-update.orig/include/linux/sched/isolation.h
+++ linux-2.6-vmstat-update/include/linux/sched/isolation.h
@@ -15,6 +15,7 @@ enum hk_flags {
 	HK_FLAG_WQ		= (1 << 6),
 	HK_FLAG_MANAGED_IRQ	= (1 << 7),
 	HK_FLAG_KTHREAD		= (1 << 8),
+	HK_FLAG_VMSTAT_SYNC	= (1 << 9),
 };
 
 #ifdef CONFIG_CPU_ISOLATION
Index: linux-2.6-vmstat-update/kernel/sched/isolation.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/sched/isolation.c
+++ linux-2.6-vmstat-update/kernel/sched/isolation.c
@@ -173,6 +173,12 @@ static int __init housekeeping_isolcpus_
 			continue;
 		}
 
+		if (!strncmp(str, "vmstat_sync,", 12)) {
+			str += 12;
+			flags |= HK_FLAG_VMSTAT_SYNC;
+			continue;
+		}
+
 		/*
 		 * Skip unknown sub-parameter and validate that it is not
 		 * containing an invalid character.
Index: linux-2.6-vmstat-update/Documentation/admin-guide/kernel-parameters.txt
===================================================================
--- linux-2.6-vmstat-update.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-2.6-vmstat-update/Documentation/admin-guide/kernel-parameters.txt
@@ -2124,6 +2124,23 @@
 
 			The format of <cpu-list> is described above.
 
+                        vmstat_sync
+
+			  Page counters are maintained in per-CPU counters to
+			  improve performance. When a CPU modifies a page counter,
+			  this modification is kept in the per-CPU counter.
+			  Certain activities require a global count, which
+			  involves requesting each CPU to flush its local counters
+			  to the global VM counters.
+			  This flush is implemented via a workqueue item, which
+			  requires scheduling the workqueue task on isolated CPUs.
+
+			  To avoid this interruption, this option syncs the
+			  page counters on each return from system calls.
+			  To ensure the application returns to userspace
+  			  with no modified per-CPU counters, its necessary to
+			  use mlockall() in addition to this isolcpus flag.
+
 	iucv=		[HW,NET]
 
 	ivrs_ioapic	[HW,X86-64]



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 2/5] common entry: add hook for isolation to __syscall_exit_to_user_mode_work
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
  2021-07-01 21:03 ` [patch 1/5] sched: isolation: introduce vmstat_sync isolcpu flags Marcelo Tosatti
@ 2021-07-01 21:03 ` Marcelo Tosatti
  2021-07-01 21:03 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-01 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Marcelo Tosatti

This hook will be used by the next patch to perform synchronization
of per-CPU vmstats.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6-vmstat-update/kernel/entry/common.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/entry/common.c
+++ linux-2.6-vmstat-update/kernel/entry/common.c
@@ -284,9 +284,18 @@ static void syscall_exit_to_user_mode_pr
 		syscall_exit_work(regs, work);
 }
 
+/*
+ * Isolaton specific exit to user mode preparation. Runs with interrupts
+ * enabled.
+ */
+static void isolation_exit_to_user_mode_prepare(void)
+{
+}
+
 static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)
 {
 	syscall_exit_to_user_mode_prepare(regs);
+	isolation_exit_to_user_mode_prepare();
 	local_irq_disable_exit_to_user();
 	exit_to_user_mode_prepare(regs);
 }



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
  2021-07-01 21:03 ` [patch 1/5] sched: isolation: introduce vmstat_sync isolcpu flags Marcelo Tosatti
  2021-07-01 21:03 ` [patch 2/5] common entry: add hook for isolation to __syscall_exit_to_user_mode_work Marcelo Tosatti
@ 2021-07-01 21:03 ` Marcelo Tosatti
  2021-07-01 23:11     ` kernel test robot
  2021-07-02  6:50     ` kernel test robot
  2021-07-01 21:03 ` [patch 4/5] mm: vmstat: move need_update Marcelo Tosatti
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-01 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Marcelo Tosatti

The logic to disable vmstat worker thread, when entering
nohz full, does not cover all scenarios. For example, it is possible
for the following to happen:

1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop

Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
the mlock, vmstat shepherd can restart vmstat worker thread on 
the CPU in question.

To fix this, optionally sync the vmstat counters when returning
from userspace, controllable by a new "vmstat_sync" isolcpus 
flags (default off).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6-vmstat-update/kernel/sched/isolation.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/sched/isolation.c
+++ linux-2.6-vmstat-update/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
  *
  */
 #include "sched.h"
+#include <linux/vmstat.h>
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -129,6 +130,9 @@ static int __init housekeeping_setup(cha
 		}
 	}
 
+	if (flags & HK_FLAG_VMSTAT_SYNC)
+		static_branch_enable(&vmstat_sync_enabled);
+
 	housekeeping_flags |= flags;
 
 	free_bootmem_cpumask_var(non_housekeeping_mask);
Index: linux-2.6-vmstat-update/include/linux/vmstat.h
===================================================================
--- linux-2.6-vmstat-update.orig/include/linux/vmstat.h
+++ linux-2.6-vmstat-update/include/linux/vmstat.h
@@ -21,6 +21,15 @@ int sysctl_vm_numa_stat_handler(struct c
 		void *buffer, size_t *length, loff_t *ppos);
 #endif
 
+DECLARE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+
+extern void __sync_vmstat(void);
+static inline void sync_vmstat(void)
+{
+	if (static_branch_unlikely(&vmstat_sync_enabled))
+		__sync_vmstat();
+}
+
 struct reclaim_stat {
 	unsigned nr_dirty;
 	unsigned nr_unqueued_dirty;
Index: linux-2.6-vmstat-update/mm/vmstat.c
===================================================================
--- linux-2.6-vmstat-update.orig/mm/vmstat.c
+++ linux-2.6-vmstat-update/mm/vmstat.c
@@ -28,6 +28,7 @@
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/sched/isolation.h>
 
 #include "internal.h"
 
@@ -308,6 +309,24 @@ void set_pgdat_percpu_threshold(pg_data_
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty);
+
+static inline void mark_vmstat_dirty(void)
+{
+	int cpu;
+
+	if (!static_branch_unlikely(&vmstat_sync_enabled))
+		return;
+
+	cpu = smp_processor_id();
+
+	if (housekeeping_cpu(cpu, HK_FLAG_VMSTAT_SYNC))
+		return;
+
+	per_cpu(vmstat_dirty, smp_processor_id()) = true;
+}
+
 /*
  * For use when we know that interrupts are disabled,
  * or when we know that preemption is disabled and that
@@ -330,6 +349,7 @@ void __mod_zone_page_state(struct zone *
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_zone_page_state);
 
@@ -361,6 +381,7 @@ void __mod_node_page_state(struct pglist
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_node_page_state);
 
@@ -401,6 +422,7 @@ void __inc_zone_state(struct zone *zone,
 		zone_page_state_add(v + overstep, zone, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -419,6 +441,7 @@ void __inc_node_state(struct pglist_data
 		node_page_state_add(v + overstep, pgdat, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -447,6 +470,7 @@ void __dec_zone_state(struct zone *zone,
 		zone_page_state_add(v - overstep, zone, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -465,6 +489,7 @@ void __dec_node_state(struct pglist_data
 		node_page_state_add(v - overstep, pgdat, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -528,6 +553,7 @@ static inline void mod_zone_state(struct
 
 	if (z)
 		zone_page_state_add(z, zone, item);
+	mark_vmstat_dirty();
 }
 
 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
@@ -596,6 +622,7 @@ static inline void mod_node_state(struct
 
 	if (z)
 		node_page_state_add(z, pgdat, item);
+	mark_vmstat_dirty();
 }
 
 void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
@@ -2006,6 +2033,32 @@ static void vmstat_shepherd(struct work_
 		round_jiffies_relative(sysctl_stat_interval));
 }
 
+void __sync_vmstat(void)
+{
+	int cpu;
+
+	cpu = get_cpu();
+	if (per_cpu(vmstat_dirty, cpu) == false) {
+		put_cpu();
+		return;
+	}
+
+	refresh_cpu_vm_stats(false);
+	per_cpu(vmstat_dirty, cpu) = false;
+	put_cpu();
+
+	/*
+	 * If task is migrated to another CPU between put_cpu
+	 * and cancel_delayed_work_sync, the code below might
+	 * cancel vmstat_update work for a different cpu
+	 * (than the one from which the vmstats were flushed).
+	 *
+	 * However, vmstat shepherd will re-enable it later,
+	 * so its harmless.
+	 */
+	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+}
+
 static void __init start_shepherd_timer(void)
 {
 	int cpu;
Index: linux-2.6-vmstat-update/kernel/entry/common.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/entry/common.c
+++ linux-2.6-vmstat-update/kernel/entry/common.c
@@ -6,6 +6,7 @@
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
+#include <linux/vmstat.h>
 
 #include "common.h"
 
@@ -290,6 +291,7 @@ static void syscall_exit_to_user_mode_pr
  */
 static void isolation_exit_to_user_mode_prepare(void)
 {
+	sync_vmstat();
 }
 
 static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 4/5] mm: vmstat: move need_update
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2021-07-01 21:03 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti
@ 2021-07-01 21:03 ` Marcelo Tosatti
  2021-07-01 21:03 ` [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-01 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Marcelo Tosatti

Move need_update() function up in vmstat.c, needed by next patch. 
No code changes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: linux-2.6-vmstat-update/mm/vmstat.c
===================================================================
--- linux-2.6-vmstat-update.orig/mm/vmstat.c
+++ linux-2.6-vmstat-update/mm/vmstat.c
@@ -1860,6 +1860,40 @@ static const struct seq_operations vmsta
 static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
 int sysctl_stat_interval __read_mostly = HZ;
 
+/*
+ * Check if the diffs for a certain cpu indicate that
+ * an update is needed.
+ */
+static bool need_update(int cpu)
+{
+	pg_data_t *last_pgdat = NULL;
+	struct zone *zone;
+
+	for_each_populated_zone(zone) {
+		struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
+		struct per_cpu_nodestat *n;
+		/*
+		 * The fast way of checking if there are any vmstat diffs.
+		 */
+		if (memchr_inv(p->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS *
+			       sizeof(p->vm_stat_diff[0])))
+			return true;
+#ifdef CONFIG_NUMA
+		if (memchr_inv(p->vm_numa_stat_diff, 0, NR_VM_NUMA_STAT_ITEMS *
+			       sizeof(p->vm_numa_stat_diff[0])))
+			return true;
+#endif
+		if (last_pgdat == zone->zone_pgdat)
+			continue;
+		last_pgdat = zone->zone_pgdat;
+		n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu);
+		if (memchr_inv(n->vm_node_stat_diff, 0, NR_VM_NODE_STAT_ITEMS *
+			       sizeof(n->vm_node_stat_diff[0])))
+		    return true;
+	}
+	return false;
+}
+
 #ifdef CONFIG_PROC_FS
 static void refresh_vm_stats(struct work_struct *work)
 {
@@ -1945,40 +1979,6 @@ static void vmstat_update(struct work_st
  * invoked when tick processing is not active.
  */
 /*
- * Check if the diffs for a certain cpu indicate that
- * an update is needed.
- */
-static bool need_update(int cpu)
-{
-	pg_data_t *last_pgdat = NULL;
-	struct zone *zone;
-
-	for_each_populated_zone(zone) {
-		struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
-		struct per_cpu_nodestat *n;
-		/*
-		 * The fast way of checking if there are any vmstat diffs.
-		 */
-		if (memchr_inv(p->vm_stat_diff, 0, NR_VM_ZONE_STAT_ITEMS *
-			       sizeof(p->vm_stat_diff[0])))
-			return true;
-#ifdef CONFIG_NUMA
-		if (memchr_inv(p->vm_numa_stat_diff, 0, NR_VM_NUMA_STAT_ITEMS *
-			       sizeof(p->vm_numa_stat_diff[0])))
-			return true;
-#endif
-		if (last_pgdat == zone->zone_pgdat)
-			continue;
-		last_pgdat = zone->zone_pgdat;
-		n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu);
-		if (memchr_inv(n->vm_node_stat_diff, 0, NR_VM_NODE_STAT_ITEMS *
-			       sizeof(n->vm_node_stat_diff[0])))
-		    return true;
-	}
-	return false;
-}
-
-/*
  * Switch off vmstat processing and then fold all the remaining differentials
  * until the diffs stay at zero. The function is used by NOHZ and can only be
  * invoked when tick processing is not active.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2021-07-01 21:03 ` [patch 4/5] mm: vmstat: move need_update Marcelo Tosatti
@ 2021-07-01 21:03 ` Marcelo Tosatti
  2021-07-02  4:10     ` kernel test robot
  2021-07-02  4:43     ` kernel test robot
  2021-07-02  8:00 ` [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Christoph Lameter
  2021-07-02 12:30 ` Frederic Weisbecker
  6 siblings, 2 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-01 21:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Marcelo Tosatti

It is not necessary to queue work item to run refresh_vm_stats 
on a remote CPU if that CPU has no dirty stats and no per-CPU
allocations for remote nodes.

This fixes sosreport hang (which uses vmstat_refresh) with 
spinning SCHED_FIFO process.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6-vmstat-update/mm/vmstat.c
===================================================================
--- linux-2.6-vmstat-update.orig/mm/vmstat.c
+++ linux-2.6-vmstat-update/mm/vmstat.c
@@ -1895,17 +1895,39 @@ static bool need_update(int cpu)
 }
 
 #ifdef CONFIG_PROC_FS
-static void refresh_vm_stats(struct work_struct *work)
+static bool need_drain_remote_zones(int cpu)
+{
+	struct zone *zone;
+
+	for_each_populated_zone(zone) {
+		struct per_cpu_pageset *p;
+
+		p = per_cpu_ptr(zone->pageset, cpu);
+
+		if (!p->pcp.count)
+			continue;
+		if (!p->expire)
+			continue;
+		if (zone_to_nid(zone) == cpu_to_node(cpu))
+			continue;
+
+		return true;
+	}
+
+	return false;
+}
+
+static long refresh_vm_stats(void *arg)
 {
 	refresh_cpu_vm_stats(true);
+	return 0;
 }
 
 int vmstat_refresh(struct ctl_table *table, int write,
 		   void *buffer, size_t *lenp, loff_t *ppos)
 {
 	long val;
-	int err;
-	int i;
+	int i, cpu;
 
 	/*
 	 * The regular update, every sysctl_stat_interval, may come later
@@ -1919,9 +1941,15 @@ int vmstat_refresh(struct ctl_table *tab
 	 * transiently negative values, report an error here if any of
 	 * the stats is negative, so we know to go looking for imbalance.
 	 */
-	err = schedule_on_each_cpu(refresh_vm_stats);
-	if (err)
-		return err;
+	get_online_cpus();
+	for_each_online_cpu(cpu) {
+		if (need_update(cpu) || need_drain_remote_zones(cpu))
+			work_on_cpu(cpu, refresh_vm_stats, NULL);
+
+		cond_resched();
+	}
+	put_online_cpus();
+
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
 		/*
 		 * Skip checking stats known to go negative occasionally.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-01 21:03 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti
@ 2021-07-01 23:11     ` kernel test robot
  2021-07-02  6:50     ` kernel test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-01 23:11 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: kbuild-all, Christoph Lameter, Thomas Gleixner,
	Frederic Weisbecker, Juri Lelli, Nitesh Lal, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1993 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linus/master v5.13 next-20210701]
[cannot apply to hnaz-linux-mm/master tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: i386-tinyconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/b973a70c0670675073265d2cbee70a36bda3273e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout b973a70c0670675073265d2cbee70a36bda3273e
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: kernel/entry/common.o: in function `syscall_exit_to_user_mode_work':
>> common.c:(.text+0x1d8): undefined reference to `vmstat_sync_enabled'
>> ld: common.c:(.text+0x1e1): undefined reference to `__sync_vmstat'
   ld: kernel/entry/common.o: in function `syscall_exit_to_user_mode':
>> common.c:(.noinstr.text+0x71): undefined reference to `vmstat_sync_enabled'
>> ld: common.c:(.noinstr.text+0x7a): undefined reference to `__sync_vmstat'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7417 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
@ 2021-07-01 23:11     ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-01 23:11 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linus/master v5.13 next-20210701]
[cannot apply to hnaz-linux-mm/master tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: i386-tinyconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/b973a70c0670675073265d2cbee70a36bda3273e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout b973a70c0670675073265d2cbee70a36bda3273e
        # save the attached .config to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: kernel/entry/common.o: in function `syscall_exit_to_user_mode_work':
>> common.c:(.text+0x1d8): undefined reference to `vmstat_sync_enabled'
>> ld: common.c:(.text+0x1e1): undefined reference to `__sync_vmstat'
   ld: kernel/entry/common.o: in function `syscall_exit_to_user_mode':
>> common.c:(.noinstr.text+0x71): undefined reference to `vmstat_sync_enabled'
>> ld: common.c:(.noinstr.text+0x7a): undefined reference to `__sync_vmstat'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 7417 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean
  2021-07-01 21:03 ` [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
@ 2021-07-02  4:10     ` kernel test robot
  2021-07-02  4:43     ` kernel test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-02  4:10 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: clang-built-linux, kbuild-all, Christoph Lameter,
	Thomas Gleixner, Frederic Weisbecker, Juri Lelli, Nitesh Lal,
	Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 3100 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master v5.13]
[cannot apply to hnaz-linux-mm/master linus/master tip/core/entry next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: riscv-randconfig-r013-20210630 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9eb613b2de3163686b1a4bd1160f15ac56a4b083)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/vmstat.c:1909:11: error: no member named 'expire' in 'struct per_cpu_pageset'
                   if (!p->expire)
                        ~  ^
   1 error generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for LOCKDEP
   Depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && (FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86)
   Selected by
   - DEBUG_LOCK_ALLOC && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
   WARNING: unmet direct dependencies detected for ERRATA_SIFIVE
   Depends on RISCV_ERRATA_ALTERNATIVE
   Selected by
   - SOC_SIFIVE


vim +1909 mm/vmstat.c

  1896	
  1897	#ifdef CONFIG_PROC_FS
  1898	static bool need_drain_remote_zones(int cpu)
  1899	{
  1900		struct zone *zone;
  1901	
  1902		for_each_populated_zone(zone) {
  1903			struct per_cpu_pageset *p;
  1904	
  1905			p = per_cpu_ptr(zone->pageset, cpu);
  1906	
  1907			if (!p->pcp.count)
  1908				continue;
> 1909			if (!p->expire)
  1910				continue;
  1911			if (zone_to_nid(zone) == cpu_to_node(cpu))
  1912				continue;
  1913	
  1914			return true;
  1915		}
  1916	
  1917		return false;
  1918	}
  1919	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23877 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean
@ 2021-07-02  4:10     ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-02  4:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3181 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master v5.13]
[cannot apply to hnaz-linux-mm/master linus/master tip/core/entry next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: riscv-randconfig-r013-20210630 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9eb613b2de3163686b1a4bd1160f15ac56a4b083)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/vmstat.c:1909:11: error: no member named 'expire' in 'struct per_cpu_pageset'
                   if (!p->expire)
                        ~  ^
   1 error generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for LOCKDEP
   Depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && (FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86)
   Selected by
   - DEBUG_LOCK_ALLOC && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
   WARNING: unmet direct dependencies detected for ERRATA_SIFIVE
   Depends on RISCV_ERRATA_ALTERNATIVE
   Selected by
   - SOC_SIFIVE


vim +1909 mm/vmstat.c

  1896	
  1897	#ifdef CONFIG_PROC_FS
  1898	static bool need_drain_remote_zones(int cpu)
  1899	{
  1900		struct zone *zone;
  1901	
  1902		for_each_populated_zone(zone) {
  1903			struct per_cpu_pageset *p;
  1904	
  1905			p = per_cpu_ptr(zone->pageset, cpu);
  1906	
  1907			if (!p->pcp.count)
  1908				continue;
> 1909			if (!p->expire)
  1910				continue;
  1911			if (zone_to_nid(zone) == cpu_to_node(cpu))
  1912				continue;
  1913	
  1914			return true;
  1915		}
  1916	
  1917		return false;
  1918	}
  1919	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 23877 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean
  2021-07-01 21:03 ` [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
@ 2021-07-02  4:43     ` kernel test robot
  2021-07-02  4:43     ` kernel test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-02  4:43 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: kbuild-all, Christoph Lameter, Thomas Gleixner,
	Frederic Weisbecker, Juri Lelli, Nitesh Lal, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 3014 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master v5.13]
[cannot apply to hnaz-linux-mm/master linus/master tip/core/entry next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: sparc64-randconfig-s031-20210630 (attached as .config)
compiler: sparc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=sparc64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/vmstat.c: In function 'need_drain_remote_zones':
>> mm/vmstat.c:1909:9: error: 'struct per_cpu_pageset' has no member named 'expire'
    1909 |   if (!p->expire)
         |         ^~

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for LOCKDEP
   Depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && (FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86)
   Selected by
   - PROVE_LOCKING && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
   - LOCK_STAT && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
   - DEBUG_LOCK_ALLOC && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT


vim +1909 mm/vmstat.c

  1896	
  1897	#ifdef CONFIG_PROC_FS
  1898	static bool need_drain_remote_zones(int cpu)
  1899	{
  1900		struct zone *zone;
  1901	
  1902		for_each_populated_zone(zone) {
  1903			struct per_cpu_pageset *p;
  1904	
  1905			p = per_cpu_ptr(zone->pageset, cpu);
  1906	
  1907			if (!p->pcp.count)
  1908				continue;
> 1909			if (!p->expire)
  1910				continue;
  1911			if (zone_to_nid(zone) == cpu_to_node(cpu))
  1912				continue;
  1913	
  1914			return true;
  1915		}
  1916	
  1917		return false;
  1918	}
  1919	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36489 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean
@ 2021-07-02  4:43     ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-02  4:43 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3092 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master v5.13]
[cannot apply to hnaz-linux-mm/master linus/master tip/core/entry next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: sparc64-randconfig-s031-20210630 (attached as .config)
compiler: sparc64-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout e9eaf0981b74e6c29c7691ffb25b6d6613632f4f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=sparc64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/vmstat.c: In function 'need_drain_remote_zones':
>> mm/vmstat.c:1909:9: error: 'struct per_cpu_pageset' has no member named 'expire'
    1909 |   if (!p->expire)
         |         ^~

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for LOCKDEP
   Depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && (FRAME_POINTER || MIPS || PPC || S390 || MICROBLAZE || ARM || ARC || X86)
   Selected by
   - PROVE_LOCKING && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
   - LOCK_STAT && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
   - DEBUG_LOCK_ALLOC && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT


vim +1909 mm/vmstat.c

  1896	
  1897	#ifdef CONFIG_PROC_FS
  1898	static bool need_drain_remote_zones(int cpu)
  1899	{
  1900		struct zone *zone;
  1901	
  1902		for_each_populated_zone(zone) {
  1903			struct per_cpu_pageset *p;
  1904	
  1905			p = per_cpu_ptr(zone->pageset, cpu);
  1906	
  1907			if (!p->pcp.count)
  1908				continue;
> 1909			if (!p->expire)
  1910				continue;
  1911			if (zone_to_nid(zone) == cpu_to_node(cpu))
  1912				continue;
  1913	
  1914			return true;
  1915		}
  1916	
  1917		return false;
  1918	}
  1919	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36489 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-01 21:03 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti
@ 2021-07-02  6:50     ` kernel test robot
  2021-07-02  6:50     ` kernel test robot
  1 sibling, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-02  6:50 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel
  Cc: kbuild-all, Christoph Lameter, Thomas Gleixner,
	Frederic Weisbecker, Juri Lelli, Nitesh Lal, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 2685 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linus/master v5.13 next-20210701]
[cannot apply to hnaz-linux-mm/master tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: arc-randconfig-r024-20210630 (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b973a70c0670675073265d2cbee70a36bda3273e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout b973a70c0670675073265d2cbee70a36bda3273e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arceb-elf-ld: kernel/sched/isolation.o: in function `housekeeping_setup':
   isolation.c:(.init.text+0xa0): undefined reference to `vmstat_sync_enabled'
>> arceb-elf-ld: isolation.c:(.init.text+0xa0): undefined reference to `vmstat_sync_enabled'
   arceb-elf-ld: lib/stackdepot.o: in function `filter_irq_stacks':
   (.text+0x5a): undefined reference to `__irqentry_text_start'
   arceb-elf-ld: (.text+0x5a): undefined reference to `__irqentry_text_start'
   arceb-elf-ld: (.text+0x62): undefined reference to `__irqentry_text_end'
   arceb-elf-ld: (.text+0x62): undefined reference to `__irqentry_text_end'
   arceb-elf-ld: (.text+0x6a): undefined reference to `__softirqentry_text_start'
   arceb-elf-ld: (.text+0x6a): undefined reference to `__softirqentry_text_start'
   arceb-elf-ld: (.text+0x72): undefined reference to `__softirqentry_text_end'
   arceb-elf-ld: (.text+0x72): undefined reference to `__softirqentry_text_end'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32501 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
@ 2021-07-02  6:50     ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-02  6:50 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2734 bytes --]

Hi Marcelo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on tip/master linus/master v5.13 next-20210701]
[cannot apply to hnaz-linux-mm/master tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
config: arc-randconfig-r024-20210630 (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b973a70c0670675073265d2cbee70a36bda3273e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout b973a70c0670675073265d2cbee70a36bda3273e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arceb-elf-ld: kernel/sched/isolation.o: in function `housekeeping_setup':
   isolation.c:(.init.text+0xa0): undefined reference to `vmstat_sync_enabled'
>> arceb-elf-ld: isolation.c:(.init.text+0xa0): undefined reference to `vmstat_sync_enabled'
   arceb-elf-ld: lib/stackdepot.o: in function `filter_irq_stacks':
   (.text+0x5a): undefined reference to `__irqentry_text_start'
   arceb-elf-ld: (.text+0x5a): undefined reference to `__irqentry_text_start'
   arceb-elf-ld: (.text+0x62): undefined reference to `__irqentry_text_end'
   arceb-elf-ld: (.text+0x62): undefined reference to `__irqentry_text_end'
   arceb-elf-ld: (.text+0x6a): undefined reference to `__softirqentry_text_start'
   arceb-elf-ld: (.text+0x6a): undefined reference to `__softirqentry_text_start'
   arceb-elf-ld: (.text+0x72): undefined reference to `__softirqentry_text_end'
   arceb-elf-ld: (.text+0x72): undefined reference to `__softirqentry_text_end'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32501 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
                   ` (4 preceding siblings ...)
  2021-07-01 21:03 ` [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
@ 2021-07-02  8:00 ` Christoph Lameter
  2021-07-02 11:52   ` Marcelo Tosatti
  2021-07-02 11:59   ` Marcelo Tosatti
  2021-07-02 12:30 ` Frederic Weisbecker
  6 siblings, 2 replies; 33+ messages in thread
From: Christoph Lameter @ 2021-07-02  8:00 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal

On Thu, 1 Jul 2021, Marcelo Tosatti wrote:

> The logic to disable vmstat worker thread, when entering
> nohz full, does not cover all scenarios. For example, it is possible
> for the following to happen:
>
> 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> 2) app runs mlock, which increases counters for mlock'ed pages.
> 3) start -RT loop
>
> Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> the mlock, vmstat shepherd can restart vmstat worker thread on
> the CPU in question.

Can we enter nohz_full after the app runs mlock?

> To fix this, optionally sync the vmstat counters when returning
> from userspace, controllable by a new "vmstat_sync" isolcpus
> flags (default off).
>
> See individual patches for details.

Wow... This is going into some performance sensitive VM counters here and
adds code to their primitives.

Isnt there a simpler solution that does not require this amount of
changes?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-02  8:00 ` [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Christoph Lameter
@ 2021-07-02 11:52   ` Marcelo Tosatti
  2021-07-02 11:59   ` Marcelo Tosatti
  1 sibling, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-02 11:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal

Hi Christoph,

On Fri, Jul 02, 2021 at 10:00:11AM +0200, Christoph Lameter wrote:
> On Thu, 1 Jul 2021, Marcelo Tosatti wrote:
> 
> > The logic to disable vmstat worker thread, when entering
> > nohz full, does not cover all scenarios. For example, it is possible
> > for the following to happen:
> >
> > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > 2) app runs mlock, which increases counters for mlock'ed pages.
> > 3) start -RT loop
> >
> > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > the mlock, vmstat shepherd can restart vmstat worker thread on
> > the CPU in question.
> 
> Can we enter nohz_full after the app runs mlock?
> 
> > To fix this, optionally sync the vmstat counters when returning
> > from userspace, controllable by a new "vmstat_sync" isolcpus
> > flags (default off).
> >
> > See individual patches for details.
> 
> Wow... This is going into some performance sensitive VM counters here and
> adds code to their primitives.

Yes, but it should all be under static key (therefore the performance
impact, when isolcpus=vmstat_sync,CPULIST is not enabled, should be
zero) (if the patchset is correct! ...).

For the case where isolcpus=vmstat_sync is enabled, the most important
performance aspect is the latency spike which this patch is dealing
with.

> Isnt there a simpler solution that does not require this amount of
> changes?

The one other change (I can think of) which could solve this problem 
would be allowing remote access to per-CPU vmstat counters 
(requiring a local_lock to be added), which seems to be more complex
than this.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-02  8:00 ` [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Christoph Lameter
  2021-07-02 11:52   ` Marcelo Tosatti
@ 2021-07-02 11:59   ` Marcelo Tosatti
  2021-07-05 14:26     ` Christoph Lameter
  1 sibling, 1 reply; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-02 11:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal

Hi Christoph,

Forgot to reply to this question...

On Fri, Jul 02, 2021 at 10:00:11AM +0200, Christoph Lameter wrote:
> On Thu, 1 Jul 2021, Marcelo Tosatti wrote:
> 
> > The logic to disable vmstat worker thread, when entering
> > nohz full, does not cover all scenarios. For example, it is possible
> > for the following to happen:
> >
> > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > 2) app runs mlock, which increases counters for mlock'ed pages.
> > 3) start -RT loop
> >
> > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > the mlock, vmstat shepherd can restart vmstat worker thread on
> > the CPU in question.
> 
> Can we enter nohz_full after the app runs mlock?

Hum, i don't think its a good idea to use that route, because
entering or exiting nohz_full depends on a number of variable
outside of one's control (and additional variables might be
added in the future).

So preparing the system to function
while entering nohz_full at any location seems the sane thing to do.

And that would be at return to userspace (since, if mlocked, after 
that point there will be no more changes to propagate to vmstat
counters).

Or am i missing something else you can think of ?



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
                   ` (5 preceding siblings ...)
  2021-07-02  8:00 ` [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Christoph Lameter
@ 2021-07-02 12:30 ` Frederic Weisbecker
  2021-07-02 15:28   ` Marcelo Tosatti
  6 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2021-07-02 12:30 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Christoph Lameter, Thomas Gleixner, Juri Lelli, Nitesh Lal

On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> The logic to disable vmstat worker thread, when entering
> nohz full, does not cover all scenarios. For example, it is possible
> for the following to happen:
> 
> 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> 2) app runs mlock, which increases counters for mlock'ed pages.
> 3) start -RT loop
> 
> Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> the mlock, vmstat shepherd can restart vmstat worker thread on
> the CPU in question.
>  
> To fix this, optionally sync the vmstat counters when returning
> from userspace, controllable by a new "vmstat_sync" isolcpus
> flags (default off).

Wasn't the plan for such finegrained isolation features to do it at
the per task level using prctl()?

Thanks.

> 
> See individual patches for details.
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-02 12:30 ` Frederic Weisbecker
@ 2021-07-02 15:28   ` Marcelo Tosatti
  2021-07-06 13:09     ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-02 15:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Christoph Lameter, Thomas Gleixner, Juri Lelli, Nitesh Lal

Hi Frederic,

On Fri, Jul 02, 2021 at 02:30:32PM +0200, Frederic Weisbecker wrote:
> On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> > The logic to disable vmstat worker thread, when entering
> > nohz full, does not cover all scenarios. For example, it is possible
> > for the following to happen:
> > 
> > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > 2) app runs mlock, which increases counters for mlock'ed pages.
> > 3) start -RT loop
> > 
> > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > the mlock, vmstat shepherd can restart vmstat worker thread on
> > the CPU in question.
> >  
> > To fix this, optionally sync the vmstat counters when returning
> > from userspace, controllable by a new "vmstat_sync" isolcpus
> > flags (default off).
> 
> Wasn't the plan for such finegrained isolation features to do it at
> the per task level using prctl()?

Yes, but its orthogonal: when we integrate the finegrained isolation
interface, will be able to use this code (to sync vmstat counters
on return to userspace) only when userspace informs that it has entered
isolated mode, so you don't incur the performance penalty of frequent
vmstat counter writes when not using isolated apps.

This is what the full task isolation task patchset mode is doing
as well (CC'ing Alex BTW).

This will require modifying applications (and the new kernel with the
exposed interface).

But there is demand for fixing this now, for currently existing
binary only applications.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-02 11:59   ` Marcelo Tosatti
@ 2021-07-05 14:26     ` Christoph Lameter
  2021-07-05 14:45       ` Marcelo Tosatti
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2021-07-05 14:26 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal

On Fri, 2 Jul 2021, Marcelo Tosatti wrote:

> > > The logic to disable vmstat worker thread, when entering
> > > nohz full, does not cover all scenarios. For example, it is possible
> > > for the following to happen:
> > >
> > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > 3) start -RT loop
> > >
> > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > the CPU in question.
> >
> > Can we enter nohz_full after the app runs mlock?
>
> Hum, i don't think its a good idea to use that route, because
> entering or exiting nohz_full depends on a number of variable
> outside of one's control (and additional variables might be
> added in the future).

Then I do not see any need for this patch. Because after a certain time
of inactivity (after the mlock) the system will enter nohz_full again.
If userspace has no direct control over nohz_full and can only wait then
it just has to do so.

> So preparing the system to function
> while entering nohz_full at any location seems the sane thing to do.
>
> And that would be at return to userspace (since, if mlocked, after
> that point there will be no more changes to propagate to vmstat
> counters).
>
> Or am i missing something else you can think of ?

I assumed that the "enter nohz full" was an action by the user
space app because I saw some earlier patches to introduce such
functionality in the past.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-05 14:26     ` Christoph Lameter
@ 2021-07-05 14:45       ` Marcelo Tosatti
  0 siblings, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-05 14:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal

On Mon, Jul 05, 2021 at 04:26:48PM +0200, Christoph Lameter wrote:
> On Fri, 2 Jul 2021, Marcelo Tosatti wrote:
> 
> > > > The logic to disable vmstat worker thread, when entering
> > > > nohz full, does not cover all scenarios. For example, it is possible
> > > > for the following to happen:
> > > >
> > > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > > 3) start -RT loop
> > > >
> > > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > > the CPU in question.
> > >
> > > Can we enter nohz_full after the app runs mlock?
> >
> > Hum, i don't think its a good idea to use that route, because
> > entering or exiting nohz_full depends on a number of variable
> > outside of one's control (and additional variables might be
> > added in the future).
> 
> Then I do not see any need for this patch. Because after a certain time
> of inactivity (after the mlock) the system will enter nohz_full again.
> If userspace has no direct control over nohz_full and can only wait then
> it just has to do so.

Sorry, fail to see what you mean.

The problem (well its not a bug per se, but basically the current
disablement of vmstat_worker thread is not aggressive enough).

From the initial message:

1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop

Note that any activity that triggers stat counter changes (other than
mlock, it just happens that it was mlock in the test application i was 
using, just replace with any other system call that triggers writes
to per-CPU vmstat counters), will cause this.

You said:

"Because after a certain time of inactivity (after the mlock) the 
system will enter nohz_full again."

Yes, but we can't tolerate any activity from vmstat worker thread
on this particular CPU.

Do you want the app to wait for an event saying: "vmstat_worker is now
disabled, as long as you don't dirty vmstat counters, vmstat_shepherd
won't wake it up".

Rather than that, what this patch does is to sync the vmstat counters on 
return to userspace, so that:

"We synced per-CPU vmstat counters to global counters, and disable
local-CPU vmstat worker (on return to userspace). As long as you 
don't dirty vmstat counters, vmstat_shepherd won't wake it up".

Makes sense?

> > So preparing the system to function
> > while entering nohz_full at any location seems the sane thing to do.
> >
> > And that would be at return to userspace (since, if mlocked, after
> > that point there will be no more changes to propagate to vmstat
> > counters).
> >
> > Or am i missing something else you can think of ?
> 
> I assumed that the "enter nohz full" was an action by the user
> space app because I saw some earlier patches to introduce such
> functionality in the past.

No, it meant "enter nohz full" (in the current Linux codebase, for
existing applications). 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-02 15:28   ` Marcelo Tosatti
@ 2021-07-06 13:09     ` Frederic Weisbecker
  2021-07-06 14:05       ` Marcelo Tosatti
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2021-07-06 13:09 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Christoph Lameter, Thomas Gleixner, Juri Lelli,
	Nitesh Lal, Peter Zijlstra

On Fri, Jul 02, 2021 at 12:28:16PM -0300, Marcelo Tosatti wrote:
> 
> Hi Frederic,
> 
> On Fri, Jul 02, 2021 at 02:30:32PM +0200, Frederic Weisbecker wrote:
> > On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> > > The logic to disable vmstat worker thread, when entering
> > > nohz full, does not cover all scenarios. For example, it is possible
> > > for the following to happen:
> > > 
> > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > 3) start -RT loop
> > > 
> > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > the CPU in question.
> > >  
> > > To fix this, optionally sync the vmstat counters when returning
> > > from userspace, controllable by a new "vmstat_sync" isolcpus
> > > flags (default off).
> > 
> > Wasn't the plan for such finegrained isolation features to do it at
> > the per task level using prctl()?
> 
> Yes, but its orthogonal: when we integrate the finegrained isolation
> interface, will be able to use this code (to sync vmstat counters
> on return to userspace) only when userspace informs that it has entered
> isolated mode, so you don't incur the performance penalty of frequent
> vmstat counter writes when not using isolated apps.
> 
> This is what the full task isolation task patchset mode is doing
> as well (CC'ing Alex BTW).

Right there can be two ways:

* A prctl request to sync vmstat only on exit from that prctl
* A prctl request to sync vmstat on all subsequent exit from
  kernel space.

> 
> This will require modifying applications (and the new kernel with the
> exposed interface).
> 
> But there is demand for fixing this now, for currently existing
> binary only applications.

I would agree if it were a regression but it's not. It's merely
a new feature and we don't want to rush on a broken interface.

And I suspect some other people won't like much a new extension
to isolcpus.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-06 13:09     ` Frederic Weisbecker
@ 2021-07-06 14:05       ` Marcelo Tosatti
  2021-07-06 14:09         ` Marcelo Tosatti
  0 siblings, 1 reply; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-06 14:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Christoph Lameter, Thomas Gleixner, Juri Lelli,
	Nitesh Lal, Peter Zijlstra

On Tue, Jul 06, 2021 at 03:09:25PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 02, 2021 at 12:28:16PM -0300, Marcelo Tosatti wrote:
> > 
> > Hi Frederic,
> > 
> > On Fri, Jul 02, 2021 at 02:30:32PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> > > > The logic to disable vmstat worker thread, when entering
> > > > nohz full, does not cover all scenarios. For example, it is possible
> > > > for the following to happen:
> > > > 
> > > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > > 3) start -RT loop
> > > > 
> > > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > > the CPU in question.
> > > >  
> > > > To fix this, optionally sync the vmstat counters when returning
> > > > from userspace, controllable by a new "vmstat_sync" isolcpus
> > > > flags (default off).
> > > 
> > > Wasn't the plan for such finegrained isolation features to do it at
> > > the per task level using prctl()?
> > 
> > Yes, but its orthogonal: when we integrate the finegrained isolation
> > interface, will be able to use this code (to sync vmstat counters
> > on return to userspace) only when userspace informs that it has entered
> > isolated mode, so you don't incur the performance penalty of frequent
> > vmstat counter writes when not using isolated apps.
> > 
> > This is what the full task isolation task patchset mode is doing
> > as well (CC'ing Alex BTW).
> 
> Right there can be two ways:

  * An isolcpus flag to request sync of vmstat on all exits
    to userspace.
> * A prctl request to sync vmstat only on exit from that prctl
> * A prctl request to sync vmstat on all subsequent exit from
>   kernel space.

* A prctl to expose "vmstat is out of sync" information 
to userspace, so that it can be queried and flushed
(Christoph's suggestion:
https://www.spinics.net/lists/linux-mm/msg243788.html).

> > This will require modifying applications (and the new kernel with the
> > exposed interface).
> > 
> > But there is demand for fixing this now, for currently existing
> > binary only applications.
> 
> I would agree if it were a regression but it's not. It's merely
> a new feature and we don't want to rush on a broken interface.

Well, people out there need it in some form (vmstat sync).
Can we please agree on an acceptable way to allow this.

Why its a broken interface? It has good qualities IMO:

- Its well contained (if you don't need, don't use it).
- Does not require modifying -RT applications.
- Works well for a set of applications (where the overhead of
syncing vmstat is largely irrelevant, but the vmstat_worker 
interruption is).

And its patchset integrates part another piece of full task isolation.

> And I suspect some other people won't like much a new extension
> to isolcpus.

Why is that so? 

---

Regarding the prctl interface: The suggestion to allow
system calls (https://www.spinics.net/lists/linux-mm/msg241750.html)
conflicts with "full task isolation": when entering the kernel,
one might be target of an interruption (for example a TLB flush).

Thomas wrote on that thread:

"So you say some code can tolerate a few interrupts, then comes Alex and
says 'no disturbance' at all.

The point is that all of this shares the mechanisms to quiesce certain
parts of the kernel so this wants to build common infrastructure and the
prctl(ISOLATION, MODE) mode argument defines the scope of isolation
which the task asks for and the infrastructure decides whether it can be
granted and if so orchestrates the operation and provides a common
infrastructure for instrumentation, violation monitoring etc.

We really need to stop to look at particular workloads and defining
adhoc solutions tailored to their particular itch if we don't want to
end up with an uncoordinated and unmaintainable zoo of interfaces, hooks
and knobs.

Just looking at the problem at hand as an example. NOHZ already issues
quiet_vmstat(), but it does not cancel already scheduled work. Now
Marcelo wants a new mechanism which is supposed to cancel the work and
then Alex want's to prevent it from being rescheduled. If that's not
properly coordinated this goes down the drain very fast."

Not allowing the vmstat_sync to happen for unmodified applications seems
undesired, as Matthew Wilcox mentioned:

From: Matthew Wilcox <willy@infradead.org>

"Subject: Re: [PATCH] mm: introduce sysctl file to flush per-cpu vmstat statistics

On Tue, Nov 17, 2020 at 01:28:06PM -0300, Marcelo Tosatti wrote:
> For isolated applications that busy loop (packet processing with DPDK,                                  
> for example), workqueue functions either stall (if the -rt app priority                                 
> is higher than kworker thread priority) or interrupt the -rt app                                        
> (if the -rt app priority is lower than kworker thread priority.                                         

This seems a bit obscure to expect an application to do.  Can we make
this happen automatically when we bind an rt task to a group of CPUs?"

It turns out that is what would make most sense in the field.

And even if a prctl interface is added, a mode where the "flushing of 
pending activities" happens automatically on return to userspace 
would be desired (to allow unmodified applications to take benefit 
of the decreased interruptions by the OS).

So the isolcpus flag is a way to enable/disable this feature.
prctl interface would be another.

Would you prefer a more generic "quiesce OS activities on return 
from system calls" type of flag?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-06 14:05       ` Marcelo Tosatti
@ 2021-07-06 14:09         ` Marcelo Tosatti
  2021-07-06 14:17           ` Marcelo Tosatti
  2021-07-06 16:15           ` Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-06 14:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Christoph Lameter, Thomas Gleixner, Juri Lelli,
	Nitesh Lal, Peter Zijlstra

On Tue, Jul 06, 2021 at 11:05:50AM -0300, Marcelo Tosatti wrote:
> On Tue, Jul 06, 2021 at 03:09:25PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 02, 2021 at 12:28:16PM -0300, Marcelo Tosatti wrote:
> > > 
> > > Hi Frederic,
> > > 
> > > On Fri, Jul 02, 2021 at 02:30:32PM +0200, Frederic Weisbecker wrote:
> > > > On Thu, Jul 01, 2021 at 06:03:36PM -0300, Marcelo Tosatti wrote:
> > > > > The logic to disable vmstat worker thread, when entering
> > > > > nohz full, does not cover all scenarios. For example, it is possible
> > > > > for the following to happen:
> > > > > 
> > > > > 1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
> > > > > 2) app runs mlock, which increases counters for mlock'ed pages.
> > > > > 3) start -RT loop
> > > > > 
> > > > > Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
> > > > > the mlock, vmstat shepherd can restart vmstat worker thread on
> > > > > the CPU in question.
> > > > >  
> > > > > To fix this, optionally sync the vmstat counters when returning
> > > > > from userspace, controllable by a new "vmstat_sync" isolcpus
> > > > > flags (default off).
> > > > 
> > > > Wasn't the plan for such finegrained isolation features to do it at
> > > > the per task level using prctl()?
> > > 
> > > Yes, but its orthogonal: when we integrate the finegrained isolation
> > > interface, will be able to use this code (to sync vmstat counters
> > > on return to userspace) only when userspace informs that it has entered
> > > isolated mode, so you don't incur the performance penalty of frequent
> > > vmstat counter writes when not using isolated apps.
> > > 
> > > This is what the full task isolation task patchset mode is doing
> > > as well (CC'ing Alex BTW).
> > 
> > Right there can be two ways:
> 
> 
>   * An isolcpus flag to request sync of vmstat on all exits
>     to userspace.
> > * A prctl request to sync vmstat only on exit from that prctl
> > * A prctl request to sync vmstat on all subsequent exit from
> >   kernel space.
> 
> * A prctl to expose "vmstat is out of sync" information 
> to userspace, so that it can be queried and flushed
> (Christoph's suggestion:
> https://www.spinics.net/lists/linux-mm/msg243788.html).
> 
> > > This will require modifying applications (and the new kernel with the
> > > exposed interface).
> > > 
> > > But there is demand for fixing this now, for currently existing
> > > binary only applications.
> > 
> > I would agree if it were a regression but it's not. It's merely
> > a new feature and we don't want to rush on a broken interface.
> 
> Well, people out there need it in some form (vmstat sync).
> Can we please agree on an acceptable way to allow this.
> 
> Why its a broken interface? It has good qualities IMO:
> 
> - Its well contained (if you don't need, don't use it).
> - Does not require modifying -RT applications.
> - Works well for a set of applications (where the overhead of
> syncing vmstat is largely irrelevant, but the vmstat_worker 
> interruption is).
> 
> And its patchset integrates part another piece of full task isolation.
> 
> > And I suspect some other people won't like much a new extension
> > to isolcpus.
> 
> Why is that so? 

Ah, yes, that would be PeterZ.

IIRC his main point was that its not runtime changeable.
We can (partially fix that), if that is the case.

Peter, was that the only problem you saw with isolcpus interface?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-06 14:09         ` Marcelo Tosatti
@ 2021-07-06 14:17           ` Marcelo Tosatti
  2021-07-06 16:15           ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-06 14:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Christoph Lameter, Thomas Gleixner, Juri Lelli,
	Nitesh Lal, Peter Zijlstra

On Tue, Jul 06, 2021 at 11:09:20AM -0300, Marcelo Tosatti wrote:
> > > And I suspect some other people won't like much a new extension
> > > to isolcpus.
> > 
> > Why is that so? 
> 
> Ah, yes, that would be PeterZ.
> 
> IIRC his main point was that its not runtime changeable.
> We can (partially fix that), if that is the case.
> 
> Peter, was that the only problem you saw with isolcpus interface?

Oh, and BTW, isolcpus=managed_irq flag was recently added due to another
isolation bug.

This problem is the same category, so i don't see why it should be
treated especially (yes, i agree isolcpus= interface should be 
improved, but thats what is available today).


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-06 14:09         ` Marcelo Tosatti
  2021-07-06 14:17           ` Marcelo Tosatti
@ 2021-07-06 16:15           ` Peter Zijlstra
  2021-07-06 16:53             ` Marcelo Tosatti
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2021-07-06 16:15 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Frederic Weisbecker, linux-kernel, Christoph Lameter,
	Thomas Gleixner, Juri Lelli, Nitesh Lal

On Tue, Jul 06, 2021 at 11:09:20AM -0300, Marcelo Tosatti wrote:
> Peter, was that the only problem you saw with isolcpus interface?

It needs to die, it's a piece of crap. Use cpusets already.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace
  2021-07-06 16:15           ` Peter Zijlstra
@ 2021-07-06 16:53             ` Marcelo Tosatti
  0 siblings, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-06 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, linux-kernel, Christoph Lameter,
	Thomas Gleixner, Juri Lelli, Nitesh Lal

On Tue, Jul 06, 2021 at 06:15:24PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 06, 2021 at 11:09:20AM -0300, Marcelo Tosatti wrote:
> > Peter, was that the only problem you saw with isolcpus interface?
> 
> It needs to die, it's a piece of crap. Use cpusets already.

OK, can do that. So how about, in addition to this patch (which again,
is needed for current systems, so we will have to keep extending it
for the current kernels which patches are backported to, as done with
managed_irqs... note most of the code that is integrated will be reused,
just a different path that enables it).

So what was discussed before was the following:

https://lkml.org/lkml/2020/9/9/1120

Do you have any other comments 
(on the "new file per isolation feature" structure) ?

Would probably want to split the flags per-CPU as well.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-14 20:42 [patch 0/5] optionally perform deferred actions on return to userspace (v3) Marcelo Tosatti
@ 2021-07-14 20:42 ` Marcelo Tosatti
  0 siblings, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-14 20:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Peter Zijlstra, Nicolas Saenz,
	Marcelo Tosatti

The logic to disable vmstat worker thread, when entering
nohz full, does not cover all scenarios. For example, it is possible
for the following to happen:

1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop

Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
the mlock, vmstat shepherd can restart vmstat worker thread on 
the CPU in question.

To fix this, optionally sync the vmstat counters when returning
from userspace, controllable by a new "quiesce_on_exit_to_usermode" isolcpus 
flags (default off).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6-vmstat-update/kernel/sched/isolation.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/sched/isolation.c
+++ linux-2.6-vmstat-update/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
  *
  */
 #include "sched.h"
+#include <linux/vmstat.h>
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -129,6 +130,11 @@ static int __init housekeeping_setup(cha
 		}
 	}
 
+#ifdef CONFIG_SMP
+	if (flags & HK_FLAG_QUIESCE_URET)
+		static_branch_enable(&vmstat_sync_enabled);
+#endif
+
 	housekeeping_flags |= flags;
 
 	free_bootmem_cpumask_var(non_housekeeping_mask);
Index: linux-2.6-vmstat-update/include/linux/vmstat.h
===================================================================
--- linux-2.6-vmstat-update.orig/include/linux/vmstat.h
+++ linux-2.6-vmstat-update/include/linux/vmstat.h
@@ -21,6 +21,23 @@ int sysctl_vm_numa_stat_handler(struct c
 		void *buffer, size_t *length, loff_t *ppos);
 #endif
 
+#ifdef CONFIG_SMP
+DECLARE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+
+extern void __sync_vmstat(void);
+static inline void sync_vmstat(void)
+{
+	if (static_branch_unlikely(&vmstat_sync_enabled))
+		__sync_vmstat();
+}
+#else
+
+static inline void sync_vmstat(void)
+{
+}
+
+#endif
+
 struct reclaim_stat {
 	unsigned nr_dirty;
 	unsigned nr_unqueued_dirty;
Index: linux-2.6-vmstat-update/mm/vmstat.c
===================================================================
--- linux-2.6-vmstat-update.orig/mm/vmstat.c
+++ linux-2.6-vmstat-update/mm/vmstat.c
@@ -28,6 +28,7 @@
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/sched/isolation.h>
 
 #include "internal.h"
 
@@ -308,6 +309,17 @@ void set_pgdat_percpu_threshold(pg_data_
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty);
+
+static inline void mark_vmstat_dirty(void)
+{
+	if (!static_branch_unlikely(&vmstat_sync_enabled))
+		return;
+
+	raw_cpu_write(vmstat_dirty, true);
+}
+
 /*
  * For use when we know that interrupts are disabled,
  * or when we know that preemption is disabled and that
@@ -330,6 +342,7 @@ void __mod_zone_page_state(struct zone *
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_zone_page_state);
 
@@ -361,6 +374,7 @@ void __mod_node_page_state(struct pglist
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_node_page_state);
 
@@ -401,6 +415,7 @@ void __inc_zone_state(struct zone *zone,
 		zone_page_state_add(v + overstep, zone, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -419,6 +434,7 @@ void __inc_node_state(struct pglist_data
 		node_page_state_add(v + overstep, pgdat, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -447,6 +463,7 @@ void __dec_zone_state(struct zone *zone,
 		zone_page_state_add(v - overstep, zone, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -465,6 +482,7 @@ void __dec_node_state(struct pglist_data
 		node_page_state_add(v - overstep, pgdat, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -528,6 +546,7 @@ static inline void mod_zone_state(struct
 
 	if (z)
 		zone_page_state_add(z, zone, item);
+	mark_vmstat_dirty();
 }
 
 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
@@ -596,6 +615,7 @@ static inline void mod_node_state(struct
 
 	if (z)
 		node_page_state_add(z, pgdat, item);
+	mark_vmstat_dirty();
 }
 
 void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
@@ -2006,6 +2026,37 @@ static void vmstat_shepherd(struct work_
 		round_jiffies_relative(sysctl_stat_interval));
 }
 
+void __sync_vmstat(void)
+{
+	int cpu;
+
+	cpu = get_cpu();
+	if (housekeeping_cpu(cpu, HK_FLAG_QUIESCE_URET)) {
+		put_cpu();
+		return;
+	}
+
+	if (!raw_cpu_read(vmstat_dirty)) {
+		put_cpu();
+		return;
+	}
+
+	refresh_cpu_vm_stats(false);
+	raw_cpu_write(vmstat_dirty, false);
+	put_cpu();
+
+	/*
+	 * If task is migrated to another CPU between put_cpu
+	 * and cancel_delayed_work_sync, the code below might
+	 * cancel vmstat_update work for a different cpu
+	 * (than the one from which the vmstats were flushed).
+	 *
+	 * However, vmstat shepherd will re-enable it later,
+	 * so its harmless.
+	 */
+	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+}
+
 static void __init start_shepherd_timer(void)
 {
 	int cpu;
Index: linux-2.6-vmstat-update/kernel/entry/common.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/entry/common.c
+++ linux-2.6-vmstat-update/kernel/entry/common.c
@@ -6,6 +6,7 @@
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
+#include <linux/vmstat.h>
 
 #include "common.h"
 
@@ -290,6 +291,7 @@ static void syscall_exit_to_user_mode_pr
  */
 static void isolation_exit_to_user_mode_prepare(void)
 {
+	sync_vmstat();
 }
 
 static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-12  9:05   ` Christoph Lameter
  2021-07-12 10:30     ` Marcelo Tosatti
@ 2021-07-13 19:30     ` Marcelo Tosatti
  1 sibling, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-13 19:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal, Peter Zijlstra, Nicolas Saenz

On Mon, Jul 12, 2021 at 11:05:58AM +0200, Christoph Lameter wrote:
> On Fri, 9 Jul 2021, Marcelo Tosatti wrote:
> 
> > +
> > +	if (!static_branch_unlikely(&vmstat_sync_enabled))
> > +		return;
> > +
> > +	cpu = smp_processor_id();
> > +
> > +	if (housekeeping_cpu(cpu, HK_FLAG_QUIESCE_URET))
> > +		return;
> > +
> > +	per_cpu(vmstat_dirty, smp_processor_id()) = true;
> > +}
> 
> And you are going to insert this into all the performance critical VM
> statistics handling. Inline?
> 
> And why do you need to do such things as to determine the processor? At
> mininum do this using this cpu operations like the vmstat functions
> currently do. And, lucky us, now we also have
> more issues why we should disable preemption etc etc while handling vm
> counters.

OK, hopefully this is what you mean.

Any other comments?

Index: linux-2.6-vmstat-update/kernel/sched/isolation.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/sched/isolation.c
+++ linux-2.6-vmstat-update/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
  *
  */
 #include "sched.h"
+#include <linux/vmstat.h>
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -129,6 +130,11 @@ static int __init housekeeping_setup(cha
 		}
 	}
 
+#ifdef CONFIG_SMP
+	if (flags & HK_FLAG_QUIESCE_URET)
+		static_branch_enable(&vmstat_sync_enabled);
+#endif
+
 	housekeeping_flags |= flags;
 
 	free_bootmem_cpumask_var(non_housekeeping_mask);
Index: linux-2.6-vmstat-update/include/linux/vmstat.h
===================================================================
--- linux-2.6-vmstat-update.orig/include/linux/vmstat.h
+++ linux-2.6-vmstat-update/include/linux/vmstat.h
@@ -21,6 +21,23 @@ int sysctl_vm_numa_stat_handler(struct c
 		void *buffer, size_t *length, loff_t *ppos);
 #endif
 
+#ifdef CONFIG_SMP
+DECLARE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+
+extern void __sync_vmstat(void);
+static inline void sync_vmstat(void)
+{
+	if (static_branch_unlikely(&vmstat_sync_enabled))
+		__sync_vmstat();
+}
+#else
+
+static inline void sync_vmstat(void)
+{
+}
+
+#endif
+
 struct reclaim_stat {
 	unsigned nr_dirty;
 	unsigned nr_unqueued_dirty;
Index: linux-2.6-vmstat-update/mm/vmstat.c
===================================================================
--- linux-2.6-vmstat-update.orig/mm/vmstat.c
+++ linux-2.6-vmstat-update/mm/vmstat.c
@@ -28,6 +28,7 @@
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/sched/isolation.h>
 
 #include "internal.h"
 
@@ -308,6 +309,17 @@ void set_pgdat_percpu_threshold(pg_data_
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty);
+
+static inline void mark_vmstat_dirty(void)
+{
+	if (!static_branch_unlikely(&vmstat_sync_enabled))
+		return;
+
+	raw_cpu_write(vmstat_dirty, true);
+}
+
 /*
  * For use when we know that interrupts are disabled,
  * or when we know that preemption is disabled and that
@@ -330,6 +342,7 @@ void __mod_zone_page_state(struct zone *
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_zone_page_state);
 
@@ -361,6 +374,7 @@ void __mod_node_page_state(struct pglist
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_node_page_state);
 
@@ -401,6 +415,7 @@ void __inc_zone_state(struct zone *zone,
 		zone_page_state_add(v + overstep, zone, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -419,6 +434,7 @@ void __inc_node_state(struct pglist_data
 		node_page_state_add(v + overstep, pgdat, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -447,6 +463,7 @@ void __dec_zone_state(struct zone *zone,
 		zone_page_state_add(v - overstep, zone, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -465,6 +482,7 @@ void __dec_node_state(struct pglist_data
 		node_page_state_add(v - overstep, pgdat, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -528,6 +546,7 @@ static inline void mod_zone_state(struct
 
 	if (z)
 		zone_page_state_add(z, zone, item);
+	mark_vmstat_dirty();
 }
 
 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
@@ -596,6 +615,7 @@ static inline void mod_node_state(struct
 
 	if (z)
 		node_page_state_add(z, pgdat, item);
+	mark_vmstat_dirty();
 }
 
 void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
@@ -2006,6 +2026,32 @@ static void vmstat_shepherd(struct work_
 		round_jiffies_relative(sysctl_stat_interval));
 }
 
+void __sync_vmstat(void)
+{
+	int cpu;
+
+	cpu = get_cpu();
+	if (raw_cpu_read(vmstat_dirty) == false) {
+		put_cpu();
+		return;
+	}
+
+	refresh_cpu_vm_stats(false);
+	raw_cpu_write(vmstat_dirty, false);
+	put_cpu();
+
+	/*
+	 * If task is migrated to another CPU between put_cpu
+	 * and cancel_delayed_work_sync, the code below might
+	 * cancel vmstat_update work for a different cpu
+	 * (than the one from which the vmstats were flushed).
+	 *
+	 * However, vmstat shepherd will re-enable it later,
+	 * so its harmless.
+	 */
+	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+}
+
 static void __init start_shepherd_timer(void)
 {
 	int cpu;
Index: linux-2.6-vmstat-update/kernel/entry/common.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/entry/common.c
+++ linux-2.6-vmstat-update/kernel/entry/common.c
@@ -6,6 +6,7 @@
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
+#include <linux/vmstat.h>
 
 #include "common.h"
 
@@ -290,6 +291,7 @@ static void syscall_exit_to_user_mode_pr
  */
 static void isolation_exit_to_user_mode_prepare(void)
 {
+	sync_vmstat();
 }
 
 static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-12  9:05   ` Christoph Lameter
@ 2021-07-12 10:30     ` Marcelo Tosatti
  2021-07-13 19:30     ` Marcelo Tosatti
  1 sibling, 0 replies; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-12 10:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal, Peter Zijlstra, Nicolas Saenz

On Mon, Jul 12, 2021 at 11:05:58AM +0200, Christoph Lameter wrote:
> On Fri, 9 Jul 2021, Marcelo Tosatti wrote:
> 
> > +
> > +	if (!static_branch_unlikely(&vmstat_sync_enabled))
> > +		return;
> > +
> > +	cpu = smp_processor_id();
> > +
> > +	if (housekeeping_cpu(cpu, HK_FLAG_QUIESCE_URET))
> > +		return;
> > +
> > +	per_cpu(vmstat_dirty, smp_processor_id()) = true;
> > +}
> 
> And you are going to insert this into all the performance critical VM
> statistics handling. Inline?

Yes, this is what the patch below is supposed to do (maybe it missed
some statistics?).

The alternative would be some equivalent of need_update on return to
userspace (for all system call returns) (when the HK_FLAG_QUIESCE_URET 
flag is enabled).

> And why do you need to do such things as to determine the processor? At
> mininum do this using this cpu operations like the vmstat functions
> currently do.

OK, will do that and resend.

> And, lucky us, now we also have
> more issues why we should disable preemption etc etc while handling vm
> counters.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-09 17:37 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters " Marcelo Tosatti
@ 2021-07-12  9:05   ` Christoph Lameter
  2021-07-12 10:30     ` Marcelo Tosatti
  2021-07-13 19:30     ` Marcelo Tosatti
  0 siblings, 2 replies; 33+ messages in thread
From: Christoph Lameter @ 2021-07-12  9:05 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Thomas Gleixner, Frederic Weisbecker, Juri Lelli,
	Nitesh Lal, Peter Zijlstra, Nicolas Saenz

On Fri, 9 Jul 2021, Marcelo Tosatti wrote:

> +
> +	if (!static_branch_unlikely(&vmstat_sync_enabled))
> +		return;
> +
> +	cpu = smp_processor_id();
> +
> +	if (housekeeping_cpu(cpu, HK_FLAG_QUIESCE_URET))
> +		return;
> +
> +	per_cpu(vmstat_dirty, smp_processor_id()) = true;
> +}

And you are going to insert this into all the performance critical VM
statistics handling. Inline?

And why do you need to do such things as to determine the processor? At
mininum do this using this cpu operations like the vmstat functions
currently do. And, lucky us, now we also have
more issues why we should disable preemption etc etc while handling vm
counters.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
  2021-07-09 17:37 [patch 0/5] optionally perform deferred actions " Marcelo Tosatti
@ 2021-07-09 17:37 ` Marcelo Tosatti
  2021-07-12  9:05   ` Christoph Lameter
  0 siblings, 1 reply; 33+ messages in thread
From: Marcelo Tosatti @ 2021-07-09 17:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Christoph Lameter, Thomas Gleixner, Frederic Weisbecker,
	Juri Lelli, Nitesh Lal, Peter Zijlstra, Nicolas Saenz,
	Marcelo Tosatti

The logic to disable vmstat worker thread, when entering
nohz full, does not cover all scenarios. For example, it is possible
for the following to happen:

1) enter nohz_full, which calls refresh_cpu_vm_stats, syncing the stats.
2) app runs mlock, which increases counters for mlock'ed pages.
3) start -RT loop

Since refresh_cpu_vm_stats from nohz_full logic can happen _before_
the mlock, vmstat shepherd can restart vmstat worker thread on 
the CPU in question.

To fix this, optionally sync the vmstat counters when returning
from userspace, controllable by a new "quiesce_on_exit_to_usermode" isolcpus 
flags (default off).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: linux-2.6-vmstat-update/kernel/sched/isolation.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/sched/isolation.c
+++ linux-2.6-vmstat-update/kernel/sched/isolation.c
@@ -8,6 +8,7 @@
  *
  */
 #include "sched.h"
+#include <linux/vmstat.h>
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -129,6 +130,11 @@ static int __init housekeeping_setup(cha
 		}
 	}
 
+#ifdef CONFIG_SMP
+	if (flags & HK_FLAG_QUIESCE_URET)
+		static_branch_enable(&vmstat_sync_enabled);
+#endif
+
 	housekeeping_flags |= flags;
 
 	free_bootmem_cpumask_var(non_housekeeping_mask);
Index: linux-2.6-vmstat-update/include/linux/vmstat.h
===================================================================
--- linux-2.6-vmstat-update.orig/include/linux/vmstat.h
+++ linux-2.6-vmstat-update/include/linux/vmstat.h
@@ -21,6 +21,23 @@ int sysctl_vm_numa_stat_handler(struct c
 		void *buffer, size_t *length, loff_t *ppos);
 #endif
 
+#ifdef CONFIG_SMP
+DECLARE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+
+extern void __sync_vmstat(void);
+static inline void sync_vmstat(void)
+{
+	if (static_branch_unlikely(&vmstat_sync_enabled))
+		__sync_vmstat();
+}
+#else
+
+static inline void sync_vmstat(void)
+{
+}
+
+#endif
+
 struct reclaim_stat {
 	unsigned nr_dirty;
 	unsigned nr_unqueued_dirty;
Index: linux-2.6-vmstat-update/mm/vmstat.c
===================================================================
--- linux-2.6-vmstat-update.orig/mm/vmstat.c
+++ linux-2.6-vmstat-update/mm/vmstat.c
@@ -28,6 +28,7 @@
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/sched/isolation.h>
 
 #include "internal.h"
 
@@ -308,6 +309,24 @@ void set_pgdat_percpu_threshold(pg_data_
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(vmstat_sync_enabled);
+static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty);
+
+static inline void mark_vmstat_dirty(void)
+{
+	int cpu;
+
+	if (!static_branch_unlikely(&vmstat_sync_enabled))
+		return;
+
+	cpu = smp_processor_id();
+
+	if (housekeeping_cpu(cpu, HK_FLAG_QUIESCE_URET))
+		return;
+
+	per_cpu(vmstat_dirty, smp_processor_id()) = true;
+}
+
 /*
  * For use when we know that interrupts are disabled,
  * or when we know that preemption is disabled and that
@@ -330,6 +349,7 @@ void __mod_zone_page_state(struct zone *
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_zone_page_state);
 
@@ -361,6 +381,7 @@ void __mod_node_page_state(struct pglist
 		x = 0;
 	}
 	__this_cpu_write(*p, x);
+	mark_vmstat_dirty();
 }
 EXPORT_SYMBOL(__mod_node_page_state);
 
@@ -401,6 +422,7 @@ void __inc_zone_state(struct zone *zone,
 		zone_page_state_add(v + overstep, zone, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -419,6 +441,7 @@ void __inc_node_state(struct pglist_data
 		node_page_state_add(v + overstep, pgdat, item);
 		__this_cpu_write(*p, -overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __inc_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -447,6 +470,7 @@ void __dec_zone_state(struct zone *zone,
 		zone_page_state_add(v - overstep, zone, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_node_state(struct pglist_data *pgdat, enum node_stat_item item)
@@ -465,6 +489,7 @@ void __dec_node_state(struct pglist_data
 		node_page_state_add(v - overstep, pgdat, item);
 		__this_cpu_write(*p, overstep);
 	}
+	mark_vmstat_dirty();
 }
 
 void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
@@ -528,6 +553,7 @@ static inline void mod_zone_state(struct
 
 	if (z)
 		zone_page_state_add(z, zone, item);
+	mark_vmstat_dirty();
 }
 
 void mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
@@ -596,6 +622,7 @@ static inline void mod_node_state(struct
 
 	if (z)
 		node_page_state_add(z, pgdat, item);
+	mark_vmstat_dirty();
 }
 
 void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
@@ -2006,6 +2033,32 @@ static void vmstat_shepherd(struct work_
 		round_jiffies_relative(sysctl_stat_interval));
 }
 
+void __sync_vmstat(void)
+{
+	int cpu;
+
+	cpu = get_cpu();
+	if (per_cpu(vmstat_dirty, cpu) == false) {
+		put_cpu();
+		return;
+	}
+
+	refresh_cpu_vm_stats(false);
+	per_cpu(vmstat_dirty, cpu) = false;
+	put_cpu();
+
+	/*
+	 * If task is migrated to another CPU between put_cpu
+	 * and cancel_delayed_work_sync, the code below might
+	 * cancel vmstat_update work for a different cpu
+	 * (than the one from which the vmstats were flushed).
+	 *
+	 * However, vmstat shepherd will re-enable it later,
+	 * so its harmless.
+	 */
+	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+}
+
 static void __init start_shepherd_timer(void)
 {
 	int cpu;
Index: linux-2.6-vmstat-update/kernel/entry/common.c
===================================================================
--- linux-2.6-vmstat-update.orig/kernel/entry/common.c
+++ linux-2.6-vmstat-update/kernel/entry/common.c
@@ -6,6 +6,7 @@
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 #include <linux/tick.h>
+#include <linux/vmstat.h>
 
 #include "common.h"
 
@@ -290,6 +291,7 @@ static void syscall_exit_to_user_mode_pr
  */
 static void isolation_exit_to_user_mode_prepare(void)
 {
+	sync_vmstat();
 }
 
 static __always_inline void __syscall_exit_to_user_mode_work(struct pt_regs *regs)



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace
@ 2021-07-03  4:54 kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2021-07-03  4:54 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 3220 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210701210458.350881923@fuller.cnet>
References: <20210701210458.350881923@fuller.cnet>
TO: Marcelo Tosatti <mtosatti@redhat.com>
TO: linux-kernel(a)vger.kernel.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Frederic Weisbecker <frederic@kernel.org>
CC: Juri Lelli <juri.lelli@redhat.com>
CC: Nitesh Lal <nilal@redhat.com>
CC: Marcelo Tosatti <mtosatti@redhat.com>

Hi Marcelo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on tip/master linus/master v5.13 next-20210701]
[cannot apply to hnaz-linux-mm/master tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
:::::: branch date: 32 hours ago
:::::: commit date: 32 hours ago
config: x86_64-randconfig-b001-20210630 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 9eb613b2de3163686b1a4bd1160f15ac56a4b083)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # apt-get install iwyu # include-what-you-use
        # https://github.com/0day-ci/linux/commit/b973a70c0670675073265d2cbee70a36bda3273e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Marcelo-Tosatti/optionally-sync-per-CPU-vmstats-counter-on-return-to-userspace/20210702-050826
        git checkout b973a70c0670675073265d2cbee70a36bda3273e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross C=1 CHECK=iwyu O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


iwyu warnings: (new ones prefixed by >>)
   mm/vmstat.c:18:1: iwyu: warning: superfluous #include <linux/cpu.h>
   mm/vmstat.c:13:1: iwyu: warning: superfluous #include <linux/fs.h>
   mm/vmstat.c:28:1: iwyu: warning: superfluous #include <linux/mm_inline.h>
   mm/vmstat.c:29:1: iwyu: warning: superfluous #include <linux/page_ext.h>
   mm/vmstat.c:30:1: iwyu: warning: superfluous #include <linux/page_owner.h>
>> mm/vmstat.c:31:1: iwyu: warning: superfluous #include <linux/sched/isolation.h>

vim +31 mm/vmstat.c

b973a70c067067 Marcelo Tosatti 2021-07-01 @31  #include <linux/sched/isolation.h>
6e543d5780e36f Lisa Du         2013-09-11  32  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 41897 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-07-14 20:43 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01 21:03 [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Marcelo Tosatti
2021-07-01 21:03 ` [patch 1/5] sched: isolation: introduce vmstat_sync isolcpu flags Marcelo Tosatti
2021-07-01 21:03 ` [patch 2/5] common entry: add hook for isolation to __syscall_exit_to_user_mode_work Marcelo Tosatti
2021-07-01 21:03 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti
2021-07-01 23:11   ` kernel test robot
2021-07-01 23:11     ` kernel test robot
2021-07-02  6:50   ` kernel test robot
2021-07-02  6:50     ` kernel test robot
2021-07-01 21:03 ` [patch 4/5] mm: vmstat: move need_update Marcelo Tosatti
2021-07-01 21:03 ` [patch 5/5] mm: vmstat_refresh: avoid queueing work item if cpu stats are clean Marcelo Tosatti
2021-07-02  4:10   ` kernel test robot
2021-07-02  4:10     ` kernel test robot
2021-07-02  4:43   ` kernel test robot
2021-07-02  4:43     ` kernel test robot
2021-07-02  8:00 ` [patch 0/5] optionally sync per-CPU vmstats counter on return to userspace Christoph Lameter
2021-07-02 11:52   ` Marcelo Tosatti
2021-07-02 11:59   ` Marcelo Tosatti
2021-07-05 14:26     ` Christoph Lameter
2021-07-05 14:45       ` Marcelo Tosatti
2021-07-02 12:30 ` Frederic Weisbecker
2021-07-02 15:28   ` Marcelo Tosatti
2021-07-06 13:09     ` Frederic Weisbecker
2021-07-06 14:05       ` Marcelo Tosatti
2021-07-06 14:09         ` Marcelo Tosatti
2021-07-06 14:17           ` Marcelo Tosatti
2021-07-06 16:15           ` Peter Zijlstra
2021-07-06 16:53             ` Marcelo Tosatti
2021-07-03  4:54 [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters " kernel test robot
2021-07-09 17:37 [patch 0/5] optionally perform deferred actions " Marcelo Tosatti
2021-07-09 17:37 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters " Marcelo Tosatti
2021-07-12  9:05   ` Christoph Lameter
2021-07-12 10:30     ` Marcelo Tosatti
2021-07-13 19:30     ` Marcelo Tosatti
2021-07-14 20:42 [patch 0/5] optionally perform deferred actions on return to userspace (v3) Marcelo Tosatti
2021-07-14 20:42 ` [patch 3/5] mm: vmstat: optionally flush per-CPU vmstat counters on return to userspace Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.