All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, kernel-team@fb.com,
	linux-kernel@vger.kernel.org, Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating false warnings
Date: Mon, 20 Jul 2020 10:03:49 +0200	[thread overview]
Message-ID: <20200720080349.GC18535@dhcp22.suse.cz> (raw)
In-Reply-To: <20200714173920.3319063-1-guro@fb.com>

On Tue 14-07-20 10:39:20, Roman Gushchin wrote:
> I've noticed a number of warnings like "vmstat_refresh: nr_free_cma
> -5" or "vmstat_refresh: nr_zone_write_pending -11" on our production
> hosts. The numbers of these warnings were relatively low and stable,
> so it didn't look like we are systematically leaking the counters.
> The corresponding vmstat counters also looked sane.
> 
> These warnings are generated by the vmstat_refresh() function, which
> assumes that atomic zone and numa counters can't go below zero.
> However, on a SMP machine it's not quite right: due to per-cpu
> caching it can in theory be as low as -(zone threshold) * NR_CPUs.
> 
> For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES
> reached 0. Then we've reclaimed a small number of cma pages on each
> CPU except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are
> slightly positive (the atomic counter is still 0). Then somebody on
> CPU0 consumes all these pages. The number of pages can easily exceed
> the threshold and a negative value will be committed to the atomic
> counter.
> 
> To fix the problem and avoid generating false warnings, let's just
> relax the condition and warn only if the value is less than minus
> the maximum theoretically possible drift value, which is 125 *
> number of online CPUs. It will still allow to catch systematic leaks,
> but will not generate bogus warnings.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Hugh Dickins <hughd@google.com>

Acked-by: Michal Hocko <mhocko@suse.com>

One minor nit which can be handled by a separate patch but now that you
are touching this code...

> ---
>  Documentation/admin-guide/sysctl/vm.rst |  4 ++--
>  mm/vmstat.c                             | 30 ++++++++++++++++---------
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 4b9d2e8e9142..95fb80d0c606 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -822,8 +822,8 @@ e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
>  
>  As a side-effect, it also checks for negative totals (elsewhere reported
>  as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
> -(At time of writing, a few stats are known sometimes to be found negative,
> -with no ill effects: errors and warnings on these stats are suppressed.)
> +(On a SMP machine some stats can temporarily become negative, with no ill
> +effects: errors and warnings on these stats are suppressed.)
>  
>  
>  numa_stat
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index a21140373edb..8f0ef8aaf8ee 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -169,6 +169,8 @@ EXPORT_SYMBOL(vm_node_stat);
>  
>  #ifdef CONFIG_SMP
>  
> +#define MAX_THRESHOLD 125

This would deserve a comment. 88f5acf88ae6a didn't really explain why
this specific value has been selected but the specific value shouldn't
really matter much. I would go with the following at least.
"
Maximum sync threshold for per-cpu vmstat counters. 
"
> +
>  int calculate_pressure_threshold(struct zone *zone)
>  {
>  	int threshold;
> @@ -186,11 +188,9 @@ int calculate_pressure_threshold(struct zone *zone)
>  	threshold = max(1, (int)(watermark_distance / num_online_cpus()));
>  
>  	/*
> -	 * Maximum threshold is 125
> +	 * Threshold is capped by MAX_THRESHOLD
>  	 */
> -	threshold = min(125, threshold);
> -
> -	return threshold;
> +	return min(MAX_THRESHOLD, threshold);
>  }
>  
>  int calculate_normal_threshold(struct zone *zone)
> @@ -610,6 +610,9 @@ void dec_node_page_state(struct page *page, enum node_stat_item item)
>  }
>  EXPORT_SYMBOL(dec_node_page_state);
>  #else
> +
> +#define MAX_THRESHOLD 0
> +
>  /*
>   * Use interrupt disable to serialize counter updates
>   */
> @@ -1810,7 +1813,7 @@ static void refresh_vm_stats(struct work_struct *work)
>  int vmstat_refresh(struct ctl_table *table, int write,
>  		   void *buffer, size_t *lenp, loff_t *ppos)
>  {
> -	long val;
> +	long val, max_drift;
>  	int err;
>  	int i;
>  
> @@ -1821,17 +1824,22 @@ int vmstat_refresh(struct ctl_table *table, int write,
>  	 * pages, immediately after running a test.  /proc/sys/vm/stat_refresh,
>  	 * which can equally be echo'ed to or cat'ted from (by root),
>  	 * can be used to update the stats just before reading them.
> -	 *
> -	 * Oh, and since global_zone_page_state() etc. are so careful to hide
> -	 * transiently negative values, report an error here if any of
> -	 * the stats is negative, so we know to go looking for imbalance.
>  	 */
>  	err = schedule_on_each_cpu(refresh_vm_stats);
>  	if (err)
>  		return err;
> +
> +	/*
> +	 * Since global_zone_page_state() etc. are so careful to hide
> +	 * transiently negative values, report an error here if any of
> +	 * the stats is negative and are less than the maximum drift value,
> +	 * so we know to go looking for imbalance.
> +	 */
> +	max_drift = num_online_cpus() * MAX_THRESHOLD;
> +
>  	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
>  		val = atomic_long_read(&vm_zone_stat[i]);
> -		if (val < 0) {
> +		if (val < -max_drift) {
>  			pr_warn("%s: %s %ld\n",
>  				__func__, zone_stat_name(i), val);
>  			err = -EINVAL;
> @@ -1840,7 +1848,7 @@ int vmstat_refresh(struct ctl_table *table, int write,
>  #ifdef CONFIG_NUMA
>  	for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) {
>  		val = atomic_long_read(&vm_numa_stat[i]);
> -		if (val < 0) {
> +		if (val < -max_drift) {
>  			pr_warn("%s: %s %ld\n",
>  				__func__, numa_stat_name(i), val);
>  			err = -EINVAL;
> -- 
> 2.26.2

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-07-20  8:03 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14 17:39 [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating false warnings Roman Gushchin
2020-07-20  8:03 ` Michal Hocko [this message]
2020-07-20 20:20   ` Roman Gushchin
2020-07-30  3:45 ` Hugh Dickins
2020-07-30  3:45   ` Hugh Dickins
2020-07-30 16:23   ` Roman Gushchin
2020-07-31  4:06     ` Hugh Dickins
2020-07-31  4:06       ` Hugh Dickins
2020-08-01  1:18       ` Roman Gushchin
2020-08-01  2:17         ` Hugh Dickins
2020-08-01  2:17           ` Hugh Dickins
2020-08-04  0:40           ` Roman Gushchin
2020-08-06  3:01             ` Hugh Dickins
2020-08-06  3:01               ` Hugh Dickins
2020-08-06  3:51               ` Roman Gushchin
2020-08-06 16:41                 ` Hugh Dickins
2020-08-06 16:41                   ` Hugh Dickins
2020-08-06 23:38               ` Roman Gushchin
2020-08-07  0:16                 ` Hugh Dickins
2020-08-07  0:16                   ` Hugh Dickins
2020-08-07  1:25                 ` Andrew Morton
2021-02-24  7:24                   ` Hugh Dickins
2021-02-24  7:24                     ` Hugh Dickins
2021-02-25  1:53                     ` Roman Gushchin
2021-02-25 17:21                       ` Hugh Dickins
2021-02-25 17:21                         ` Hugh Dickins
2021-02-25 18:06                         ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200720080349.GC18535@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.