All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>, <linux-mm@kvack.org>,
	<kernel-team@fb.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating false warnings
Date: Mon, 3 Aug 2020 17:40:12 -0700	[thread overview]
Message-ID: <20200804004012.GA1049259@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2007311915130.9716@eggly.anvils>

On Fri, Jul 31, 2020 at 07:17:05PM -0700, Hugh Dickins wrote:
> On Fri, 31 Jul 2020, Roman Gushchin wrote:
> > On Thu, Jul 30, 2020 at 09:06:55PM -0700, Hugh Dickins wrote:
> > > 
> > > Though another alternative did occur to me overnight: we could
> > > scrap the logged warning, and show "nr_whatever -53" as output
> > > from /proc/sys/vm/stat_refresh: that too would be acceptable
> > > to me, and you redirect to /dev/null.
> > 
> > It sounds like a good idea to me. Do you want me to prepare a patch?
> 
> Yes, if you like that one best, please do prepare a patch - thanks!

Hi Hugh,

I mastered a patch (attached below), but honestly I can't say I like it.
The resulting interface is confusing: we don't generally use sysctls to
print debug data and/or warnings.

I thought about treating a write to this sysctls as setting the threshold,
so that "echo 0 > /proc/sys/vm/stat_refresh" would warn on all negative
entries, and "cat /proc/sys/vm/stat_refresh" would use the default threshold
as in my patch. But this breaks  to some extent the current ABI, as passing
an incorrect value will result in -EINVAL instead of passing (as now).

Overall I still think we shouldn't warn on any values inside the possible
range, as it's not an indication of any kind of error. The only reason
why we see some values going negative and some not, is that some of them
are updated more frequently than others, and some are bouncing around
zero, while other can't reach zero too easily (like the number of free pages).

Actually, if someone wants to ensure that numbers are accurate,
we have to temporarily set the threshold to 0, then flush the percpu data
and only then check atomics. In the current design flushing percpu data
matters for only slowly updated counters, as all others will run away while
we're waiting for the flush. So if we're targeting some slowly updating
counters, maybe we should warn only on them being negative, Idk.

Thanks!

--

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6eecfcbbfe79..1d2f2471bb78 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1815,13 +1815,28 @@ static void refresh_vm_stats(struct work_struct *work)
        refresh_cpu_vm_stats(true);
 }
 
+static void warn_vmstat(void *buffer, size_t *lenp, loff_t *ppos,
+                       const char *name, long val)
+{
+       int len;
+
+       len = snprintf(buffer + *ppos, *lenp, "%s %lu\n", name, val);
+       *lenp -= len;
+       *ppos += len;
+}
+
 int vmstat_refresh(struct ctl_table *table, int write,
                   void *buffer, size_t *lenp, loff_t *ppos)
 {
-       long val, max_drift;
+       long val;
        int err;
        int i;
 
+       if (!*lenp || (*ppos && !write)) {
+               *lenp = 0;
+               return 0;
+       }
+
        /*
         * The regular update, every sysctl_stat_interval, may come later
         * than expected: leaving a significant amount in per_cpu buckets.
@@ -1837,35 +1852,24 @@ int vmstat_refresh(struct ctl_table *table, int write,
        /*
         * Since global_zone_page_state() etc. are so careful to hide
         * transiently negative values, report an error here if any of
-        * the stats is negative and are less than the maximum drift value,
-        * so we know to go looking for imbalance.
+        * the stats is negative, so we know to go looking for imbalance.
         */
-       max_drift = num_online_cpus() * MAX_THRESHOLD;
-
        for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
                val = atomic_long_read(&vm_zone_stat[i]);
-               if (val < -max_drift) {
-                       pr_warn("%s: %s %ld\n",
-                               __func__, zone_stat_name(i), val);
-                       err = -EINVAL;
-               }
+               if (!write && val < 0)
+                       warn_vmstat(buffer, lenp, ppos, zone_stat_name(i), val);
        }
 #ifdef CONFIG_NUMA
        for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) {
                val = atomic_long_read(&vm_numa_stat[i]);
-               if (val < -max_drift) {
-                       pr_warn("%s: %s %ld\n",
-                               __func__, numa_stat_name(i), val);
-                       err = -EINVAL;
-               }
+               if (!write && val < 0)
+                       warn_vmstat(buffer, lenp, ppos, numa_stat_name(i), val);
        }
 #endif
        if (err)
                return err;
        if (write)
                *ppos += *lenp;
-       else
-               *lenp = 0;
        return 0;
 }
 #endif /* CONFIG_PROC_FS */

  reply	other threads:[~2020-08-04  0:40 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14 17:39 [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating false warnings Roman Gushchin
2020-07-20  8:03 ` Michal Hocko
2020-07-20 20:20   ` Roman Gushchin
2020-07-30  3:45 ` Hugh Dickins
2020-07-30  3:45   ` Hugh Dickins
2020-07-30 16:23   ` Roman Gushchin
2020-07-31  4:06     ` Hugh Dickins
2020-07-31  4:06       ` Hugh Dickins
2020-08-01  1:18       ` Roman Gushchin
2020-08-01  2:17         ` Hugh Dickins
2020-08-01  2:17           ` Hugh Dickins
2020-08-04  0:40           ` Roman Gushchin [this message]
2020-08-06  3:01             ` Hugh Dickins
2020-08-06  3:01               ` Hugh Dickins
2020-08-06  3:51               ` Roman Gushchin
2020-08-06 16:41                 ` Hugh Dickins
2020-08-06 16:41                   ` Hugh Dickins
2020-08-06 23:38               ` Roman Gushchin
2020-08-07  0:16                 ` Hugh Dickins
2020-08-07  0:16                   ` Hugh Dickins
2020-08-07  1:25                 ` Andrew Morton
2021-02-24  7:24                   ` Hugh Dickins
2021-02-24  7:24                     ` Hugh Dickins
2021-02-25  1:53                     ` Roman Gushchin
2021-02-25 17:21                       ` Hugh Dickins
2021-02-25 17:21                         ` Hugh Dickins
2021-02-25 18:06                         ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804004012.GA1049259@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.