From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-24.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AF5AC433E0 for ; Tue, 2 Mar 2021 07:11:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 01495600CD for ; Tue, 2 Mar 2021 07:11:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1575198AbhCBDyF (ORCPT ); Mon, 1 Mar 2021 22:54:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245122AbhCAWJr (ORCPT ); Mon, 1 Mar 2021 17:09:47 -0500 Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89FB5C061756 for ; Mon, 1 Mar 2021 14:09:06 -0800 (PST) Received: by mail-ot1-x32f.google.com with SMTP id f33so18030558otf.11 for ; Mon, 01 Mar 2021 14:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=boKdmhhVglqyH3x2SwJtdSPRESqRj+XFpHZ19kWLcLuk3PdvxHt5zGQP6dOT0EaDeo UAhRXu1N9DnBApCp3gGBf4LBn0n8Dai32ehLTW6H5D/Cel9MKczU4C+4Zt1d03cv4kmc Skm3GeaiLYp3OoNm0qX5meZzXHLtPNrlrcgF6rSqTE/44tNm2ZDPlleBJK24R8W683cy /oXOq4iaPmrlk/zAgkLX0OC8ptv8Q1XKXrRKIZZx11lY1uJX+u+lIIeDKaAwIW+arfFp Js5fQtozihwy2btyyCKRZqJ1Yeg3j797Swlt5sFfA05bvQtdYkFZd1M07DA21Z84MYuY lk+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=qk8qN/V1gAREkL92UPR1Ch4TdXR7gbOf7gVuazqRlxZOpwPZpEbb0czcjWCNMpLyhR 4BoMHdtldCda0yiAvKMiHz+OmFBN48vMq5aWQlfOz81b7hT0GI+H4Yo8aVWH9yQugrdt GxjXF3WhjWmkos72tFCnb8ghXY92FoTYJ/DtYlUqsyNp1lujEcA1aiw2M7+g6Eu+AyQo GhDKDEm30y+4QFXZced9UasbtJELYPsv9zFhd7/Ea3iNS3/qq38C68shZTmnLOFGdDXQ ZRxjp/NYf71833zE/kr59WIYeBk9u8gT8P0YwtgFUr7Z1o79jTwMsClaquYcJ1MKvQEY 5b7Q== X-Gm-Message-State: AOAM53130qMoJzE3jbTeelNMETsyL0T5ZG70U2Xk+u40bs7zJsRzcLDp QpqOnZv6SHE+QEooN3RQatoxIA== X-Google-Smtp-Source: ABdhPJw4VVTKDPyHM3gxFOWW4ZbqCUOWW/KT4E7UF/D7mb7eZGiYr3VyqVCVYwwACXFbW0aYyyiHCg== X-Received: by 2002:a9d:7a88:: with SMTP id l8mr15019090otn.289.1614636545715; Mon, 01 Mar 2021 14:09:05 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id p12sm3735344oon.12.2021.03.01.14.09.04 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 01 Mar 2021 14:09:05 -0800 (PST) Date: Mon, 1 Mar 2021 14:08:17 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Roman Gushchin cc: Hugh Dickins , Andrew Morton , Johannes Weiner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/4] mm: /proc/sys/vm/stat_refresh skip checking known negative stats In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 28 Feb 2021, Roman Gushchin wrote: > On Thu, Feb 25, 2021 at 03:14:03PM -0800, Hugh Dickins wrote: > > vmstat_refresh() can occasionally catch nr_zone_write_pending and > > nr_writeback when they are transiently negative. The reason is partly > > that the interrupt which decrements them in test_clear_page_writeback() > > can come in before __test_set_page_writeback() got to increment them; > > but transient negatives are still seen even when that is prevented, and > > we have not yet resolved why (Roman believes that it is an unavoidable > > consequence of the refresh scheduled on each cpu). But those stats are > > not buggy, they have never been seen to drift away from 0 permanently: > > so just avoid the annoyance of showing a warning on them. > > > > Similarly avoid showing a warning on nr_free_cma: CMA users have seen > > that one reported negative from /proc/sys/vm/stat_refresh too, but it > > does drift away permanently: I believe that's because its incrementation > > and decrementation are decided by page migratetype, but the migratetype > > of a pageblock is not guaranteed to be constant. > > > > Use switch statements so we can most easily add or remove cases later. > > I'm OK with the code, but I can't fully agree with the commit log. I don't think > there is any mystery around negative values. Let me copy-paste the explanation > from my original patch: > > These warnings* are generated by the vmstat_refresh() function, which > assumes that atomic zone and numa counters can't go below zero. However, > on a SMP machine it's not quite right: due to per-cpu caching it can in > theory be as low as -(zone threshold) * NR_CPUs. > > For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES > reached 0. Then we've reclaimed a small number of cma pages on each CPU > except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are slightly > positive (the atomic counter is still 0). Then somebody on CPU0 consumes > all these pages. The number of pages can easily exceed the threshold and > a negative value will be committed to the atomic counter. > > * warnings about negative NR_FREE_CMA_PAGES Hi Roman, thanks for your Acks on the others - and indeed this is the one on which disagreement was more to be expected. I certainly wanted (and included below) a Link to your original patch; and even wondered whether to paste your description into mine. But I read it again and still have issues with it. Mainly, it does not convey at all, that touching stat_refresh adds the per-cpu counts into the global atomics, resetting per-cpu counts to 0. Which does not invalidate your explanation: races might still manage to underflow; but it does take the "easily" out of "can easily exceed". Since I don't use CMA on any machine, I cannot be sure, but it looked like a bad example to rely upon, because of its migratetype-based accounting. If you use /proc/sys/vm/stat_refresh frequently enough, without suppressing the warning, I guess that uncertainty could be resolved by checking whether nr_free_cma is seen with negative value in consecutive refreshes - which would tend to support my migratetype theory - or only singly - which would support your raciness theory. > > Actually, the same is almost true for ANY other counter. What differs CMA, dirty > and write pending counters is that they can reach 0 value under normal conditions. > Other counters are usually not reaching values small enough to see negative values > on a reasonable sized machine. Looking through /proc/vmstat now, yes, I can see that there are fewer counters which hover near 0 than I had imagined: more have a positive bias, or are monotonically increasing. And I'd be lying if I said I'd never seen any others than nr_writeback or nr_zone_write_pending caught negative. But what are you asking for? Should the patch be changed, to retry the refresh_vm_stats() before warning, if it sees any negative? Depends on how terrible one line in dmesg is considered! > > Does it makes sense? I'm not sure: you were not asking for the patch to be changed, but its commit log: and I better not say "Roman believes that it is an unavoidable consequence of the refresh scheduled on each cpu" if that's untrue (or unclear: now it reads to me as if we're accusing the refresh of messing things up, whereas it's the non-atomic nature of the refresh which leaves it vulnerable to races). Hugh > > > > > Link: https://lore.kernel.org/linux-mm/20200714173747.3315771-1-guro@fb.com/ > > Reported-by: Roman Gushchin > > Signed-off-by: Hugh Dickins > > --- > > > > mm/vmstat.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > --- vmstat2/mm/vmstat.c 2021-02-25 11:56:18.000000000 -0800 > > +++ vmstat3/mm/vmstat.c 2021-02-25 12:42:15.000000000 -0800 > > @@ -1840,6 +1840,14 @@ int vmstat_refresh(struct ctl_table *tab > > if (err) > > return err; > > for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > > + /* > > + * Skip checking stats known to go negative occasionally. > > + */ > > + switch (i) { > > + case NR_ZONE_WRITE_PENDING: > > + case NR_FREE_CMA_PAGES: > > + continue; > > + } > > val = atomic_long_read(&vm_zone_stat[i]); > > if (val < 0) { > > pr_warn("%s: %s %ld\n", > > @@ -1856,6 +1864,13 @@ int vmstat_refresh(struct ctl_table *tab > > } > > #endif > > for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > > + /* > > + * Skip checking stats known to go negative occasionally. > > + */ > > + switch (i) { > > + case NR_WRITEBACK: > > + continue; > > + } > > val = atomic_long_read(&vm_node_stat[i]); > > if (val < 0) { > > pr_warn("%s: %s %ld\n", > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-24.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4995C433DB for ; Mon, 1 Mar 2021 22:09:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 64C5C60231 for ; Mon, 1 Mar 2021 22:09:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64C5C60231 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D65B58D00AA; Mon, 1 Mar 2021 17:09:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D168C8D0063; Mon, 1 Mar 2021 17:09:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C048B8D00AA; Mon, 1 Mar 2021 17:09:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id AA7818D0063 for ; Mon, 1 Mar 2021 17:09:10 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 40B5E181AEF31 for ; Mon, 1 Mar 2021 22:09:10 +0000 (UTC) X-FDA: 77872696860.09.6EB9614 Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) by imf24.hostedemail.com (Postfix) with ESMTP id 7234BA2F81D3 for ; Mon, 1 Mar 2021 22:08:59 +0000 (UTC) Received: by mail-ot1-f51.google.com with SMTP id 105so18085699otd.3 for ; Mon, 01 Mar 2021 14:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=boKdmhhVglqyH3x2SwJtdSPRESqRj+XFpHZ19kWLcLuk3PdvxHt5zGQP6dOT0EaDeo UAhRXu1N9DnBApCp3gGBf4LBn0n8Dai32ehLTW6H5D/Cel9MKczU4C+4Zt1d03cv4kmc Skm3GeaiLYp3OoNm0qX5meZzXHLtPNrlrcgF6rSqTE/44tNm2ZDPlleBJK24R8W683cy /oXOq4iaPmrlk/zAgkLX0OC8ptv8Q1XKXrRKIZZx11lY1uJX+u+lIIeDKaAwIW+arfFp Js5fQtozihwy2btyyCKRZqJ1Yeg3j797Swlt5sFfA05bvQtdYkFZd1M07DA21Z84MYuY lk+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=N2SIZnKFl57XmaM1x3RQRrJiRX73yYEFQFUEzlhmU0M01SFGKLzUBMH0afbFP7jvmS rJsRfPOL2vi3VMFDNvHbUos0antNWygbdMVSO0f8ByiPeDchrZ5q4jE4lijDK1n29hGl F/evHMakk3Er7sxpSJtIElZHjT4mGonbaeq0sUkm/Sb3FAqYh83eyfnHk90YSK40JlGQ uL2Yuw8ljpVWVlX2WcAAKHRoYlSc4m0cMBO+iT/iuzy2evq78/9mQQLpVgzhBv1BhrOQ VWW1HFFn+fSTD/Vu/RZ8ggc1V2IIARR9hnkF8FVdowPV+/x+utf/A4XgdhbGBWmMQpiQ 3G+Q== X-Gm-Message-State: AOAM531ObXHz2P9ACSgMNEWgEFQQlJvebTQ9Z7+V1teF3IcQ6Nd50VPY dC+7gmn8CVaVskLlDLo91KjM4A== X-Google-Smtp-Source: ABdhPJw4VVTKDPyHM3gxFOWW4ZbqCUOWW/KT4E7UF/D7mb7eZGiYr3VyqVCVYwwACXFbW0aYyyiHCg== X-Received: by 2002:a9d:7a88:: with SMTP id l8mr15019090otn.289.1614636545715; Mon, 01 Mar 2021 14:09:05 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id p12sm3735344oon.12.2021.03.01.14.09.04 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 01 Mar 2021 14:09:05 -0800 (PST) Date: Mon, 1 Mar 2021 14:08:17 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Roman Gushchin cc: Hugh Dickins , Andrew Morton , Johannes Weiner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/4] mm: /proc/sys/vm/stat_refresh skip checking known negative stats In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Stat-Signature: 8hq3b6fy1sozhkbtfcuntrqud9ayikg3 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7234BA2F81D3 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail-ot1-f51.google.com; client-ip=209.85.210.51 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614636539-967926 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, 28 Feb 2021, Roman Gushchin wrote: > On Thu, Feb 25, 2021 at 03:14:03PM -0800, Hugh Dickins wrote: > > vmstat_refresh() can occasionally catch nr_zone_write_pending and > > nr_writeback when they are transiently negative. The reason is partly > > that the interrupt which decrements them in test_clear_page_writeback() > > can come in before __test_set_page_writeback() got to increment them; > > but transient negatives are still seen even when that is prevented, and > > we have not yet resolved why (Roman believes that it is an unavoidable > > consequence of the refresh scheduled on each cpu). But those stats are > > not buggy, they have never been seen to drift away from 0 permanently: > > so just avoid the annoyance of showing a warning on them. > > > > Similarly avoid showing a warning on nr_free_cma: CMA users have seen > > that one reported negative from /proc/sys/vm/stat_refresh too, but it > > does drift away permanently: I believe that's because its incrementation > > and decrementation are decided by page migratetype, but the migratetype > > of a pageblock is not guaranteed to be constant. > > > > Use switch statements so we can most easily add or remove cases later. > > I'm OK with the code, but I can't fully agree with the commit log. I don't think > there is any mystery around negative values. Let me copy-paste the explanation > from my original patch: > > These warnings* are generated by the vmstat_refresh() function, which > assumes that atomic zone and numa counters can't go below zero. However, > on a SMP machine it's not quite right: due to per-cpu caching it can in > theory be as low as -(zone threshold) * NR_CPUs. > > For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES > reached 0. Then we've reclaimed a small number of cma pages on each CPU > except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are slightly > positive (the atomic counter is still 0). Then somebody on CPU0 consumes > all these pages. The number of pages can easily exceed the threshold and > a negative value will be committed to the atomic counter. > > * warnings about negative NR_FREE_CMA_PAGES Hi Roman, thanks for your Acks on the others - and indeed this is the one on which disagreement was more to be expected. I certainly wanted (and included below) a Link to your original patch; and even wondered whether to paste your description into mine. But I read it again and still have issues with it. Mainly, it does not convey at all, that touching stat_refresh adds the per-cpu counts into the global atomics, resetting per-cpu counts to 0. Which does not invalidate your explanation: races might still manage to underflow; but it does take the "easily" out of "can easily exceed". Since I don't use CMA on any machine, I cannot be sure, but it looked like a bad example to rely upon, because of its migratetype-based accounting. If you use /proc/sys/vm/stat_refresh frequently enough, without suppressing the warning, I guess that uncertainty could be resolved by checking whether nr_free_cma is seen with negative value in consecutive refreshes - which would tend to support my migratetype theory - or only singly - which would support your raciness theory. > > Actually, the same is almost true for ANY other counter. What differs CMA, dirty > and write pending counters is that they can reach 0 value under normal conditions. > Other counters are usually not reaching values small enough to see negative values > on a reasonable sized machine. Looking through /proc/vmstat now, yes, I can see that there are fewer counters which hover near 0 than I had imagined: more have a positive bias, or are monotonically increasing. And I'd be lying if I said I'd never seen any others than nr_writeback or nr_zone_write_pending caught negative. But what are you asking for? Should the patch be changed, to retry the refresh_vm_stats() before warning, if it sees any negative? Depends on how terrible one line in dmesg is considered! > > Does it makes sense? I'm not sure: you were not asking for the patch to be changed, but its commit log: and I better not say "Roman believes that it is an unavoidable consequence of the refresh scheduled on each cpu" if that's untrue (or unclear: now it reads to me as if we're accusing the refresh of messing things up, whereas it's the non-atomic nature of the refresh which leaves it vulnerable to races). Hugh > > > > > Link: https://lore.kernel.org/linux-mm/20200714173747.3315771-1-guro@fb.com/ > > Reported-by: Roman Gushchin > > Signed-off-by: Hugh Dickins > > --- > > > > mm/vmstat.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > --- vmstat2/mm/vmstat.c 2021-02-25 11:56:18.000000000 -0800 > > +++ vmstat3/mm/vmstat.c 2021-02-25 12:42:15.000000000 -0800 > > @@ -1840,6 +1840,14 @@ int vmstat_refresh(struct ctl_table *tab > > if (err) > > return err; > > for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > > + /* > > + * Skip checking stats known to go negative occasionally. > > + */ > > + switch (i) { > > + case NR_ZONE_WRITE_PENDING: > > + case NR_FREE_CMA_PAGES: > > + continue; > > + } > > val = atomic_long_read(&vm_zone_stat[i]); > > if (val < 0) { > > pr_warn("%s: %s %ld\n", > > @@ -1856,6 +1864,13 @@ int vmstat_refresh(struct ctl_table *tab > > } > > #endif > > for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > > + /* > > + * Skip checking stats known to go negative occasionally. > > + */ > > + switch (i) { > > + case NR_WRITEBACK: > > + continue; > > + } > > val = atomic_long_read(&vm_node_stat[i]); > > if (val < 0) { > > pr_warn("%s: %s %ld\n", >