All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Nikolay Borisov <kernel@kyup.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: Softlockup during memory allocation
Date: Thu, 3 Nov 2016 20:46:04 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1611032013440.5863@eggly.anvils> (raw)
In-Reply-To: <89ee3413-71a3-403d-48fa-af325d40f8db@suse.cz>

On Wed, 2 Nov 2016, Vlastimil Babka wrote:
> On 11/01/2016 09:12 AM, Nikolay Borisov wrote:
> > In addition to that I believe there is something wrong
> > with the NR_PAGES_SCANNED stats since they are being negative. 
> > I haven't looked into the code to see how this value is being 
> > synchronized and if there is a possibility of it temporary going negative. 
> 
> This is because there's a shared counter and percpu diffs, and crash
> only looks at the shared counter.

Actually no, as I found when adding vmstat_refresh().  Coincidentally,
I spent some of last weekend trying to understand why, then wrote a
long comment about it (we thought it might be responsible for a hang).
Let me share that comment; but I was writing about an earlier release,
so some of the "zone"s have become "node"s since then - and I'm not
sure whether my comment is comprehensible to anyone but the writer!

This comment attempts to explain the NR_PAGES_SCANNED underflow.  I doubt
it's crucial to the bug in question: it's unsightly, and it may double the
margin of error involved in using per-cpu accounting, but I don't think it
makes a crucial difference to the approximation already inherent there.
If that approximation is a problem, we could consider reverting the commit
0d5d823ab4e "mm: move zone->pages_scanned into a vmstat counter" which
introduced it; or perhaps we could reconsider the stat_threshold
calculation on small zones (if that would make a difference -
I've not gone so far as to think about the hang itself).

The NR_PAGES_SCANNED underflow does not come from any race: it comes from
the way free_pcppages_bulk() and free_one_page() attempt to reset the
counter to 0 by __mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned)
with a "fuzzy" nr_scanned obtained from zone_page_state(zone,
NR_PAGES_SCANNED).

Normally __mod_zone_page_state() is used to adjust a counter by a certain
well-known amount, but here it is being used with an approximate amount -
the cheaply-visible approximate total, before correction by per-cpu diffs
(and getting that total from zone_page_state_snapshot() instead would
defeat most of the optimization of using per-cpu here, if not regress
it to worse than the global per-zone counter used before).

The problem starts on an occasion when nr_scanned there is perhaps
perfectly correct, but (factoring in the current cpu diff) is less than
the stat_threshold: __mod_zone_page_state() then updates the cpu diff and
doesn't touch the globally visible counter - but the negative diff implies
that the globally visible counter is larger than the correct value.  Then
later that too-large value is fed back into __mod_zone_pages_state() as if
it were correct, tending towards underflow of the full counter.  (Or the
same could all happen in reverse, with the "reset to 0" leaving a positive
residue in the full counter.)

This would be more serious (unbounded) without the periodic
refresh_cpu_vm_stats(): which every second folds the per-cpu diffs
back into the cheaply-visible totals.  When the diffs are 0, then
free_pcppages_bulk() and free_one_page() will properly reset the total to
0, even if the value it had before was incorrect (negative or positive).

I can eliminate the negative NR_PAGES_SCANNED reports by a one-line change
to __mod_zone_page_state(), to stop ever putting a negative into the
per-cpu diff for NR_PAGES_SCANNED; or by copying most of
__mod_zone_page_state() to a separate __reset_zone_page_state(), called
only on NR_PAGES_SCANNED, again avoiding the negative diffs.  But as I said
in the first paragraph, I doubt the underflow is worse in effect than the
approximation already inherent in the per-cpu counting here.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-11-04  3:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-01  8:12 Softlockup during memory allocation Nikolay Borisov
2016-11-01  8:16 ` Nikolay Borisov
2016-11-02 19:00 ` Vlastimil Babka
2016-11-04  3:46   ` Hugh Dickins [this message]
2016-11-04 12:18 ` Nikolay Borisov
2016-11-13 22:02   ` Nikolay Borisov
2016-11-21  5:31     ` Michal Hocko
2016-11-22  8:56       ` Nikolay Borisov
2016-11-22 14:30         ` Michal Hocko
2016-11-22 14:32           ` Michal Hocko
2016-11-22 14:46             ` Nikolay Borisov
2016-11-22 14:35           ` Nikolay Borisov
2016-11-22 17:02             ` Michal Hocko
2016-11-23  7:44               ` Nikolay Borisov
2016-11-23  7:49                 ` Michal Hocko
2016-11-23  7:50                   ` Michal Hocko
2016-11-24 11:45                   ` Nikolay Borisov
2016-11-24 12:12                     ` Michal Hocko
2016-11-24 13:09                       ` Nikolay Borisov
2016-11-25  9:00                         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1611032013440.5863@eggly.anvils \
    --to=hughd@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel@kyup.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.