linux-parisc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Helge Deller <deller@gmx.de>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	John David Anglin <dave.anglin@bell.net>,
	linux-parisc@vger.kernel.org, linux-mm@kvack.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs"
Date: Mon, 8 Apr 2019 10:52:24 +0100	[thread overview]
Message-ID: <20190408095224.GA18914@techsingularity.net> (raw)
In-Reply-To: <alpine.LRH.2.02.1904061042490.9597@file01.intranet.prod.int.rdu2.redhat.com>

On Sat, Apr 06, 2019 at 11:20:35AM -0400, Mikulas Patocka wrote:
> Hi
> 
> The patch 1c30844d2dfe272d58c8fc000960b835d13aa2ac ("mm: reclaim small 
> amounts of memory when an external fragmentation event occurs") breaks 
> memory management on parisc.
> 
> I have a parisc machine with 7GiB RAM, the chipset maps the physical 
> memory to three zones:
> 	0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
> 	1) Start 0x0000000100000000 End 0x00000001bfdfffff Size   3070 MB
> 	2) Start 0x0000004040000000 End 0x00000040ffffffff Size   3072 MB
> (but it is not NUMA)
> 
> With the patch 1c30844d2, the kernel will incorrectly reclaim the first 
> zone when it fills up, ignoring the fact that there are two completely 
> free zones. Basiscally, it limits cache size to 1GiB.
> 
> For example, if I run:
> # dd if=/dev/sda of=/dev/null bs=1M count=2048
> 
> - with the proper kernel, there should be "Buffers - 2GiB" when this 
> command finishes. With the patch 1c30844d2, buffers will consume just 1GiB 
> or slightly more, because the kernel was incorrectly reclaiming them.
> 

I could argue that the feature is behaving as expected for separate
pgdats but that's neither here nor there. The bug is real but I have a
few questions.

First, if pa-risc is !NUMA then why are separate local ranges
represented as separate nodes? Is it because of DISCONTIGMEM or something
else? DISCONTIGMEM is before my time so I'm not familiar with it and
I consider it "essentially dead" but the arch init code seems to setup
pgdats for each physical contiguous range so it's a possibility. The most
likely explanation is pa-risc does not have hardware with addressing
limitations smaller than the CPUs physical address limits and it's
possible to have more ranges than available zones but clarification would
be nice.  By rights, SPARSEMEM would be supported on pa-risc but that
would be a time-consuming and somewhat futile exercise.  Regardless of the
explanation, as pa-risc does not appear to support transparent hugepages,
an option is to special case watermark_boost_factor to be 0 on DISCONTIGMEM
as that commit was primarily about THP with secondary concerns around
SLUB. This is probably the most straight-forward solution but it'd need
a comment obviously. I do not know what the distro configurations for
pa-risc set as I'm not a user of gentoo or debian.

Second, if you set the sysctl vm.watermark_boost_factor=0, does the
problem go away? If so, an option would be to set this sysctl to 0 by
default on distros that support pa-risc. Would that be suitable?

Finally, I'm sure this has been asked before buy why is pa-risc alive?
It appears a new CPU has not been manufactured since 2005. Even Alpha
I can understand being semi-alive since it's an interesting case for
weakly-ordered memory models. pa-risc appears to be supported and active
for debian at least so someone cares. It's not the only feature like this
that is bizarrely alive but it is curious -- 32 bit NUMA support on x86,
I'm looking at you, your machines are all dead since the early 2000's
AFAIK and anyone else using NUMA on 32-bit x86 needs their head examined.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2019-04-08  9:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-06 15:20 Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs" Mikulas Patocka
2019-04-06 17:26 ` Mikulas Patocka
2019-04-08  9:52 ` Mel Gorman [this message]
2019-04-08 11:10   ` Mikulas Patocka
2019-04-08 12:54     ` Mel Gorman
2019-04-08 14:29   ` James Bottomley
2019-04-08 15:22     ` Helge Deller
2019-04-08 19:44       ` James Bottomley
2019-04-09 20:09       ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190408095224.GA18914@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.anglin@bell.net \
    --cc=deller@gmx.de \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).