linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>, Rik van Riel <riel@surriel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>,
	daniel.vetter@intel.com, intel-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, David Airlie <airlied@linux.ie>
Subject: Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8
Date: Mon, 4 Jul 2016 10:55:09 +0100	[thread overview]
Message-ID: <20160704095509.GC11498@techsingularity.net> (raw)
In-Reply-To: <20160704080412.GA24605@bbox>

On Mon, Jul 04, 2016 at 05:04:12PM +0900, Minchan Kim wrote:
> > > How big ratio between highmem:lowmem do you think a problem?
> > > 
> > 
> > That's a "how long is a piece of string" type question.  The ratio does
> > not matter as much as whether the workload is both under memory pressure
> > and requires large amounts of lowmem pages. Even on systems with very high
> > ratios, it may not be a problem if HIGHPTE is enabled.
> 
> As well page table, pgd/kernelstack/zbud/slab and so on, every kernel
> allocations wanted to mask __GFP_HIGHMEM off would be a problem in
> 32bit system.
> 

The same point applies -- it depends on the rate of these allocations,
not the ratio of highmem:lowmem per se.

> It also depends on that how many drivers needed lowmem only we have
> in the system.
> 
> I don't know how many such driver in the world. When I simply do grep,
> I found several cases which mask __GFP_HIGHMEM off and among them,
> I guess DRM might be a popular for us. However, it might be really rare
> usecase among various i915 usecases.
> 

It's also perfectly possible that such allocations are long-lived in which
case they are not going to cause many skips. Hence why I cannot make a
general prediction.

> > > > Conceptually, moving to node LRUs should be easier to understand. The
> > > > page allocator plays fewer tricks to game reclaim and reclaim behaves
> > > > similarly on all nodes. 
> > > > 
> > > > The series has been tested on a 16 core UMA machine and a 2-socket 48
> > > > core NUMA machine. The UMA results are presented in most cases as the NUMA
> > > > machine behaved similarly.
> > > 
> > > I guess you would already test below with various highmem system(e.g.,
> > > 2:1, 3:1, 4:1 and so on). If you have, could you mind sharing it?
> > > 
> > 
> > I haven't that data, the baseline distribution used doesn't even have
> > 32-bit support. Even if it was, the results may not be that interesting.
> > The workloads used were not necessarily going to trigger lowmem pressure
> > as HIGHPTE was set on the 32-bit configs.
> 
> That means we didn't test this on 32-bit with highmem.
> 

No. I tested the skip logic and noticed that when forced on purpose that
system CPU usage was higher but it functionally worked.

> I'm not sure it's really too rare case to spend a time for testing.
> In fact, I really want to test all series to our production system
> which is 32bit and highmem but as we know well, most of embedded
> system kernel is rather old so backporting needs lots of time and
> care. However, if we miss testing in those system at the moment,
> we will be suprised after 1~2 years.
> 

It would be appreciated if it could be tested on such platforms if at all
possible. Even if I did set up a 32-bit x86 system, it won't have the same
allocation/reclaim profile as the platforms you are considering.

> I don't know what kinds of benchmark can we can check it so I cannot
> insist on it but you might know it.
> 

One method would be to use fsmark with very large numbers of small files
to force slab to require low memory. It's not representative of many real
workloads unfortunately. Usually such a configuration is for checking the
slab shrinker is working as expected.

> Okay, do you have any idea to fix it if we see such regression report
> in 32-bit system in future?

Two options, neither whose complexity is justified without a "real"
workload to use as a reference.

1. Long-term isolation of highmem pages when reclaim is lowmem

   When pages are skipped, they are immediately added back onto the LRU
   list. If lowmem reclaim persisted for long periods of time, the same
   highmem pages get continually scanned. The idea would be that lowmem
   keeps those pages on a separate list until a reclaim for highmem pages
   arrives that splices the highmem pages back onto the LRU.

   That would reduce the skip rate, the potential corner case is that
   highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

   This will break LRU ordering but may be preferable and faster during
   memory pressure than skipping LRU pages.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2016-07-04  9:55 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 20:01 [PATCH 00/31] Move LRU page reclaim from zones to nodes v8 Mel Gorman
2016-07-01 20:01 ` [PATCH 01/31] mm, vmstat: add infrastructure for per-node vmstats Mel Gorman
2016-07-04 23:50   ` Minchan Kim
2016-07-05  8:14     ` Mel Gorman
2016-07-06  0:15       ` Minchan Kim
2016-07-01 20:01 ` [PATCH 02/31] mm, vmscan: move lru_lock to the node Mel Gorman
2016-07-05  0:03   ` Minchan Kim
2016-07-01 20:01 ` [PATCH 03/31] mm, vmscan: move LRU lists to node Mel Gorman
2016-07-05  1:19   ` Minchan Kim
2016-07-05 10:14     ` Mel Gorman
2016-07-01 20:01 ` [PATCH 04/31] mm, vmscan: begin reclaiming pages on a per-node basis Mel Gorman
2016-07-07  1:12   ` Joonsoo Kim
2016-07-07  9:48     ` Mel Gorman
2016-07-08  2:28       ` Joonsoo Kim
2016-07-08 10:05         ` Mel Gorman
2016-07-14  6:28           ` Joonsoo Kim
2016-07-14  7:48             ` Vlastimil Babka
2016-07-18  4:52               ` Joonsoo Kim
2016-07-18 12:11             ` Mel Gorman
2016-07-18 14:27               ` Mel Gorman
2016-07-19  8:30                 ` Joonsoo Kim
2016-07-19 14:25                   ` Mel Gorman
2016-07-01 20:01 ` [PATCH 05/31] mm, vmscan: have kswapd only scan based on the highest requested zone Mel Gorman
2016-07-01 20:01 ` [PATCH 06/31] mm, vmscan: make kswapd reclaim in terms of nodes Mel Gorman
2016-07-01 20:01 ` [PATCH 07/31] mm, vmscan: remove balance gap Mel Gorman
2016-07-01 20:01 ` [PATCH 08/31] mm, vmscan: simplify the logic deciding whether kswapd sleeps Mel Gorman
2016-07-05  5:59   ` Minchan Kim
2016-07-05 10:26     ` Mel Gorman
2016-07-06  0:30       ` Minchan Kim
2016-07-06  8:31         ` Mel Gorman
2016-07-07  5:51           ` Minchan Kim
2016-07-07  9:56             ` Mel Gorman
2016-07-07  1:20   ` Joonsoo Kim
2016-07-07 10:17     ` Mel Gorman
2016-07-08  2:44       ` Joonsoo Kim
2016-07-08 10:11         ` Mel Gorman
2016-07-14  5:23           ` Joonsoo Kim
2016-07-14  8:32             ` Vlastimil Babka
2016-07-18  5:07               ` Joonsoo Kim
2016-07-18  6:51                 ` Vlastimil Babka
2016-07-18  7:24                   ` Joonsoo Kim
2016-07-14  9:05             ` Mel Gorman
2016-07-18  5:03               ` Joonsoo Kim
2016-07-01 20:01 ` [PATCH 09/31] mm, vmscan: by default have direct reclaim only shrink once per node Mel Gorman
2016-07-07  1:43   ` Joonsoo Kim
2016-07-07 10:27     ` Mel Gorman
2016-07-01 20:01 ` [PATCH 10/31] mm, vmscan: remove duplicate logic clearing node congestion and dirty state Mel Gorman
2016-07-01 20:01 ` [PATCH 11/31] mm: vmscan: do not reclaim from kswapd if there is any eligible zone Mel Gorman
2016-07-05  6:11   ` Minchan Kim
2016-07-05 10:38     ` Mel Gorman
2016-07-06  1:25       ` Minchan Kim
2016-07-06  8:42         ` Mel Gorman
2016-07-07  6:27           ` Minchan Kim
2016-07-07 10:55             ` Mel Gorman
2016-07-01 20:01 ` [PATCH 12/31] mm, vmscan: make shrink_node decisions more node-centric Mel Gorman
2016-07-05  6:24   ` Minchan Kim
2016-07-05 10:40     ` Mel Gorman
2016-07-01 20:01 ` [PATCH 13/31] mm, memcg: move memcg limit enforcement from zones to nodes Mel Gorman
2016-07-01 20:01 ` [PATCH 14/31] mm, workingset: make working set detection node-aware Mel Gorman
2016-07-01 20:01 ` [PATCH 15/31] mm, page_alloc: consider dirtyable memory in terms of nodes Mel Gorman
2016-07-01 20:01 ` [PATCH 16/31] mm: move page mapped accounting to the node Mel Gorman
2016-07-01 20:01 ` [PATCH 17/31] mm: rename NR_ANON_PAGES to NR_ANON_MAPPED Mel Gorman
2016-07-01 20:01 ` [PATCH 18/31] mm: move most file-based accounting to the node Mel Gorman
2016-07-01 20:01 ` [PATCH 19/31] mm: move vmscan writes and file write " Mel Gorman
2016-07-01 20:01 ` [PATCH 20/31] mm, vmscan: only wakeup kswapd once per node for the requested classzone Mel Gorman
2016-07-07  1:24   ` Joonsoo Kim
2016-07-07 10:58     ` Mel Gorman
2016-07-01 20:01 ` [PATCH 21/31] mm, page_alloc: Wake kswapd based on the highest eligible zone Mel Gorman
2016-07-01 20:01 ` [PATCH 22/31] mm: convert zone_reclaim to node_reclaim Mel Gorman
2016-07-01 20:01 ` [PATCH 23/31] mm, vmscan: Avoid passing in classzone_idx unnecessarily to shrink_node Mel Gorman
2016-07-01 20:01 ` [PATCH 24/31] mm, vmscan: Avoid passing in classzone_idx unnecessarily to compaction_ready Mel Gorman
2016-07-01 20:01 ` [PATCH 25/31] mm, vmscan: add classzone information to tracepoints Mel Gorman
2016-07-01 20:01 ` [PATCH 26/31] mm, page_alloc: remove fair zone allocation policy Mel Gorman
2016-07-01 20:01 ` [PATCH 27/31] mm: page_alloc: cache the last node whose dirty limit is reached Mel Gorman
2016-07-01 20:01 ` [PATCH 28/31] mm: vmstat: replace __count_zone_vm_events with a zone id equivalent Mel Gorman
2016-07-01 20:01 ` [PATCH 29/31] mm: vmstat: account per-zone stalls and pages skipped during reclaim Mel Gorman
2016-07-01 20:01 ` [PATCH 30/31] mm, vmstat: print node-based stats in zoneinfo file Mel Gorman
2016-07-01 20:01 ` [PATCH 31/31] mm, vmstat: Remove zone and node double accounting by approximating retries Mel Gorman
2016-07-06  0:02   ` Minchan Kim
2016-07-06  8:58     ` Mel Gorman
2016-07-06  9:33       ` Mel Gorman
2016-07-07  6:47       ` Minchan Kim
2016-07-06 18:12   ` Dave Hansen
2016-07-07 11:26     ` Mel Gorman
2016-07-04  1:37 ` [PATCH 00/31] Move LRU page reclaim from zones to nodes v8 Minchan Kim
2016-07-04  4:34   ` Mel Gorman
2016-07-04  8:04     ` Minchan Kim
2016-07-04  9:55       ` Mel Gorman [this message]
2016-07-06  1:51         ` Minchan Kim
  -- strict thread matches above, loose matches on Subject: below --
2016-07-01 15:37 Mel Gorman
2016-07-07 23:27 ` Dave Chinner
2016-07-08  9:52   ` Mel Gorman
2016-07-11  0:47     ` Dave Chinner
2016-07-11  9:02       ` Mel Gorman
2016-07-12  2:52         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160704095509.GC11498@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@surriel.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).