All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v2
@ 2016-07-21 14:10 ` Mel Gorman
  0 siblings, 0 replies; 44+ messages in thread
From: Mel Gorman @ 2016-07-21 14:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Minchan Kim, Michal Hocko, Vlastimil Babka,
	Linux-MM, LKML, Mel Gorman

Both Joonsoo Kim and Minchan Kim have reported premature OOM kills.
The common element is a zone-constrained allocation failings. Two factors
appear to be at fault -- pgdat being considered unreclaimable prematurely
and insufficient rotation of the active list.

The series is in three basic parts;

Patches 1-3 add per-zone stats back in. The actual stats patch is different
	to Minchan's as the original patch did not account for unevictable
	LRU which would corrupt counters. The second two patches remove
	approximations based on pgdat statistics. It's effectively a
	revert of "mm, vmstat: remove zone and node double accounting
	by approximating retries" but different LRU stats are used. This
	is better than a full revert or a reworking of the series as it
	preserves history of why the zone stats are necessary.

	If this work out, we may have to leave the double accounting in
	place for now until an alternative cheap solution presents itself.

Patch 4 rotates inactive/active lists for lowmem allocations. This is also
	quite different to Minchan's patch as the original patch did not
	account for memcg and would rotate if *any* eligible zone needed
	rotation which may rotate excessively. The new patch considers the
	ratio for all eligible zones which is more in line with node-lru
	in general.

Patch 5 accounts for skipped pages as partial scanned. This avoids the pgdat
	being prematurely marked unreclaimable while still allowing it to
	be marked unreclaimable if there are no reclaimable pages.

These patches did not OOM for me on a 2G 32-bit KVM instance while running
a stress test for an hour. Preliminary tests on a 64-bit system using a
parallel dd workload did not show anything alarming.

If an OOM is detected then please post the full OOM message.

Optionally please test without patch 5 if an OOM occurs.

 include/linux/mm_inline.h | 19 ++---------
 include/linux/mmzone.h    |  7 ++++
 include/linux/swap.h      |  1 +
 mm/compaction.c           | 20 +----------
 mm/migrate.c              |  2 ++
 mm/page-writeback.c       | 17 +++++-----
 mm/page_alloc.c           | 59 +++++++++++----------------------
 mm/vmscan.c               | 84 ++++++++++++++++++++++++++++++++++++++---------
 mm/vmstat.c               |  6 ++++
 9 files changed, 116 insertions(+), 99 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2016-07-28 10:28 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21 14:10 [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v2 Mel Gorman
2016-07-21 14:10 ` Mel Gorman
2016-07-21 14:10 ` [PATCH 1/5] mm: add per-zone lru list stat Mel Gorman
2016-07-21 14:10   ` Mel Gorman
2016-07-22 15:51   ` Johannes Weiner
2016-07-22 15:51     ` Johannes Weiner
2016-07-21 14:10 ` [PATCH 2/5] mm, vmscan: Remove highmem_file_pages Mel Gorman
2016-07-21 14:10   ` Mel Gorman
2016-07-22 15:53   ` Johannes Weiner
2016-07-22 15:53     ` Johannes Weiner
2016-07-25  8:09   ` Minchan Kim
2016-07-25  8:09     ` Minchan Kim
2016-07-25  9:23     ` [PATCH] mm, vmscan: remove highmem_file_pages -fix Mel Gorman
2016-07-25  9:23       ` Mel Gorman
2016-07-21 14:10 ` [PATCH 3/5] mm: Remove reclaim and compaction retry approximations Mel Gorman
2016-07-21 14:10   ` Mel Gorman
2016-07-22 15:57   ` Johannes Weiner
2016-07-22 15:57     ` Johannes Weiner
2016-07-25  8:18   ` Minchan Kim
2016-07-25  8:18     ` Minchan Kim
2016-07-21 14:11 ` [PATCH 4/5] mm: consider per-zone inactive ratio to deactivate Mel Gorman
2016-07-21 14:11   ` Mel Gorman
2016-07-21 15:52   ` Johannes Weiner
2016-07-21 15:52     ` Johannes Weiner
2016-07-21 14:11 ` [PATCH 5/5] mm, vmscan: Account for skipped pages as a partial scan Mel Gorman
2016-07-21 14:11   ` Mel Gorman
2016-07-22 16:02   ` Johannes Weiner
2016-07-22 16:02     ` Johannes Weiner
2016-07-25  8:39   ` Minchan Kim
2016-07-25  8:39     ` Minchan Kim
2016-07-25  9:52     ` Mel Gorman
2016-07-25  9:52       ` Mel Gorman
2016-07-26  8:16   ` Joonsoo Kim
2016-07-26  8:16     ` Joonsoo Kim
2016-07-26  8:26     ` Joonsoo Kim
2016-07-26  8:26       ` Joonsoo Kim
2016-07-26  8:11 ` [PATCH 0/5] Candidate fixes for premature OOM kills with node-lru v2 Joonsoo Kim
2016-07-26  8:11   ` Joonsoo Kim
2016-07-26 12:50   ` Mel Gorman
2016-07-26 12:50     ` Mel Gorman
2016-07-28  6:44     ` Joonsoo Kim
2016-07-28  6:44       ` Joonsoo Kim
2016-07-28 10:27       ` Mel Gorman
2016-07-28 10:27         ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.