linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] cpuset/mempolicies related fixes and cleanups
@ 2017-05-17  8:11 Vlastimil Babka
  2017-05-17  8:11 ` [PATCH v2 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update Vlastimil Babka
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Vlastimil Babka @ 2017-05-17  8:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Li Zefan, Michal Hocko,
	Mel Gorman, David Rientjes, Christoph Lameter, Hugh Dickins,
	Andrea Arcangeli, Anshuman Khandual, Kirill A. Shutemov,
	Vlastimil Babka

Changes since RFC v1 [3]:

- Reworked patch 2 after discussion with Christoph Lameter.
- Fix bug in patch 5 spotted by Hillf Danton.
- Rebased to mmotm-2017-05-12-15-53

I would like to stress that this patchset aims to fix issues and cleanup the
code *within the existing documented semantics*, i.e. patch 1 ignores mempolicy
restrictions if the set of allowed nodes has no intersection with set of nodes
allowed by cpuset. I believe discussing potential changes of the semantics can
be better done once we have a baseline with no known bugs of the current
semantics.

===

I've recently summarized the cpuset/mempolicy issues in a LSF/MM proposal [1]
and the discussion itself [2]. I've been trying to rewrite the handling as
proposed, with the idea that changing semantics to make all mempolicies static
wrt cpuset updates (and discarding the relative and default modes) can be tried
on top, as there's a high risk of being rejected/reverted because somebody
might still care about the removed modes.

However I haven't yet figured out how to properly:

1) make mempolicies swappable instead of rebinding in place. I thought mbind()
already works that way and uses refcounting to avoid use-after-free of the old
policy by a parallel allocation, but turns out true refcounting is only done
for shared (shmem) mempolicies, and the actual protection for mbind() comes
from mmap_sem. Extending the refcounting means more overhead in allocator hot
path. Also swapping whole mempolicies means that we have to allocate the new
ones, which can fail, and reverting of the partially done work also means
allocating (note that mbind() doesn't care and will just leave part of the
range updated and part not updated when returning -ENOMEM...).

2) make cpuset's task->mems_allowed also swappable (after converting it from
nodemask to zonelist, which is the easy part) for mostly the same reasons.

The good news is that while trying to do the above, I've at least figured out
how to hopefully close the remaining premature OOM's, and do a buch of cleanups
on top, removing quite some of the code that was also supposed to prevent the
cpuset update races, but doesn't work anymore nowadays. This should fix the
most pressing concerns with this topic and give us a better baseline before
either proceeding with the original proposal, or pushing a change of semantics
that removes the problem 1) above. I'd be then fine with trying to change the
semantic first and rewrite later.

Patchset is based on next-20170411 and has been tested with the LTP cpuset01
stress test.

[1] https://lkml.kernel.org/r/4c44a589-5fd8-08d0-892c-e893bb525b71-AlSwsSmVLrQ@public.gmane.org
[2] https://lwn.net/Articles/717797/
[3] https://marc.info/?l=linux-mm&m=149191957922828&w=2

Vlastimil Babka (6):
  mm, page_alloc: fix more premature OOM due to race with cpuset update
  mm, mempolicy: stop adjusting current->il_next in
    mpol_rebind_nodemask()
  mm, page_alloc: pass preferred nid instead of zonelist to allocator
  mm, mempolicy: simplify rebinding mempolicies when updating cpusets
  mm, cpuset: always use seqlock when changing task's nodemask
  mm, mempolicy: don't check cpuset seqlock where it doesn't matter

 include/linux/gfp.h            |  11 ++-
 include/linux/mempolicy.h      |  12 ++-
 include/linux/sched.h          |   2 +-
 include/uapi/linux/mempolicy.h |   8 --
 kernel/cgroup/cpuset.c         |  33 ++------
 mm/hugetlb.c                   |  15 ++--
 mm/memory_hotplug.c            |   6 +-
 mm/mempolicy.c                 | 181 ++++++++++-------------------------------
 mm/page_alloc.c                |  61 ++++++++++----
 9 files changed, 118 insertions(+), 211 deletions(-)

-- 
2.12.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-05-23 11:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-17  8:11 [PATCH v2 0/6] cpuset/mempolicies related fixes and cleanups Vlastimil Babka
2017-05-17  8:11 ` [PATCH v2 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update Vlastimil Babka
     [not found]   ` <20170517081140.30654-2-vbabka-AlSwsSmVLrQ@public.gmane.org>
2017-05-19 11:51     ` Michal Hocko
     [not found]       ` <20170519115110.GB29839-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-05-23 11:32         ` Vlastimil Babka
2017-05-17  8:11 ` [PATCH v2 2/6] mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask() Vlastimil Babka
     [not found]   ` <20170517081140.30654-3-vbabka-AlSwsSmVLrQ@public.gmane.org>
2017-05-17 15:07     ` Christoph Lameter
2017-05-17  8:11 ` [PATCH v2 3/6] mm, page_alloc: pass preferred nid instead of zonelist to allocator Vlastimil Babka
2017-05-17 15:19   ` Christoph Lameter
     [not found]     ` <alpine.DEB.2.20.1705171009340.8714-wcBtFHqTun5QOdAKl3ChDw@public.gmane.org>
2017-05-18 10:25       ` Vlastimil Babka
     [not found]   ` <20170517081140.30654-4-vbabka-AlSwsSmVLrQ@public.gmane.org>
2017-05-19 11:59     ` Michal Hocko
2017-05-17  8:11 ` [PATCH v2 4/6] mm, mempolicy: simplify rebinding mempolicies when updating cpusets Vlastimil Babka
2017-05-23  7:11   ` Michal Hocko
2017-05-17  8:11 ` [PATCH v2 5/6] mm, cpuset: always use seqlock when changing task's nodemask Vlastimil Babka
     [not found]   ` <20170517081140.30654-6-vbabka-AlSwsSmVLrQ@public.gmane.org>
2017-05-23  7:15     ` Michal Hocko
2017-05-17  8:11 ` [PATCH v2 6/6] mm, mempolicy: don't check cpuset seqlock where it doesn't matter Vlastimil Babka
2017-05-23  7:16   ` Michal Hocko
     [not found] ` <20170517081140.30654-1-vbabka-AlSwsSmVLrQ@public.gmane.org>
2017-05-17 16:24   ` [PATCH v2 0/6] cpuset/mempolicies related fixes and cleanups Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).