linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v19 00/20] per memcg lru_lock
@ 2020-09-24  3:28 Alex Shi
  2020-09-24  3:28 ` [PATCH v19 01/20] mm/memcg: warning on !memcg after readahead page charged Alex Shi
                   ` (20 more replies)
  0 siblings, 21 replies; 31+ messages in thread
From: Alex Shi @ 2020-09-24  3:28 UTC (permalink / raw)
  To: akpm, mgorman, tj, hughd, khlebnikov, daniel.m.jordan, willy,
	hannes, lkp, linux-mm, linux-kernel, cgroups, shakeelb,
	iamjoonsoo.kim, richard.weiyang, kirill, alexander.duyck,
	rong.a.chen, mhocko, vdavydov.dev, shy828301, aaron.lwe

The new version rebased on v5.9-rc6 with line by line review by Hugh Dickins.
Millions thanks! And removed the 4th part from last version which do post
optimization, that we can repost after the main part get merged. About one
year coding and review, now I believe it's ready.

So now this patchset includes 3 parts:
1, some code cleanup and minimum optimization as a preparation. 
2, use TestCleanPageLRU as page isolation's precondition.
3, replace per node lru_lock with per memcg per node lru_lock.

Current lru_lock is one for each of node, pgdat->lru_lock, that guard for
lru lists, but now we had moved the lru lists into memcg for long time. Still
using per node lru_lock is clearly unscalable, pages on each of memcgs have
to compete each others for a whole lru_lock. This patchset try to use per
lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make
it scalable for memcgs and get performance gain.

Currently lru_lock still guards both lru list and page's lru bit, that's ok.
but if we want to use specific lruvec lock on the page, we need to pin down
the page's lruvec/memcg during locking. Just taking lruvec lock first may be
undermined by the page's memcg charge/migration. To fix this problem, we could
take out the page's lru bit clear and use it as pin down action to block the
memcg changes. That's the reason for new atomic func TestClearPageLRU.
So now isolating a page need both actions: TestClearPageLRU and hold the
lru_lock.

The typical usage of this is isolate_migratepages_block() in compaction.c
we have to take lru bit before lru lock, that serialized the page isolation
in memcg page charge/migration which will change page's lruvec and new 
lru_lock in it.

The above solution suggested by Johannes Weiner, and based on his new memcg 
charge path, then have this patchset. (Hugh Dickins tested and contributed much
code from compaction fix to general code polish, thanks a lot!).

Daniel Jordan's testing show 62% improvement on modified readtwice case
on his 2P * 10 core * 2 HT broadwell box.
https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who give comments as well: Daniel Jordan, 
Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc.

Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks!


Alex Shi (17):
  mm/memcg: warning on !memcg after readahead page charged
  mm/memcg: bail early from swap accounting if memcg disabled
  mm/thp: move lru_add_page_tail func to huge_memory.c
  mm/thp: use head for head page in lru_add_page_tail
  mm/thp: Simplify lru_add_page_tail()
  mm/thp: narrow lru locking
  mm/vmscan: remove unnecessary lruvec adding
  mm/memcg: add debug checking in lock_page_memcg
  mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn
  mm/lru: move lock into lru_note_cost
  mm/vmscan: remove lruvec reget in move_pages_to_lru
  mm/mlock: remove lru_lock on TestClearPageMlocked
  mm/mlock: remove __munlock_isolate_lru_page
  mm/lru: introduce TestClearPageLRU
  mm/compaction: do page isolation first in compaction
  mm/swap.c: serialize memcg changes in pagevec_lru_move_fn
  mm/lru: replace pgdat lru_lock with lruvec lock

Alexander Duyck (1):
  mm/lru: introduce the relock_page_lruvec function

Hugh Dickins (2):
  mm: page_idle_get_page() does not need lru_lock
  mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst |  15 +-
 Documentation/admin-guide/cgroup-v1/memory.rst     |  21 +--
 Documentation/trace/events-kmem.rst                |   2 +-
 Documentation/vm/unevictable-lru.rst               |  22 +--
 include/linux/memcontrol.h                         | 110 +++++++++++
 include/linux/mm_types.h                           |   2 +-
 include/linux/mmdebug.h                            |  13 ++
 include/linux/mmzone.h                             |   6 +-
 include/linux/page-flags.h                         |   1 +
 include/linux/swap.h                               |   4 +-
 mm/compaction.c                                    |  94 +++++++---
 mm/filemap.c                                       |   4 +-
 mm/huge_memory.c                                   |  45 +++--
 mm/memcontrol.c                                    |  85 ++++++++-
 mm/mlock.c                                         |  63 ++-----
 mm/mmzone.c                                        |   1 +
 mm/page_alloc.c                                    |   1 -
 mm/page_idle.c                                     |   4 -
 mm/rmap.c                                          |   4 +-
 mm/swap.c                                          | 199 ++++++++------------
 mm/vmscan.c                                        | 203 +++++++++++----------
 mm/workingset.c                                    |   2 -
 22 files changed, 523 insertions(+), 378 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-10-27  0:37 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24  3:28 [PATCH v19 00/20] per memcg lru_lock Alex Shi
2020-09-24  3:28 ` [PATCH v19 01/20] mm/memcg: warning on !memcg after readahead page charged Alex Shi
2020-09-24  3:28 ` [PATCH v19 02/20] mm/memcg: bail early from swap accounting if memcg disabled Alex Shi
2020-09-24  3:28 ` [PATCH v19 03/20] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-09-26  6:06   ` Alex Shi
2020-09-24  3:28 ` [PATCH v19 04/20] mm/thp: use head for head page in lru_add_page_tail Alex Shi
2020-09-24  3:28 ` [PATCH v19 05/20] mm/thp: Simplify lru_add_page_tail() Alex Shi
2020-09-24  3:28 ` [PATCH v19 06/20] mm/thp: narrow lru locking Alex Shi
2020-09-26  6:08   ` Alex Shi
2020-09-24  3:28 ` [PATCH v19 07/20] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-09-26  6:14   ` Alex Shi
2020-09-24  3:28 ` [PATCH v19 08/20] mm: page_idle_get_page() does not need lru_lock Alex Shi
2020-09-24  3:28 ` [PATCH v19 09/20] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-09-24  3:28 ` [PATCH v19 10/20] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-09-24  3:28 ` [PATCH v19 11/20] mm/lru: move lock into lru_note_cost Alex Shi
2020-09-24  3:28 ` [PATCH v19 12/20] mm/vmscan: remove lruvec reget in move_pages_to_lru Alex Shi
2020-09-24  3:28 ` [PATCH v19 13/20] mm/mlock: remove lru_lock on TestClearPageMlocked Alex Shi
2020-09-24  3:28 ` [PATCH v19 14/20] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-09-24  3:28 ` [PATCH v19 15/20] mm/lru: introduce TestClearPageLRU Alex Shi
2020-09-24  3:28 ` [PATCH v19 16/20] mm/compaction: do page isolation first in compaction Alex Shi
2020-09-24  3:28 ` [PATCH v19 17/20] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-09-24  3:28 ` [PATCH v19 18/20] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-10-21  7:23   ` Alex Shi
2020-10-23  2:00     ` Rong Chen
2020-10-25 21:51     ` Hugh Dickins
2020-10-26  1:41       ` Alex Shi
2020-10-27  0:36         ` Rong Chen
2020-09-24  3:28 ` [PATCH v19 19/20] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-09-24  3:28 ` [PATCH v19 20/20] mm/lru: revise the comments of lru_lock Alex Shi
2020-09-24 18:06 ` [PATCH v19 00/20] per memcg lru_lock Daniel Jordan
2020-09-25  0:01   ` Alex Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).