All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com,
	willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, shakeelb@google.com,
	iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com
Subject: [PATCH v13 00/18] per memcg lru lock
Date: Fri, 19 Jun 2020 16:33:38 +0800	[thread overview]
Message-ID: <1592555636-115095-1-git-send-email-alex.shi@linux.alibaba.com> (raw)

This is a new version which bases on linux-next, merged much suggestion
from Hugh Dickins, from compaction fix to less TestClearPageLRU and
comments reverse etc. Thank a lot, Hugh!

Johannes Weiner has suggested:
"So here is a crazy idea that may be worth exploring:

Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
linked list.

Can we make PageLRU atomic and use it to stabilize the lru_lock
instead, and then use the lru_lock only serialize list operations?
..."

With new memcg charge path and this solution, we could isolate
LRU pages to exclusive visit them in compaction, page migration, reclaim,
memcg move_accunt, huge page split etc scenarios while keeping pages' 
memcg stable. Then possible to change per node lru locking to per memcg
lru locking. As to pagevec_lru_move_fn funcs, it would be safe to let
pages remain on lru list, lru lock could guard them for list integrity.

The patchset includes 3 parts:
1, some code cleanup and minimum optimization as a preparation.
2, use TestCleanPageLRU as page isolation's precondition
3, replace per node lru_lock with per memcg per node lru_lock

The 3rd part moves per node lru_lock into lruvec, thus bring a lru_lock for
each of memcg per node. So on a large machine, each of memcg don't
have to suffer from per node pgdat->lru_lock competition. They could go
fast with their self lru_lock

Following Daniel Jordan's suggestion, I have run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice

With this patchset, the readtwice performance increased about 80%
in concurrent containers.

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who give comments as well: Daniel Jordan, 
Mel Gorman, Shakeel Butt, Matthew Wilcox etc.

Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks!

Alex Shi (16):
  mm/vmscan: remove unnecessary lruvec adding
  mm/page_idle: no unlikely double check for idle page counting
  mm/compaction: correct the comments of compact_defer_shift
  mm/compaction: rename compact_deferred as compact_should_defer
  mm/thp: move lru_add_page_tail func to huge_memory.c
  mm/thp: clean up lru_add_page_tail
  mm/thp: narrow lru locking
  mm/memcg: add debug checking in lock_page_memcg
  mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn
  mm/lru: introduce TestClearPageLRU
  mm/compaction: do page isolation first in compaction
  mm/mlock: reorder isolation sequence during munlock
  mm/swap: serialize memcg changes during pagevec_lru_move_fn
  mm/lru: replace pgdat lru_lock with lruvec lock
  mm/lru: introduce the relock_page_lruvec function
  mm/pgdat: remove pgdat lru_lock

Hugh Dickins (2):
  mm/vmscan: use relock for move_pages_to_lru
  mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst |  15 +-
 Documentation/admin-guide/cgroup-v1/memory.rst     |  21 ++-
 Documentation/trace/events-kmem.rst                |   2 +-
 Documentation/vm/unevictable-lru.rst               |  22 +--
 include/linux/compaction.h                         |   4 +-
 include/linux/memcontrol.h                         |  95 +++++++++++
 include/linux/mm_types.h                           |   2 +-
 include/linux/mmzone.h                             |   6 +-
 include/linux/page-flags.h                         |   1 +
 include/linux/swap.h                               |   4 +-
 include/trace/events/compaction.h                  |   2 +-
 mm/compaction.c                                    | 113 ++++++++-----
 mm/filemap.c                                       |   4 +-
 mm/huge_memory.c                                   |  54 +++++--
 mm/memcontrol.c                                    |  56 ++++++-
 mm/mlock.c                                         |  93 +++++------
 mm/mmzone.c                                        |   1 +
 mm/page_alloc.c                                    |   1 -
 mm/page_idle.c                                     |   8 -
 mm/rmap.c                                          |   4 +-
 mm/swap.c                                          | 175 +++++++--------------
 mm/swap_state.c                                    |   5 +-
 mm/vmscan.c                                        | 165 ++++++++++---------
 mm/workingset.c                                    |   4 +-
 24 files changed, 500 insertions(+), 357 deletions(-)

-- 
1.8.3.1


WARNING: multiple messages have this Message-ID (diff)
From: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org,
	daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org,
	willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: [PATCH v13 00/18] per memcg lru lock
Date: Fri, 19 Jun 2020 16:33:38 +0800	[thread overview]
Message-ID: <1592555636-115095-1-git-send-email-alex.shi@linux.alibaba.com> (raw)

This is a new version which bases on linux-next, merged much suggestion
from Hugh Dickins, from compaction fix to less TestClearPageLRU and
comments reverse etc. Thank a lot, Hugh!

Johannes Weiner has suggested:
"So here is a crazy idea that may be worth exploring:

Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
linked list.

Can we make PageLRU atomic and use it to stabilize the lru_lock
instead, and then use the lru_lock only serialize list operations?
..."

With new memcg charge path and this solution, we could isolate
LRU pages to exclusive visit them in compaction, page migration, reclaim,
memcg move_accunt, huge page split etc scenarios while keeping pages' 
memcg stable. Then possible to change per node lru locking to per memcg
lru locking. As to pagevec_lru_move_fn funcs, it would be safe to let
pages remain on lru list, lru lock could guard them for list integrity.

The patchset includes 3 parts:
1, some code cleanup and minimum optimization as a preparation.
2, use TestCleanPageLRU as page isolation's precondition
3, replace per node lru_lock with per memcg per node lru_lock

The 3rd part moves per node lru_lock into lruvec, thus bring a lru_lock for
each of memcg per node. So on a large machine, each of memcg don't
have to suffer from per node pgdat->lru_lock competition. They could go
fast with their self lru_lock

Following Daniel Jordan's suggestion, I have run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice

With this patchset, the readtwice performance increased about 80%
in concurrent containers.

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who give comments as well: Daniel Jordan, 
Mel Gorman, Shakeel Butt, Matthew Wilcox etc.

Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks!

Alex Shi (16):
  mm/vmscan: remove unnecessary lruvec adding
  mm/page_idle: no unlikely double check for idle page counting
  mm/compaction: correct the comments of compact_defer_shift
  mm/compaction: rename compact_deferred as compact_should_defer
  mm/thp: move lru_add_page_tail func to huge_memory.c
  mm/thp: clean up lru_add_page_tail
  mm/thp: narrow lru locking
  mm/memcg: add debug checking in lock_page_memcg
  mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn
  mm/lru: introduce TestClearPageLRU
  mm/compaction: do page isolation first in compaction
  mm/mlock: reorder isolation sequence during munlock
  mm/swap: serialize memcg changes during pagevec_lru_move_fn
  mm/lru: replace pgdat lru_lock with lruvec lock
  mm/lru: introduce the relock_page_lruvec function
  mm/pgdat: remove pgdat lru_lock

Hugh Dickins (2):
  mm/vmscan: use relock for move_pages_to_lru
  mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst |  15 +-
 Documentation/admin-guide/cgroup-v1/memory.rst     |  21 ++-
 Documentation/trace/events-kmem.rst                |   2 +-
 Documentation/vm/unevictable-lru.rst               |  22 +--
 include/linux/compaction.h                         |   4 +-
 include/linux/memcontrol.h                         |  95 +++++++++++
 include/linux/mm_types.h                           |   2 +-
 include/linux/mmzone.h                             |   6 +-
 include/linux/page-flags.h                         |   1 +
 include/linux/swap.h                               |   4 +-
 include/trace/events/compaction.h                  |   2 +-
 mm/compaction.c                                    | 113 ++++++++-----
 mm/filemap.c                                       |   4 +-
 mm/huge_memory.c                                   |  54 +++++--
 mm/memcontrol.c                                    |  56 ++++++-
 mm/mlock.c                                         |  93 +++++------
 mm/mmzone.c                                        |   1 +
 mm/page_alloc.c                                    |   1 -
 mm/page_idle.c                                     |   8 -
 mm/rmap.c                                          |   4 +-
 mm/swap.c                                          | 175 +++++++--------------
 mm/swap_state.c                                    |   5 +-
 mm/vmscan.c                                        | 165 ++++++++++---------
 mm/workingset.c                                    |   4 +-
 24 files changed, 500 insertions(+), 357 deletions(-)

-- 
1.8.3.1


             reply	other threads:[~2020-06-19  8:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19  8:33 Alex Shi [this message]
2020-06-19  8:33 ` [PATCH v13 00/18] per memcg lru lock Alex Shi
2020-06-19  8:33 ` [PATCH v13 01/18] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 02/18] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-06-19  8:33 ` [PATCH v13 03/18] mm/compaction: correct the comments of compact_defer_shift Alex Shi
2020-06-19  8:33 ` [PATCH v13 04/18] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 05/18] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-06-19  8:33 ` [PATCH v13 06/18] mm/thp: clean up lru_add_page_tail Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 07/18] mm/thp: narrow lru locking Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 08/18] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-06-19  8:33 ` [PATCH v13 09/18] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-06-19  8:33 ` [PATCH v13 10/18] mm/lru: introduce TestClearPageLRU Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 11/18] mm/compaction: do page isolation first in compaction Alex Shi
2020-06-19  8:33 ` [PATCH v13 12/18] mm/mlock: reorder isolation sequence during munlock Alex Shi
2020-06-19  8:33 ` [PATCH v13 13/18] mm/swap: serialize memcg changes during pagevec_lru_move_fn Alex Shi
2020-06-19  8:33 ` [PATCH v13 14/18] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-06-19  8:33 ` [PATCH v13 15/18] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-06-19  8:33 ` [PATCH v13 16/18] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 17/18] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-19  8:33 ` [PATCH v13 18/18] mm/lru: revise the comments of lru_lock Alex Shi
2020-06-19  8:33   ` Alex Shi
2020-06-20 23:08 ` [PATCH v13 00/18] per memcg lru lock Andrew Morton
2020-06-20 23:08   ` Andrew Morton
2020-06-21 15:44   ` Alex Shi
2020-06-21 15:44     ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1592555636-115095-1-git-send-email-alex.shi@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.