From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA68AC4727E for ; Thu, 24 Sep 2020 03:28:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3410923600 for ; Thu, 24 Sep 2020 03:28:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3410923600 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 452F46B005D; Wed, 23 Sep 2020 23:28:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4021F6B0062; Wed, 23 Sep 2020 23:28:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C9A96B0068; Wed, 23 Sep 2020 23:28:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 141126B005D for ; Wed, 23 Sep 2020 23:28:44 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id BDC7B181AE862 for ; Thu, 24 Sep 2020 03:28:43 +0000 (UTC) X-FDA: 77296522926.05.wheel67_2716daf2715b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 9D8A518022BC0 for ; Thu, 24 Sep 2020 03:28:43 +0000 (UTC) X-HE-Tag: wheel67_2716daf2715b X-Filterd-Recvd-Size: 6616 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Sep 2020 03:28:42 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R721e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0U9w0Ss1_1600918116; Received: from aliy80.localdomain(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U9w0Ss1_1600918116) by smtp.aliyun-inc.com(127.0.0.1); Thu, 24 Sep 2020 11:28:37 +0800 From: Alex Shi To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name, alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com, aaron.lwe@gmail.com Subject: [PATCH v19 00/20] per memcg lru_lock Date: Thu, 24 Sep 2020 11:28:15 +0800 Message-Id: <1600918115-22007-1-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The new version rebased on v5.9-rc6 with line by line review by Hugh Dick= ins. Millions thanks! And removed the 4th part from last version which do post optimization, that we can repost after the main part get merged. About on= e year coding and review, now I believe it's ready. So now this patchset includes 3 parts: 1, some code cleanup and minimum optimization as a preparation.=20 2, use TestCleanPageLRU as page isolation's precondition. 3, replace per node lru_lock with per memcg per node lru_lock. Current lru_lock is one for each of node, pgdat->lru_lock, that guard for lru lists, but now we had moved the lru lists into memcg for long time. S= till using per node lru_lock is clearly unscalable, pages on each of memcgs ha= ve to compete each others for a whole lru_lock. This patchset try to use per lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, m= ake it scalable for memcgs and get performance gain. Currently lru_lock still guards both lru list and page's lru bit, that's = ok. but if we want to use specific lruvec lock on the page, we need to pin do= wn the page's lruvec/memcg during locking. Just taking lruvec lock first may= be undermined by the page's memcg charge/migration. To fix this problem, we = could take out the page's lru bit clear and use it as pin down action to block = the memcg changes. That's the reason for new atomic func TestClearPageLRU. So now isolating a page need both actions: TestClearPageLRU and hold the lru_lock. The typical usage of this is isolate_migratepages_block() in compaction.c we have to take lru bit before lru lock, that serialized the page isolati= on in memcg page charge/migration which will change page's lruvec and new=20 lru_lock in it. The above solution suggested by Johannes Weiner, and based on his new mem= cg=20 charge path, then have this patchset. (Hugh Dickins tested and contribute= d much code from compaction fix to general code polish, thanks a lot!). Daniel Jordan's testing show 62% improvement on modified readtwice case on his 2P * 10 core * 2 HT broadwell box. https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1= .us.oracle.com/ Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this idea 8 years ago, and others who give comments as well: Daniel Jordan,=20 Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc. Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu, and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks! Alex Shi (17): mm/memcg: warning on !memcg after readahead page charged mm/memcg: bail early from swap accounting if memcg disabled mm/thp: move lru_add_page_tail func to huge_memory.c mm/thp: use head for head page in lru_add_page_tail mm/thp: Simplify lru_add_page_tail() mm/thp: narrow lru locking mm/vmscan: remove unnecessary lruvec adding mm/memcg: add debug checking in lock_page_memcg mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn mm/lru: move lock into lru_note_cost mm/vmscan: remove lruvec reget in move_pages_to_lru mm/mlock: remove lru_lock on TestClearPageMlocked mm/mlock: remove __munlock_isolate_lru_page mm/lru: introduce TestClearPageLRU mm/compaction: do page isolation first in compaction mm/swap.c: serialize memcg changes in pagevec_lru_move_fn mm/lru: replace pgdat lru_lock with lruvec lock Alexander Duyck (1): mm/lru: introduce the relock_page_lruvec function Hugh Dickins (2): mm: page_idle_get_page() does not need lru_lock mm/lru: revise the comments of lru_lock Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +- Documentation/admin-guide/cgroup-v1/memory.rst | 21 +-- Documentation/trace/events-kmem.rst | 2 +- Documentation/vm/unevictable-lru.rst | 22 +-- include/linux/memcontrol.h | 110 +++++++++++ include/linux/mm_types.h | 2 +- include/linux/mmdebug.h | 13 ++ include/linux/mmzone.h | 6 +- include/linux/page-flags.h | 1 + include/linux/swap.h | 4 +- mm/compaction.c | 94 +++++++--- mm/filemap.c | 4 +- mm/huge_memory.c | 45 +++-- mm/memcontrol.c | 85 ++++++++- mm/mlock.c | 63 ++----- mm/mmzone.c | 1 + mm/page_alloc.c | 1 - mm/page_idle.c | 4 - mm/rmap.c | 4 +- mm/swap.c | 199 ++++++++-------= ----- mm/vmscan.c | 203 +++++++++++----= ------ mm/workingset.c | 2 - 22 files changed, 523 insertions(+), 378 deletions(-) --=20 1.8.3.1