From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3BE3C32771 for ; Thu, 16 Jan 2020 03:05:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9498224679 for ; Thu, 16 Jan 2020 03:05:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9498224679 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 35AD28E001F; Wed, 15 Jan 2020 22:05:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 30A388E001C; Wed, 15 Jan 2020 22:05:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D2468E001F; Wed, 15 Jan 2020 22:05:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 048B48E001C for ; Wed, 15 Jan 2020 22:05:18 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id A6472181AEF07 for ; Thu, 16 Jan 2020 03:05:17 +0000 (UTC) X-FDA: 76382006274.25.boats74_85595bf3c444a X-HE-Tag: boats74_85595bf3c444a X-Filterd-Recvd-Size: 5920 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jan 2020 03:05:15 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R461e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0TnrEHFy_1579143910; Received: from localhost(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0TnrEHFy_1579143910) by smtp.aliyun-inc.com(127.0.0.1); Thu, 16 Jan 2020 11:05:11 +0800 From: Alex Shi To: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, shakeelb@google.com, hannes@cmpxchg.org Subject: [PATCH v8 00/10] per lruvec lru_lock for memcg Date: Thu, 16 Jan 2020 11:04:59 +0800 Message-Id: <1579143909-156105-1-git-send-email-alex.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all, This patchset move lru_lock into lruvec, give a lru_lock for each of lruvec, thus bring a lru_lock for each of memcg per node. So on a large node machine, each of memcg don't need suffer from per node pgdat->lru_lo= ck waiting. They could go fast with their self lru_lock. We introduce function lock_page_lruvec, which will lock the page's memcg and then memcg's lruvec->lru_lock(Thanks Johannes Weiner, Hugh Dickins and Konstantin Khlebnikov suggestion/reminder) to replace old pgdat->lru_lock. Following Daniel Jordan's suggestion, I run 208 'dd' with on 104 containers on a 2s * 26cores * HT box with a modefied case: https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/= tree/case-lru-file-readtwice With this patchset, the readtwice performance increased about 80% with containers. And no performance drops w/o container. Another way to guard move_account is by lru_lock instead of move_lock=20 Considering the memcg move task path: mem_cgroup_move_task: mem_cgroup_move_charge: lru_add_drain_all(); atomic_inc(&mc.from->moving_account); //ask lruvec's move_lock synchronize_rcu(); walk_parge_range: do charge_walk_ops(mem_cgroup_move_charge_pte_range): isolate_lru_page(); mem_cgroup_move_account(page,) spin_lock(&from->move_lock)=20 page->mem_cgroup =3D to; spin_unlock(&from->move_lock)=20 putback_lru_page(page) to guard 'page->mem_cgroup =3D to' by to_vec->lru_lock has the similar ef= fect with move_lock. So for performance reason, both solutions are same. Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought the same= idea 8 years ago. Thanks all the comments from Hugh Dickins, Konstantin Khlebnikov, Daniel = Jordan,=20 Johannes Weiner, Mel Gorman, Shakeel Butt, Rong Chen, Fengguang Wu, Yun W= ang etc. and some testing support from Intel 0days! v8, a, redo lock_page_lru cleanup as Konstantin Khlebnikov suggested. b, fix a bug in lruvec_memcg_debug, reported by Hugh Dickins v7, a, rebase on v5.5-rc3,=20 b, move the lock_page_lru() clean up before lock replace. v6,=20 a, rebase on v5.5-rc2, and redo performance testing. b, pick up Johanness' comments change and a lock_page_lru cleanup. v5, a, locking page's memcg according JohannesW suggestion b, using macro for non memcg, according to Metthew's suggestion. v4:=20 a, fix the page->mem_cgroup dereferencing issue, thanks Johannes Weiner b, remove the irqsave flags changes, thanks Metthew Wilcox c, merge/split patches for better understanding and bisection purpose v3: rebase on linux-next, and fold the relock fix patch into introducing = patch v2: bypass a performance regression bug and fix some function issues v1: initial version, aim testing show 5% performance increase on a 16 thr= eads box. Alex Shi (9): mm/vmscan: remove unnecessary lruvec adding mm/memcg: fold lock_page_lru into commit_charge mm/lru: replace pgdat lru_lock with lruvec lock mm/lru: introduce the relock_page_lruvec function mm/mlock: optimize munlock_pagevec by relocking mm/swap: only change the lru_lock iff page's lruvec is different mm/pgdat: remove pgdat lru_lock mm/lru: add debug checking for page memcg moving mm/memcg: add debug checking in lock_page_memcg Hugh Dickins (1): mm/lru: revise the comments of lru_lock Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +-- Documentation/admin-guide/cgroup-v1/memory.rst | 6 +- Documentation/trace/events-kmem.rst | 2 +- Documentation/vm/unevictable-lru.rst | 22 ++-- include/linux/memcontrol.h | 68 ++++++++++++ include/linux/mm_types.h | 2 +- include/linux/mmzone.h | 5 +- mm/compaction.c | 57 ++++++---- mm/filemap.c | 4 +- mm/huge_memory.c | 18 ++-- mm/memcontrol.c | 115 ++++++++++++++-= ------ mm/mlock.c | 28 ++--- mm/mmzone.c | 1 + mm/page_alloc.c | 1 - mm/page_idle.c | 7 +- mm/rmap.c | 2 +- mm/swap.c | 75 ++++++-------- mm/vmscan.c | 115 ++++++++++++---= ------ 18 files changed, 326 insertions(+), 217 deletions(-) --=20 1.8.3.1