From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B27C8C433DF for ; Thu, 30 Jul 2020 06:08:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63F0720842 for ; Thu, 30 Jul 2020 06:08:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63F0720842 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ACAFA6B0002; Thu, 30 Jul 2020 02:08:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7BDB6B0005; Thu, 30 Jul 2020 02:08:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 990886B0006; Thu, 30 Jul 2020 02:08:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 8374F6B0002 for ; Thu, 30 Jul 2020 02:08:54 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 23FC8181AC9BF for ; Thu, 30 Jul 2020 06:08:54 +0000 (UTC) X-FDA: 77093713788.05.roll12_2008bf826f78 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id F044B18020318 for ; Thu, 30 Jul 2020 06:08:53 +0000 (UTC) X-HE-Tag: roll12_2008bf826f78 X-Filterd-Recvd-Size: 6760 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 30 Jul 2020 06:08:52 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0U4D5cZp_1596089323; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U4D5cZp_1596089323) by smtp.aliyun-inc.com(127.0.0.1); Thu, 30 Jul 2020 14:08:44 +0800 Subject: Re: [PATCH v17 18/21] mm/lru: introduce the relock_page_lruvec function To: Alexander Duyck Cc: Andrew Morton , Mel Gorman , Tejun Heo , Hugh Dickins , Konstantin Khlebnikov , Daniel Jordan , Yang Shi , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" , Rong Chen , Thomas Gleixner , Andrey Ryabinin References: <1595681998-19193-1-git-send-email-alex.shi@linux.alibaba.com> <1595681998-19193-19-git-send-email-alex.shi@linux.alibaba.com> From: Alex Shi Message-ID: <3345bfbf-ebe9-b5e0-a731-77dd7d76b0c9@linux.alibaba.com> Date: Thu, 30 Jul 2020 14:08:41 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 X-Rspamd-Queue-Id: F044B18020318 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: =E5=9C=A8 2020/7/30 =E4=B8=8A=E5=8D=881:52, Alexander Duyck =E5=86=99=E9=81= =93: >> + rcu_read_lock(); >> + locked =3D mem_cgroup_page_lruvec(page, pgdat) =3D=3D locked_l= ruvec; >> + rcu_read_unlock(); >> + >> + if (locked) >> + return locked_lruvec; >> + >> + if (locked_lruvec) >> + unlock_page_lruvec_irqrestore(locked_lruvec, *flags); >> + >> + return lock_page_lruvec_irqsave(page, flags); >> +} >> + > So looking these over they seem to be pretty inefficient for what they > do. Basically in worst case (locked_lruvec =3D=3D NULL) you end up call= ing > mem_cgoup_page_lruvec and all the rcu_read_lock/unlock a couple times > for a single page. It might make more sense to structure this like: > if (locked_lruvec) { Uh, we still need to check this page's lruvec, that needs a rcu_read_lock= . to save a mem_cgroup_page_lruvec call, we have to open lock_page_lruvec as your mentained before. > if (lruvec_holds_page_lru_lock(page, locked_lruvec)) > return locked_lruvec; >=20 > unlock_page_lruvec_irqrestore(locked_lruvec, *flags); > } > return lock_page_lruvec_irqsave(page, flags); >=20 > The other piece that has me scratching my head is that I wonder if we > couldn't do this without needing the rcu_read_lock. For example, what > if we were to compare the page mem_cgroup pointer to the memcg back > pointer stored in the mem_cgroup_per_node? It seems like ordering > things this way would significantly reduce the overhead due to the > pointer chasing to see if the page is in the locked lruvec or not. >=20 If page->mem_cgroup always be charged. the following could be better. +/* Don't lock again iff page's lruvec locked */ +static inline struct lruvec *relock_page_lruvec_irqsave(struct page *pag= e, + struct lruvec *locked_lruvec, unsigned long *flags) +{ + struct lruvec *lruvec; + + if (mem_cgroup_disabled()) + return locked_lruvec; + + /* user page always be charged */ + VM_BUG_ON_PAGE(!page->mem_cgroup, page); + + rcu_read_lock(); + if (likely(lruvec_memcg(locked_lruvec) =3D=3D page->mem_cgroup)) = { + rcu_read_unlock(); + return locked_lruvec; + } + + if (locked_lruvec) + unlock_page_lruvec_irqrestore(locked_lruvec, *flags); + + lruvec =3D mem_cgroup_page_lruvec(page, page_pgdat(page)); + spin_lock_irqsave(&lruvec->lru_lock, *flags); + rcu_read_unlock(); + lruvec_memcg_debug(lruvec, page); + + return lruvec; +} + The user page is always be charged since readahead page is charged now. and looks we also can apply this patch. I will test it to see if there is other exception. commit 826128346e50f6c60c513e166998466b593becad Author: Alex Shi Date: Thu Jul 30 13:58:38 2020 +0800 mm/memcg: remove useless check on page->mem_cgroup Since readahead page will be charged on memcg too. We don't need to check this exception now. Signed-off-by: Alex Shi diff --git a/mm/memcontrol.c b/mm/memcontrol.c index af96217f2ec5..0c7f6bed199b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1336,12 +1336,6 @@ struct lruvec *mem_cgroup_page_lruvec(struct page = *page, struct pglist_data *pgd VM_BUG_ON_PAGE(PageTail(page), page); memcg =3D READ_ONCE(page->mem_cgroup); - /* - * Swapcache readahead pages are added to the LRU - and - * possibly migrated - before they are charged. - */ - if (!memcg) - memcg =3D root_mem_cgroup; mz =3D mem_cgroup_page_nodeinfo(memcg, page); lruvec =3D &mz->lruvec; @@ -6962,10 +6956,7 @@ void mem_cgroup_migrate(struct page *oldpage, stru= ct page *newpage) if (newpage->mem_cgroup) return; - /* Swapcache readahead pages can get replaced before being charge= d */ memcg =3D oldpage->mem_cgroup; - if (!memcg) - return; /* Force-charge the new page. The old one will be freed soon */ nr_pages =3D thp_nr_pages(newpage); @@ -7160,10 +7151,6 @@ void mem_cgroup_swapout(struct page *page, swp_ent= ry_t entry) memcg =3D page->mem_cgroup; - /* Readahead page, never charged */ - if (!memcg) - return; - /* * In case the memcg owning these pages has been offlined and doe= sn't * have an ID allocated to it anymore, charge the closest online