From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15959C432C3 for ; Sat, 16 Nov 2019 04:38:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A256320733 for ; Sat, 16 Nov 2019 04:38:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="KJtGujYN" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A256320733 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DAEB56B0003; Fri, 15 Nov 2019 23:38:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D5E656B0006; Fri, 15 Nov 2019 23:38:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4DD36B0007; Fri, 15 Nov 2019 23:38:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id ADA826B0003 for ; Fri, 15 Nov 2019 23:38:20 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 4E0594DC2 for ; Sat, 16 Nov 2019 04:38:20 +0000 (UTC) X-FDA: 76160883960.20.swing53_e6ce658aa026 X-HE-Tag: swing53_e6ce658aa026 X-Filterd-Recvd-Size: 5450 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Sat, 16 Nov 2019 04:38:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=NcGo6y3a7xQQOSK5TngZP8uR7CqYMj91tUhuMqf+9Ns=; b=KJtGujYNRdJwEveOIs3XLtio4 XHcsmagpcP1yaA2N+7UrIE8V88n7OW4Do7IfTCYeR+d4q8v+G3r8Zir+Cp+wqGKc7HcyXxooyXm1i jUE+uYz62ZT+621BjFs/KBU3haXSa51WqN5sxZOr6xU5RH76Y/Sz5sUAwn8r5wYzwdM5mfHtQUUs2 ZrfmLVkLLS3h61l7VCJuzpkz68DUwA0bDfpS9kSDMtYs2EMRC5HTkOW19cMl/t9SIEyiGCRFhHR5D Twd3LrtrHICHuFYKljsa+iVV+hFFYEI2DXYCf/BJPck0X727I4xYwpczptlMf5e6Dk3CiwXvaYxO3 s/HDUXKow==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1iVpqY-00079E-6F; Sat, 16 Nov 2019 04:38:06 +0000 Date: Fri, 15 Nov 2019 20:38:06 -0800 From: Matthew Wilcox To: Alex Shi Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Thomas Gleixner , Vlastimil Babka , Qian Cai , Andrey Ryabinin , "Kirill A. Shutemov" , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrea Arcangeli , David Rientjes , "Aneesh Kumar K.V" , swkhack , "Potyra, Stefan" , Mike Rapoport , Stephen Rothwell , Colin Ian King , Jason Gunthorpe , Mauro Carvalho Chehab , Peng Fan , Nikolay Borisov , Ira Weiny , Kirill Tkhai , Yafang Shao Subject: Re: [PATCH v3 3/7] mm/lru: replace pgdat lru_lock with lruvec lock Message-ID: <20191116043806.GD20752@bombadil.infradead.org> References: <1573874106-23802-1-git-send-email-alex.shi@linux.alibaba.com> <1573874106-23802-4-git-send-email-alex.shi@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1573874106-23802-4-git-send-email-alex.shi@linux.alibaba.com> User-Agent: Mutt/1.12.1 (2019-06-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Nov 16, 2019 at 11:15:02AM +0800, Alex Shi wrote: > This is the main patch to replace per node lru_lock with per memcg > lruvec lock. It also fold the irqsave flags into lruvec. I have to say, I don't love the part where we fold the irqsave flags into the lruvec. I know it saves us an argument, but it opens up the possibility of mismatched expectations. eg we currently have: static void __split_huge_page(struct page *page, struct list_head *list, struct lruvec *lruvec, pgoff_t end) { ... spin_unlock_irqrestore(&lruvec->lru_lock, lruvec->irqflags); so if we introduce a new caller, we have to be certain that this caller is also using lock_page_lruvec_irqsave() and not lock_page_lruvec_irq(). I can't think of a way to make the compiler enforce that, and if we don't, then we can get some odd crashes with interrupts being unexpectedly enabled or disabled, depending on how ->irqflags was used last. So it makes the code more subtle. And that's not a good thing. > +static inline struct lruvec *lock_page_lruvec_irq(struct page *page, > + struct pglist_data *pgdat) > +{ > + struct lruvec *lruvec = mem_cgroup_page_lruvec(page, pgdat); > + > + spin_lock_irq(&lruvec->lru_lock); > + > + return lruvec; > +} ... > +static struct lruvec *lock_page_lru(struct page *page, int *isolated) > { > pg_data_t *pgdat = page_pgdat(page); > + struct lruvec *lruvec = lock_page_lruvec_irq(page, pgdat); > > - spin_lock_irq(&pgdat->lru_lock); > if (PageLRU(page)) { > - struct lruvec *lruvec; > > - lruvec = mem_cgroup_page_lruvec(page, pgdat); > ClearPageLRU(page); > del_page_from_lru_list(page, lruvec, page_lru(page)); > *isolated = 1; > } else > *isolated = 0; > + > + return lruvec; > } But what if the page is !PageLRU? What lruvec did we just lock? According to the comments on mem_cgroup_page_lruvec(), * This function is only safe when following the LRU page isolation * and putback protocol: the LRU lock must be held, and the page must * either be PageLRU() or the caller must have isolated/allocated it. and now it's being called in order to find out which LRU lock to take. So this comment needs to be updated, if it's wrong, or this patch has a race.