From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42761C43461 for ; Thu, 10 Sep 2020 14:25:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5BC222075B for ; Thu, 10 Sep 2020 14:25:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CsIlPa7k" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BC222075B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD1D7900011; Thu, 10 Sep 2020 10:25:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A81B6900010; Thu, 10 Sep 2020 10:25:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97029900011; Thu, 10 Sep 2020 10:25:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196]) by kanga.kvack.org (Postfix) with ESMTP id 7E505900010 for ; Thu, 10 Sep 2020 10:25:09 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 3D5B9181AEF21 for ; Thu, 10 Sep 2020 14:25:09 +0000 (UTC) X-FDA: 77247373938.21.curve88_2a16ab6270e6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 0A5BA180442C4 for ; Thu, 10 Sep 2020 14:25:09 +0000 (UTC) X-HE-Tag: curve88_2a16ab6270e6 X-Filterd-Recvd-Size: 8808 Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 10 Sep 2020 14:25:08 +0000 (UTC) Received: by mail-io1-f66.google.com with SMTP id d190so7274699iof.3 for ; Thu, 10 Sep 2020 07:25:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NJbG0wJxu+3OkwLFEOd0Ennw5o6VsPA/l71MgV2bL8A=; b=CsIlPa7kgf5QyQFS8hC4+hb0L/bF/jNeX8ymjdfPGndzk6+fjtb6UPlyyDOgn9T/Rj dcNmQZtqRILJ5b0WMWAc4ARCiPRsOwn3OAskLXH4wSnP6Fci/pDDyc9GoV3CkuB65CRE gCg/SteaRqwF7agvOR3wOsgUovXU4Sk9WMgkXj7yL2Q8Osfwx9Q9xYGiSHVOIpXEQ5A/ yTEI8iUxit6bTdUVaNNcOqXYtIqzxR47TjMSxqbx8tT+xNLFdbC99+e8WoQz4furBy7G kgFpOOKohzSXDEmsL3JJgxTPGBG3JAgnQ2qg4iT0sUE0gv9NfreIR9dAMEP/R00EWqml ypGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NJbG0wJxu+3OkwLFEOd0Ennw5o6VsPA/l71MgV2bL8A=; b=Gm1DERvBZw3EqQ4GJsk0vnHp6hInPNo0l5O1Y3r1I5u2d94WTvdcX1lwIy2HSp2irB 6zCCEFcTrdqjZ3LA2Fe+PZNsj+aQvvtnjQXId+G9uOOP+QR5UXOvHfdXowqQC0nwPXHW 2RkOw3dNrm1BKQBAFxQ/4K8JUKanWod04HgyCQbB5157Un5GGCm7hrjv7YElwQcYZjup 69s6W7gZ3Fx5pV1ElRet/jviDpNXlD1SiQmpc797UxYZX63JTr+5wR/QX78n94TD5Scx u8147tHwvpfVkvuYz/MkhxVyRrXiQjFlaqxDdQh1TyWenr8P9DBXGunqvIeLqb2k9+fF E6Xg== X-Gm-Message-State: AOAM533ODnaqbBJGUOdJ8B/lMPqkk1RjaPVU2SqQ2cESdSSSawXdPp48 jtF3yciLmCtWjwcZGM813rbyU1GHvpdiZIDJaPc= X-Google-Smtp-Source: ABdhPJyHLAehFart54v04ZVzdd9dYoO1X/9VyzBD35P5qq8nrYTHzo/wmLH9Bi9dzeOL0T4JorwX9diu/ssyuf15Fzg= X-Received: by 2002:a05:6602:240c:: with SMTP id s12mr7606501ioa.5.1599747907552; Thu, 10 Sep 2020 07:25:07 -0700 (PDT) MIME-Version: 1.0 References: <1598273705-69124-1-git-send-email-alex.shi@linux.alibaba.com> <20200824114204.cc796ca182db95809dd70a47@linux-foundation.org> In-Reply-To: From: Alexander Duyck Date: Thu, 10 Sep 2020 07:24:56 -0700 Message-ID: Subject: Re: [PATCH v18 00/32] per memcg lru_lock: reviews To: Hugh Dickins Cc: Alex Shi , Andrew Morton , Mel Gorman , Tejun Heo , Konstantin Khlebnikov , Daniel Jordan , Matthew Wilcox , Johannes Weiner , kbuild test robot , linux-mm , LKML , cgroups@vger.kernel.org, Shakeel Butt , Joonsoo Kim , Wei Yang , "Kirill A. Shutemov" , Rong Chen , Michal Hocko , Vladimir Davydov , shy828301@gmail.com, Vlastimil Babka , Minchan Kim , Qian Cai Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 0A5BA180442C4 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 9, 2020 at 5:32 PM Hugh Dickins wrote: > > On Wed, 9 Sep 2020, Alexander Duyck wrote: > > On Tue, Sep 8, 2020 at 4:41 PM Hugh Dickins wrote: > > > [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block > > > Most of this consists of replacing "locked" by "lruvec", which is good: > > > but please fold those changes back into 20/32 (or would it be 17/32? > > > I've not yet looked into the relationship between those two), so we > > > can then see more clearly what change this 28/32 (will need renaming!) > > > actually makes, to use lruvec_holds_page_lru_lock(). That may be a > > > good change, but it's mixed up with the "locked"->"lruvec" at present, > > > and I think you could have just used lruvec for locked all along > > > (but of course there's a place where you'll need new_lruvec too). > > > > I am good with my patch being folded in. No need to keep it separate. > > Thanks. Though it was only the "locked"->"lruvec" changes I was > suggesting to fold back, to minimize the diff, so that we could > see your use of lruvec_holds_page_lru_lock() more clearly - you > had not introduced that function at the stage of the earlier patches. > > But now that I stare at it again, using lruvec_holds_page_lru_lock() > there doesn't look like an advantage to me: when it decides no, the > same calculation is made all over again in mem_cgroup_page_lruvec(), > whereas the code before only had to calculate it once. > > So, the code before looks better to me: I wonder, do you think that > rcu_read_lock() is more expensive than I think it? There can be > debug instrumentation that makes it heavier, but by itself it is > very cheap (by design) - not worth branching around. Actually what I was more concerned with was the pointer chase that required the RCU lock. With this function we are able to compare a pair of pointers from the page and the lruvec and avoid the need for the RCU lock. The way the old code was working we had to crawl through the memcg to get to the lruvec before we could compare it to the one we currently hold. The general idea is to use the data we have instead of having to pull in some additional cache lines to perform the test. > > > > > [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block > > > NAK. I agree that isolate_migratepages_block() looks nicer this way, but > > > take a look at prep_new_page() in mm/page_alloc.c: post_alloc_hook() is > > > where set_page_refcounted() changes page->_refcount from 0 to 1, allowing > > > a racing get_page_unless_zero() to succeed; then later prep_compound_page() > > > is where PageHead and PageTails get set. So there's a small race window in > > > which this patch could deliver a compound page when it should not. > > > > So the main motivation for the patch was to avoid the case where we > > are having to reset the LRU flag. > > That would be satisfying. Not necessary, but I agree satisfying. > Maybe depends also on your "skip" change, which I've not looked at yet? My concern is that we have scenarios where isolate_migratepages_block could possibly prevent another page from being able to isolate a page. I'm mostly concerned with us potentially creating something like an isolation leak if multiple threads are doing something like clearing and then resetting the LRU flag. In my mind if we clear the LRU flag we should be certain we are going to remove the page as otherwise another thread would have done it if it would have been allowed access. > > One question I would have is what if > > we swapped the code block with the __isolate_lru_page_prepare section? > > WIth that we would be taking a reference on the page, then verifying > > the LRU flag is set, and then testing for compound page flag bit. > > Would doing that close the race window since the LRU flag being set > > should indicate that the allocation has already been completed has it > > not? > > Yes, I think that would be safe, and would look better. But I am > very hesitant to give snap assurances here (I've twice missed out > a vital PageLRU check from this sequence myself): it is very easy > to deceive myself and only see it later. I'm not looking for assurances, just sanity checks to make sure I am not missing something obvious. > If you can see a bug in what's there before these patches, certainly > we need to fix it. But adding non-essential patches to the already > overlong series risks delaying it. My concern ends up being that if we are clearing the bit and restoring it while holding the LRU lock we can effectively cause pages to become pseudo-pinned on the LRU. In my mind I would want us to avoid clearing the LRU flag until we know we are going to be pulling the page from the list once we take the lruvec lock. I interpret clearing of the flag to indicate the page has already been pulled, it just hasn't left the list yet. With us resetting the bit we are violating that which I worry will lead to issues.