linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Hugh Dickins <hughd@google.com>,
	Alex Shi <alex.shi@linux.alibaba.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tejun Heo <tj@kernel.org>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	kbuild test robot <lkp@intel.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	cgroups@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Wei Yang <richard.weiyang@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Rong Chen <rong.a.chen@intel.com>, Michal Hocko <mhocko@suse.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	shy828301@gmail.com, Vlastimil Babka <vbabka@suse.cz>,
	Minchan Kim <minchan@kernel.org>, Qian Cai <cai@lca.pw>
Subject: Re: [PATCH v18 00/32] per memcg lru_lock: reviews
Date: Fri, 11 Sep 2020 22:12:27 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2009112058040.23961@eggly.anvils> (raw)
In-Reply-To: <CAKgT0Ucive3RreD3TJt1Fjch_BH2ygFfUnpAJ_1BhsHy74w88g@mail.gmail.com>

On Thu, 10 Sep 2020, Alexander Duyck wrote:
> On Wed, Sep 9, 2020 at 5:32 PM Hugh Dickins <hughd@google.com> wrote:
> > On Wed, 9 Sep 2020, Alexander Duyck wrote:
> > > On Tue, Sep 8, 2020 at 4:41 PM Hugh Dickins <hughd@google.com> wrote:
> > > > [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block
> > > > Most of this consists of replacing "locked" by "lruvec", which is good:
> > > > but please fold those changes back into 20/32 (or would it be 17/32?
> > > > I've not yet looked into the relationship between those two), so we
> > > > can then see more clearly what change this 28/32 (will need renaming!)
> > > > actually makes, to use lruvec_holds_page_lru_lock(). That may be a
> > > > good change, but it's mixed up with the "locked"->"lruvec" at present,
> > > > and I think you could have just used lruvec for locked all along
> > > > (but of course there's a place where you'll need new_lruvec too).
> > >
> > > I am good with my patch being folded in. No need to keep it separate.
> >
> > Thanks.  Though it was only the "locked"->"lruvec" changes I was
> > suggesting to fold back, to minimize the diff, so that we could
> > see your use of lruvec_holds_page_lru_lock() more clearly - you
> > had not introduced that function at the stage of the earlier patches.
> >
> > But now that I stare at it again, using lruvec_holds_page_lru_lock()
> > there doesn't look like an advantage to me: when it decides no, the
> > same calculation is made all over again in mem_cgroup_page_lruvec(),
> > whereas the code before only had to calculate it once.
> >
> > So, the code before looks better to me: I wonder, do you think that
> > rcu_read_lock() is more expensive than I think it?  There can be
> > debug instrumentation that makes it heavier, but by itself it is
> > very cheap (by design) - not worth branching around.
> 
> Actually what I was more concerned with was the pointer chase that
> required the RCU lock. With this function we are able to compare a
> pair of pointers from the page and the lruvec and avoid the need for
> the RCU lock. The way the old code was working we had to crawl through
> the memcg to get to the lruvec before we could compare it to the one
> we currently hold. The general idea is to use the data we have instead
> of having to pull in some additional cache lines to perform the test.

When you say "With this function...", I think you are referring to
lruvec_holds_page_lru_lock().  Yes, I appreciate what you're doing
there, making calculations from known-stable data, and taking it no
further than the required comparison; and I think (I don't yet claim
to have reviewed 21/32) what you do with it in relock_page_lruvec*()
is an improvement over what we had there before.

But here I'm talking about using it in isolate_migratepages_block()
in 28/32: in this case, the code before evaluated the new lruvec,
compared against the old, and immediately used the new lruvec if
different; whereas using lruvec_holds_page_lru_lock() makes an
almost (I agree not entirely, and I haven't counted cachelines)
equivalent evaluation, but its results have to thrown away when
it's false, then the new lruvec actually calculated and used.

The same "results thrown away" criticism can be made of
relock_page_lruvec*(), but what was done there before your rewrite
in v18 was no better: they both resort to lock_page_lruvec*(page),
working it all out again from page.  And I'm not suggesting that
be changed, not at this point anyway; but 28/32 looks to me
like a regression from what was done there before 28/32.

> 
> > >
> > > > [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block
> > > > NAK. I agree that isolate_migratepages_block() looks nicer this way, but
> > > > take a look at prep_new_page() in mm/page_alloc.c: post_alloc_hook() is
> > > > where set_page_refcounted() changes page->_refcount from 0 to 1, allowing
> > > > a racing get_page_unless_zero() to succeed; then later prep_compound_page()
> > > > is where PageHead and PageTails get set. So there's a small race window in
> > > > which this patch could deliver a compound page when it should not.
> > >
> > > So the main motivation for the patch was to avoid the case where we
> > > are having to reset the LRU flag.
> >
> > That would be satisfying.  Not necessary, but I agree satisfying.
> > Maybe depends also on your "skip" change, which I've not looked at yet?
> 
> My concern is that we have scenarios where isolate_migratepages_block
> could possibly prevent another page from being able to isolate a page.
> I'm mostly concerned with us potentially creating something like an
> isolation leak if multiple threads are doing something like clearing
> and then resetting the LRU flag. In my mind if we clear the LRU flag
> we should be certain we are going to remove the page as otherwise
> another thread would have done it if it would have been allowed
> access.

I agree it's nicer not to TestClearPageLRU unnecessarily; but if the
occasional unnecessary TestClearPageLRU were really a concern, then
there's a lot of more serious places to worry about - page reclaim
is the great isolator that comes first to my mind.

> 
> > > One question I would have is what if
> > > we swapped the code block with the __isolate_lru_page_prepare section?
> > > WIth that we would be taking a reference on the page, then verifying
> > > the LRU flag is set, and then testing for compound page flag bit.
> > > Would doing that close the race window since the LRU flag being set
> > > should indicate that the allocation has already been completed has it
> > > not?
> >
> > Yes, I think that would be safe, and would look better.  But I am
> > very hesitant to give snap assurances here (I've twice missed out
> > a vital PageLRU check from this sequence myself): it is very easy
> > to deceive myself and only see it later.
> 
> I'm not looking for assurances, just sanity checks to make sure I am
> not missing something obvious.
> 
> > If you can see a bug in what's there before these patches, certainly
> > we need to fix it.  But adding non-essential patches to the already
> > overlong series risks delaying it.
> 
> My concern ends up being that if we are clearing the bit and restoring
> it while holding the LRU lock we can effectively cause pages to become
> pseudo-pinned on the LRU. In my mind I would want us to avoid clearing
> the LRU flag until we know we are going to be pulling the page from
> the list once we take the lruvec lock. I interpret clearing of the
> flag to indicate the page has already been pulled, it just hasn't left
> the list yet. With us resetting the bit we are violating that which I
> worry will lead to issues.

Your concern and my concern are different, but we are "on the same page".

I've said repeatedly (to Alex) that I am not at ease with this
TestClearPageLRU() technique: he has got it working, reliably, but
I find it hard to reason about.  Perhaps I'm just too used to what
was there before, but clearing PageLRU and removing from LRU while
holding lru_lock seems natural to me; whereas disconnecting them
leaves us on shaky ground, adding comments and warnings about the
peculiar races involved.  And it adds a pair of atomic operations
on each page in pagevec_lru_move_fn(), which were not needed before.

I want us to go ahead with TestClearPageLRU, to get the series into
mmotm and under wider testing.  But if we accept the lock reordering
in 06/32, then it becomes possible to replace those TestClearPageLRUs
by lock_page_memcg()s: which in principle should be cheaper, but that
will have to be measured.

Hugh

  reply	other threads:[~2020-09-12  5:12 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 12:54 [PATCH v18 00/32] per memcg lru_lock Alex Shi
2020-08-24 12:54 ` [PATCH v18 01/32] mm/memcg: warning on !memcg after readahead page charged Alex Shi
2020-08-24 12:54 ` [PATCH v18 02/32] mm/memcg: bail out early from swap accounting when memcg is disabled Alex Shi
2020-08-24 12:54 ` [PATCH v18 03/32] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-08-24 12:54 ` [PATCH v18 04/32] mm/thp: clean up lru_add_page_tail Alex Shi
2020-08-24 12:54 ` [PATCH v18 05/32] mm/thp: remove code path which never got into Alex Shi
2020-08-24 12:54 ` [PATCH v18 06/32] mm/thp: narrow lru locking Alex Shi
2020-09-10 13:49   ` Matthew Wilcox
2020-09-11  3:37     ` Alex Shi
2020-09-13 15:27       ` Matthew Wilcox
2020-09-19  1:00         ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 07/32] mm/swap.c: stop deactivate_file_page if page not on lru Alex Shi
2020-08-24 12:54 ` [PATCH v18 08/32] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-24 12:54 ` [PATCH v18 09/32] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-08-24 12:54 ` [PATCH v18 10/32] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-08-24 12:54 ` [PATCH v18 11/32] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-08-24 12:54 ` [PATCH v18 12/32] mm/memcg: optimize mem_cgroup_page_lruvec Alex Shi
2020-08-24 12:54 ` [PATCH v18 13/32] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-08-24 12:54 ` [PATCH v18 14/32] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-24 12:54 ` [PATCH v18 15/32] mm/lru: move lock into lru_note_cost Alex Shi
2020-09-21 21:36   ` Hugh Dickins
2020-09-21 22:03     ` Hugh Dickins
2020-09-22  3:39       ` Alex Shi
2020-09-22  3:38     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU Alex Shi
2020-09-21 23:16   ` Hugh Dickins
2020-09-22  3:53     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 17/32] mm/compaction: do page isolation first in compaction Alex Shi
2020-09-21 23:49   ` Hugh Dickins
2020-09-22  4:57     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 18/32] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-08-24 12:54 ` [PATCH v18 19/32] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-09-22  0:42   ` Hugh Dickins
2020-09-22  5:00     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 20/32] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-09-22  5:27   ` Hugh Dickins
2020-09-22  8:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 21/32] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-09-22  5:40   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 22/32] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-09-22  5:44   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 23/32] mm/lru: revise the comments of lru_lock Alex Shi
2020-09-22  5:48   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 24/32] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-09-22  5:53   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 25/32] mm/mlock: remove lru_lock on TestClearPageMlocked in munlock_vma_page Alex Shi
2020-08-26  5:52   ` Alex Shi
2020-09-22  6:13   ` Hugh Dickins
2020-09-23  1:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 26/32] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-08-24 12:55 ` [PATCH v18 27/32] mm/swap.c: optimizing __pagevec_lru_add lru_lock Alex Shi
2020-08-26  9:07   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 30/32] mm: Drop use of test_and_set_skip in favor of just setting skip Alex Shi
2020-08-24 12:55 ` [PATCH v18 31/32] mm: Add explicit page decrement in exception path for isolate_lru_pages Alex Shi
2020-09-09  1:01   ` Matthew Wilcox
2020-09-09 15:43     ` Alexander Duyck
2020-09-09 17:07       ` Matthew Wilcox
2020-09-09 18:24       ` Hugh Dickins
2020-09-09 20:15         ` Matthew Wilcox
2020-09-09 21:05           ` Hugh Dickins
2020-09-09 21:17         ` Alexander Duyck
2020-08-24 12:55 ` [PATCH v18 32/32] mm: Split release_pages work into 3 passes Alex Shi
2020-08-24 18:42 ` [PATCH v18 00/32] per memcg lru_lock Andrew Morton
2020-08-24 19:50   ` Qian Cai
2020-08-24 20:24   ` Hugh Dickins
2020-08-25  1:56     ` Daniel Jordan
2020-08-25  3:26       ` Alex Shi
2020-08-25 11:39         ` Matthew Wilcox
2020-08-26  1:19         ` Daniel Jordan
2020-08-26  8:59           ` Alex Shi
2020-08-28  1:40             ` Daniel Jordan
2020-08-28  5:22               ` Alex Shi
2020-09-09  2:44               ` Aaron Lu
2020-09-09 11:40                 ` Michal Hocko
2020-08-25  8:52       ` Alex Shi
2020-08-25 13:00         ` Alex Shi
2020-08-27  7:01     ` Hugh Dickins
2020-08-27 12:20       ` Race between freeing and waking page Matthew Wilcox
2020-09-08 23:41       ` [PATCH v18 00/32] per memcg lru_lock: reviews Hugh Dickins
2020-09-09  2:24         ` Wei Yang
2020-09-09 15:08         ` Alex Shi
2020-09-09 23:16           ` Hugh Dickins
2020-09-11  2:50             ` Alex Shi
2020-09-12  2:13               ` Hugh Dickins
2020-09-13 14:21                 ` Alex Shi
2020-09-15  8:21                   ` Hugh Dickins
2020-09-15 16:58                     ` Daniel Jordan
2020-09-17  2:37                       ` Alex Shi
2020-09-17 14:35                         ` Daniel Jordan
2020-09-17 15:39                           ` Alexander Duyck
2020-09-17 16:48                             ` Daniel Jordan
2020-09-12  8:38           ` Hugh Dickins
2020-09-13 14:22             ` Alex Shi
2020-09-09 16:11         ` Alexander Duyck
2020-09-10  0:32           ` Hugh Dickins
2020-09-10 14:24             ` Alexander Duyck
2020-09-12  5:12               ` Hugh Dickins [this message]
2020-08-25  7:21   ` [PATCH v18 00/32] per memcg lru_lock Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2009112058040.23961@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=alexander.duyck@gmail.com \
    --cc=cai@lca.pw \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).