linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	mgorman@techsingularity.net, tj@kernel.org,
	khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com,
	willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, shakeelb@google.com,
	iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com,
	kirill@shutemov.name, alexander.duyck@gmail.com,
	rong.a.chen@intel.com, mhocko@suse.com, vdavydov.dev@gmail.com,
	shy828301@gmail.com, vbabka@suse.cz, minchan@kernel.org,
	cai@lca.pw
Subject: Re: [PATCH v18 00/32] per memcg lru_lock: reviews
Date: Fri, 11 Sep 2020 19:13:39 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2009111634020.22739@eggly.anvils> (raw)
In-Reply-To: <855ad6ee-dba4-9729-78bd-23e392905cf6@linux.alibaba.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11229 bytes --]

On Fri, 11 Sep 2020, Alex Shi wrote:
> 在 2020/9/10 上午7:16, Hugh Dickins 写道:
> > On Wed, 9 Sep 2020, Alex Shi wrote:
> >> 在 2020/9/9 上午7:41, Hugh Dickins 写道:
> >>>
> >>> [PATCH v18 05/32] mm/thp: remove code path which never got into
> >>> This is a good simplification, but I see no sign that you understand
> >>> why it's valid: it relies on lru_add_page_tail() being called while
> >>> head refcount is frozen to 0: we would not get this far if someone
> >>> else holds a reference to the THP - which they must hold if they have
> >>> isolated the page from its lru (and that's true before or after your
> >>> per-memcg changes - but even truer after those changes, since PageLRU
> >>> can then be flipped without lru_lock at any instant): please explain
> >>> something of this in the commit message.
> >>
> >> Is the following commit log better?
> >>
> >>     split_huge_page() will never call on a page which isn't on lru list, so
> >>     this code never got a chance to run, and should not be run, to add tail
> >>     pages on a lru list which head page isn't there.
> >>
> >>     Hugh Dickins' mentioned:
> >>     The path should never be called since lru_add_page_tail() being called
> >>     while head refcount is frozen to 0: we would not get this far if someone
> >>     else holds a reference to the THP - which they must hold if they have
> >>     isolated the page from its lru.
> >>
> >>     Although the bug was never triggered, it'better be removed for code
> >>     correctness, and add a warn for unexpected calling.
> > 
> > Not much better, no.  split_huge_page() can easily be called for a page
> > which is not on the lru list at the time, 
> 
> Hi Hugh,
> 
> Thanks for comments!
> 
> There are some discussion on this point a couple of weeks ago,
> https://lkml.org/lkml/2020/7/9/760
> 
> Matthew Wilcox and Kirill have the following comments,
> > I don't understand how we get to split_huge_page() with a page that's
> > not on an LRU list.  Both anonymous and page cache pages should be on
> > an LRU list.  What am I missing?
> 
> Right, and it's never got removed from LRU during the split. The tail
> pages have to be added to LRU because they now separate from the tail
> page.
> 
> -- 
>  Kirill A. Shutemov

Yes, those were among the mails that I read through before getting
down to review.  I was surprised by their not understanding, but
it was a bit late to reply to that thread.

Perhaps everybody had been focused on pages which have been and
naturally belong on an LRU list, rather than pages which are on
the LRU list at the instant that split_huge_page() is called.

There are a number of places where PageLRU gets cleared, and a
number of places where we del_page_from_lru_list(), I think you'll
agree: your patches touch all or most of them.  Let's think of a
common one, isolate_lru_pages() used by page reclaim, but the same
would apply to most of the others.

Then there a number of places where split_huge_page() is called:
I am having difficulty finding any of those which cannot race with
page reclaim, but shall we choose anon THP's deferred_split_scan(),
or shmem THP's shmem_punch_compound()?

What prevents either of those from calling split_huge_page() at
a time when isolate_lru_pages() has removed the page from LRU?

But there's no problem in this race, because anyone isolating the
page from LRU must hold their own reference to the page (to prevent
it from being freed independently), and the can_split_huge_page() or
page_ref_freeze() in split_huge_page_to_list() will detect that and
fail the split with -EBUSY (or else succeed and prevent new references
from being acquired).  So this case never reaches lru_add_page_tail().

> 
> > and I don't know what was the
> > bug which was never triggered.  
> 
> So the only path to the removed part should be a bug, like  sth here,
> https://lkml.org/lkml/2020/7/10/118
> or
> https://lkml.org/lkml/2020/7/10/972

Oh, the use of split_huge_page() in __iommu_dma_alloc_pages() is just
nonsense, I thought it had already been removed - perhaps some debate
over __GFP_COMP held it up.  Not something you need worry about in
this patchset.

> 
> > Stick with whatever text you end up with
> > for the combination of 05/32 and 18/32, and I'll rewrite it after.
> 
> I am not object to merge them into one, I just don't know how to say
> clear about 2 patches in commit log. As patch 18, TestClearPageLRU
> add the incorrect posibility of remove lru bit during split, that's
> the reason of code path rewrite and a WARN there.

I did not know that was why you were putting 18/32 in at that
point, it does not mention TestClearPageLRU at all.  But the fact
remains that it's a nice cleanup, contains a reassuring WARN if we
got it wrong (and I've suggested a WARN on the other branch too),
it was valid before your changes, and it's valid after your changes.
Please merge it back into the uglier 05/32, and again I'll rewrite
whatever comment you come up with if necessary.

> > 
> >>> [PATCH v18 06/32] mm/thp: narrow lru locking
> >>> Why? What part does this play in the series? "narrow lru locking" can
> >>> also be described as "widen page cache locking": 
> >>
> >> Uh, the page cache locking isn't widen, it's still on the old place.
> > 
> > I'm not sure if you're joking there. Perhaps just a misunderstanding.
> > 
> > Yes, patch 06/32 does not touch the xa_lock(&mapping->i_pages) and
> > xa_lock(&swap_cache->i_pages) lines (odd how we've arrived at two of
> > those, but please do not get into cleaning it up now); but it removes
> > the spin_lock_irqsave(&pgdata->lru_lock, flags) which used to come
> > before them, and inserts a spin_lock(&pgdat->lru_lock) after them.
> > 
> > You call that narrowing the lru locking, okay, but I see it as also
> > pushing the page cache locking outwards: before this patch, page cache
> > lock was taken inside lru_lock; after this patch, page cache lock is
> > taken outside lru_lock.  If you cannot see that, then I think you
> > should not have touched this code at all; but it's what we have
> > been testing, and I think we should go forward with it.
> > 
> >>> But I wish you could give some reason for it in the commit message!
> >>
> >> It's a head scratch task. Would you like to tell me what's detailed info 
> >> should be there? Thanks!
> > 
> > So, you don't know why you did it either: then it will be hard to
> > justify.  I guess I'll have to write something for it later.  I'm
> > strongly tempted just to drop the patch, but expect it will become
> > useful later, for using lock_page_memcg() before getting lru_lock.
> > 
> 
> I thought the xa_lock and lru_lock relationship was described clear
> in the commit log,

You say "lru_lock and page cache xa_lock have no reason with current
sequence", but you give no reason for inverting their sequence:
"let's" is not a reason.

> and still no idea of the move_lock in the chain.

memcg->move_lock is what's at the heart of lock_page_memcg(), but
as much as possible that tries to avoid the overhead of actually
taking it, since moving memcg is a rare operation.  For lock ordering,
see the diagram in mm/rmap.c, which 23/32 updates to match this change.

Before this commit: lru_lock > move_lock > i_pages lock was the
expected lock ordering (but it looks as if the lru_lock > move_lock
requirement came from my per-memcg lru_lock patches).

After this commit:  move_lock > i_pages lock > lru_lock is the
required lock ordering, since there are strong reasons (in dirty
writeback) for move_lock > i_pages lock.

> Please refill them for what I overlooked.

Will do, but not before reviewing your remaining patches.

> Thanks!
> 
> >>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >>> Is that correct? Or Wei Yang suggested some part of it perhaps?
> >>
> >> Yes, we talked a lot to confirm the locking change is safe.
> > 
> > Okay, but the patch was written by you, and sent by you to Andrew:
> > that is not a case for "Signed-off-by: Someone Else".
> > 
> 
> Ok. let's remove his signed-off.
> 
> >>> [PATCH v18 27/32] mm/swap.c: optimizing __pagevec_lru_add lru_lock
> >>> Could we please drop this one for the moment? And come back to it later
> >>> when the basic series is safely in.  It's a good idea to try sorting
> >>> together those pages which come under the same lock (though my guess is
> >>> that they naturally gather themselves together quite well already); but
> >>> I'm not happy adding 360 bytes to the kernel stack here (and that in
> >>> addition to 192 bytes of horrid pseudo-vma in the shmem swapin case),
> >>> though that could be avoided by making it per-cpu. But I hope there's
> >>> a simpler way of doing it, as efficient, but also useful for the other
> >>> pagevec operations here: perhaps scanning the pagevec for same page->
> >>> mem_cgroup (and flags node bits), NULLing entries as they are done.
> >>> Another, easily fixed, minor defect in this patch: if I'm reading it
> >>> right, it reverses the order in which the pages are put on the lru?
> >>
> >> this patch could give about 10+% performance gain on my multiple memcg
> >> readtwice testing. fairness locking cost the performance much.
> > 
> > Good to know, should have been mentioned.  s/fairness/Repeated/
> > 
> > But what was the gain or loss on your multiple memcg readtwice
> > testing without this patch, compared against node-only lru_lock?
> > The 80% gain mentioned before, I presume.  So this further
> > optimization can wait until the rest is solid.
> 
> the gain based on the patch 26.

If I understand your brief comment there, you're saying that
in a fixed interval of time, the baseline 5.9-rc did 100 runs,
the patches up to and including 26/32 did 180 runs, then with
27/32 on top, did 198 runs?

That's a good improvement by 27/32, but not essential for getting
the patchset in: I don't think 27/32 is the right way to do it,
so I'd still prefer to hold it back from the "initial offering".

> 
> > 
> >>
> >> I also tried per cpu solution but that cause much trouble of per cpu func
> >> things, and looks no benefit except a bit struct size of stack, so if 
> >> stack size still fine. May we could use the solution and improve it better.
> >> like, functionlize, fix the reverse issue etc.
> > 
> > I don't know how important the stack depth consideration is nowadays:
> > I still care, maybe others don't, since VMAP_STACK became an option.
> > 
> > Yes, please fix the reversal (if I was right on that); and I expect
> > you could use a singly linked list instead of the double.
> 
> single linked list is more saving, but do we have to reverse walking to seek
> the head or tail for correct sequence?

I imagine all you need is to start off with a
	for (i = pagevec_count(pvec) - 1; i >= 0; i--)
loop.

> 
> > 
> > But I'll look for an alternative - later, once the urgent stuff
> > is completed - and leave the acks on this patch to others.
> 
> Ok, looking forward for your new solution!
> 
> Thanks
> Alex

  reply	other threads:[~2020-09-12  2:14 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 12:54 [PATCH v18 00/32] per memcg lru_lock Alex Shi
2020-08-24 12:54 ` [PATCH v18 01/32] mm/memcg: warning on !memcg after readahead page charged Alex Shi
2020-08-24 12:54 ` [PATCH v18 02/32] mm/memcg: bail out early from swap accounting when memcg is disabled Alex Shi
2020-08-24 12:54 ` [PATCH v18 03/32] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-08-24 12:54 ` [PATCH v18 04/32] mm/thp: clean up lru_add_page_tail Alex Shi
2020-08-24 12:54 ` [PATCH v18 05/32] mm/thp: remove code path which never got into Alex Shi
2020-08-24 12:54 ` [PATCH v18 06/32] mm/thp: narrow lru locking Alex Shi
2020-09-10 13:49   ` Matthew Wilcox
2020-09-11  3:37     ` Alex Shi
2020-09-13 15:27       ` Matthew Wilcox
2020-09-19  1:00         ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 07/32] mm/swap.c: stop deactivate_file_page if page not on lru Alex Shi
2020-08-24 12:54 ` [PATCH v18 08/32] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-24 12:54 ` [PATCH v18 09/32] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-08-24 12:54 ` [PATCH v18 10/32] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-08-24 12:54 ` [PATCH v18 11/32] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-08-24 12:54 ` [PATCH v18 12/32] mm/memcg: optimize mem_cgroup_page_lruvec Alex Shi
2020-08-24 12:54 ` [PATCH v18 13/32] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-08-24 12:54 ` [PATCH v18 14/32] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-24 12:54 ` [PATCH v18 15/32] mm/lru: move lock into lru_note_cost Alex Shi
2020-09-21 21:36   ` Hugh Dickins
2020-09-21 22:03     ` Hugh Dickins
2020-09-22  3:39       ` Alex Shi
2020-09-22  3:38     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU Alex Shi
2020-09-21 23:16   ` Hugh Dickins
2020-09-22  3:53     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 17/32] mm/compaction: do page isolation first in compaction Alex Shi
2020-09-21 23:49   ` Hugh Dickins
2020-09-22  4:57     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 18/32] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-08-24 12:54 ` [PATCH v18 19/32] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-09-22  0:42   ` Hugh Dickins
2020-09-22  5:00     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 20/32] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-09-22  5:27   ` Hugh Dickins
2020-09-22  8:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 21/32] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-09-22  5:40   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 22/32] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-09-22  5:44   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 23/32] mm/lru: revise the comments of lru_lock Alex Shi
2020-09-22  5:48   ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 24/32] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-09-22  5:53   ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 25/32] mm/mlock: remove lru_lock on TestClearPageMlocked in munlock_vma_page Alex Shi
2020-08-26  5:52   ` Alex Shi
2020-09-22  6:13   ` Hugh Dickins
2020-09-23  1:58     ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 26/32] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-08-24 12:55 ` [PATCH v18 27/32] mm/swap.c: optimizing __pagevec_lru_add lru_lock Alex Shi
2020-08-26  9:07   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 30/32] mm: Drop use of test_and_set_skip in favor of just setting skip Alex Shi
2020-08-24 12:55 ` [PATCH v18 31/32] mm: Add explicit page decrement in exception path for isolate_lru_pages Alex Shi
2020-09-09  1:01   ` Matthew Wilcox
2020-09-09 15:43     ` Alexander Duyck
2020-09-09 17:07       ` Matthew Wilcox
2020-09-09 18:24       ` Hugh Dickins
2020-09-09 20:15         ` Matthew Wilcox
2020-09-09 21:05           ` Hugh Dickins
2020-09-09 21:17         ` Alexander Duyck
2020-08-24 12:55 ` [PATCH v18 32/32] mm: Split release_pages work into 3 passes Alex Shi
2020-08-24 18:42 ` [PATCH v18 00/32] per memcg lru_lock Andrew Morton
2020-08-24 19:50   ` Qian Cai
2020-08-24 20:24   ` Hugh Dickins
2020-08-25  1:56     ` Daniel Jordan
2020-08-25  3:26       ` Alex Shi
2020-08-25 11:39         ` Matthew Wilcox
2020-08-26  1:19         ` Daniel Jordan
2020-08-26  8:59           ` Alex Shi
2020-08-28  1:40             ` Daniel Jordan
2020-08-28  5:22               ` Alex Shi
2020-09-09  2:44               ` Aaron Lu
2020-09-09 11:40                 ` Michal Hocko
2020-08-25  8:52       ` Alex Shi
2020-08-25 13:00         ` Alex Shi
2020-08-27  7:01     ` Hugh Dickins
2020-08-27 12:20       ` Race between freeing and waking page Matthew Wilcox
2020-09-08 23:41       ` [PATCH v18 00/32] per memcg lru_lock: reviews Hugh Dickins
2020-09-09  2:24         ` Wei Yang
2020-09-09 15:08         ` Alex Shi
2020-09-09 23:16           ` Hugh Dickins
2020-09-11  2:50             ` Alex Shi
2020-09-12  2:13               ` Hugh Dickins [this message]
2020-09-13 14:21                 ` Alex Shi
2020-09-15  8:21                   ` Hugh Dickins
2020-09-15 16:58                     ` Daniel Jordan
2020-09-17  2:37                       ` Alex Shi
2020-09-17 14:35                         ` Daniel Jordan
2020-09-17 15:39                           ` Alexander Duyck
2020-09-17 16:48                             ` Daniel Jordan
2020-09-12  8:38           ` Hugh Dickins
2020-09-13 14:22             ` Alex Shi
2020-09-09 16:11         ` Alexander Duyck
2020-09-10  0:32           ` Hugh Dickins
2020-09-10 14:24             ` Alexander Duyck
2020-09-12  5:12               ` Hugh Dickins
2020-08-25  7:21   ` [PATCH v18 00/32] per memcg lru_lock Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2009111634020.22739@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=alexander.duyck@gmail.com \
    --cc=cai@lca.pw \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).