linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>,
	akpm@linux-foundation.org,  mgorman@techsingularity.net,
	tj@kernel.org, khlebnikov@yandex-team.ru,
	 daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com,
	 willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, shakeelb@google.com,
	 iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com
Subject: Re: [PATCH v11 00/16] per memcg lru lock
Date: Tue, 9 Jun 2020 20:22:13 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2006091904530.2779@eggly.anvils> (raw)
In-Reply-To: <31943f08-a8e8-be38-24fb-ab9d25fd96ff@linux.alibaba.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4095 bytes --]

On Mon, 8 Jun 2020, Alex Shi wrote:
> 在 2020/6/8 下午12:15, Hugh Dickins 写道:
> >>  24 files changed, 487 insertions(+), 312 deletions(-)
> > Hi Alex,
> > 
> > I didn't get to try v10 at all, waited until Johannes's preparatory
> > memcg swap cleanup was in mmotm; but I have spent a while thrashing
> > this v11, and can happily report that it is much better than v9 etc:
> > I believe this memcg lru_lock work will soon be ready for v5.9.
> > 
> > I've not yet found any flaw at the swapping end, but fixes are needed
> > for isolate_migratepages_block() and mem_cgroup_move_account(): I've
> > got a series of 4 fix patches to send you (I guess two to fold into
> > existing patches of yours, and two to keep as separate from me).
> > 
> > I haven't yet written the patch descriptions, will return to that
> > tomorrow.  I expect you will be preparing a v12 rebased on v5.8-rc1
> > or v5.8-rc2, and will be able to include these fixes in that.
> 
> I am very glad to get your help on this feature! 
> 
> and looking forward for your fixes tomorrow. :)
> 
> Thanks a lot!
> Alex

Sorry, Alex, the news is not so good today.

You'll have noticed I sent nothing yesterday. That's because I got
stuck on my second patch: could not quite convince myself that it
was safe.

I keep hinting at these patches, and I can't complete their writeups
until I'm convinced; but to give you a better idea of what they do:

1. Fixes isolate_fail and isolate_abort in isolate_migratepages_block().
2. Fixes unsafe use of trylock_page() in __isolate_lru_page_prepare().
3. Reverts 07/16 inversion of lock ordering in split_huge_page_to_list().
4. Adds lruvec lock protection in mem_cgroup_move_account().

In the second, I was using rcu_read_lock() instead of trylock_page()
(like in my own patchset), but could not quite be sure of the case when
PageSwapCache gets set at the wrong moment. Gave up for the night, and
in the morning abandoned that, instead just shifting the call to
__isolate_lru_page_prepare() after the get_page_unless_zero(),
where that trylock_page() becomes safe (no danger of stomping on page
flags while page is being freed or newly allocated to another owner).

I thought that a very safe change, but best to do some test runs with
it in before finalizing. And was then unpleasantly surprised to hit a
VM_BUG_ON_PAGE(lruvec_memcg(lruvec) != page->mem_cgroup) from
lock_page_lruvec_irqsave < relock_page_lruvec < pagevec_lru_move_fn <
pagevec_move_tail < lru_add_drain_cpu after 6 hours on one machine.
Then similar but < rotate_reclaimable_page after 8 hours on another.

Only seen once before: that's what drove me to add patch 4 (with 3 to
revert the locking before it): somehow, when adding the lruvec locking
there, I just took it for granted that your patchset would have the
appropriate locking (or TestClearPageLRU magic) at the other end.

But apparently not. And I'm beginning to think that TestClearPageLRU
was just to distract the audience from the lack of proper locking.

I have certainly not concluded that yet, but I'm having to think about
an area of the code which I'd imagined you had under control (and I'm
puzzled why my testing has found it so very hard to hit). If we're
lucky, I'll find that pagevec_move_tail is a special case, and
nothing much else needs changing; but I doubt that will be so.

There's one other unexplained and unfixed bug I've seen several times
while exercising mem_cgroup_move_account(): refcount_warn_saturate()
from where __mem_cgroup_clear_mc() calls mem_cgroup_id_get_many().
I'll be glad if that goes away when the lruvec locking is fixed,
but don't understand the connection. And it's quite possible that
this refcounting bug has nothing to do with your changes: I have
not succeeded in reproducing it on 5.7 nor on 5.7-rc7-mm1,
but I didn't really try long enough to be sure.

(I should also warn, that I'm surprised by the amount of change
11/16 makes to mm/mlock.c: I've not been exercising mlock at all.)

Taking a break for the evening,
Hugh

  reply	other threads:[~2020-06-10  3:22 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 11:00 [PATCH v11 00/16] per memcg lru lock Alex Shi
2020-05-28 11:00 ` [PATCH v11 01/16] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-05-28 11:00 ` [PATCH v11 02/16] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-05-28 11:00 ` [PATCH v11 03/16] mm/compaction: correct the comments of compact_defer_shift Alex Shi
2020-05-28 11:00 ` [PATCH v11 04/16] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-05-28 11:00 ` [PATCH v11 05/16] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-05-28 11:00 ` [PATCH v11 06/16] mm/thp: clean up lru_add_page_tail Alex Shi
2020-05-28 11:00 ` [PATCH v11 07/16] mm/thp: narrow lru locking Alex Shi
2020-05-28 11:00 ` [PATCH v11 08/16] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-05-28 11:00 ` [PATCH v11 09/16] mm/lru: introduce TestClearPageLRU Alex Shi
2020-05-28 11:00 ` [PATCH v11 10/16] mm/compaction: do page isolation first in compaction Alex Shi
2020-05-28 11:00 ` [PATCH v11 11/16] mm/mlock: reorder isolation sequence during munlock Alex Shi
2020-05-28 11:00 ` [PATCH v11 12/16] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-05-28 11:00 ` [PATCH v11 13/16] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-05-28 11:00 ` [PATCH v11 14/16] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-05-28 11:00 ` [PATCH v11 15/16] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-05-28 11:00 ` [PATCH v11 16/16] mm/lru: revise the comments of lru_lock Alex Shi
2020-06-08  4:15 ` [PATCH v11 00/16] per memcg lru lock Hugh Dickins
2020-06-08  6:13   ` Alex Shi
2020-06-10  3:22     ` Hugh Dickins [this message]
2020-06-11  6:06       ` Alex Shi
2020-06-11 22:09         ` Hugh Dickins
2020-06-12 10:43           ` Alex Shi
2020-06-16  6:14           ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2006091904530.2779@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).