linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: Vlastimil Babka <vbabka@suse.cz>,
	akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com
Cc: Michal Hocko <mhocko@kernel.org>, Yang Shi <yang.shi@linux.alibaba.com>
Subject: Re: [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Thu, 12 Nov 2020 22:19:33 +0800	[thread overview]
Message-ID: <fe584528-9d9b-ac6d-bc9a-4be2d6b98cf4@linux.alibaba.com> (raw)
In-Reply-To: <f9cfab13-fae2-c384-90b2-9e3107273734@suse.cz>



在 2020/11/12 下午8:19, Vlastimil Babka 写道:
> On 11/5/20 9:55 AM, Alex Shi wrote:
>> This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
>> each of memcg per node. So on a large machine, each of memcg don't
>> have to suffer from per node pgdat->lru_lock competition. They could go
>> fast with their self lru_lock.
>>
>> After move memcg charge before lru inserting, page isolation could
>> serialize page's memcg, then per memcg lruvec lock is stable and could
>> replace per node lru lock.
>>
>> In func isolate_migratepages_block, compact_unlock_should_abort and
>> lock_page_lruvec_irqsave are open coded to work with compact_control.
>> Also add a debug func in locking which may give some clues if there are
>> sth out of hands.
>>
>> Daniel Jordan's testing show 62% improvement on modified readtwice case
>> on his 2P * 10 core * 2 HT broadwell box.
>> https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/
>>
>> On a large machine with memcg enabled but not used, the page's lruvec
>> seeking pass a few pointers, that may lead to lru_lock holding time
>> increase and a bit regression.
>>
>> Hugh Dickins helped on the patch polish, thanks!
>>
>> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
>> Acked-by: Hugh Dickins <hughd@google.com>
>> Cc: Rong Chen <rong.a.chen@intel.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
>> Cc: Yang Shi <yang.shi@linux.alibaba.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: linux-mm@kvack.org
>> Cc: cgroups@vger.kernel.org
> 
> I think I need some explanation about the rcu_read_lock() usage in lock_page_lruvec*() (and places effectively opencoding it).
> Preferably in form of some code comment, but that can be also added as a additional patch later, I don't want to block the series.
> 

Hi Vlastimil, 

Thanks for comments!

Oh, we did talk about the rcu_read_lock which is used to block memcg destroy during locking.
and the spin_lock actually includes a rcu_read_lock(). Yes, we could add this comments later.

> mem_cgroup_page_lruvec() comment says
> 
>  * This function relies on page->mem_cgroup being stable - see the
>  * access rules in commit_charge().
> 
> commit_charge() comment:
> 
>          * Any of the following ensures page->mem_cgroup stability:
>          *
>          * - the page lock
>          * - LRU isolation
>          * - lock_page_memcg()
>          * - exclusive reference
> 
> "LRU isolation" used to be quite clear, but now is it after TestClearPageLRU(page) or after deleting from the lru list as well?
> Also it doesn't mention rcu_read_lock(), should it?

The lru isolation still is same as old conception, a set actions that take a page from a lru list, and commit_charge do
need a isoltion for the page.

but the condition of page_memcg could be change since we don't rely on lru isolation for it. The comments
could be changed later.

> 
> So what exactly are we protecting by rcu_read_lock() in e.g. lock_page_lruvec()?
> 
>         rcu_read_lock();
>         lruvec = mem_cgroup_page_lruvec(page, pgdat);
>         spin_lock(&lruvec->lru_lock);
>         rcu_read_unlock();
> 
> Looks like we are protecting the lruvec from going away and it can't go away anymore after we take the lru_lock?
> 
> But then e.g. in __munlock_pagevec() we are doing this without an rcu_read_lock():
> 
>     new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));

TestClearPageLRU could block the page from memcg migration/destory.

Thanks
Alex

> 
> where new_lruvec is potentionally not the one that we have locked
> 
> And the last thing mem_cgroup_page_lruvec() is doing is:
> 
>         if (unlikely(lruvec->pgdat != pgdat))
>                 lruvec->pgdat = pgdat;
>         return lruvec;
> 
> So without the rcu_read_lock() is this potentionally accessing the pgdat field of lruvec that might have just gone away?
> 
> Thanks,
> Vlastimil


  reply	other threads:[~2020-11-12 14:20 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05  8:55 [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-05  8:55 ` [PATCH v21 01/19] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-11-05  8:55 ` [PATCH v21 02/19] mm/thp: use head for head page in lru_add_page_tail Alex Shi
2020-11-05  8:55 ` [PATCH v21 03/19] mm/thp: Simplify lru_add_page_tail() Alex Shi
2020-11-05  8:55 ` [PATCH v21 04/19] mm/thp: narrow lru locking Alex Shi
2020-11-05  8:55 ` [PATCH v21 05/19] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-11-11 12:36   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 06/19] mm/rmap: stop store reordering issue on page->mapping Alex Shi
2020-11-06  1:20   ` Alex Shi
2020-11-10 19:06     ` Johannes Weiner
2020-11-11  7:41     ` Hugh Dickins
2020-11-05  8:55 ` [PATCH v21 07/19] mm: page_idle_get_page() does not need lru_lock Alex Shi
2020-11-10 19:01   ` Johannes Weiner
2020-11-11  8:17   ` huang ying
2020-11-11 12:52     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 08/19] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-11-05  8:55 ` [PATCH v21 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-11-05  8:55 ` [PATCH v21 10/19] mm/lru: move lock into lru_note_cost Alex Shi
2020-11-05  8:55 ` [PATCH v21 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Alex Shi
2020-11-05  8:55 ` [PATCH v21 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Alex Shi
2020-11-11 13:03   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 13/19] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-11-11 13:07   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Alex Shi
2020-11-11 13:36   ` Vlastimil Babka
2020-11-12  2:03     ` Hugh Dickins
2020-11-12 11:24       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 15/19] mm/compaction: do page isolation first in compaction Alex Shi
2020-11-11 17:12   ` Vlastimil Babka
2020-11-12  2:28     ` Hugh Dickins
2020-11-12  3:35       ` Alex Shi
2020-11-12 11:25       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-11-11 18:00   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-11-05 13:43   ` Alex Shi
2020-11-06  7:48     ` Alex Shi
2020-11-10 18:54       ` Johannes Weiner
2020-11-11 17:46   ` Vlastimil Babka
2020-11-11 17:59     ` Vlastimil Babka
2020-11-12 12:19   ` Vlastimil Babka
2020-11-12 14:19     ` Alex Shi [this message]
2020-11-05  8:55 ` [PATCH v21 18/19] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-11-06  7:50   ` Alex Shi
2020-11-10 18:59     ` Johannes Weiner
2020-11-12 12:31   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 19/19] mm/lru: revise the comments of lru_lock Alex Shi
2020-11-12 12:37   ` Vlastimil Babka
2020-11-10 12:14 ` [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-16  3:45 ` Alex Shi
2020-12-15  0:47 ` Andrew Morton
2020-12-15  2:16   ` Hugh Dickins
2020-12-15  2:28     ` Andrew Morton
2021-01-05 19:30 ` Qian Cai
2021-01-05 19:42   ` Shakeel Butt
2021-01-05 20:11     ` Qian Cai
2021-01-05 21:35       ` Hugh Dickins
2021-01-05 22:01         ` Qian Cai
2021-01-06  3:10           ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe584528-9d9b-ac6d-bc9a-4be2d6b98cf4@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).