All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Alex Shi <alex.shi@linux.alibaba.com>,
	akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com
Cc: Michal Hocko <mhocko@kernel.org>, Yang Shi <yang.shi@linux.alibaba.com>
Subject: Re: [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Thu, 12 Nov 2020 13:19:18 +0100	[thread overview]
Message-ID: <f9cfab13-fae2-c384-90b2-9e3107273734@suse.cz> (raw)
In-Reply-To: <1604566549-62481-18-git-send-email-alex.shi@linux.alibaba.com>

On 11/5/20 9:55 AM, Alex Shi wrote:
> This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
> each of memcg per node. So on a large machine, each of memcg don't
> have to suffer from per node pgdat->lru_lock competition. They could go
> fast with their self lru_lock.
> 
> After move memcg charge before lru inserting, page isolation could
> serialize page's memcg, then per memcg lruvec lock is stable and could
> replace per node lru lock.
> 
> In func isolate_migratepages_block, compact_unlock_should_abort and
> lock_page_lruvec_irqsave are open coded to work with compact_control.
> Also add a debug func in locking which may give some clues if there are
> sth out of hands.
> 
> Daniel Jordan's testing show 62% improvement on modified readtwice case
> on his 2P * 10 core * 2 HT broadwell box.
> https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/
> 
> On a large machine with memcg enabled but not used, the page's lruvec
> seeking pass a few pointers, that may lead to lru_lock holding time
> increase and a bit regression.
> 
> Hugh Dickins helped on the patch polish, thanks!
> 
> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> Acked-by: Hugh Dickins <hughd@google.com>
> Cc: Rong Chen <rong.a.chen@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: cgroups@vger.kernel.org

I think I need some explanation about the rcu_read_lock() usage in 
lock_page_lruvec*() (and places effectively opencoding it).
Preferably in form of some code comment, but that can be also added as a 
additional patch later, I don't want to block the series.

mem_cgroup_page_lruvec() comment says

  * This function relies on page->mem_cgroup being stable - see the
  * access rules in commit_charge().

commit_charge() comment:

          * Any of the following ensures page->mem_cgroup stability:
          *
          * - the page lock
          * - LRU isolation
          * - lock_page_memcg()
          * - exclusive reference

"LRU isolation" used to be quite clear, but now is it after 
TestClearPageLRU(page) or after deleting from the lru list as well?
Also it doesn't mention rcu_read_lock(), should it?

So what exactly are we protecting by rcu_read_lock() in e.g. lock_page_lruvec()?

         rcu_read_lock();
         lruvec = mem_cgroup_page_lruvec(page, pgdat);
         spin_lock(&lruvec->lru_lock);
         rcu_read_unlock();

Looks like we are protecting the lruvec from going away and it can't go away 
anymore after we take the lru_lock?

But then e.g. in __munlock_pagevec() we are doing this without an rcu_read_lock():

	new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));

where new_lruvec is potentionally not the one that we have locked

And the last thing mem_cgroup_page_lruvec() is doing is:

         if (unlikely(lruvec->pgdat != pgdat))
                 lruvec->pgdat = pgdat;
         return lruvec;

So without the rcu_read_lock() is this potentionally accessing the pgdat field 
of lruvec that might have just gone away?

Thanks,
Vlastimil

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>
To: Alex Shi
	<alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org,
	daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org,
	richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org,
	alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	rong.a.chen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	mhocko-IBi9RG/b67k@public.gmane.org,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	shy828301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Cc: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Yang Shi
	<yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Subject: Re: [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Thu, 12 Nov 2020 13:19:18 +0100	[thread overview]
Message-ID: <f9cfab13-fae2-c384-90b2-9e3107273734@suse.cz> (raw)
In-Reply-To: <1604566549-62481-18-git-send-email-alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>

On 11/5/20 9:55 AM, Alex Shi wrote:
> This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
> each of memcg per node. So on a large machine, each of memcg don't
> have to suffer from per node pgdat->lru_lock competition. They could go
> fast with their self lru_lock.
> 
> After move memcg charge before lru inserting, page isolation could
> serialize page's memcg, then per memcg lruvec lock is stable and could
> replace per node lru lock.
> 
> In func isolate_migratepages_block, compact_unlock_should_abort and
> lock_page_lruvec_irqsave are open coded to work with compact_control.
> Also add a debug func in locking which may give some clues if there are
> sth out of hands.
> 
> Daniel Jordan's testing show 62% improvement on modified readtwice case
> on his 2P * 10 core * 2 HT broadwell box.
> https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu-S51bK0XF4qpuJJETbFA3a0B3C2bhBk7L0E9HWUfgJXw@public.gmane.org/
> 
> On a large machine with memcg enabled but not used, the page's lruvec
> seeking pass a few pointers, that may lead to lru_lock holding time
> increase and a bit regression.
> 
> Hugh Dickins helped on the patch polish, thanks!
> 
> Signed-off-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
> Acked-by: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Rong Chen <rong.a.chen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Vladimir Davydov <vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Yang Shi <yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
> Cc: Matthew Wilcox <willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Cc: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
> Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
> Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I think I need some explanation about the rcu_read_lock() usage in 
lock_page_lruvec*() (and places effectively opencoding it).
Preferably in form of some code comment, but that can be also added as a 
additional patch later, I don't want to block the series.

mem_cgroup_page_lruvec() comment says

  * This function relies on page->mem_cgroup being stable - see the
  * access rules in commit_charge().

commit_charge() comment:

          * Any of the following ensures page->mem_cgroup stability:
          *
          * - the page lock
          * - LRU isolation
          * - lock_page_memcg()
          * - exclusive reference

"LRU isolation" used to be quite clear, but now is it after 
TestClearPageLRU(page) or after deleting from the lru list as well?
Also it doesn't mention rcu_read_lock(), should it?

So what exactly are we protecting by rcu_read_lock() in e.g. lock_page_lruvec()?

         rcu_read_lock();
         lruvec = mem_cgroup_page_lruvec(page, pgdat);
         spin_lock(&lruvec->lru_lock);
         rcu_read_unlock();

Looks like we are protecting the lruvec from going away and it can't go away 
anymore after we take the lru_lock?

But then e.g. in __munlock_pagevec() we are doing this without an rcu_read_lock():

	new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));

where new_lruvec is potentionally not the one that we have locked

And the last thing mem_cgroup_page_lruvec() is doing is:

         if (unlikely(lruvec->pgdat != pgdat))
                 lruvec->pgdat = pgdat;
         return lruvec;

So without the rcu_read_lock() is this potentionally accessing the pgdat field 
of lruvec that might have just gone away?

Thanks,
Vlastimil

  parent reply	other threads:[~2020-11-12 12:19 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05  8:55 [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-05  8:55 ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 01/19] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 02/19] mm/thp: use head for head page in lru_add_page_tail Alex Shi
2020-11-05  8:55 ` [PATCH v21 03/19] mm/thp: Simplify lru_add_page_tail() Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 04/19] mm/thp: narrow lru locking Alex Shi
2020-11-05  8:55 ` [PATCH v21 05/19] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-11 12:36   ` Vlastimil Babka
2020-11-11 12:36     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 06/19] mm/rmap: stop store reordering issue on page->mapping Alex Shi
2020-11-06  1:20   ` Alex Shi
2020-11-06  1:20     ` Alex Shi
2020-11-10 19:06     ` Johannes Weiner
2020-11-11  7:41     ` Hugh Dickins
2020-11-11  7:41       ` Hugh Dickins
2020-11-05  8:55 ` [PATCH v21 07/19] mm: page_idle_get_page() does not need lru_lock Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-10 19:01   ` Johannes Weiner
2020-11-11  8:17   ` huang ying
2020-11-11  8:17     ` huang ying
2020-11-11  8:17     ` huang ying
2020-11-11 12:52     ` Vlastimil Babka
2020-11-11 12:52       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 08/19] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-11-05  8:55 ` [PATCH v21 10/19] mm/lru: move lock into lru_note_cost Alex Shi
2020-11-05  8:55 ` [PATCH v21 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Alex Shi
2020-11-11 13:03   ` Vlastimil Babka
2020-11-11 13:03     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 13/19] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-11-11 13:07   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-11 13:36   ` Vlastimil Babka
2020-11-12  2:03     ` Hugh Dickins
2020-11-12  2:03       ` Hugh Dickins
2020-11-12 11:24       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 15/19] mm/compaction: do page isolation first in compaction Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-11 17:12   ` Vlastimil Babka
2020-11-11 17:12     ` Vlastimil Babka
2020-11-12  2:28     ` Hugh Dickins
2020-11-12  2:28       ` Hugh Dickins
2020-11-12  3:35       ` Alex Shi
2020-11-12  3:35         ` Alex Shi
2020-11-12 11:25       ` Vlastimil Babka
2020-11-12 11:25         ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-11-11 18:00   ` Vlastimil Babka
2020-11-11 18:00     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05 13:43   ` Alex Shi
2020-11-05 13:43     ` Alex Shi
2020-11-06  7:48     ` Alex Shi
2020-11-06  7:48       ` Alex Shi
2020-11-10 18:54       ` Johannes Weiner
2020-11-10 18:54         ` Johannes Weiner
2020-11-11 17:46   ` Vlastimil Babka
2020-11-11 17:46     ` Vlastimil Babka
2020-11-11 17:59     ` Vlastimil Babka
2020-11-12 12:19   ` Vlastimil Babka [this message]
2020-11-12 12:19     ` Vlastimil Babka
2020-11-12 14:19     ` Alex Shi
2020-11-12 14:19       ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 18/19] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-06  7:50   ` Alex Shi
2020-11-06  7:50     ` Alex Shi
2020-11-10 18:59     ` Johannes Weiner
2020-11-10 18:59       ` Johannes Weiner
2020-11-12 12:31   ` Vlastimil Babka
2020-11-12 12:31     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 19/19] mm/lru: revise the comments of lru_lock Alex Shi
2020-11-12 12:37   ` Vlastimil Babka
2020-11-12 12:37     ` Vlastimil Babka
2020-11-10 12:14 ` [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-10 12:14   ` Alex Shi
2020-11-16  3:45 ` Alex Shi
2020-11-16  3:45   ` Alex Shi
2020-12-15  0:47 ` Andrew Morton
2020-12-15  0:47   ` Andrew Morton
2020-12-15  2:16   ` Hugh Dickins
2020-12-15  2:16     ` Hugh Dickins
2020-12-15  2:16     ` Hugh Dickins
2020-12-15  2:28     ` Andrew Morton
2020-12-15  2:28       ` Andrew Morton
2021-01-05 19:30 ` Qian Cai
2021-01-05 19:30   ` Qian Cai
2021-01-05 19:30   ` Qian Cai
2021-01-05 19:42   ` Shakeel Butt
2021-01-05 19:42     ` Shakeel Butt
2021-01-05 19:42     ` Shakeel Butt
2021-01-05 20:11     ` Qian Cai
2021-01-05 20:11       ` Qian Cai
2021-01-05 20:11       ` Qian Cai
2021-01-05 21:35       ` Hugh Dickins
2021-01-05 21:35         ` Hugh Dickins
2021-01-05 21:35         ` Hugh Dickins
2021-01-05 22:01         ` Qian Cai
2021-01-05 22:01           ` Qian Cai
2021-01-05 22:01           ` Qian Cai
2021-01-06  3:10           ` Hugh Dickins
2021-01-06  3:10             ` Hugh Dickins
2021-01-06  3:10             ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9cfab13-fae2-c384-90b2-9e3107273734@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=alexander.duyck@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.