From: Alex Shi <alex.shi@linux.alibaba.com> To: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com, willy@infradead.org, hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com, iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com Cc: Alex Shi <alex.shi@linux.alibaba.com>, "Kirill A. Shutemov" <kirill@shutemov.name> Subject: [PATCH v11 11/16] mm/mlock: reorder isolation sequence during munlock Date: Thu, 28 May 2020 19:00:53 +0800 [thread overview] Message-ID: <1590663658-184131-12-git-send-email-alex.shi@linux.alibaba.com> (raw) In-Reply-To: <1590663658-184131-1-git-send-email-alex.shi@linux.alibaba.com> This patch reorder the isolation steps during munlock, move the lru lock to guard each pages, unfold __munlock_isolate_lru_page func, to do the preparation for lru lock change. __split_huge_page_refcount doesn't exist, but we still have to guard PageMlocked and PageLRU in __split_huge_page_tail. [lkp@intel.com: found a sleeping function bug ... at mm/rmap.c] Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org --- mm/mlock.c | 93 ++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 51 insertions(+), 42 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 03b3a5d99ad7..a0856085c4b7 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -103,25 +103,6 @@ void mlock_vma_page(struct page *page) } /* - * Isolate a page from LRU with optional get_page() pin. - * Assumes lru_lock already held and page already pinned. - */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) -{ - if (TestClearPageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (getpage) - get_page(page); - del_page_from_lru_list(page, lruvec, page_lru(page)); - return true; - } - - return false; -} - -/* * Finish munlock after successful page isolation * * Page must be locked. This is a wrapper for try_to_munlock() @@ -181,6 +162,7 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { int nr_pages; + bool clearlru = false; pg_data_t *pgdat = page_pgdat(page); /* For try_to_munlock() and to serialize with page migration */ @@ -189,32 +171,42 @@ unsigned int munlock_vma_page(struct page *page) VM_BUG_ON_PAGE(PageTail(page), page); /* - * Serialize with any parallel __split_huge_page_refcount() which + * Serialize with any parallel __split_huge_page_tail() which * might otherwise copy PageMlocked to part of the tail pages before * we clear it in the head page. It also stabilizes hpage_nr_pages(). */ + get_page(page); spin_lock_irq(&pgdat->lru_lock); + clearlru = TestClearPageLRU(page); if (!TestClearPageMlocked(page)) { - /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ - nr_pages = 1; - goto unlock_out; + if (clearlru) + SetPageLRU(page); + /* + * Potentially, PTE-mapped THP: do not skip the rest PTEs + * Reuse lock as memory barrier for release_pages racing. + */ + spin_unlock_irq(&pgdat->lru_lock); + put_page(page); + return 0; } nr_pages = hpage_nr_pages(page); __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (__munlock_isolate_lru_page(page, true)) { + if (clearlru) { + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + del_page_from_lru_list(page, lruvec, page_lru(page)); spin_unlock_irq(&pgdat->lru_lock); __munlock_isolated_page(page); - goto out; + } else { + spin_unlock_irq(&pgdat->lru_lock); + put_page(page); + __munlock_isolation_failed(page); } - __munlock_isolation_failed(page); - -unlock_out: - spin_unlock_irq(&pgdat->lru_lock); -out: return nr_pages - 1; } @@ -297,34 +289,51 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; + struct lruvec *lruvec; + bool clearlru; - if (TestClearPageMlocked(page)) { - /* - * We already have pin from follow_page_mask() - * so we can spare the get_page() here. - */ - if (__munlock_isolate_lru_page(page, false)) - continue; - else - __munlock_isolation_failed(page); - } else { + clearlru = TestClearPageLRU(page); + spin_lock_irq(&zone->zone_pgdat->lru_lock); + + if (!TestClearPageMlocked(page)) { delta_munlocked++; + if (clearlru) + SetPageLRU(page); + goto putback; + } + + if (!clearlru) { + __munlock_isolation_failed(page); + goto putback; } /* + * Isolate this page. + * We already have pin from follow_page_mask() + * so we can spare the get_page() here. + */ + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + del_page_from_lru_list(page, lruvec, page_lru(page)); + spin_unlock_irq(&zone->zone_pgdat->lru_lock); + continue; + + /* * We won't be munlocking this page in the next phase * but we still need to release the follow_page_mask() * pin. We cannot do it under lru_lock however. If it's * the last pin, __page_cache_release() would deadlock. */ +putback: + spin_unlock_irq(&zone->zone_pgdat->lru_lock); pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; } + /* tempary disable irq, will remove later */ + local_irq_disable(); __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(&zone->zone_pgdat->lru_lock); + local_irq_enable(); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); -- 1.8.3.1
WARNING: multiple messages have this Message-ID (diff)
From: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org, tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org, daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org, willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org, richard.weiyang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Cc: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>, "Kirill A. Shutemov" <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org> Subject: [PATCH v11 11/16] mm/mlock: reorder isolation sequence during munlock Date: Thu, 28 May 2020 19:00:53 +0800 [thread overview] Message-ID: <1590663658-184131-12-git-send-email-alex.shi@linux.alibaba.com> (raw) In-Reply-To: <1590663658-184131-1-git-send-email-alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> This patch reorder the isolation steps during munlock, move the lru lock to guard each pages, unfold __munlock_isolate_lru_page func, to do the preparation for lru lock change. __split_huge_page_refcount doesn't exist, but we still have to guard PageMlocked and PageLRU in __split_huge_page_tail. [lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org: found a sleeping function bug ... at mm/rmap.c] Signed-off-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org> Cc: Kirill A. Shutemov <kirill-oKw7cIdHH8eLwutG50LtGA@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> Cc: Matthew Wilcox <willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> Cc: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org --- mm/mlock.c | 93 ++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 51 insertions(+), 42 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 03b3a5d99ad7..a0856085c4b7 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -103,25 +103,6 @@ void mlock_vma_page(struct page *page) } /* - * Isolate a page from LRU with optional get_page() pin. - * Assumes lru_lock already held and page already pinned. - */ -static bool __munlock_isolate_lru_page(struct page *page, bool getpage) -{ - if (TestClearPageLRU(page)) { - struct lruvec *lruvec; - - lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); - if (getpage) - get_page(page); - del_page_from_lru_list(page, lruvec, page_lru(page)); - return true; - } - - return false; -} - -/* * Finish munlock after successful page isolation * * Page must be locked. This is a wrapper for try_to_munlock() @@ -181,6 +162,7 @@ static void __munlock_isolation_failed(struct page *page) unsigned int munlock_vma_page(struct page *page) { int nr_pages; + bool clearlru = false; pg_data_t *pgdat = page_pgdat(page); /* For try_to_munlock() and to serialize with page migration */ @@ -189,32 +171,42 @@ unsigned int munlock_vma_page(struct page *page) VM_BUG_ON_PAGE(PageTail(page), page); /* - * Serialize with any parallel __split_huge_page_refcount() which + * Serialize with any parallel __split_huge_page_tail() which * might otherwise copy PageMlocked to part of the tail pages before * we clear it in the head page. It also stabilizes hpage_nr_pages(). */ + get_page(page); spin_lock_irq(&pgdat->lru_lock); + clearlru = TestClearPageLRU(page); if (!TestClearPageMlocked(page)) { - /* Potentially, PTE-mapped THP: do not skip the rest PTEs */ - nr_pages = 1; - goto unlock_out; + if (clearlru) + SetPageLRU(page); + /* + * Potentially, PTE-mapped THP: do not skip the rest PTEs + * Reuse lock as memory barrier for release_pages racing. + */ + spin_unlock_irq(&pgdat->lru_lock); + put_page(page); + return 0; } nr_pages = hpage_nr_pages(page); __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (__munlock_isolate_lru_page(page, true)) { + if (clearlru) { + struct lruvec *lruvec; + + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + del_page_from_lru_list(page, lruvec, page_lru(page)); spin_unlock_irq(&pgdat->lru_lock); __munlock_isolated_page(page); - goto out; + } else { + spin_unlock_irq(&pgdat->lru_lock); + put_page(page); + __munlock_isolation_failed(page); } - __munlock_isolation_failed(page); - -unlock_out: - spin_unlock_irq(&pgdat->lru_lock); -out: return nr_pages - 1; } @@ -297,34 +289,51 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone) pagevec_init(&pvec_putback); /* Phase 1: page isolation */ - spin_lock_irq(&zone->zone_pgdat->lru_lock); for (i = 0; i < nr; i++) { struct page *page = pvec->pages[i]; + struct lruvec *lruvec; + bool clearlru; - if (TestClearPageMlocked(page)) { - /* - * We already have pin from follow_page_mask() - * so we can spare the get_page() here. - */ - if (__munlock_isolate_lru_page(page, false)) - continue; - else - __munlock_isolation_failed(page); - } else { + clearlru = TestClearPageLRU(page); + spin_lock_irq(&zone->zone_pgdat->lru_lock); + + if (!TestClearPageMlocked(page)) { delta_munlocked++; + if (clearlru) + SetPageLRU(page); + goto putback; + } + + if (!clearlru) { + __munlock_isolation_failed(page); + goto putback; } /* + * Isolate this page. + * We already have pin from follow_page_mask() + * so we can spare the get_page() here. + */ + lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page)); + del_page_from_lru_list(page, lruvec, page_lru(page)); + spin_unlock_irq(&zone->zone_pgdat->lru_lock); + continue; + + /* * We won't be munlocking this page in the next phase * but we still need to release the follow_page_mask() * pin. We cannot do it under lru_lock however. If it's * the last pin, __page_cache_release() would deadlock. */ +putback: + spin_unlock_irq(&zone->zone_pgdat->lru_lock); pagevec_add(&pvec_putback, pvec->pages[i]); pvec->pages[i] = NULL; } + /* tempary disable irq, will remove later */ + local_irq_disable(); __mod_zone_page_state(zone, NR_MLOCK, delta_munlocked); - spin_unlock_irq(&zone->zone_pgdat->lru_lock); + local_irq_enable(); /* Now we can release pins of pages that we are not munlocking */ pagevec_release(&pvec_putback); -- 1.8.3.1
next prev parent reply other threads:[~2020-05-28 11:01 UTC|newest] Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-05-28 11:00 [PATCH v11 00/16] per memcg lru lock Alex Shi 2020-05-28 11:00 ` [PATCH v11 01/16] mm/vmscan: remove unnecessary lruvec adding Alex Shi 2020-05-28 11:00 ` [PATCH v11 02/16] mm/page_idle: no unlikely double check for idle page counting Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 03/16] mm/compaction: correct the comments of compact_defer_shift Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 04/16] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi 2020-05-28 11:00 ` [PATCH v11 05/16] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 06/16] mm/thp: clean up lru_add_page_tail Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 07/16] mm/thp: narrow lru locking Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 08/16] mm/memcg: add debug checking in lock_page_memcg Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 09/16] mm/lru: introduce TestClearPageLRU Alex Shi 2020-05-28 11:00 ` [PATCH v11 10/16] mm/compaction: do page isolation first in compaction Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` Alex Shi [this message] 2020-05-28 11:00 ` [PATCH v11 11/16] mm/mlock: reorder isolation sequence during munlock Alex Shi 2020-05-28 11:00 ` [PATCH v11 12/16] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 13/16] mm/lru: introduce the relock_page_lruvec function Alex Shi 2020-05-28 11:00 ` [PATCH v11 14/16] mm/vmscan: use relock for move_pages_to_lru Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 15/16] mm/pgdat: remove pgdat lru_lock Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-05-28 11:00 ` [PATCH v11 16/16] mm/lru: revise the comments of lru_lock Alex Shi 2020-05-28 11:00 ` Alex Shi 2020-06-08 4:15 ` [PATCH v11 00/16] per memcg lru lock Hugh Dickins 2020-06-08 4:15 ` Hugh Dickins 2020-06-08 4:15 ` Hugh Dickins 2020-06-08 6:13 ` Alex Shi 2020-06-08 6:13 ` Alex Shi 2020-06-10 3:22 ` Hugh Dickins 2020-06-10 3:22 ` Hugh Dickins 2020-06-10 3:22 ` Hugh Dickins 2020-06-11 6:06 ` Alex Shi 2020-06-11 6:06 ` Alex Shi 2020-06-11 22:09 ` Hugh Dickins 2020-06-11 22:09 ` Hugh Dickins 2020-06-11 22:09 ` Hugh Dickins 2020-06-12 10:43 ` Alex Shi 2020-06-12 10:43 ` Alex Shi 2020-06-16 6:14 ` Alex Shi 2020-06-16 6:14 ` Alex Shi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1590663658-184131-12-git-send-email-alex.shi@linux.alibaba.com \ --to=alex.shi@linux.alibaba.com \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=daniel.m.jordan@oracle.com \ --cc=hannes@cmpxchg.org \ --cc=hughd@google.com \ --cc=iamjoonsoo.kim@lge.com \ --cc=khlebnikov@yandex-team.ru \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=lkp@intel.com \ --cc=mgorman@techsingularity.net \ --cc=richard.weiyang@gmail.com \ --cc=shakeelb@google.com \ --cc=tj@kernel.org \ --cc=willy@infradead.org \ --cc=yang.shi@linux.alibaba.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.