All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Subject: [PATCH v21 04/19] mm/thp: narrow lru locking
Date: Thu,  5 Nov 2020 16:55:34 +0800	[thread overview]
Message-ID: <1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com> (raw)
In-Reply-To: <1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com>

lru_lock and page cache xa_lock have no obvious reason to be taken
one way round or the other: until now, lru_lock has been taken before
page cache xa_lock, when splitting a THP; but nothing else takes them
together.  Reverse that ordering: let's narrow the lru locking - but
leave local_irq_disable to block interrupts throughout, like before.

Hugh Dickins point: split_huge_page_to_list() was already silly, to be
using the _irqsave variant: it's just been taking sleeping locks, so
would already be broken if entered with interrupts enabled.  So we
can save passing flags argument down to __split_huge_page().

Why change the lock ordering here? That was hard to decide. One reason:
when this series reaches per-memcg lru locking, it relies on the THP's
memcg to be stable when taking the lru_lock: that is now done after the
THP's refcount has been frozen, which ensures page memcg cannot change.

Another reason: previously, lock_page_memcg()'s move_lock was presumed
to nest inside lru_lock; but now lru_lock must nest inside (page cache
lock inside) move_lock, so it becomes possible to use lock_page_memcg()
to stabilize page memcg before taking its lru_lock.  That is not the
mechanism used in this series, but it is an option we want to keep open.

[Hugh Dickins: rewrite commit log]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 79318d7f7d5d..b70ec0c6076b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2435,7 +2435,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-		pgoff_t end, unsigned long flags)
+		pgoff_t end)
 {
 	struct page *head = compound_head(page);
 	pg_data_t *pgdat = page_pgdat(head);
@@ -2445,8 +2445,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	unsigned int nr = thp_nr_pages(head);
 	int i;
 
-	lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
 	/* complete memcg works before add pages to LRU */
 	mem_cgroup_split_huge_fixup(head);
 
@@ -2458,6 +2456,11 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
+	/* prevent PageLRU to go away from under us, and freeze lru stats */
+	spin_lock(&pgdat->lru_lock);
+
+	lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
 	for (i = nr - 1; i >= 1; i--) {
 		__split_huge_page_tail(head, i, lruvec, list);
 		/* Some pages can be beyond i_size: drop them from page cache */
@@ -2477,6 +2480,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 	}
 
 	ClearPageCompound(head);
+	spin_unlock(&pgdat->lru_lock);
+	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, nr);
 
@@ -2494,8 +2499,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		page_ref_add(head, 2);
 		xa_unlock(&head->mapping->i_pages);
 	}
-
-	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
+	local_irq_enable();
 
 	remap_page(head, nr);
 
@@ -2641,12 +2645,10 @@ bool can_split_huge_page(struct page *page, int *pextra_pins)
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
 	struct page *head = compound_head(page);
-	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
 	struct deferred_split *ds_queue = get_deferred_split_queue(head);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
 	int count, mapcount, extra_pins, ret;
-	unsigned long flags;
 	pgoff_t end;
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2707,9 +2709,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	unmap_page(head);
 	VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-	/* prevent PageLRU to go away from under us, and freeze lru stats */
-	spin_lock_irqsave(&pgdata->lru_lock, flags);
-
+	/* block interrupt reentry in xa_lock and spinlock */
+	local_irq_disable();
 	if (mapping) {
 		XA_STATE(xas, &mapping->i_pages, page_index(head));
 
@@ -2739,7 +2740,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 				__dec_lruvec_page_state(head, NR_FILE_THPS);
 		}
 
-		__split_huge_page(page, list, end, flags);
+		__split_huge_page(page, list, end);
 		ret = 0;
 	} else {
 		if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) {
@@ -2753,7 +2754,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:		if (mapping)
 			xa_unlock(&mapping->i_pages);
-		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
+		local_irq_enable();
 		remap_page(head, thp_nr_pages(head));
 		ret = -EBUSY;
 	}
-- 
1.8.3.1


  parent reply	other threads:[~2020-11-05  8:57 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05  8:55 [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-05  8:55 ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 01/19] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 02/19] mm/thp: use head for head page in lru_add_page_tail Alex Shi
2020-11-05  8:55 ` [PATCH v21 03/19] mm/thp: Simplify lru_add_page_tail() Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` Alex Shi [this message]
2020-11-05  8:55 ` [PATCH v21 05/19] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-11 12:36   ` Vlastimil Babka
2020-11-11 12:36     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 06/19] mm/rmap: stop store reordering issue on page->mapping Alex Shi
2020-11-06  1:20   ` Alex Shi
2020-11-06  1:20     ` Alex Shi
2020-11-10 19:06     ` Johannes Weiner
2020-11-11  7:41     ` Hugh Dickins
2020-11-11  7:41       ` Hugh Dickins
2020-11-05  8:55 ` [PATCH v21 07/19] mm: page_idle_get_page() does not need lru_lock Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-10 19:01   ` Johannes Weiner
2020-11-11  8:17   ` huang ying
2020-11-11  8:17     ` huang ying
2020-11-11  8:17     ` huang ying
2020-11-11 12:52     ` Vlastimil Babka
2020-11-11 12:52       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 08/19] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 09/19] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-11-05  8:55 ` [PATCH v21 10/19] mm/lru: move lock into lru_note_cost Alex Shi
2020-11-05  8:55 ` [PATCH v21 11/19] mm/vmscan: remove lruvec reget in move_pages_to_lru Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 12/19] mm/mlock: remove lru_lock on TestClearPageMlocked Alex Shi
2020-11-11 13:03   ` Vlastimil Babka
2020-11-11 13:03     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 13/19] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-11-11 13:07   ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 14/19] mm/lru: introduce TestClearPageLRU Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-11 13:36   ` Vlastimil Babka
2020-11-12  2:03     ` Hugh Dickins
2020-11-12  2:03       ` Hugh Dickins
2020-11-12 11:24       ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 15/19] mm/compaction: do page isolation first in compaction Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-11 17:12   ` Vlastimil Babka
2020-11-11 17:12     ` Vlastimil Babka
2020-11-12  2:28     ` Hugh Dickins
2020-11-12  2:28       ` Hugh Dickins
2020-11-12  3:35       ` Alex Shi
2020-11-12  3:35         ` Alex Shi
2020-11-12 11:25       ` Vlastimil Babka
2020-11-12 11:25         ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 16/19] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-11-11 18:00   ` Vlastimil Babka
2020-11-11 18:00     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 17/19] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-05 13:43   ` Alex Shi
2020-11-05 13:43     ` Alex Shi
2020-11-06  7:48     ` Alex Shi
2020-11-06  7:48       ` Alex Shi
2020-11-10 18:54       ` Johannes Weiner
2020-11-10 18:54         ` Johannes Weiner
2020-11-11 17:46   ` Vlastimil Babka
2020-11-11 17:46     ` Vlastimil Babka
2020-11-11 17:59     ` Vlastimil Babka
2020-11-12 12:19   ` Vlastimil Babka
2020-11-12 12:19     ` Vlastimil Babka
2020-11-12 14:19     ` Alex Shi
2020-11-12 14:19       ` Alex Shi
2020-11-05  8:55 ` [PATCH v21 18/19] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-11-05  8:55   ` Alex Shi
2020-11-06  7:50   ` Alex Shi
2020-11-06  7:50     ` Alex Shi
2020-11-10 18:59     ` Johannes Weiner
2020-11-10 18:59       ` Johannes Weiner
2020-11-12 12:31   ` Vlastimil Babka
2020-11-12 12:31     ` Vlastimil Babka
2020-11-05  8:55 ` [PATCH v21 19/19] mm/lru: revise the comments of lru_lock Alex Shi
2020-11-12 12:37   ` Vlastimil Babka
2020-11-12 12:37     ` Vlastimil Babka
2020-11-10 12:14 ` [PATCH v21 00/19] per memcg lru lock Alex Shi
2020-11-10 12:14   ` Alex Shi
2020-11-16  3:45 ` Alex Shi
2020-11-16  3:45   ` Alex Shi
2020-12-15  0:47 ` Andrew Morton
2020-12-15  0:47   ` Andrew Morton
2020-12-15  2:16   ` Hugh Dickins
2020-12-15  2:16     ` Hugh Dickins
2020-12-15  2:16     ` Hugh Dickins
2020-12-15  2:28     ` Andrew Morton
2020-12-15  2:28       ` Andrew Morton
2021-01-05 19:30 ` Qian Cai
2021-01-05 19:30   ` Qian Cai
2021-01-05 19:30   ` Qian Cai
2021-01-05 19:42   ` Shakeel Butt
2021-01-05 19:42     ` Shakeel Butt
2021-01-05 19:42     ` Shakeel Butt
2021-01-05 20:11     ` Qian Cai
2021-01-05 20:11       ` Qian Cai
2021-01-05 20:11       ` Qian Cai
2021-01-05 21:35       ` Hugh Dickins
2021-01-05 21:35         ` Hugh Dickins
2021-01-05 21:35         ` Hugh Dickins
2021-01-05 22:01         ` Qian Cai
2021-01-05 22:01           ` Qian Cai
2021-01-05 22:01           ` Qian Cai
2021-01-06  3:10           ` Hugh Dickins
2021-01-06  3:10             ` Hugh Dickins
2021-01-06  3:10             ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1604566549-62481-5-git-send-email-alex.shi@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.