linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, hughd@google.com, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, yang.shi@linux.alibaba.com,
	lkp@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, shakeelb@google.com,
	iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com
Subject: Re: [PATCH v14 07/20] mm/thp: narrow lru locking
Date: Mon, 6 Jul 2020 21:52:34 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.2007062059420.2793@eggly.anvils> (raw)
In-Reply-To: <20200706113513.GY25523@casper.infradead.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7702 bytes --]

On Mon, 6 Jul 2020, Matthew Wilcox wrote:
> On Mon, Jul 06, 2020 at 05:15:09PM +0800, Alex Shi wrote:
> > Hi Kirill & Johannes & Matthew,

Adding Kirill, who was in patch's Cc list but not mail's Cc list.

I asked Alex to direct this one particularly to Kirill and Johannes
and Matthew because (and I regret that the commit message still does
not make this at all clear) this patch changes the lock ordering:
which for years has been lru_lock outside memcg move_lock outside
i_pages lock, but here inverted to lru_lock inside i_pages lock.

I don't see a strong reason to have them one way round or the other,
and think Alex is right that they can safely be reversed here: but
he doesn't actually give any reason for doing so (if cleanup, then
I think the cleanup should have been taken further), and no reason
for doing so as part of this series.

I had more need to know which way round they should go, when adding
lru_lock into mem_cgroup_move_account (inside or outside move_lock?):
but Alex's use of TestClearPageLRU appears to have successfully
eliminated the need for that; so I only need to know for the final
Doc patch in the series (credited to my name), where mm/rmap.c
documents the lock ordering.

I'm okay with leaving this patch in the series (and the final patch
currently documents this new order); but wondered if someone else
(especially Kirill or Johannes or Matthew) sees a reason against it?

And I have to admit that, in researching this, I discovered that
actually we unconsciously departed from the supposed lock ordering
years ago: back in 3.18's 8186eb6a799e, Johannes did a cleanup which
moved a clear_page_mlock() call to inside memcg move_lock, and in
principle clear_page_mlock() can take lru_lock. But we have never
seen a lockdep complaint about this, so I suspect that the page is
(almost?) always already isolated from lru when that is called,
and the issue therefore hypothetical.

My vote, for dispatch of the series, is to leave this patch in;
but cannot object if consensus were that it should be taken out.

Hugh

> > 
> > Would you like to give some comments or share your concern of this patchset,
> > specialy for THP part? 
> 
> I don't have the brain space to understand this patch set fully at
> the moment.  I'll note that the realtime folks are doing their best to
> stamp out users of local_irq_disable(), so they won't be pleased to see
> you adding a new one.  Also, you removed the comment explaining why the
> lock needed to be taken.
> 
> > Many Thanks
> > Alex
> > 
> > 在 2020/7/3 下午1:07, Alex Shi 写道:
> > > lru_lock and page cache xa_lock have no reason with current sequence,
> > > put them together isn't necessary. let's narrow the lru locking, but
> > > left the local_irq_disable to block interrupt re-entry and statistic update.
> > > 
> > > Hugh Dickins point: split_huge_page_to_list() was already silly,to be
> > > using the _irqsave variant: it's just been taking sleeping locks, so
> > > would already be broken if entered with interrupts enabled.
> > > so we can save passing flags argument down to __split_huge_page().
> > > 
> > > Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-kernel@vger.kernel.org
> > > ---
> > >  mm/huge_memory.c | 24 ++++++++++++------------
> > >  1 file changed, 12 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index b18f21da4dac..607869330329 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -2433,7 +2433,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
> > >  }
> > >  
> > >  static void __split_huge_page(struct page *page, struct list_head *list,
> > > -		pgoff_t end, unsigned long flags)
> > > +			      pgoff_t end)
> > >  {
> > >  	struct page *head = compound_head(page);
> > >  	pg_data_t *pgdat = page_pgdat(head);
> > > @@ -2442,8 +2442,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> > >  	unsigned long offset = 0;
> > >  	int i;
> > >  
> > > -	lruvec = mem_cgroup_page_lruvec(head, pgdat);
> > > -
> > >  	/* complete memcg works before add pages to LRU */
> > >  	mem_cgroup_split_huge_fixup(head);
> > >  
> > > @@ -2455,6 +2453,11 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> > >  		xa_lock(&swap_cache->i_pages);
> > >  	}
> > >  
> > > +	/* lock lru list/PageCompound, ref freezed by page_ref_freeze */
> > > +	spin_lock(&pgdat->lru_lock);
> > > +
> > > +	lruvec = mem_cgroup_page_lruvec(head, pgdat);
> > > +
> > >  	for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
> > >  		__split_huge_page_tail(head, i, lruvec, list);
> > >  		/* Some pages can be beyond i_size: drop them from page cache */
> > > @@ -2474,6 +2477,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> > >  	}
> > >  
> > >  	ClearPageCompound(head);
> > > +	spin_unlock(&pgdat->lru_lock);
> > > +	/* Caller disabled irqs, so they are still disabled here */
> > >  
> > >  	split_page_owner(head, HPAGE_PMD_ORDER);
> > >  
> > > @@ -2491,8 +2496,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> > >  		page_ref_add(head, 2);
> > >  		xa_unlock(&head->mapping->i_pages);
> > >  	}
> > > -
> > > -	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
> > > +	local_irq_enable();
> > >  
> > >  	remap_page(head);
> > >  
> > > @@ -2631,12 +2635,10 @@ bool can_split_huge_page(struct page *page, int *pextra_pins)
> > >  int split_huge_page_to_list(struct page *page, struct list_head *list)
> > >  {
> > >  	struct page *head = compound_head(page);
> > > -	struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
> > >  	struct deferred_split *ds_queue = get_deferred_split_queue(head);
> > >  	struct anon_vma *anon_vma = NULL;
> > >  	struct address_space *mapping = NULL;
> > >  	int count, mapcount, extra_pins, ret;
> > > -	unsigned long flags;
> > >  	pgoff_t end;
> > >  
> > >  	VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
> > > @@ -2697,9 +2699,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> > >  	unmap_page(head);
> > >  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
> > >  
> > > -	/* prevent PageLRU to go away from under us, and freeze lru stats */
> > > -	spin_lock_irqsave(&pgdata->lru_lock, flags);
> > > -
> > > +	local_irq_disable();
> > >  	if (mapping) {
> > >  		XA_STATE(xas, &mapping->i_pages, page_index(head));
> > >  
> > > @@ -2729,7 +2729,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> > >  				__dec_node_page_state(head, NR_FILE_THPS);
> > >  		}
> > >  
> > > -		__split_huge_page(page, list, end, flags);
> > > +		__split_huge_page(page, list, end);
> > >  		if (PageSwapCache(head)) {
> > >  			swp_entry_t entry = { .val = page_private(head) };
> > >  
> > > @@ -2748,7 +2748,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
> > >  		spin_unlock(&ds_queue->split_queue_lock);
> > >  fail:		if (mapping)
> > >  			xa_unlock(&mapping->i_pages);
> > > -		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
> > > +		local_irq_enable();
> > >  		remap_page(head);
> > >  		ret = -EBUSY;
> > >  	}

  reply	other threads:[~2020-07-07  4:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03  5:07 [PATCH v14 00/20] per memcg lru lock Alex Shi
2020-07-03  5:07 ` [PATCH v14 01/20] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-07-03  5:07 ` [PATCH v14 02/20] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-07-03  5:07 ` [PATCH v14 03/20] mm/compaction: correct the comments of compact_defer_shift Alex Shi
2020-07-03  5:07 ` [PATCH v14 04/20] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-07-03  5:07 ` [PATCH v14 05/20] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-07-03  5:07 ` [PATCH v14 06/20] mm/thp: clean up lru_add_page_tail Alex Shi
2020-07-03  5:07 ` [PATCH v14 07/20] mm/thp: narrow lru locking Alex Shi
2020-07-06  9:15   ` Alex Shi
2020-07-06 11:35     ` Matthew Wilcox
2020-07-07  4:52       ` Hugh Dickins [this message]
2020-07-09 14:02         ` Alex Shi
2020-07-09 15:48         ` Kirill A. Shutemov
2020-07-10  8:23           ` Alex Shi
     [not found]             ` <20200710112831.jrv4hzjzjqtxtc7u@box>
2020-07-10 14:09               ` Alex Shi
2020-07-07 10:51       ` Alex Shi
2020-07-03  5:07 ` [PATCH v14 08/20] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-07-03  5:07 ` [PATCH v14 09/20] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-07-03  5:07 ` [PATCH v14 10/20] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-07-03  5:07 ` [PATCH v14 11/20] mm/lru: move lock into lru_note_cost Alex Shi
2020-07-03  5:07 ` [PATCH v14 12/20] mm/lru: introduce TestClearPageLRU Alex Shi
2020-07-03  5:07 ` [PATCH v14 13/20] mm/compaction: do page isolation first in compaction Alex Shi
2020-07-03  5:07 ` [PATCH v14 14/20] mm/mlock: reorder isolation sequence during munlock Alex Shi
2020-07-03  5:07 ` [PATCH v14 15/20] mm/swap: serialize memcg changes during pagevec_lru_move_fn Alex Shi
2020-07-03  9:13   ` Konstantin Khlebnikov
2020-07-04 11:34     ` Alex Shi
2020-07-04 11:39       ` Matthew Wilcox
2020-07-04 13:12         ` Alex Shi
2020-07-04 13:33           ` Matthew Wilcox
2020-07-04 15:47             ` Alex Shi
2020-07-03  5:07 ` [PATCH v14 16/20] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-07-03  5:07 ` [PATCH v14 17/20] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-07-03  5:07 ` [PATCH v14 18/20] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-07-03  5:07 ` [PATCH v14 19/20] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-07-03  5:07 ` [PATCH v14 20/20] mm/lru: revise the comments of lru_lock Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.2007062059420.2793@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).