[PATCH v3 0/2] Reduce lock contention related with large folio

From: Yin Fengwei <fengwei.yin@intel.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org,
	willy@infradead.org, kirill@shutemov.name, yuzhao@google.com,
	ryan.roberts@arm.com, ying.huang@intel.com
Cc: fengwei.yin@intel.com
Subject: [PATCH v3 0/2] Reduce lock contention related with large folio
Date: Sat, 29 Apr 2023 16:27:57 +0800	[thread overview]
Message-ID: <20230429082759.1600796-1-fengwei.yin@intel.com> (raw)

yan tried to enable the large folio for anonymous mapping [1].

Unlike large folio for page cache which doesn't trigger frequent page
allocation/free, large folio for anonymous mapping is allocated/freeed
more frequently. So large folio for anonymous mapping exposes some lock
contention.

Ryan mentioned the deferred queue lock in [1]. We also met other two
lock contention: lru lock and zone lock.

This series tries to mitigate the deferred queue lock and reduce lru
lock in some level.

The patch1 tries to reduce deferred queue lock by not acquiring queue
lock when check whether the folio is in deferred list or not. Test
page fault1 of will-it-scale showed 60% deferred queue lock contention
reduction.

The patch2 tries to reduce lru lock by allowing batched add large folio
to lru list. Test page fault1 of will-it-scale showed 20% lru lock
contention reduction.

The zone lock contention happens on large folio free path and related
with commit f26b3fa04611 "mm/page_alloc: limit number of high-order
pages on PCP during bulk free" and will not be address by this series.

[1]
https://lore.kernel.org/linux-mm/20230414130303.2345383-1-ryan.roberts@arm.com/

Changelog from v2:
  - Rebased to v6.3-rc7
  - Removed Tested-by: Ryan Roberts <ryan.roberts@arm.com> as patches got
    some updated after Ryan tested them.
  - Updated the perf data change for deferred queue lock and lru lock with
    v3.
  - recheck whether folio is in deferred_list or not after take the deferred
    queue lock as Kirill suggested.

Changelog from v1:

For patch2:
  - Add Reported-by from Huang Ying which was missed by my mistake.
  - Fix kernel panic issue. The folio_batch_add() can have folio which
    doesn't reference folio directly:
    - For mlock usage, add new interface with extra parameter nr_pages.
      And callee pass nr_pages by direct reference folio.
    - For swap, shawdow and dax entries as parameter folio, treat the
      nr_pages as 1.
    With the fix, the stress testing can run 12 hours without any issue
    while hit kernel panic in around 3 minutes.
  - Update the lock contention info in commit message.
  - Change field name from pages_nr to nr_pages as Ying's suggestion.

For this version, still use PAGEVEC_SIZE as max nr_pages in fbatch. We
can revise it after we make decision about the page order for anonymous
large folio.

Yin Fengwei (2):
  THP: avoid lock when check whether THP is in deferred list
  lru: allow large batched add large folio to lru list

 include/linux/pagevec.h | 46 ++++++++++++++++++++++++++++++++++++++---
 mm/huge_memory.c        | 17 ++++++++++-----
 mm/mlock.c              |  7 +++----
 mm/swap.c               |  3 +--
 4 files changed, 59 insertions(+), 14 deletions(-)

-- 
2.34.1