* [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
@ 2023-03-13 12:45 Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
` (5 more replies)
0 siblings, 6 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-13 12:45 UTC (permalink / raw)
To: linux-mm, akpm, willy, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu, david
Cc: fengwei.yin
This series is trying to bring the batched rmap removing to
try_to_unmap_one(). It's expected that the batched rmap
removing bring performance gain than remove rmap per page.
This series reconstruct the try_to_unmap_one() from:
loop:
clear and update PTE
unmap one page
goto loop
to:
loop:
clear and update PTE
goto loop
unmap the range of folio in one call
It is one step to always map/unmap the entire folio in one call.
Which can simplify the folio mapcount handling by avoid dealing
with each page map/unmap.
The changes are organized as:
Patch1/2 move the hugetlb and normal page unmap to dedicated
functions to make try_to_unmap_one() logic clearer and easy
to add batched rmap removing. To make code review easier, no
function change.
Patch3 cleanup the try_to_unmap_one_page(). Try to removed
some duplicated function calls.
Patch4 adds folio_remove_rmap_range() which batched remove rmap.
Patch5 make try_to_unmap_one() to batched remove rmap.
Functional testing done with the V3 patchset in a qemu guest
with 4G mem:
- kernel mm selftest to trigger vmscan() and final hit
try_to_unmap_one().
- Inject hwpoison to hugetlb page to trigger try_to_unmap_one()
call against hugetlb.
- 8 hours stress testing: Firefox + kernel mm selftest + kernel
build.
For performance gain demonstration, changed the MADV_PAGEOUT not
to split the large folio for page cache and created a micro
benchmark mainly as following:
#define FILESIZE (2 * 1024 * 1024)
char *c = mmap(NULL, FILESIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
count = 0;
while (1) {
unsigned long i;
for (i = 0; i < FILESIZE; i += pgsize) {
cc = *(volatile char *)(c + i);
}
madvise(c, FILESIZE, MADV_PAGEOUT);
count++;
}
munmap(c, FILESIZE);
Run it with 96 instances + 96 files on xfs file system for 1
second. The test platform was IceLake with 48C/96T + 192G memory.
Test result (number count) got around %7 (58865 -> 63247) improvement
with this patch series. And perf shows following:
Without this series:
18.26%--try_to_unmap_one
|
|--10.71%--page_remove_rmap
| |
| --9.81%--__mod_lruvec_page_state
| |
| |--1.36%--__mod_memcg_lruvec_state
| | |
| | --0.80%--cgroup_rstat_updated
| |
| --0.67%--__mod_lruvec_state
| |
| --0.59%--__mod_node_page_state
|
|--5.41%--ptep_clear_flush
| |
| --4.64%--flush_tlb_mm_range
| |
| --3.88%--flush_tlb_func
| |
| --3.56%--native_flush_tlb_one_user
|
|--0.75%--percpu_counter_add_batch
|
--0.53%--PageHeadHuge
With this series:
9.87%--try_to_unmap_one
|
|--7.14%--try_to_unmap_one_page.constprop.0.isra.0
| |
| |--5.21%--ptep_clear_flush
| | |
| | --4.36%--flush_tlb_mm_range
| | |
| | --3.54%--flush_tlb_func
| | |
| | --3.17%--native_flush_tlb_one_user
| |
| --0.82%--percpu_counter_add_batch
|
|--1.18%--folio_remove_rmap_and_update_count.part.0
| |
| --1.11%--folio_remove_rmap_range
| |
| --0.53%--__mod_lruvec_page_state
|
--0.57%--PageHeadHuge
As expected, the cost of __mod_lruvec_page_state is reduced significantly
with batched folio_remove_rmap_range. Suppose the page reclaim path can
get same benefit also.
This series based on next-20230310.
Changes from v3:
- General
- Rebase to next-20230310
- Add performance testing result
- Patch1
- Fixed incorrect comments as Mike Kravetz pointed out
- Use huge_pte_dirty() as Mike Kravetz suggested
- Use true instead of folio_test_hugetlb() in
try_to_unmap_one_hugetlb() as it's hugetlb page
for sure as Mike Kravetz suggested
Changes from v2:
- General
- Rebase the patch to next-20230303
- Update cover letter about the preparation to unmap
the entire folio in one call
- No code change comparing to V2. But fix the patch applying
conflict because of wrong patch order in V2.
Changes from v1:
- General
- Rebase the patch to next-20230228
- Patch1
- Removed the if (PageHWPoison(page) && !(flags & TTU_HWPOISON)
as suggestion from Mike Kravetz and HORIGUCHI NAOYA
- Removed the mlock_drain_local() as suggestion from Mike Kravetz
_ Removed the comments about the mm counter change as suggestion
from Mike Kravetz
Yin Fengwei (5):
rmap: move hugetlb try_to_unmap to dedicated function
rmap: move page unmap operation to dedicated function
rmap: cleanup exit path of try_to_unmap_one_page()
rmap:addd folio_remove_rmap_range()
try_to_unmap_one: batched remove rmap, update folio refcount
include/linux/rmap.h | 5 +
mm/page_vma_mapped.c | 30 +++
mm/rmap.c | 623 +++++++++++++++++++++++++------------------
3 files changed, 398 insertions(+), 260 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
@ 2023-03-13 12:45 ` Yin Fengwei
2023-03-13 16:55 ` kernel test robot
` (2 more replies)
2023-03-13 12:45 ` [PATCH v4 2/5] rmap: move page unmap operation " Yin Fengwei
` (4 subsequent siblings)
5 siblings, 3 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-13 12:45 UTC (permalink / raw)
To: linux-mm, akpm, willy, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu, david
Cc: fengwei.yin
It's to prepare the batched rmap update for large folio.
No need to looped handle hugetlb. Just handle hugetlb and
bail out early.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
---
mm/rmap.c | 200 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 121 insertions(+), 79 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index ba901c416785..3a2e3ccb8031 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1441,6 +1441,103 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
munlock_vma_folio(folio, vma, compound);
}
+static bool try_to_unmap_one_hugetlb(struct folio *folio,
+ struct vm_area_struct *vma, struct mmu_notifier_range range,
+ struct page_vma_mapped_walk pvmw, unsigned long address,
+ enum ttu_flags flags)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ pte_t pteval;
+ bool ret = true, anon = folio_test_anon(folio);
+
+ /*
+ * The try_to_unmap() is only passed a hugetlb page
+ * in the case where the hugetlb page is poisoned.
+ */
+ VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
+ /*
+ * huge_pmd_unshare may unmap an entire PMD page.
+ * There is no way of knowing exactly which PMDs may
+ * be cached for this mm, so we must flush them all.
+ * start/end were already adjusted in caller
+ * (try_to_unmap_one) to cover this range.
+ */
+ flush_cache_range(vma, range.start, range.end);
+
+ /*
+ * To call huge_pmd_unshare, i_mmap_rwsem must be
+ * held in write mode. Caller needs to explicitly
+ * do this outside rmap routines.
+ *
+ * We also must hold hugetlb vma_lock in write mode.
+ * Lock order dictates acquiring vma_lock BEFORE
+ * i_mmap_rwsem. We can only try lock here and fail
+ * if unsuccessful.
+ */
+ if (!anon) {
+ VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+ if (!hugetlb_vma_trylock_write(vma)) {
+ ret = false;
+ goto out;
+ }
+ if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
+ hugetlb_vma_unlock_write(vma);
+ flush_tlb_range(vma,
+ range.start, range.end);
+ mmu_notifier_invalidate_range(mm,
+ range.start, range.end);
+ /*
+ * The ref count of the PMD page was
+ * dropped which is part of the way map
+ * counting is done for shared PMDs.
+ * Return 'true' here. When there is
+ * no other sharing, huge_pmd_unshare
+ * returns false and we will unmap the
+ * actual page and drop map count
+ * to zero.
+ */
+ goto out;
+ }
+ hugetlb_vma_unlock_write(vma);
+ }
+ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+
+ /*
+ * Now the pte is cleared. If this pte was uffd-wp armed,
+ * we may want to replace a none pte with a marker pte if
+ * it's file-backed, so we don't lose the tracking info.
+ */
+ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+ /* Set the dirty flag on the folio now the pte is gone. */
+ if (huge_pte_dirty(pteval))
+ folio_mark_dirty(folio);
+
+ /* Update high watermark before we lower rss */
+ update_hiwater_rss(mm);
+
+ /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
+ pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
+ set_huge_pte_at(mm, address, pvmw.pte, pteval);
+ hugetlb_count_sub(folio_nr_pages(folio), mm);
+
+ /*
+ * No need to call mmu_notifier_invalidate_range() it has be
+ * done above for all cases requiring it to happen under page
+ * table lock before mmu_notifier_invalidate_range_end()
+ *
+ * See Documentation/mm/mmu_notifier.rst
+ */
+ page_remove_rmap(&folio->page, vma, true);
+ /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
+ * necessary to call mlock_drain_local().
+ */
+ folio_put(folio);
+
+out:
+ return ret;
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
@@ -1504,86 +1601,37 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
+ address = pvmw.address;
+ if (folio_test_hugetlb(folio)) {
+ ret = try_to_unmap_one_hugetlb(folio, vma, range,
+ pvmw, address, flags);
+
+ /* no need to loop for hugetlb */
+ page_vma_mapped_walk_done(&pvmw);
+ break;
+ }
+
subpage = folio_page(folio,
pte_pfn(*pvmw.pte) - folio_pfn(folio));
- address = pvmw.address;
anon_exclusive = folio_test_anon(folio) &&
PageAnonExclusive(subpage);
- if (folio_test_hugetlb(folio)) {
- bool anon = folio_test_anon(folio);
-
- /*
- * The try_to_unmap() is only passed a hugetlb page
- * in the case where the hugetlb page is poisoned.
- */
- VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage);
+ flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+ /* Nuke the page table entry. */
+ if (should_defer_flush(mm, flags)) {
/*
- * huge_pmd_unshare may unmap an entire PMD page.
- * There is no way of knowing exactly which PMDs may
- * be cached for this mm, so we must flush them all.
- * start/end were already adjusted above to cover this
- * range.
+ * We clear the PTE but do not flush so potentially
+ * a remote CPU could still be writing to the folio.
+ * If the entry was previously clean then the
+ * architecture must guarantee that a clear->dirty
+ * transition on a cached TLB entry is written through
+ * and traps if the PTE is unmapped.
*/
- flush_cache_range(vma, range.start, range.end);
+ pteval = ptep_get_and_clear(mm, address, pvmw.pte);
- /*
- * To call huge_pmd_unshare, i_mmap_rwsem must be
- * held in write mode. Caller needs to explicitly
- * do this outside rmap routines.
- *
- * We also must hold hugetlb vma_lock in write mode.
- * Lock order dictates acquiring vma_lock BEFORE
- * i_mmap_rwsem. We can only try lock here and fail
- * if unsuccessful.
- */
- if (!anon) {
- VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
- if (!hugetlb_vma_trylock_write(vma)) {
- page_vma_mapped_walk_done(&pvmw);
- ret = false;
- break;
- }
- if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
- hugetlb_vma_unlock_write(vma);
- flush_tlb_range(vma,
- range.start, range.end);
- mmu_notifier_invalidate_range(mm,
- range.start, range.end);
- /*
- * The ref count of the PMD page was
- * dropped which is part of the way map
- * counting is done for shared PMDs.
- * Return 'true' here. When there is
- * no other sharing, huge_pmd_unshare
- * returns false and we will unmap the
- * actual page and drop map count
- * to zero.
- */
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
- hugetlb_vma_unlock_write(vma);
- }
- pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
+ set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
} else {
- flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
- /* Nuke the page table entry. */
- if (should_defer_flush(mm, flags)) {
- /*
- * We clear the PTE but do not flush so potentially
- * a remote CPU could still be writing to the folio.
- * If the entry was previously clean then the
- * architecture must guarantee that a clear->dirty
- * transition on a cached TLB entry is written through
- * and traps if the PTE is unmapped.
- */
- pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
- set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
- } else {
- pteval = ptep_clear_flush(vma, address, pvmw.pte);
- }
+ pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
/*
@@ -1602,14 +1650,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
- if (folio_test_hugetlb(folio)) {
- hugetlb_count_sub(folio_nr_pages(folio), mm);
- set_huge_pte_at(mm, address, pvmw.pte, pteval);
- } else {
- dec_mm_counter(mm, mm_counter(&folio->page));
- set_pte_at(mm, address, pvmw.pte, pteval);
- }
-
+ dec_mm_counter(mm, mm_counter(&folio->page));
+ set_pte_at(mm, address, pvmw.pte, pteval);
} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
/*
* The guest indicated that the page content is of no
--
2.30.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v4 2/5] rmap: move page unmap operation to dedicated function
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
@ 2023-03-13 12:45 ` Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 3/5] rmap: cleanup exit path of try_to_unmap_one_page() Yin Fengwei
` (3 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-13 12:45 UTC (permalink / raw)
To: linux-mm, akpm, willy, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu, david
Cc: fengwei.yin
No functional change. Just code reorganized.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
mm/rmap.c | 369 ++++++++++++++++++++++++++++--------------------------
1 file changed, 194 insertions(+), 175 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 3a2e3ccb8031..23eda671447a 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1538,17 +1538,204 @@ static bool try_to_unmap_one_hugetlb(struct folio *folio,
return ret;
}
+static bool try_to_unmap_one_page(struct folio *folio,
+ struct vm_area_struct *vma, struct mmu_notifier_range range,
+ struct page_vma_mapped_walk pvmw, unsigned long address,
+ enum ttu_flags flags)
+{
+ bool anon_exclusive, ret = true;
+ struct page *subpage;
+ struct mm_struct *mm = vma->vm_mm;
+ pte_t pteval;
+
+ subpage = folio_page(folio,
+ pte_pfn(*pvmw.pte) - folio_pfn(folio));
+ anon_exclusive = folio_test_anon(folio) &&
+ PageAnonExclusive(subpage);
+
+ flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
+ /* Nuke the page table entry. */
+ if (should_defer_flush(mm, flags)) {
+ /*
+ * We clear the PTE but do not flush so potentially
+ * a remote CPU could still be writing to the folio.
+ * If the entry was previously clean then the
+ * architecture must guarantee that a clear->dirty
+ * transition on a cached TLB entry is written through
+ * and traps if the PTE is unmapped.
+ */
+ pteval = ptep_get_and_clear(mm, address, pvmw.pte);
+
+ set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+ } else {
+ pteval = ptep_clear_flush(vma, address, pvmw.pte);
+ }
+
+ /*
+ * Now the pte is cleared. If this pte was uffd-wp armed,
+ * we may want to replace a none pte with a marker pte if
+ * it's file-backed, so we don't lose the tracking info.
+ */
+ pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
+
+ /* Set the dirty flag on the folio now the pte is gone. */
+ if (pte_dirty(pteval))
+ folio_mark_dirty(folio);
+
+ /* Update high watermark before we lower rss */
+ update_hiwater_rss(mm);
+
+ if (PageHWPoison(subpage) && !(flags & TTU_HWPOISON)) {
+ pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
+ dec_mm_counter(mm, mm_counter(&folio->page));
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
+ /*
+ * The guest indicated that the page content is of no
+ * interest anymore. Simply discard the pte, vmscan
+ * will take care of the rest.
+ * A future reference will then fault in a new zero
+ * page. When userfaultfd is active, we must not drop
+ * this page though, as its main user (postcopy
+ * migration) will not expect userfaults on already
+ * copied pages.
+ */
+ dec_mm_counter(mm, mm_counter(&folio->page));
+ /* We have to invalidate as we cleared the pte */
+ mmu_notifier_invalidate_range(mm, address,
+ address + PAGE_SIZE);
+ } else if (folio_test_anon(folio)) {
+ swp_entry_t entry = { .val = page_private(subpage) };
+ pte_t swp_pte;
+ /*
+ * Store the swap location in the pte.
+ * See handle_pte_fault() ...
+ */
+ if (unlikely(folio_test_swapbacked(folio) !=
+ folio_test_swapcache(folio))) {
+ WARN_ON_ONCE(1);
+ ret = false;
+ /* We have to invalidate as we cleared the pte */
+ mmu_notifier_invalidate_range(mm, address,
+ address + PAGE_SIZE);
+ page_vma_mapped_walk_done(&pvmw);
+ goto discard;
+ }
+
+ /* MADV_FREE page check */
+ if (!folio_test_swapbacked(folio)) {
+ int ref_count, map_count;
+
+ /*
+ * Synchronize with gup_pte_range():
+ * - clear PTE; barrier; read refcount
+ * - inc refcount; barrier; read PTE
+ */
+ smp_mb();
+
+ ref_count = folio_ref_count(folio);
+ map_count = folio_mapcount(folio);
+
+ /*
+ * Order reads for page refcount and dirty flag
+ * (see comments in __remove_mapping()).
+ */
+ smp_rmb();
+
+ /*
+ * The only page refs must be one from isolation
+ * plus the rmap(s) (dropped by discard:).
+ */
+ if (ref_count == 1 + map_count &&
+ !folio_test_dirty(folio)) {
+ /* Invalidate as we cleared the pte */
+ mmu_notifier_invalidate_range(mm,
+ address, address + PAGE_SIZE);
+ dec_mm_counter(mm, MM_ANONPAGES);
+ goto discard;
+ }
+
+ /*
+ * If the folio was redirtied, it cannot be
+ * discarded. Remap the page to page table.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ folio_set_swapbacked(folio);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ goto discard;
+ }
+
+ if (swap_duplicate(entry) < 0) {
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ goto discard;
+ }
+ if (arch_unmap_one(mm, vma, address, pteval) < 0) {
+ swap_free(entry);
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ goto discard;
+ }
+
+ /* See page_try_share_anon_rmap(): clear PTE first. */
+ if (anon_exclusive &&
+ page_try_share_anon_rmap(subpage)) {
+ swap_free(entry);
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ goto discard;
+ }
+ if (list_empty(&mm->mmlist)) {
+ spin_lock(&mmlist_lock);
+ if (list_empty(&mm->mmlist))
+ list_add(&mm->mmlist, &init_mm.mmlist);
+ spin_unlock(&mmlist_lock);
+ }
+ dec_mm_counter(mm, MM_ANONPAGES);
+ inc_mm_counter(mm, MM_SWAPENTS);
+ swp_pte = swp_entry_to_pte(entry);
+ if (anon_exclusive)
+ swp_pte = pte_swp_mkexclusive(swp_pte);
+ if (pte_soft_dirty(pteval))
+ swp_pte = pte_swp_mksoft_dirty(swp_pte);
+ if (pte_uffd_wp(pteval))
+ swp_pte = pte_swp_mkuffd_wp(swp_pte);
+ set_pte_at(mm, address, pvmw.pte, swp_pte);
+ /* Invalidate as we cleared the pte */
+ mmu_notifier_invalidate_range(mm, address,
+ address + PAGE_SIZE);
+ } else {
+ /*
+ * This is a locked file-backed folio,
+ * so it cannot be removed from the page
+ * cache and replaced by a new folio before
+ * mmu_notifier_invalidate_range_end, so no
+ * concurrent thread might update its page table
+ * to point at a new folio while a device is
+ * still using this folio.
+ *
+ * See Documentation/mm/mmu_notifier.rst
+ */
+ dec_mm_counter(mm, mm_counter_file(&folio->page));
+ }
+
+discard:
+ return ret;
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, void *arg)
{
- struct mm_struct *mm = vma->vm_mm;
DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
- pte_t pteval;
struct page *subpage;
- bool anon_exclusive, ret = true;
+ bool ret = true;
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;
@@ -1613,179 +1800,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
subpage = folio_page(folio,
pte_pfn(*pvmw.pte) - folio_pfn(folio));
- anon_exclusive = folio_test_anon(folio) &&
- PageAnonExclusive(subpage);
-
- flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
- /* Nuke the page table entry. */
- if (should_defer_flush(mm, flags)) {
- /*
- * We clear the PTE but do not flush so potentially
- * a remote CPU could still be writing to the folio.
- * If the entry was previously clean then the
- * architecture must guarantee that a clear->dirty
- * transition on a cached TLB entry is written through
- * and traps if the PTE is unmapped.
- */
- pteval = ptep_get_and_clear(mm, address, pvmw.pte);
-
- set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
- } else {
- pteval = ptep_clear_flush(vma, address, pvmw.pte);
- }
-
- /*
- * Now the pte is cleared. If this pte was uffd-wp armed,
- * we may want to replace a none pte with a marker pte if
- * it's file-backed, so we don't lose the tracking info.
- */
- pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
-
- /* Set the dirty flag on the folio now the pte is gone. */
- if (pte_dirty(pteval))
- folio_mark_dirty(folio);
-
- /* Update high watermark before we lower rss */
- update_hiwater_rss(mm);
-
- if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
- pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
- dec_mm_counter(mm, mm_counter(&folio->page));
- set_pte_at(mm, address, pvmw.pte, pteval);
- } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
- /*
- * The guest indicated that the page content is of no
- * interest anymore. Simply discard the pte, vmscan
- * will take care of the rest.
- * A future reference will then fault in a new zero
- * page. When userfaultfd is active, we must not drop
- * this page though, as its main user (postcopy
- * migration) will not expect userfaults on already
- * copied pages.
- */
- dec_mm_counter(mm, mm_counter(&folio->page));
- /* We have to invalidate as we cleared the pte */
- mmu_notifier_invalidate_range(mm, address,
- address + PAGE_SIZE);
- } else if (folio_test_anon(folio)) {
- swp_entry_t entry = { .val = page_private(subpage) };
- pte_t swp_pte;
- /*
- * Store the swap location in the pte.
- * See handle_pte_fault() ...
- */
- if (unlikely(folio_test_swapbacked(folio) !=
- folio_test_swapcache(folio))) {
- WARN_ON_ONCE(1);
- ret = false;
- /* We have to invalidate as we cleared the pte */
- mmu_notifier_invalidate_range(mm, address,
- address + PAGE_SIZE);
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
-
- /* MADV_FREE page check */
- if (!folio_test_swapbacked(folio)) {
- int ref_count, map_count;
-
- /*
- * Synchronize with gup_pte_range():
- * - clear PTE; barrier; read refcount
- * - inc refcount; barrier; read PTE
- */
- smp_mb();
-
- ref_count = folio_ref_count(folio);
- map_count = folio_mapcount(folio);
-
- /*
- * Order reads for page refcount and dirty flag
- * (see comments in __remove_mapping()).
- */
- smp_rmb();
-
- /*
- * The only page refs must be one from isolation
- * plus the rmap(s) (dropped by discard:).
- */
- if (ref_count == 1 + map_count &&
- !folio_test_dirty(folio)) {
- /* Invalidate as we cleared the pte */
- mmu_notifier_invalidate_range(mm,
- address, address + PAGE_SIZE);
- dec_mm_counter(mm, MM_ANONPAGES);
- goto discard;
- }
-
- /*
- * If the folio was redirtied, it cannot be
- * discarded. Remap the page to page table.
- */
- set_pte_at(mm, address, pvmw.pte, pteval);
- folio_set_swapbacked(folio);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
-
- if (swap_duplicate(entry) < 0) {
- set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
- if (arch_unmap_one(mm, vma, address, pteval) < 0) {
- swap_free(entry);
- set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
+ ret = try_to_unmap_one_page(folio, vma,
+ range, pvmw, address, flags);
+ if (!ret)
+ break;
- /* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
- swap_free(entry);
- set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- break;
- }
- if (list_empty(&mm->mmlist)) {
- spin_lock(&mmlist_lock);
- if (list_empty(&mm->mmlist))
- list_add(&mm->mmlist, &init_mm.mmlist);
- spin_unlock(&mmlist_lock);
- }
- dec_mm_counter(mm, MM_ANONPAGES);
- inc_mm_counter(mm, MM_SWAPENTS);
- swp_pte = swp_entry_to_pte(entry);
- if (anon_exclusive)
- swp_pte = pte_swp_mkexclusive(swp_pte);
- if (pte_soft_dirty(pteval))
- swp_pte = pte_swp_mksoft_dirty(swp_pte);
- if (pte_uffd_wp(pteval))
- swp_pte = pte_swp_mkuffd_wp(swp_pte);
- set_pte_at(mm, address, pvmw.pte, swp_pte);
- /* Invalidate as we cleared the pte */
- mmu_notifier_invalidate_range(mm, address,
- address + PAGE_SIZE);
- } else {
- /*
- * This is a locked file-backed folio,
- * so it cannot be removed from the page
- * cache and replaced by a new folio before
- * mmu_notifier_invalidate_range_end, so no
- * concurrent thread might update its page table
- * to point at a new folio while a device is
- * still using this folio.
- *
- * See Documentation/mm/mmu_notifier.rst
- */
- dec_mm_counter(mm, mm_counter_file(&folio->page));
- }
-discard:
/*
* No need to call mmu_notifier_invalidate_range() it has be
* done above for all cases requiring it to happen under page
--
2.30.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v4 3/5] rmap: cleanup exit path of try_to_unmap_one_page()
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 2/5] rmap: move page unmap operation " Yin Fengwei
@ 2023-03-13 12:45 ` Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 4/5] rmap:addd folio_remove_rmap_range() Yin Fengwei
` (2 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-13 12:45 UTC (permalink / raw)
To: linux-mm, akpm, willy, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu, david
Cc: fengwei.yin
Cleanup exit path of try_to_unmap_one_page() by removing
some duplicated code.
Move page_vma_mapped_walk_done() back to try_to_unmap_one().
Change subpage to page as folio has no concept of subpage.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
mm/rmap.c | 72 ++++++++++++++++++++++---------------------------------
1 file changed, 29 insertions(+), 43 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 23eda671447a..72fc8c559cd9 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1543,15 +1543,13 @@ static bool try_to_unmap_one_page(struct folio *folio,
struct page_vma_mapped_walk pvmw, unsigned long address,
enum ttu_flags flags)
{
- bool anon_exclusive, ret = true;
- struct page *subpage;
+ bool anon_exclusive;
+ struct page *page;
struct mm_struct *mm = vma->vm_mm;
pte_t pteval;
- subpage = folio_page(folio,
- pte_pfn(*pvmw.pte) - folio_pfn(folio));
- anon_exclusive = folio_test_anon(folio) &&
- PageAnonExclusive(subpage);
+ page = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio));
+ anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(page);
flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
/* Nuke the page table entry. */
@@ -1579,15 +1577,14 @@ static bool try_to_unmap_one_page(struct folio *folio,
pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
/* Set the dirty flag on the folio now the pte is gone. */
- if (pte_dirty(pteval))
+ if (pte_dirty(pteval) && !folio_test_dirty(folio))
folio_mark_dirty(folio);
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
- if (PageHWPoison(subpage) && !(flags & TTU_HWPOISON)) {
- pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
- dec_mm_counter(mm, mm_counter(&folio->page));
+ if (PageHWPoison(page) && !(flags & TTU_HWPOISON)) {
+ pteval = swp_entry_to_pte(make_hwpoison_entry(page));
set_pte_at(mm, address, pvmw.pte, pteval);
} else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
/*
@@ -1600,12 +1597,11 @@ static bool try_to_unmap_one_page(struct folio *folio,
* migration) will not expect userfaults on already
* copied pages.
*/
- dec_mm_counter(mm, mm_counter(&folio->page));
/* We have to invalidate as we cleared the pte */
mmu_notifier_invalidate_range(mm, address,
address + PAGE_SIZE);
} else if (folio_test_anon(folio)) {
- swp_entry_t entry = { .val = page_private(subpage) };
+ swp_entry_t entry = { .val = page_private(page) };
pte_t swp_pte;
/*
* Store the swap location in the pte.
@@ -1614,12 +1610,10 @@ static bool try_to_unmap_one_page(struct folio *folio,
if (unlikely(folio_test_swapbacked(folio) !=
folio_test_swapcache(folio))) {
WARN_ON_ONCE(1);
- ret = false;
/* We have to invalidate as we cleared the pte */
mmu_notifier_invalidate_range(mm, address,
address + PAGE_SIZE);
- page_vma_mapped_walk_done(&pvmw);
- goto discard;
+ goto exit;
}
/* MADV_FREE page check */
@@ -1651,7 +1645,6 @@ static bool try_to_unmap_one_page(struct folio *folio,
/* Invalidate as we cleared the pte */
mmu_notifier_invalidate_range(mm,
address, address + PAGE_SIZE);
- dec_mm_counter(mm, MM_ANONPAGES);
goto discard;
}
@@ -1659,43 +1652,30 @@ static bool try_to_unmap_one_page(struct folio *folio,
* If the folio was redirtied, it cannot be
* discarded. Remap the page to page table.
*/
- set_pte_at(mm, address, pvmw.pte, pteval);
folio_set_swapbacked(folio);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- goto discard;
+ goto exit_restore_pte;
}
- if (swap_duplicate(entry) < 0) {
- set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- goto discard;
- }
+ if (swap_duplicate(entry) < 0)
+ goto exit_restore_pte;
+
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
swap_free(entry);
- set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- goto discard;
+ goto exit_restore_pte;
}
/* See page_try_share_anon_rmap(): clear PTE first. */
- if (anon_exclusive &&
- page_try_share_anon_rmap(subpage)) {
+ if (anon_exclusive && page_try_share_anon_rmap(page)) {
swap_free(entry);
- set_pte_at(mm, address, pvmw.pte, pteval);
- ret = false;
- page_vma_mapped_walk_done(&pvmw);
- goto discard;
+ goto exit_restore_pte;
}
+
if (list_empty(&mm->mmlist)) {
spin_lock(&mmlist_lock);
if (list_empty(&mm->mmlist))
list_add(&mm->mmlist, &init_mm.mmlist);
spin_unlock(&mmlist_lock);
}
- dec_mm_counter(mm, MM_ANONPAGES);
inc_mm_counter(mm, MM_SWAPENTS);
swp_pte = swp_entry_to_pte(entry);
if (anon_exclusive)
@@ -1706,8 +1686,7 @@ static bool try_to_unmap_one_page(struct folio *folio,
swp_pte = pte_swp_mkuffd_wp(swp_pte);
set_pte_at(mm, address, pvmw.pte, swp_pte);
/* Invalidate as we cleared the pte */
- mmu_notifier_invalidate_range(mm, address,
- address + PAGE_SIZE);
+ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE);
} else {
/*
* This is a locked file-backed folio,
@@ -1720,11 +1699,16 @@ static bool try_to_unmap_one_page(struct folio *folio,
*
* See Documentation/mm/mmu_notifier.rst
*/
- dec_mm_counter(mm, mm_counter_file(&folio->page));
}
discard:
- return ret;
+ dec_mm_counter(vma->vm_mm, mm_counter(&folio->page));
+ return true;
+
+exit_restore_pte:
+ set_pte_at(mm, address, pvmw.pte, pteval);
+exit:
+ return false;
}
/*
@@ -1802,8 +1786,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
pte_pfn(*pvmw.pte) - folio_pfn(folio));
ret = try_to_unmap_one_page(folio, vma,
range, pvmw, address, flags);
- if (!ret)
+ if (!ret) {
+ page_vma_mapped_walk_done(&pvmw);
break;
+ }
/*
* No need to call mmu_notifier_invalidate_range() it has be
@@ -1812,7 +1798,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*
* See Documentation/mm/mmu_notifier.rst
*/
- page_remove_rmap(subpage, vma, folio_test_hugetlb(folio));
+ page_remove_rmap(subpage, vma, false);
if (vma->vm_flags & VM_LOCKED)
mlock_drain_local();
folio_put(folio);
--
2.30.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v4 4/5] rmap:addd folio_remove_rmap_range()
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
` (2 preceding siblings ...)
2023-03-13 12:45 ` [PATCH v4 3/5] rmap: cleanup exit path of try_to_unmap_one_page() Yin Fengwei
@ 2023-03-13 12:45 ` Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 5/5] try_to_unmap_one: batched remove rmap, update folio refcount Yin Fengwei
2023-03-13 18:49 ` [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Andrew Morton
5 siblings, 0 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-13 12:45 UTC (permalink / raw)
To: linux-mm, akpm, willy, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu, david
Cc: fengwei.yin
folio_remove_rmap_range() allows to take down the pte mapping to
a specific range of folio. Comparing to page_remove_rmap(), it
batched updates __lruvec_stat for large folio.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
include/linux/rmap.h | 4 +++
mm/rmap.c | 58 +++++++++++++++++++++++++++++++++-----------
2 files changed, 48 insertions(+), 14 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b87d01660412..d2569b42e21a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -200,6 +200,10 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *,
bool compound);
void page_remove_rmap(struct page *, struct vm_area_struct *,
bool compound);
+void folio_remove_rmap_range(struct folio *, struct page *,
+ unsigned int nr_pages, struct vm_area_struct *,
+ bool compound);
+
void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
diff --git a/mm/rmap.c b/mm/rmap.c
index 72fc8c559cd9..bd5331dc9d44 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1355,23 +1355,25 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
}
/**
- * page_remove_rmap - take down pte mapping from a page
- * @page: page to remove mapping from
+ * folio_remove_rmap_range - take down pte mapping from a range of pages
+ * @folio: folio to remove mapping from
+ * @page: The first page to take down pte mapping
+ * @nr_pages: The number of pages which will be take down pte mapping
* @vma: the vm area from which the mapping is removed
* @compound: uncharge the page as compound or small page
*
* The caller needs to hold the pte lock.
*/
-void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
- bool compound)
+void folio_remove_rmap_range(struct folio *folio, struct page *page,
+ unsigned int nr_pages, struct vm_area_struct *vma,
+ bool compound)
{
- struct folio *folio = page_folio(page);
atomic_t *mapped = &folio->_nr_pages_mapped;
- int nr = 0, nr_pmdmapped = 0;
- bool last;
+ int nr = 0, nr_pmdmapped = 0, last;
enum node_stat_item idx;
- VM_BUG_ON_PAGE(compound && !PageHead(page), page);
+ VM_BUG_ON_FOLIO(compound && (nr_pages != folio_nr_pages(folio)), folio);
+ VM_BUG_ON_FOLIO(compound && (page != &folio->page), folio);
/* Hugetlb pages are not counted in NR_*MAPPED */
if (unlikely(folio_test_hugetlb(folio))) {
@@ -1382,12 +1384,16 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
/* Is page being unmapped by PTE? Is this its last map to be removed? */
if (likely(!compound)) {
- last = atomic_add_negative(-1, &page->_mapcount);
- nr = last;
- if (last && folio_test_large(folio)) {
- nr = atomic_dec_return_relaxed(mapped);
- nr = (nr < COMPOUND_MAPPED);
- }
+ do {
+ last = atomic_add_negative(-1, &page->_mapcount);
+ if (last && folio_test_large(folio)) {
+ last = atomic_dec_return_relaxed(mapped);
+ last = (last < COMPOUND_MAPPED);
+ }
+
+ if (last)
+ nr++;
+ } while (page++, --nr_pages > 0);
} else if (folio_test_pmd_mappable(folio)) {
/* That test is redundant: it's for safety or to optimize out */
@@ -1441,6 +1447,30 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
munlock_vma_folio(folio, vma, compound);
}
+/**
+ * page_remove_rmap - take down pte mapping from a page
+ * @page: page to remove mapping from
+ * @vma: the vm area from which the mapping is removed
+ * @compound: uncharge the page as compound or small page
+ *
+ * The caller needs to hold the pte lock.
+ */
+void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
+ bool compound)
+{
+ struct folio *folio = page_folio(page);
+ unsigned int nr_pages;
+
+ VM_BUG_ON_FOLIO(compound && (page != &folio->page), folio);
+
+ if (likely(!compound))
+ nr_pages = 1;
+ else
+ nr_pages = folio_nr_pages(folio);
+
+ folio_remove_rmap_range(folio, page, nr_pages, vma, compound);
+}
+
static bool try_to_unmap_one_hugetlb(struct folio *folio,
struct vm_area_struct *vma, struct mmu_notifier_range range,
struct page_vma_mapped_walk pvmw, unsigned long address,
--
2.30.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v4 5/5] try_to_unmap_one: batched remove rmap, update folio refcount
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
` (3 preceding siblings ...)
2023-03-13 12:45 ` [PATCH v4 4/5] rmap:addd folio_remove_rmap_range() Yin Fengwei
@ 2023-03-13 12:45 ` Yin Fengwei
2023-03-13 18:49 ` [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Andrew Morton
5 siblings, 0 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-13 12:45 UTC (permalink / raw)
To: linux-mm, akpm, willy, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu, david
Cc: fengwei.yin
If unmap one page fails, or the vma walk will skip next pte,
or the vma walk will end on next pte, batched remove map,
update folio refcount.
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
include/linux/rmap.h | 1 +
mm/page_vma_mapped.c | 30 +++++++++++++++++++++++++++
mm/rmap.c | 48 ++++++++++++++++++++++++++++++++++----------
3 files changed, 68 insertions(+), 11 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index d2569b42e21a..18193d1d5a8e 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -424,6 +424,7 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
}
bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw);
+bool pvmw_walk_skip_or_end_on_next(struct page_vma_mapped_walk *pvmw);
/*
* Used by swapoff to help locate where page is expected in vma.
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 4e448cfbc6ef..d5d69d02f08b 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -291,6 +291,36 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return false;
}
+/**
+ * pvmw_walk_skip_or_end_on_next - check if next pte will be skipped or
+ * end the walk
+ * @pvmw: pointer to struct page_vma_mapped_walk.
+ *
+ * This function can only be called with correct pte lock hold
+ */
+bool pvmw_walk_skip_or_end_on_next(struct page_vma_mapped_walk *pvmw)
+{
+ unsigned long address = pvmw->address + PAGE_SIZE;
+
+ if (address >= vma_address_end(pvmw))
+ return true;
+
+ if ((address & (PMD_SIZE - PAGE_SIZE)) == 0)
+ return true;
+
+ pvmw->pte++;
+ if (pte_none(*pvmw->pte))
+ return true;
+
+ if (!check_pte(pvmw)) {
+ pvmw->pte--;
+ return true;
+ }
+ pvmw->pte--;
+
+ return false;
+}
+
/**
* page_mapped_in_vma - check whether a page is really mapped in a VMA
* @page: the page to test
diff --git a/mm/rmap.c b/mm/rmap.c
index bd5331dc9d44..60314c76df59 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1741,6 +1741,26 @@ static bool try_to_unmap_one_page(struct folio *folio,
return false;
}
+static void folio_remove_rmap_and_update_count(struct folio *folio,
+ struct page *start, struct vm_area_struct *vma, int count)
+{
+ if (count == 0)
+ return;
+
+ /*
+ * No need to call mmu_notifier_invalidate_range() it has be
+ * done above for all cases requiring it to happen under page
+ * table lock before mmu_notifier_invalidate_range_end()
+ *
+ * See Documentation/mm/mmu_notifier.rst
+ */
+ folio_remove_rmap_range(folio, start, count, vma,
+ folio_test_hugetlb(folio));
+ if (vma->vm_flags & VM_LOCKED)
+ mlock_drain_local();
+ folio_ref_sub(folio, count);
+}
+
/*
* @arg: enum ttu_flags will be passed to this argument
*/
@@ -1748,10 +1768,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
unsigned long address, void *arg)
{
DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
- struct page *subpage;
+ struct page *start = NULL;
bool ret = true;
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;
+ int count = 0;
/*
* When racing against e.g. zap_pte_range() on another cpu,
@@ -1812,26 +1833,31 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
break;
}
- subpage = folio_page(folio,
+ if (!start)
+ start = folio_page(folio,
pte_pfn(*pvmw.pte) - folio_pfn(folio));
ret = try_to_unmap_one_page(folio, vma,
range, pvmw, address, flags);
if (!ret) {
+ folio_remove_rmap_and_update_count(folio,
+ start, vma, count);
page_vma_mapped_walk_done(&pvmw);
break;
}
+ count++;
/*
- * No need to call mmu_notifier_invalidate_range() it has be
- * done above for all cases requiring it to happen under page
- * table lock before mmu_notifier_invalidate_range_end()
- *
- * See Documentation/mm/mmu_notifier.rst
+ * If next pte will be skipped in page_vma_mapped_walk() or
+ * the walk will end at it, batched remove rmap and update
+ * page refcount. We can't do it after page_vma_mapped_walk()
+ * return false because the pte lock will not be hold.
*/
- page_remove_rmap(subpage, vma, false);
- if (vma->vm_flags & VM_LOCKED)
- mlock_drain_local();
- folio_put(folio);
+ if (pvmw_walk_skip_or_end_on_next(&pvmw)) {
+ folio_remove_rmap_and_update_count(folio,
+ start, vma, count);
+ count = 0;
+ start = NULL;
+ }
}
mmu_notifier_invalidate_range_end(&range);
--
2.30.2
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
@ 2023-03-13 16:55 ` kernel test robot
2023-03-13 17:06 ` kernel test robot
2023-03-13 17:37 ` kernel test robot
2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-03-13 16:55 UTC (permalink / raw)
To: Yin Fengwei; +Cc: llvm, oe-kbuild-all
Hi Yin,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yin-Fengwei/rmap-move-hugetlb-try_to_unmap-to-dedicated-function/20230313-204656
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230313124526.1207490-2-fengwei.yin%40intel.com
patch subject: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
config: arm-buildonly-randconfig-r001-20230312 (https://download.01.org/0day-ci/archive/20230314/202303140016.q8dVJMP2-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/intel-lab-lkp/linux/commit/c43d03caaff9b0b9a869fd2a9b8beed7bc833f2a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yin-Fengwei/rmap-move-hugetlb-try_to_unmap-to-dedicated-function/20230313-204656
git checkout c43d03caaff9b0b9a869fd2a9b8beed7bc833f2a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303140016.q8dVJMP2-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/rmap.c:1513:6: error: call to undeclared function 'huge_pte_dirty'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
if (huge_pte_dirty(pteval))
^
1 error generated.
vim +/huge_pte_dirty +1513 mm/rmap.c
1443
1444 static bool try_to_unmap_one_hugetlb(struct folio *folio,
1445 struct vm_area_struct *vma, struct mmu_notifier_range range,
1446 struct page_vma_mapped_walk pvmw, unsigned long address,
1447 enum ttu_flags flags)
1448 {
1449 struct mm_struct *mm = vma->vm_mm;
1450 pte_t pteval;
1451 bool ret = true, anon = folio_test_anon(folio);
1452
1453 /*
1454 * The try_to_unmap() is only passed a hugetlb page
1455 * in the case where the hugetlb page is poisoned.
1456 */
1457 VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
1458 /*
1459 * huge_pmd_unshare may unmap an entire PMD page.
1460 * There is no way of knowing exactly which PMDs may
1461 * be cached for this mm, so we must flush them all.
1462 * start/end were already adjusted in caller
1463 * (try_to_unmap_one) to cover this range.
1464 */
1465 flush_cache_range(vma, range.start, range.end);
1466
1467 /*
1468 * To call huge_pmd_unshare, i_mmap_rwsem must be
1469 * held in write mode. Caller needs to explicitly
1470 * do this outside rmap routines.
1471 *
1472 * We also must hold hugetlb vma_lock in write mode.
1473 * Lock order dictates acquiring vma_lock BEFORE
1474 * i_mmap_rwsem. We can only try lock here and fail
1475 * if unsuccessful.
1476 */
1477 if (!anon) {
1478 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
1479 if (!hugetlb_vma_trylock_write(vma)) {
1480 ret = false;
1481 goto out;
1482 }
1483 if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
1484 hugetlb_vma_unlock_write(vma);
1485 flush_tlb_range(vma,
1486 range.start, range.end);
1487 mmu_notifier_invalidate_range(mm,
1488 range.start, range.end);
1489 /*
1490 * The ref count of the PMD page was
1491 * dropped which is part of the way map
1492 * counting is done for shared PMDs.
1493 * Return 'true' here. When there is
1494 * no other sharing, huge_pmd_unshare
1495 * returns false and we will unmap the
1496 * actual page and drop map count
1497 * to zero.
1498 */
1499 goto out;
1500 }
1501 hugetlb_vma_unlock_write(vma);
1502 }
1503 pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
1504
1505 /*
1506 * Now the pte is cleared. If this pte was uffd-wp armed,
1507 * we may want to replace a none pte with a marker pte if
1508 * it's file-backed, so we don't lose the tracking info.
1509 */
1510 pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
1511
1512 /* Set the dirty flag on the folio now the pte is gone. */
> 1513 if (huge_pte_dirty(pteval))
1514 folio_mark_dirty(folio);
1515
1516 /* Update high watermark before we lower rss */
1517 update_hiwater_rss(mm);
1518
1519 /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
1520 pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
1521 set_huge_pte_at(mm, address, pvmw.pte, pteval);
1522 hugetlb_count_sub(folio_nr_pages(folio), mm);
1523
1524 /*
1525 * No need to call mmu_notifier_invalidate_range() it has be
1526 * done above for all cases requiring it to happen under page
1527 * table lock before mmu_notifier_invalidate_range_end()
1528 *
1529 * See Documentation/mm/mmu_notifier.rst
1530 */
1531 page_remove_rmap(&folio->page, vma, true);
1532 /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
1533 * necessary to call mlock_drain_local().
1534 */
1535 folio_put(folio);
1536
1537 out:
1538 return ret;
1539 }
1540
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
2023-03-13 16:55 ` kernel test robot
@ 2023-03-13 17:06 ` kernel test robot
2023-03-13 17:37 ` kernel test robot
2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-03-13 17:06 UTC (permalink / raw)
To: Yin Fengwei; +Cc: oe-kbuild-all
Hi Yin,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yin-Fengwei/rmap-move-hugetlb-try_to_unmap-to-dedicated-function/20230313-204656
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230313124526.1207490-2-fengwei.yin%40intel.com
patch subject: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
config: riscv-randconfig-r042-20230312 (https://download.01.org/0day-ci/archive/20230314/202303140050.bbSbHSLn-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/c43d03caaff9b0b9a869fd2a9b8beed7bc833f2a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yin-Fengwei/rmap-move-hugetlb-try_to_unmap-to-dedicated-function/20230313-204656
git checkout c43d03caaff9b0b9a869fd2a9b8beed7bc833f2a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303140050.bbSbHSLn-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/rmap.c: In function 'try_to_unmap_one_hugetlb':
>> mm/rmap.c:1513:13: error: implicit declaration of function 'huge_pte_dirty' [-Werror=implicit-function-declaration]
1513 | if (huge_pte_dirty(pteval))
| ^~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/huge_pte_dirty +1513 mm/rmap.c
1443
1444 static bool try_to_unmap_one_hugetlb(struct folio *folio,
1445 struct vm_area_struct *vma, struct mmu_notifier_range range,
1446 struct page_vma_mapped_walk pvmw, unsigned long address,
1447 enum ttu_flags flags)
1448 {
1449 struct mm_struct *mm = vma->vm_mm;
1450 pte_t pteval;
1451 bool ret = true, anon = folio_test_anon(folio);
1452
1453 /*
1454 * The try_to_unmap() is only passed a hugetlb page
1455 * in the case where the hugetlb page is poisoned.
1456 */
1457 VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
1458 /*
1459 * huge_pmd_unshare may unmap an entire PMD page.
1460 * There is no way of knowing exactly which PMDs may
1461 * be cached for this mm, so we must flush them all.
1462 * start/end were already adjusted in caller
1463 * (try_to_unmap_one) to cover this range.
1464 */
1465 flush_cache_range(vma, range.start, range.end);
1466
1467 /*
1468 * To call huge_pmd_unshare, i_mmap_rwsem must be
1469 * held in write mode. Caller needs to explicitly
1470 * do this outside rmap routines.
1471 *
1472 * We also must hold hugetlb vma_lock in write mode.
1473 * Lock order dictates acquiring vma_lock BEFORE
1474 * i_mmap_rwsem. We can only try lock here and fail
1475 * if unsuccessful.
1476 */
1477 if (!anon) {
1478 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
1479 if (!hugetlb_vma_trylock_write(vma)) {
1480 ret = false;
1481 goto out;
1482 }
1483 if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
1484 hugetlb_vma_unlock_write(vma);
1485 flush_tlb_range(vma,
1486 range.start, range.end);
1487 mmu_notifier_invalidate_range(mm,
1488 range.start, range.end);
1489 /*
1490 * The ref count of the PMD page was
1491 * dropped which is part of the way map
1492 * counting is done for shared PMDs.
1493 * Return 'true' here. When there is
1494 * no other sharing, huge_pmd_unshare
1495 * returns false and we will unmap the
1496 * actual page and drop map count
1497 * to zero.
1498 */
1499 goto out;
1500 }
1501 hugetlb_vma_unlock_write(vma);
1502 }
1503 pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
1504
1505 /*
1506 * Now the pte is cleared. If this pte was uffd-wp armed,
1507 * we may want to replace a none pte with a marker pte if
1508 * it's file-backed, so we don't lose the tracking info.
1509 */
1510 pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
1511
1512 /* Set the dirty flag on the folio now the pte is gone. */
> 1513 if (huge_pte_dirty(pteval))
1514 folio_mark_dirty(folio);
1515
1516 /* Update high watermark before we lower rss */
1517 update_hiwater_rss(mm);
1518
1519 /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
1520 pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
1521 set_huge_pte_at(mm, address, pvmw.pte, pteval);
1522 hugetlb_count_sub(folio_nr_pages(folio), mm);
1523
1524 /*
1525 * No need to call mmu_notifier_invalidate_range() it has be
1526 * done above for all cases requiring it to happen under page
1527 * table lock before mmu_notifier_invalidate_range_end()
1528 *
1529 * See Documentation/mm/mmu_notifier.rst
1530 */
1531 page_remove_rmap(&folio->page, vma, true);
1532 /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
1533 * necessary to call mlock_drain_local().
1534 */
1535 folio_put(folio);
1536
1537 out:
1538 return ret;
1539 }
1540
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
2023-03-13 16:55 ` kernel test robot
2023-03-13 17:06 ` kernel test robot
@ 2023-03-13 17:37 ` kernel test robot
2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-03-13 17:37 UTC (permalink / raw)
To: Yin Fengwei; +Cc: llvm, oe-kbuild-all
Hi Yin,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yin-Fengwei/rmap-move-hugetlb-try_to_unmap-to-dedicated-function/20230313-204656
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230313124526.1207490-2-fengwei.yin%40intel.com
patch subject: [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function
config: x86_64-randconfig-a012-20230313 (https://download.01.org/0day-ci/archive/20230314/202303140156.ZvGUC2dc-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/c43d03caaff9b0b9a869fd2a9b8beed7bc833f2a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yin-Fengwei/rmap-move-hugetlb-try_to_unmap-to-dedicated-function/20230313-204656
git checkout c43d03caaff9b0b9a869fd2a9b8beed7bc833f2a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303140156.ZvGUC2dc-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/rmap.c:1513:6: error: implicit declaration of function 'huge_pte_dirty' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
if (huge_pte_dirty(pteval))
^
1 error generated.
vim +/huge_pte_dirty +1513 mm/rmap.c
1443
1444 static bool try_to_unmap_one_hugetlb(struct folio *folio,
1445 struct vm_area_struct *vma, struct mmu_notifier_range range,
1446 struct page_vma_mapped_walk pvmw, unsigned long address,
1447 enum ttu_flags flags)
1448 {
1449 struct mm_struct *mm = vma->vm_mm;
1450 pte_t pteval;
1451 bool ret = true, anon = folio_test_anon(folio);
1452
1453 /*
1454 * The try_to_unmap() is only passed a hugetlb page
1455 * in the case where the hugetlb page is poisoned.
1456 */
1457 VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio);
1458 /*
1459 * huge_pmd_unshare may unmap an entire PMD page.
1460 * There is no way of knowing exactly which PMDs may
1461 * be cached for this mm, so we must flush them all.
1462 * start/end were already adjusted in caller
1463 * (try_to_unmap_one) to cover this range.
1464 */
1465 flush_cache_range(vma, range.start, range.end);
1466
1467 /*
1468 * To call huge_pmd_unshare, i_mmap_rwsem must be
1469 * held in write mode. Caller needs to explicitly
1470 * do this outside rmap routines.
1471 *
1472 * We also must hold hugetlb vma_lock in write mode.
1473 * Lock order dictates acquiring vma_lock BEFORE
1474 * i_mmap_rwsem. We can only try lock here and fail
1475 * if unsuccessful.
1476 */
1477 if (!anon) {
1478 VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
1479 if (!hugetlb_vma_trylock_write(vma)) {
1480 ret = false;
1481 goto out;
1482 }
1483 if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) {
1484 hugetlb_vma_unlock_write(vma);
1485 flush_tlb_range(vma,
1486 range.start, range.end);
1487 mmu_notifier_invalidate_range(mm,
1488 range.start, range.end);
1489 /*
1490 * The ref count of the PMD page was
1491 * dropped which is part of the way map
1492 * counting is done for shared PMDs.
1493 * Return 'true' here. When there is
1494 * no other sharing, huge_pmd_unshare
1495 * returns false and we will unmap the
1496 * actual page and drop map count
1497 * to zero.
1498 */
1499 goto out;
1500 }
1501 hugetlb_vma_unlock_write(vma);
1502 }
1503 pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
1504
1505 /*
1506 * Now the pte is cleared. If this pte was uffd-wp armed,
1507 * we may want to replace a none pte with a marker pte if
1508 * it's file-backed, so we don't lose the tracking info.
1509 */
1510 pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval);
1511
1512 /* Set the dirty flag on the folio now the pte is gone. */
> 1513 if (huge_pte_dirty(pteval))
1514 folio_mark_dirty(folio);
1515
1516 /* Update high watermark before we lower rss */
1517 update_hiwater_rss(mm);
1518
1519 /* Poisoned hugetlb folio with TTU_HWPOISON always cleared in flags */
1520 pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page));
1521 set_huge_pte_at(mm, address, pvmw.pte, pteval);
1522 hugetlb_count_sub(folio_nr_pages(folio), mm);
1523
1524 /*
1525 * No need to call mmu_notifier_invalidate_range() it has be
1526 * done above for all cases requiring it to happen under page
1527 * table lock before mmu_notifier_invalidate_range_end()
1528 *
1529 * See Documentation/mm/mmu_notifier.rst
1530 */
1531 page_remove_rmap(&folio->page, vma, true);
1532 /* No VM_LOCKED set in vma->vm_flags for hugetlb. So not
1533 * necessary to call mlock_drain_local().
1534 */
1535 folio_put(folio);
1536
1537 out:
1538 return ret;
1539 }
1540
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
` (4 preceding siblings ...)
2023-03-13 12:45 ` [PATCH v4 5/5] try_to_unmap_one: batched remove rmap, update folio refcount Yin Fengwei
@ 2023-03-13 18:49 ` Andrew Morton
2023-03-14 3:09 ` Yin Fengwei
5 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2023-03-13 18:49 UTC (permalink / raw)
To: Yin Fengwei
Cc: linux-mm, willy, mike.kravetz, sidhartha.kumar, naoya.horiguchi,
jane.chu, david
On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei <fengwei.yin@intel.com> wrote:
> This series is trying to bring the batched rmap removing to
> try_to_unmap_one(). It's expected that the batched rmap
> removing bring performance gain than remove rmap per page.
>
> This series reconstruct the try_to_unmap_one() from:
> loop:
> clear and update PTE
> unmap one page
> goto loop
> to:
> loop:
> clear and update PTE
> goto loop
> unmap the range of folio in one call
> It is one step to always map/unmap the entire folio in one call.
> Which can simplify the folio mapcount handling by avoid dealing
> with each page map/unmap.
>
> ...
>
> For performance gain demonstration, changed the MADV_PAGEOUT not
> to split the large folio for page cache and created a micro
> benchmark mainly as following:
Please remind me why it's necessary to patch the kernel to actually
performance test this? And why it's proving so hard to demonstrate
benefits in real-world workloads?
(Yes, this was touched on in earlier discussion, but I do think these
considerations should be spelled out in the [0/N] changelog).
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-13 18:49 ` [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Andrew Morton
@ 2023-03-14 3:09 ` Yin Fengwei
2023-03-14 9:16 ` David Hildenbrand
2023-03-20 13:47 ` Yin, Fengwei
0 siblings, 2 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-14 3:09 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, willy, mike.kravetz, sidhartha.kumar, naoya.horiguchi,
jane.chu, david
On 3/14/23 02:49, Andrew Morton wrote:
> On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei <fengwei.yin@intel.com> wrote:
>
>> This series is trying to bring the batched rmap removing to
>> try_to_unmap_one(). It's expected that the batched rmap
>> removing bring performance gain than remove rmap per page.
>>
>> This series reconstruct the try_to_unmap_one() from:
>> loop:
>> clear and update PTE
>> unmap one page
>> goto loop
>> to:
>> loop:
>> clear and update PTE
>> goto loop
>> unmap the range of folio in one call
>> It is one step to always map/unmap the entire folio in one call.
>> Which can simplify the folio mapcount handling by avoid dealing
>> with each page map/unmap.
>>
>> ...
>>
>> For performance gain demonstration, changed the MADV_PAGEOUT not
>> to split the large folio for page cache and created a micro
>> benchmark mainly as following:
>
> Please remind me why it's necessary to patch the kernel to actually
> performance test this? And why it's proving so hard to demonstrate
> benefits in real-world workloads?
>
> (Yes, this was touched on in earlier discussion, but I do think these
> considerations should be spelled out in the [0/N] changelog).
OK. What about add following in cover letter:
"
The performance gain of this series can be demonstrated with large
folio reclaim. In current kernel, vmscan() path will be benefited by
the changes. But there is no workload/benchmark can show the exact
performance gain for vmscan() path as far as I am aware.
Another way to demonstrate the performance benefit is using
MADV_PAGEOUT which can trigger page reclaim also. The problem is that
MADV_PAGEOUT always split the large folio because it's not aware of
large folio for page cache currently. To show the performance benefit,
MADV_PAGEOUT is updated not to split the large folio.
For long term with wider adoption of large folio in kernel (like large
folio for anonymous page), MADV_PAGEOUT needs be updated to handle
large folio as whole to avoid splitting it always.
"
Regards
Yin, Fengwei
>
> Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 3:09 ` Yin Fengwei
@ 2023-03-14 9:16 ` David Hildenbrand
2023-03-14 9:48 ` Matthew Wilcox
2023-03-20 13:47 ` Yin, Fengwei
1 sibling, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2023-03-14 9:16 UTC (permalink / raw)
To: Yin Fengwei, Andrew Morton
Cc: linux-mm, willy, mike.kravetz, sidhartha.kumar, naoya.horiguchi,
jane.chu
On 14.03.23 04:09, Yin Fengwei wrote:
> On 3/14/23 02:49, Andrew Morton wrote:
>> On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei <fengwei.yin@intel.com> wrote:
>>
>>> This series is trying to bring the batched rmap removing to
>>> try_to_unmap_one(). It's expected that the batched rmap
>>> removing bring performance gain than remove rmap per page.
>>>
>>> This series reconstruct the try_to_unmap_one() from:
>>> loop:
>>> clear and update PTE
>>> unmap one page
>>> goto loop
>>> to:
>>> loop:
>>> clear and update PTE
>>> goto loop
>>> unmap the range of folio in one call
>>> It is one step to always map/unmap the entire folio in one call.
>>> Which can simplify the folio mapcount handling by avoid dealing
>>> with each page map/unmap.
>>>
>>> ...
>>>
>>> For performance gain demonstration, changed the MADV_PAGEOUT not
>>> to split the large folio for page cache and created a micro
>>> benchmark mainly as following:
>>
>> Please remind me why it's necessary to patch the kernel to actually
>> performance test this? And why it's proving so hard to demonstrate
>> benefits in real-world workloads?
>>
>> (Yes, this was touched on in earlier discussion, but I do think these
>> considerations should be spelled out in the [0/N] changelog).
> OK. What about add following in cover letter:
> "
> The performance gain of this series can be demonstrated with large
> folio reclaim. In current kernel, vmscan() path will be benefited by
> the changes. But there is no workload/benchmark can show the exact
> performance gain for vmscan() path as far as I am aware.
>
> Another way to demonstrate the performance benefit is using
> MADV_PAGEOUT which can trigger page reclaim also. The problem is that
> MADV_PAGEOUT always split the large folio because it's not aware of
> large folio for page cache currently. To show the performance benefit,
> MADV_PAGEOUT is updated not to split the large folio.
>
> For long term with wider adoption of large folio in kernel (like large
> folio for anonymous page), MADV_PAGEOUT needs be updated to handle
> large folio as whole to avoid splitting it always.
Just curious what the last sentence implies. Large folios are supposed
to be a transparent optimization. So why should we pageout all
surrounding subpages simply because a single subpage was requested to be
paged out? That might harm performance of some workloads ... more than
the actual split.
So it's not immediately obvious to me why "avoid splitting" is the
correct answer to the problem at hand.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 9:16 ` David Hildenbrand
@ 2023-03-14 9:48 ` Matthew Wilcox
2023-03-14 9:50 ` David Hildenbrand
2023-03-14 14:50 ` Yin, Fengwei
0 siblings, 2 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-03-14 9:48 UTC (permalink / raw)
To: David Hildenbrand
Cc: Yin Fengwei, Andrew Morton, linux-mm, mike.kravetz,
sidhartha.kumar, naoya.horiguchi, jane.chu
On Tue, Mar 14, 2023 at 10:16:09AM +0100, David Hildenbrand wrote:
> On 14.03.23 04:09, Yin Fengwei wrote:
> > For long term with wider adoption of large folio in kernel (like large
> > folio for anonymous page), MADV_PAGEOUT needs be updated to handle
> > large folio as whole to avoid splitting it always.
>
> Just curious what the last sentence implies. Large folios are supposed to be
> a transparent optimization. So why should we pageout all surrounding
> subpages simply because a single subpage was requested to be paged out? That
> might harm performance of some workloads ... more than the actual split.
>
> So it's not immediately obvious to me why "avoid splitting" is the correct
> answer to the problem at hand.
Even if your madvise() call says to pageout all pages covered by a
folio, the current code will split it. That's what needs to be fixed.
At least for anonymous pages, using large folios is an attempt to treat
all pages in a particular range the same way. If the user says to only
page out some of them, that's a big clue that these pages are different
from the other pages, and so we should split a folio where the madvise
call does not cover every page in the folio.
I'm less convinced that argument holds for page cache pages.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 9:48 ` Matthew Wilcox
@ 2023-03-14 9:50 ` David Hildenbrand
2023-03-14 14:50 ` Yin, Fengwei
1 sibling, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2023-03-14 9:50 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Yin Fengwei, Andrew Morton, linux-mm, mike.kravetz,
sidhartha.kumar, naoya.horiguchi, jane.chu
On 14.03.23 10:48, Matthew Wilcox wrote:
> On Tue, Mar 14, 2023 at 10:16:09AM +0100, David Hildenbrand wrote:
>> On 14.03.23 04:09, Yin Fengwei wrote:
>>> For long term with wider adoption of large folio in kernel (like large
>>> folio for anonymous page), MADV_PAGEOUT needs be updated to handle
>>> large folio as whole to avoid splitting it always.
>>
>> Just curious what the last sentence implies. Large folios are supposed to be
>> a transparent optimization. So why should we pageout all surrounding
>> subpages simply because a single subpage was requested to be paged out? That
>> might harm performance of some workloads ... more than the actual split.
>>
>> So it's not immediately obvious to me why "avoid splitting" is the correct
>> answer to the problem at hand.
>
> Even if your madvise() call says to pageout all pages covered by a
> folio, the current code will split it. That's what needs to be fixed.
Agreed, if possible in the future (swap handling ...).
>
> At least for anonymous pages, using large folios is an attempt to treat
> all pages in a particular range the same way. If the user says to only
> page out some of them, that's a big clue that these pages are different
> from the other pages, and so we should split a folio where the madvise
> call does not cover every page in the folio.
Agreed.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 9:48 ` Matthew Wilcox
2023-03-14 9:50 ` David Hildenbrand
@ 2023-03-14 14:50 ` Yin, Fengwei
2023-03-14 15:01 ` Matthew Wilcox
1 sibling, 1 reply; 20+ messages in thread
From: Yin, Fengwei @ 2023-03-14 14:50 UTC (permalink / raw)
To: Matthew Wilcox, David Hildenbrand
Cc: Andrew Morton, linux-mm, mike.kravetz, sidhartha.kumar,
naoya.horiguchi, jane.chu
On 3/14/2023 5:48 PM, Matthew Wilcox wrote:
> On Tue, Mar 14, 2023 at 10:16:09AM +0100, David Hildenbrand wrote:
>> On 14.03.23 04:09, Yin Fengwei wrote:
>>> For long term with wider adoption of large folio in kernel (like large
>>> folio for anonymous page), MADV_PAGEOUT needs be updated to handle
>>> large folio as whole to avoid splitting it always.
>>
>> Just curious what the last sentence implies. Large folios are supposed to be
>> a transparent optimization. So why should we pageout all surrounding
>> subpages simply because a single subpage was requested to be paged out? That
>> might harm performance of some workloads ... more than the actual split.
>>
>> So it's not immediately obvious to me why "avoid splitting" is the correct
>> answer to the problem at hand.
>
> Even if your madvise() call says to pageout all pages covered by a
> folio, the current code will split it. That's what needs to be fixed.
Yes. This is my understanding.
>
> At least for anonymous pages, using large folios is an attempt to treat
> all pages in a particular range the same way. If the user says to only
> page out some of them, that's a big clue that these pages are different
> from the other pages, and so we should split a folio where the madvise
> call does not cover every page in the folio.
Yes. This is my understanding also. :).
>
> I'm less convinced that argument holds for page cache pages.
Can you explain more about this? My understanding is that if we need
to reclaim the large folio for page cache, it's better to reclaim the
whole folio.
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 14:50 ` Yin, Fengwei
@ 2023-03-14 15:01 ` Matthew Wilcox
2023-03-15 2:17 ` Yin Fengwei
0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2023-03-14 15:01 UTC (permalink / raw)
To: Yin, Fengwei
Cc: David Hildenbrand, Andrew Morton, linux-mm, mike.kravetz,
sidhartha.kumar, naoya.horiguchi, jane.chu
On Tue, Mar 14, 2023 at 10:50:36PM +0800, Yin, Fengwei wrote:
> On 3/14/2023 5:48 PM, Matthew Wilcox wrote:
> > On Tue, Mar 14, 2023 at 10:16:09AM +0100, David Hildenbrand wrote:
> >> Just curious what the last sentence implies. Large folios are supposed to be
> >> a transparent optimization. So why should we pageout all surrounding
> >> subpages simply because a single subpage was requested to be paged out? That
> >> might harm performance of some workloads ... more than the actual split.
> >>
> >> So it's not immediately obvious to me why "avoid splitting" is the correct
> >> answer to the problem at hand.
> >
> > At least for anonymous pages, using large folios is an attempt to treat
> > all pages in a particular range the same way. If the user says to only
> > page out some of them, that's a big clue that these pages are different
> > from the other pages, and so we should split a folio where the madvise
> > call does not cover every page in the folio.
>
> Yes. This is my understanding also. :).
>
> > I'm less convinced that argument holds for page cache pages.
>
> Can you explain more about this? My understanding is that if we need
> to reclaim the large folio for page cache, it's better to reclaim the
> whole folio.
Pagecache is a shared resource. To determine how best to handle all
the memory used to cache a file (ie the correct folio size), ideally
we would take into account how all the users of a particular file are
using it. If we just listen to the most recent advice from one user,
we risk making a decision that's bad for potentially many other users.
Of course, we don't have any framework for deciding the correct folio size
used for pagecache yet. We have the initial guess based on readahead
and we have various paths that will split back to individual pages.
But it's something I know we'll want to do at some point.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 15:01 ` Matthew Wilcox
@ 2023-03-15 2:17 ` Yin Fengwei
0 siblings, 0 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-15 2:17 UTC (permalink / raw)
To: Matthew Wilcox
Cc: David Hildenbrand, Andrew Morton, linux-mm, mike.kravetz,
sidhartha.kumar, naoya.horiguchi, jane.chu
On 3/14/23 23:01, Matthew Wilcox wrote:
> On Tue, Mar 14, 2023 at 10:50:36PM +0800, Yin, Fengwei wrote:
>> On 3/14/2023 5:48 PM, Matthew Wilcox wrote:
>>> On Tue, Mar 14, 2023 at 10:16:09AM +0100, David Hildenbrand wrote:
>>>> Just curious what the last sentence implies. Large folios are supposed to be
>>>> a transparent optimization. So why should we pageout all surrounding
>>>> subpages simply because a single subpage was requested to be paged out? That
>>>> might harm performance of some workloads ... more than the actual split.
>>>>
>>>> So it's not immediately obvious to me why "avoid splitting" is the correct
>>>> answer to the problem at hand.
>>>
>>> At least for anonymous pages, using large folios is an attempt to treat
>>> all pages in a particular range the same way. If the user says to only
>>> page out some of them, that's a big clue that these pages are different
>>> from the other pages, and so we should split a folio where the madvise
>>> call does not cover every page in the folio.
>>
>> Yes. This is my understanding also. :).
>>
>>> I'm less convinced that argument holds for page cache pages.
>>
>> Can you explain more about this? My understanding is that if we need
>> to reclaim the large folio for page cache, it's better to reclaim the
>> whole folio.
>
> Pagecache is a shared resource. To determine how best to handle all
> the memory used to cache a file (ie the correct folio size), ideally
> we would take into account how all the users of a particular file are
> using it. If we just listen to the most recent advice from one user,
> we risk making a decision that's bad for potentially many other users.
>
> Of course, we don't have any framework for deciding the correct folio size
> used for pagecache yet. We have the initial guess based on readahead
> and we have various paths that will split back to individual pages.
> But it's something I know we'll want to do at some point.
Thanks a lot for detail explanation.
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-14 3:09 ` Yin Fengwei
2023-03-14 9:16 ` David Hildenbrand
@ 2023-03-20 13:47 ` Yin, Fengwei
2023-03-21 14:17 ` David Hildenbrand
1 sibling, 1 reply; 20+ messages in thread
From: Yin, Fengwei @ 2023-03-20 13:47 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, willy, mike.kravetz, sidhartha.kumar, naoya.horiguchi,
jane.chu, david
Hi Andrew, David,
On 3/14/2023 11:09 AM, Yin Fengwei wrote:
> On 3/14/23 02:49, Andrew Morton wrote:
>> On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei <fengwei.yin@intel.com> wrote:
>>
>>> This series is trying to bring the batched rmap removing to
>>> try_to_unmap_one(). It's expected that the batched rmap
>>> removing bring performance gain than remove rmap per page.
>>>
>>> This series reconstruct the try_to_unmap_one() from:
>>> loop:
>>> clear and update PTE
>>> unmap one page
>>> goto loop
>>> to:
>>> loop:
>>> clear and update PTE
>>> goto loop
>>> unmap the range of folio in one call
>>> It is one step to always map/unmap the entire folio in one call.
>>> Which can simplify the folio mapcount handling by avoid dealing
>>> with each page map/unmap.
>>>
>>> ...
>>>
>>> For performance gain demonstration, changed the MADV_PAGEOUT not
>>> to split the large folio for page cache and created a micro
>>> benchmark mainly as following:
>>
>> Please remind me why it's necessary to patch the kernel to actually
>> performance test this? And why it's proving so hard to demonstrate
>> benefits in real-world workloads?
>>
>> (Yes, this was touched on in earlier discussion, but I do think these
>> considerations should be spelled out in the [0/N] changelog).
> OK. What about add following in cover letter:
> "
> The performance gain of this series can be demonstrated with large
> folio reclaim. In current kernel, vmscan() path will be benefited by
> the changes. But there is no workload/benchmark can show the exact
> performance gain for vmscan() path as far as I am aware.
>
> Another way to demonstrate the performance benefit is using
> MADV_PAGEOUT which can trigger page reclaim also. The problem is that
> MADV_PAGEOUT always split the large folio because it's not aware of
> large folio for page cache currently. To show the performance benefit,
> MADV_PAGEOUT is updated not to split the large folio.
>
> For long term with wider adoption of large folio in kernel (like large
> folio for anonymous page), MADV_PAGEOUT needs be updated to handle
> large folio as whole to avoid splitting it always.
> "
I just want to check how I can move this work forward. Is it enough
by adding above message? Or still need some other work be done first? Thanks.
Regards
Yin, Fengwei
>
>
> Regards
> Yin, Fengwei
>
>>
>> Thanks.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-20 13:47 ` Yin, Fengwei
@ 2023-03-21 14:17 ` David Hildenbrand
2023-03-22 1:31 ` Yin Fengwei
0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2023-03-21 14:17 UTC (permalink / raw)
To: Yin, Fengwei, Andrew Morton
Cc: linux-mm, willy, mike.kravetz, sidhartha.kumar, naoya.horiguchi,
jane.chu
On 20.03.23 14:47, Yin, Fengwei wrote:
> Hi Andrew, David,
>
> On 3/14/2023 11:09 AM, Yin Fengwei wrote:
>> On 3/14/23 02:49, Andrew Morton wrote:
>>> On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei <fengwei.yin@intel.com> wrote:
>>>
>>>> This series is trying to bring the batched rmap removing to
>>>> try_to_unmap_one(). It's expected that the batched rmap
>>>> removing bring performance gain than remove rmap per page.
>>>>
>>>> This series reconstruct the try_to_unmap_one() from:
>>>> loop:
>>>> clear and update PTE
>>>> unmap one page
>>>> goto loop
>>>> to:
>>>> loop:
>>>> clear and update PTE
>>>> goto loop
>>>> unmap the range of folio in one call
>>>> It is one step to always map/unmap the entire folio in one call.
>>>> Which can simplify the folio mapcount handling by avoid dealing
>>>> with each page map/unmap.
>>>>
>>>> ...
>>>>
>>>> For performance gain demonstration, changed the MADV_PAGEOUT not
>>>> to split the large folio for page cache and created a micro
>>>> benchmark mainly as following:
>>>
>>> Please remind me why it's necessary to patch the kernel to actually
>>> performance test this? And why it's proving so hard to demonstrate
>>> benefits in real-world workloads?
>>>
>>> (Yes, this was touched on in earlier discussion, but I do think these
>>> considerations should be spelled out in the [0/N] changelog).
>> OK. What about add following in cover letter:
>> "
>> The performance gain of this series can be demonstrated with large
>> folio reclaim. In current kernel, vmscan() path will be benefited by
>> the changes. But there is no workload/benchmark can show the exact
>> performance gain for vmscan() path as far as I am aware.
>>
>> Another way to demonstrate the performance benefit is using
>> MADV_PAGEOUT which can trigger page reclaim also. The problem is that
>> MADV_PAGEOUT always split the large folio because it's not aware of
>> large folio for page cache currently. To show the performance benefit,
>> MADV_PAGEOUT is updated not to split the large folio.
>>
>> For long term with wider adoption of large folio in kernel (like large
>> folio for anonymous page), MADV_PAGEOUT needs be updated to handle
>> large folio as whole to avoid splitting it always.
>> "
> I just want to check how I can move this work forward. Is it enough
> by adding above message? Or still need some other work be done first? Thanks.
I think Andrew can add that, no need to resend. But we should see more
review (I'm fairly busy ...).
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v4 0/5] batched remove rmap in try_to_unmap_one()
2023-03-21 14:17 ` David Hildenbrand
@ 2023-03-22 1:31 ` Yin Fengwei
0 siblings, 0 replies; 20+ messages in thread
From: Yin Fengwei @ 2023-03-22 1:31 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton
Cc: linux-mm, willy, mike.kravetz, sidhartha.kumar, naoya.horiguchi,
jane.chu
On 3/21/23 22:17, David Hildenbrand wrote:
> On 20.03.23 14:47, Yin, Fengwei wrote:
>> Hi Andrew, David,
>>
>> On 3/14/2023 11:09 AM, Yin Fengwei wrote:
>>> On 3/14/23 02:49, Andrew Morton wrote:
>>>> On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei
>>>> <fengwei.yin@intel.com> wrote:
>>>>
>>>>> This series is trying to bring the batched rmap removing to
>>>>> try_to_unmap_one(). It's expected that the batched rmap
>>>>> removing bring performance gain than remove rmap per page.
>>>>>
>>>>> This series reconstruct the try_to_unmap_one() from:
>>>>> loop:
>>>>> clear and update PTE
>>>>> unmap one page
>>>>> goto loop
>>>>> to:
>>>>> loop:
>>>>> clear and update PTE
>>>>> goto loop
>>>>> unmap the range of folio in one call
>>>>> It is one step to always map/unmap the entire folio in one call.
>>>>> Which can simplify the folio mapcount handling by avoid dealing
>>>>> with each page map/unmap.
>>>>>
>>>>> ...
>>>>>
>>>>> For performance gain demonstration, changed the MADV_PAGEOUT not
>>>>> to split the large folio for page cache and created a micro
>>>>> benchmark mainly as following:
>>>>
>>>> Please remind me why it's necessary to patch the kernel to actually
>>>> performance test this? And why it's proving so hard to demonstrate
>>>> benefits in real-world workloads?
>>>>
>>>> (Yes, this was touched on in earlier discussion, but I do think these
>>>> considerations should be spelled out in the [0/N] changelog).
>>> OK. What about add following in cover letter:
>>> "
>>> The performance gain of this series can be demonstrated with large
>>> folio reclaim. In current kernel, vmscan() path will be benefited by
>>> the changes. But there is no workload/benchmark can show the exact
>>> performance gain for vmscan() path as far as I am aware.
>>>
>>> Another way to demonstrate the performance benefit is using
>>> MADV_PAGEOUT which can trigger page reclaim also. The problem is that
>>> MADV_PAGEOUT always split the large folio because it's not aware of
>>> large folio for page cache currently. To show the performance benefit,
>>> MADV_PAGEOUT is updated not to split the large folio.
>>>
>>> For long term with wider adoption of large folio in kernel (like large
>>> folio for anonymous page), MADV_PAGEOUT needs be updated to handle
>>> large folio as whole to avoid splitting it always.
>>> "
>> I just want to check how I can move this work forward. Is it enough
>> by adding above message? Or still need some other work be done first?
>> Thanks.
>
> I think Andrew can add that, no need to resend. But we should see more
> review (I'm fairly busy ...).
OK.
Regards
Yin, Fengwei
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-03-22 1:35 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-13 12:45 [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 1/5] rmap: move hugetlb try_to_unmap to dedicated function Yin Fengwei
2023-03-13 16:55 ` kernel test robot
2023-03-13 17:06 ` kernel test robot
2023-03-13 17:37 ` kernel test robot
2023-03-13 12:45 ` [PATCH v4 2/5] rmap: move page unmap operation " Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 3/5] rmap: cleanup exit path of try_to_unmap_one_page() Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 4/5] rmap:addd folio_remove_rmap_range() Yin Fengwei
2023-03-13 12:45 ` [PATCH v4 5/5] try_to_unmap_one: batched remove rmap, update folio refcount Yin Fengwei
2023-03-13 18:49 ` [PATCH v4 0/5] batched remove rmap in try_to_unmap_one() Andrew Morton
2023-03-14 3:09 ` Yin Fengwei
2023-03-14 9:16 ` David Hildenbrand
2023-03-14 9:48 ` Matthew Wilcox
2023-03-14 9:50 ` David Hildenbrand
2023-03-14 14:50 ` Yin, Fengwei
2023-03-14 15:01 ` Matthew Wilcox
2023-03-15 2:17 ` Yin Fengwei
2023-03-20 13:47 ` Yin, Fengwei
2023-03-21 14:17 ` David Hildenbrand
2023-03-22 1:31 ` Yin Fengwei
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.