* [PATCH 1/2] mm: introduce page reference manipulation functions
@ 2016-02-15 3:04 js1304
2016-02-15 3:04 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation js1304
0 siblings, 1 reply; 22+ messages in thread
From: js1304 @ 2016-02-15 3:04 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Nazarewicz, Minchan Kim, Mel Gorman, Vlastimil Babka,
Kirill A. Shutemov, Sergey Senozhatsky, Steven Rostedt, linux-mm,
linux-kernel, linux-api, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Success of CMA allocation largely depends on success of migration
and key factor of it is page reference count. Until now, page reference
is manipulated by direct calling atomic functions so we cannot follow up
who and where manipulate it. Then, it is hard to find actual reason
of CMA allocation failure. CMA allocation should be guaranteed to succeed
so finding offending place is really important.
In this patch, call sites where page reference is manipulated are converted
to introduced wrapper function. This is preparation step to add tracepoint
to each page reference manipulation function. With this facility, we can
easily find reason of CMA allocation failure. There is no functional change
in this patch.
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
arch/mips/mm/gup.c | 2 +-
arch/powerpc/mm/mmu_context_hash64.c | 3 +-
arch/powerpc/mm/pgtable_64.c | 2 +-
arch/x86/mm/gup.c | 2 +-
drivers/block/aoe/aoecmd.c | 4 +-
drivers/net/ethernet/freescale/gianfar.c | 2 +-
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 7 +--
drivers/net/ethernet/sun/niu.c | 2 +-
include/linux/mm.h | 21 ++-----
include/linux/page_ref.h | 76 +++++++++++++++++++++++
include/linux/pagemap.h | 19 +-----
mm/huge_memory.c | 4 +-
mm/internal.h | 5 --
mm/memory_hotplug.c | 4 +-
mm/migrate.c | 10 +--
mm/page_alloc.c | 6 +-
mm/vmscan.c | 6 +-
21 files changed, 113 insertions(+), 70 deletions(-)
create mode 100644 include/linux/page_ref.h
diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 1afd87c..6cdffc7 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -64,7 +64,7 @@ static inline void get_head_page_multiple(struct page *page, int nr)
{
VM_BUG_ON(page != compound_head(page));
VM_BUG_ON(page_count(page) == 0);
- atomic_add(nr, &page->_count);
+ page_ref_add(page, nr);
SetPageReferenced(page);
}
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index 4e4efbc..9ca6fe1 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -118,8 +118,7 @@ static void destroy_pagetable_page(struct mm_struct *mm)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- count = atomic_sub_return(PTE_FRAG_NR - count, &page->_count);
- if (!count) {
+ if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
pgtable_page_dtor(page);
free_hot_cold_page(page, 0);
}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 593341f..1550e8c 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -403,7 +403,7 @@ static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
* count.
*/
if (likely(!mm->context.pte_frag)) {
- atomic_set(&page->_count, PTE_FRAG_NR);
+ set_page_count(page, PTE_FRAG_NR);
mm->context.pte_frag = ret + PTE_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 6d5eb59..492beee 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -131,7 +131,7 @@ static inline void get_head_page_multiple(struct page *page, int nr)
{
VM_BUG_ON_PAGE(page != compound_head(page), page);
VM_BUG_ON_PAGE(page_count(page) == 0, page);
- atomic_add(nr, &page->_count);
+ page_ref_add(page, nr);
SetPageReferenced(page);
}
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index d048d20..437b3a8 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -875,7 +875,7 @@ bio_pageinc(struct bio *bio)
* compound pages is no longer allowed by the kernel.
*/
page = compound_head(bv.bv_page);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
}
}
@@ -888,7 +888,7 @@ bio_pagedec(struct bio *bio)
bio_for_each_segment(bv, bio, iter) {
page = compound_head(bv.bv_page);
- atomic_dec(&page->_count);
+ page_ref_dec(page);
}
}
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 2aa7b40..3b8d45a 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2942,7 +2942,7 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus,
/* change offset to the other half */
rxb->page_offset ^= GFAR_RXB_TRUESIZE;
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index b243c3c..b4547eb 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -243,7 +243,7 @@ static bool fm10k_can_reuse_rx_page(struct fm10k_rx_buffer *rx_buffer,
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 31e5f39..5b4ad1a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6630,7 +6630,7 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0c701b8..b37c2ff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1942,7 +1942,7 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 3558f01..0ea14c0 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -837,7 +837,7 @@ add_tail_frag:
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 41440b2..04758f1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -82,8 +82,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
/* Not doing get_page() for each frag is a big win
* on asymetric workloads. Note we can not use atomic_set().
*/
- atomic_add(page_alloc->page_size / frag_info->frag_stride - 1,
- &page->_count);
+ page_ref_add(page, page_alloc->page_size / frag_info->frag_stride - 1);
return 0;
}
@@ -127,7 +126,7 @@ out:
dma_unmap_page(priv->ddev, page_alloc[i].dma,
page_alloc[i].page_size, PCI_DMA_FROMDEVICE);
page = page_alloc[i].page;
- atomic_set(&page->_count, 1);
+ set_page_count(page, 1);
put_page(page);
}
}
@@ -177,7 +176,7 @@ out:
dma_unmap_page(priv->ddev, page_alloc->dma,
page_alloc->page_size, PCI_DMA_FROMDEVICE);
page = page_alloc->page;
- atomic_set(&page->_count, 1);
+ set_page_count(page, 1);
put_page(page);
page_alloc->page = NULL;
}
diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index ab6051a..9cc4564 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -3341,7 +3341,7 @@ static int niu_rbr_add_page(struct niu *np, struct rx_ring_info *rp,
niu_hash_page(rp, page, addr);
if (rp->rbr_blocks_per_page > 1)
- atomic_add(rp->rbr_blocks_per_page - 1, &page->_count);
+ page_ref_add(page, rp->rbr_blocks_per_page - 1);
for (i = 0; i < rp->rbr_blocks_per_page; i++) {
__le32 *rbr = &rp->rbr[start_index + i];
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a0ad7af..a0e4c10 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -22,6 +22,7 @@
#include <linux/resource.h>
#include <linux/page_ext.h>
#include <linux/err.h>
+#include <linux/page_ref.h>
struct mempolicy;
struct anon_vma;
@@ -365,7 +366,7 @@ static inline int pmd_devmap(pmd_t pmd)
static inline int put_page_testzero(struct page *page)
{
VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0, page);
- return atomic_dec_and_test(&page->_count);
+ return page_ref_dec_and_test(page);
}
/*
@@ -376,7 +377,7 @@ static inline int put_page_testzero(struct page *page)
*/
static inline int get_page_unless_zero(struct page *page)
{
- return atomic_inc_not_zero(&page->_count);
+ return page_ref_add_unless(page, 1, 0);
}
extern int page_is_ram(unsigned long pfn);
@@ -464,11 +465,6 @@ static inline int total_mapcount(struct page *page)
}
#endif
-static inline int page_count(struct page *page)
-{
- return atomic_read(&compound_head(page)->_count);
-}
-
static inline struct page *virt_to_head_page(const void *x)
{
struct page *page = virt_to_page(x);
@@ -476,15 +472,6 @@ static inline struct page *virt_to_head_page(const void *x)
return compound_head(page);
}
-/*
- * Setup the page count before being freed into the page allocator for
- * the first time (boot or memory hotplug)
- */
-static inline void init_page_count(struct page *page)
-{
- atomic_set(&page->_count, 1);
-}
-
void __put_page(struct page *page);
void put_pages_list(struct list_head *pages);
@@ -695,7 +682,7 @@ static inline void get_page(struct page *page)
* requires to already have an elevated page->_count.
*/
VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
if (unlikely(is_zone_device_page(page)))
get_zone_device_page(page);
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
new file mode 100644
index 0000000..534249c
--- /dev/null
+++ b/include/linux/page_ref.h
@@ -0,0 +1,76 @@
+#include <linux/atomic.h>
+#include <linux/mm_types.h>
+#include <linux/page-flags.h>
+
+static inline int page_count(struct page *page)
+{
+ return atomic_read(&compound_head(page)->_count);
+}
+
+static inline void set_page_count(struct page *page, int v)
+{
+ atomic_set(&page->_count, v);
+}
+
+/*
+ * Setup the page count before being freed into the page allocator for
+ * the first time (boot or memory hotplug)
+ */
+static inline void init_page_count(struct page *page)
+{
+ set_page_count(page, 1);
+}
+
+static inline void page_ref_add(struct page *page, int nr)
+{
+ atomic_add(nr, &page->_count);
+}
+
+static inline void page_ref_sub(struct page *page, int nr)
+{
+ atomic_sub(nr, &page->_count);
+}
+
+static inline void page_ref_inc(struct page *page)
+{
+ atomic_inc(&page->_count);
+}
+
+static inline void page_ref_dec(struct page *page)
+{
+ atomic_dec(&page->_count);
+}
+
+static inline int page_ref_sub_and_test(struct page *page, int nr)
+{
+ return atomic_sub_and_test(nr, &page->_count);
+}
+
+static inline int page_ref_dec_and_test(struct page *page)
+{
+ return atomic_dec_and_test(&page->_count);
+}
+
+static inline int page_ref_dec_return(struct page *page)
+{
+ return atomic_dec_return(&page->_count);
+}
+
+static inline int page_ref_add_unless(struct page *page, int nr, int u)
+{
+ return atomic_add_unless(&page->_count, nr, u);
+}
+
+static inline int page_ref_freeze(struct page *page, int count)
+{
+ return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
+}
+
+static inline void page_ref_unfreeze(struct page *page, int count)
+{
+ VM_BUG_ON_PAGE(page_count(page) != 0, page);
+ VM_BUG_ON(count == 0);
+
+ atomic_set(&page->_count, count);
+}
+
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 183b15e..1ebd65c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -165,7 +165,7 @@ static inline int page_cache_get_speculative(struct page *page)
* SMP requires.
*/
VM_BUG_ON_PAGE(page_count(page) == 0, page);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
#else
if (unlikely(!get_page_unless_zero(page))) {
@@ -194,10 +194,10 @@ static inline int page_cache_add_speculative(struct page *page, int count)
VM_BUG_ON(!in_atomic());
# endif
VM_BUG_ON_PAGE(page_count(page) == 0, page);
- atomic_add(count, &page->_count);
+ page_ref_add(page, count);
#else
- if (unlikely(!atomic_add_unless(&page->_count, count, 0)))
+ if (unlikely(!page_ref_add_unless(page, count, 0)))
return 0;
#endif
VM_BUG_ON_PAGE(PageCompound(page) && page != compound_head(page), page);
@@ -205,19 +205,6 @@ static inline int page_cache_add_speculative(struct page *page, int count)
return 1;
}
-static inline int page_freeze_refs(struct page *page, int count)
-{
- return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
-}
-
-static inline void page_unfreeze_refs(struct page *page, int count)
-{
- VM_BUG_ON_PAGE(page_count(page) != 0, page);
- VM_BUG_ON(count == 0);
-
- atomic_set(&page->_count, count);
-}
-
#ifdef CONFIG_NUMA
extern struct page *__page_cache_alloc(gfp_t gfp);
#else
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3cda32c..75f777a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2931,7 +2931,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
page = pmd_page(*pmd);
VM_BUG_ON_PAGE(!page_count(page), page);
- atomic_add(HPAGE_PMD_NR - 1, &page->_count);
+ page_ref_add(page, HPAGE_PMD_NR - 1);
write = pmd_write(*pmd);
young = pmd_young(*pmd);
dirty = pmd_dirty(*pmd);
@@ -3313,7 +3313,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
* atomic_set() here would be safe on all archs (and not only on x86),
* it's safer to use atomic_inc().
*/
- atomic_inc(&page_tail->_count);
+ page_ref_inc(page_tail);
page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (head->flags &
diff --git a/mm/internal.h b/mm/internal.h
index f9153e5..72bbce3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -47,11 +47,6 @@ void unmap_page_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
struct zap_details *details);
-static inline void set_page_count(struct page *page, int v)
-{
- atomic_set(&page->_count, v);
-}
-
extern int __do_page_cache_readahead(struct address_space *mapping,
struct file *filp, pgoff_t offset, unsigned long nr_to_read,
unsigned long lookahead_size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 92402f8..3554b91 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -167,7 +167,7 @@ void get_page_bootmem(unsigned long info, struct page *page,
page->lru.next = (struct list_head *) type;
SetPagePrivate(page);
set_page_private(page, info);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
}
void put_page_bootmem(struct page *page)
@@ -178,7 +178,7 @@ void put_page_bootmem(struct page *page)
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE);
- if (atomic_dec_return(&page->_count) == 1) {
+ if (page_ref_dec_return(page) == 1) {
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
diff --git a/mm/migrate.c b/mm/migrate.c
index 90cbf7c6..2dd15c0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -349,7 +349,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
return -EAGAIN;
}
- if (!page_freeze_refs(page, expected_count)) {
+ if (!page_ref_freeze(page, expected_count)) {
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
}
@@ -363,7 +363,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
*/
if (mode == MIGRATE_ASYNC && head &&
!buffer_migrate_lock_buffers(head, mode)) {
- page_unfreeze_refs(page, expected_count);
+ page_ref_unfreeze(page, expected_count);
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
}
@@ -397,7 +397,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
* to one less reference.
* We know this isn't the last reference.
*/
- page_unfreeze_refs(page, expected_count - 1);
+ page_ref_unfreeze(page, expected_count - 1);
spin_unlock(&mapping->tree_lock);
/* Leave irq disabled to prevent preemption while updating stats */
@@ -451,7 +451,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
return -EAGAIN;
}
- if (!page_freeze_refs(page, expected_count)) {
+ if (!page_ref_freeze(page, expected_count)) {
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
}
@@ -463,7 +463,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
radix_tree_replace_slot(pslot, newpage);
- page_unfreeze_refs(page, expected_count - 1);
+ page_ref_unfreeze(page, expected_count - 1);
spin_unlock_irq(&mapping->tree_lock);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00118fe..6ef7bb0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3587,7 +3587,7 @@ refill:
/* Even if we own the page, we do not use atomic_set().
* This would break get_page_unless_zero() users.
*/
- atomic_add(size - 1, &page->_count);
+ page_ref_add(page, size - 1);
/* reset page count bias and offset to start of new frag */
nc->pfmemalloc = page_is_pfmemalloc(page);
@@ -3599,7 +3599,7 @@ refill:
if (unlikely(offset < 0)) {
page = virt_to_page(nc->va);
- if (!atomic_sub_and_test(nc->pagecnt_bias, &page->_count))
+ if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
goto refill;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
@@ -3607,7 +3607,7 @@ refill:
size = nc->size;
#endif
/* OK, page count is 0, we can safely set it */
- atomic_set(&page->_count, size);
+ set_page_count(page, size);
/* reset page count bias and offset to start of new frag */
nc->pagecnt_bias = size;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 86eb214..39e90e2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -638,11 +638,11 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
* Note that if SetPageDirty is always performed via set_page_dirty,
* and thus under tree_lock, then this ordering is not required.
*/
- if (!page_freeze_refs(page, 2))
+ if (!page_ref_freeze(page, 2))
goto cannot_free;
/* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */
if (unlikely(PageDirty(page))) {
- page_unfreeze_refs(page, 2);
+ page_ref_unfreeze(page, 2);
goto cannot_free;
}
@@ -704,7 +704,7 @@ int remove_mapping(struct address_space *mapping, struct page *page)
* drops the pagecache ref for us without requiring another
* atomic operation.
*/
- page_unfreeze_refs(page, 1);
+ page_ref_unfreeze(page, 1);
return 1;
}
return 0;
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 3:04 [PATCH 1/2] mm: introduce page reference manipulation functions js1304
@ 2016-02-15 3:04 ` js1304
2016-02-15 5:08 ` Sergey Senozhatsky
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: js1304 @ 2016-02-15 3:04 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Nazarewicz, Minchan Kim, Mel Gorman, Vlastimil Babka,
Kirill A. Shutemov, Sergey Senozhatsky, Steven Rostedt, linux-mm,
linux-kernel, linux-api, Joonsoo Kim
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
CMA allocation should be guaranteed to succeed by definition, but,
unfortunately, it would be failed sometimes. It is hard to track down
the problem, because it is related to page reference manipulation and
we don't have any facility to analyze it.
This patch adds tracepoints to track down page reference manipulation.
With it, we can find exact reason of failure and can fix the problem.
Following is an example of tracepoint output.
<...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
<...>-9018 [004] 92.678378: kernel_stack:
=> get_page_from_freelist (ffffffff81176659)
=> __alloc_pages_nodemask (ffffffff81176d22)
=> alloc_pages_vma (ffffffff811bf675)
=> handle_mm_fault (ffffffff8119e693)
=> __do_page_fault (ffffffff810631ea)
=> trace_do_page_fault (ffffffff81063543)
=> do_async_page_fault (ffffffff8105c40a)
=> async_page_fault (ffffffff817581d8)
[snip]
<...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
[snip]
...
...
<...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
[snip]
<...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
=> release_pages (ffffffff8117c9e4)
=> free_pages_and_swap_cache (ffffffff811b0697)
=> tlb_flush_mmu_free (ffffffff81199616)
=> tlb_finish_mmu (ffffffff8119a62c)
=> exit_mmap (ffffffff811a53f7)
=> mmput (ffffffff81073f47)
=> do_exit (ffffffff810794e9)
=> do_group_exit (ffffffff81079def)
=> SyS_exit_group (ffffffff81079e74)
=> entry_SYSCALL_64_fastpath (ffffffff817560b6)
This output shows that problem comes from exit path. In exit path,
to improve performance, pages are not freed immediately. They are gathered
and processed by batch. During this process, migration cannot be possible
and CMA allocation is failed. This problem is hard to find without this
page reference tracepoint facility.
Enabling this feature bloat kernel text 30 KB in my configuration.
text data bss dec hex filename
12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
v2:
o Use static key of each tracepoints to avoid function call overhead
when tracepoints are disabled.
o Print human-readable page flag through show_page_flags()
o Add more description to Kconfig.debug.
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/page_ref.h | 90 +++++++++++++++++++++++++--
include/trace/events/page_ref.h | 133 ++++++++++++++++++++++++++++++++++++++++
mm/Kconfig.debug | 13 ++++
mm/Makefile | 1 +
mm/debug_page_ref.c | 53 ++++++++++++++++
5 files changed, 285 insertions(+), 5 deletions(-)
create mode 100644 include/trace/events/page_ref.h
create mode 100644 mm/debug_page_ref.c
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 534249c..fd6d9a5 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -1,6 +1,54 @@
#include <linux/atomic.h>
#include <linux/mm_types.h>
#include <linux/page-flags.h>
+#include <linux/tracepoint-defs.h>
+
+extern struct tracepoint __tracepoint_page_ref_set;
+extern struct tracepoint __tracepoint_page_ref_mod;
+extern struct tracepoint __tracepoint_page_ref_mod_and_test;
+extern struct tracepoint __tracepoint_page_ref_mod_and_return;
+extern struct tracepoint __tracepoint_page_ref_mod_unless;
+extern struct tracepoint __tracepoint_page_ref_freeze;
+extern struct tracepoint __tracepoint_page_ref_unfreeze;
+
+#ifdef CONFIG_DEBUG_PAGE_REF
+#define page_ref_tracepoint_active(t) static_key_false(&(t).key)
+
+extern void __page_ref_set(struct page *page, int v);
+extern void __page_ref_mod(struct page *page, int v);
+extern void __page_ref_mod_and_test(struct page *page, int v, int ret);
+extern void __page_ref_mod_and_return(struct page *page, int v, int ret);
+extern void __page_ref_mod_unless(struct page *page, int v, int u);
+extern void __page_ref_freeze(struct page *page, int v, int ret);
+extern void __page_ref_unfreeze(struct page *page, int v);
+
+#else
+
+#define page_ref_tracepoint_active(t) false
+
+static inline void __page_ref_set(struct page *page, int v)
+{
+}
+static inline void __page_ref_mod(struct page *page, int v)
+{
+}
+static inline void __page_ref_mod_and_test(struct page *page, int v, int ret)
+{
+}
+static inline void __page_ref_mod_and_return(struct page *page, int v, int ret)
+{
+}
+static inline void __page_ref_mod_unless(struct page *page, int v, int u)
+{
+}
+static inline void __page_ref_freeze(struct page *page, int v, int ret)
+{
+}
+static inline void __page_ref_unfreeze(struct page *page, int v)
+{
+}
+
+#endif
static inline int page_count(struct page *page)
{
@@ -10,6 +58,8 @@ static inline int page_count(struct page *page)
static inline void set_page_count(struct page *page, int v)
{
atomic_set(&page->_count, v);
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_set))
+ __page_ref_set(page, v);
}
/*
@@ -24,46 +74,74 @@ static inline void init_page_count(struct page *page)
static inline void page_ref_add(struct page *page, int nr)
{
atomic_add(nr, &page->_count);
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod))
+ __page_ref_mod(page, nr);
}
static inline void page_ref_sub(struct page *page, int nr)
{
atomic_sub(nr, &page->_count);
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod))
+ __page_ref_mod(page, -nr);
}
static inline void page_ref_inc(struct page *page)
{
atomic_inc(&page->_count);
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod))
+ __page_ref_mod(page, 1);
}
static inline void page_ref_dec(struct page *page)
{
atomic_dec(&page->_count);
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod))
+ __page_ref_mod(page, -1);
}
static inline int page_ref_sub_and_test(struct page *page, int nr)
{
- return atomic_sub_and_test(nr, &page->_count);
+ int ret = atomic_sub_and_test(nr, &page->_count);
+
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_test))
+ __page_ref_mod_and_test(page, -nr, ret);
+ return ret;
}
static inline int page_ref_dec_and_test(struct page *page)
{
- return atomic_dec_and_test(&page->_count);
+ int ret = atomic_dec_and_test(&page->_count);
+
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_test))
+ __page_ref_mod_and_test(page, -1, ret);
+ return ret;
}
static inline int page_ref_dec_return(struct page *page)
{
- return atomic_dec_return(&page->_count);
+ int ret = atomic_dec_return(&page->_count);
+
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_return))
+ __page_ref_mod_and_return(page, -1, ret);
+ return ret;
}
static inline int page_ref_add_unless(struct page *page, int nr, int u)
{
- return atomic_add_unless(&page->_count, nr, u);
+ int ret = atomic_add_unless(&page->_count, nr, u);
+
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_unless))
+ __page_ref_mod_unless(page, nr, ret);
+ return ret;
}
static inline int page_ref_freeze(struct page *page, int count)
{
- return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
+ int ret = likely(atomic_cmpxchg(&page->_count, count, 0) == count);
+
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_freeze))
+ __page_ref_freeze(page, count, ret);
+ return ret;
}
static inline void page_ref_unfreeze(struct page *page, int count)
@@ -72,5 +150,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)
VM_BUG_ON(count == 0);
atomic_set(&page->_count, count);
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze))
+ __page_ref_unfreeze(page, count);
}
diff --git a/include/trace/events/page_ref.h b/include/trace/events/page_ref.h
new file mode 100644
index 0000000..9fb920a
--- /dev/null
+++ b/include/trace/events/page_ref.h
@@ -0,0 +1,133 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM page_ref
+
+#if !defined(_TRACE_PAGE_REF_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGE_REF_H
+
+#include <linux/types.h>
+#include <linux/tracepoint.h>
+#include <trace/events/mmflags.h>
+
+DECLARE_EVENT_CLASS(page_ref_mod_template,
+
+ TP_PROTO(struct page *page, int v),
+
+ TP_ARGS(page, v),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, pfn)
+ __field(unsigned long, flags)
+ __field(int, count)
+ __field(int, mapcount)
+ __field(void *, mapping)
+ __field(int, mt)
+ __field(int, val)
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = page_to_pfn(page);
+ __entry->flags = page->flags;
+ __entry->count = atomic_read(&page->_count);
+ __entry->mapcount = page_mapcount(page);
+ __entry->mapping = page->mapping;
+ __entry->mt = get_pageblock_migratetype(page);
+ __entry->val = v;
+ ),
+
+ TP_printk("pfn=0x%lx flags=%s count=%d mapcount=%d mapping=%p mt=%d val=%d",
+ __entry->pfn,
+ show_page_flags(__entry->flags & ((1UL << NR_PAGEFLAGS) - 1)),
+ __entry->count,
+ __entry->mapcount, __entry->mapping, __entry->mt,
+ __entry->val)
+);
+
+DEFINE_EVENT(page_ref_mod_template, page_ref_set,
+
+ TP_PROTO(struct page *page, int v),
+
+ TP_ARGS(page, v)
+);
+
+DEFINE_EVENT(page_ref_mod_template, page_ref_mod,
+
+ TP_PROTO(struct page *page, int v),
+
+ TP_ARGS(page, v)
+);
+
+DECLARE_EVENT_CLASS(page_ref_mod_and_test_template,
+
+ TP_PROTO(struct page *page, int v, int ret),
+
+ TP_ARGS(page, v, ret),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, pfn)
+ __field(unsigned long, flags)
+ __field(int, count)
+ __field(int, mapcount)
+ __field(void *, mapping)
+ __field(int, mt)
+ __field(int, val)
+ __field(int, ret)
+ ),
+
+ TP_fast_assign(
+ __entry->pfn = page_to_pfn(page);
+ __entry->flags = page->flags;
+ __entry->count = atomic_read(&page->_count);
+ __entry->mapcount = page_mapcount(page);
+ __entry->mapping = page->mapping;
+ __entry->mt = get_pageblock_migratetype(page);
+ __entry->val = v;
+ __entry->ret = ret;
+ ),
+
+ TP_printk("pfn=0x%lx flags=%s count=%d mapcount=%d mapping=%p mt=%d val=%d ret=%d",
+ __entry->pfn,
+ show_page_flags(__entry->flags & ((1UL << NR_PAGEFLAGS) - 1)),
+ __entry->count,
+ __entry->mapcount, __entry->mapping, __entry->mt,
+ __entry->val, __entry->ret)
+);
+
+DEFINE_EVENT(page_ref_mod_and_test_template, page_ref_mod_and_test,
+
+ TP_PROTO(struct page *page, int v, int ret),
+
+ TP_ARGS(page, v, ret)
+);
+
+DEFINE_EVENT(page_ref_mod_and_test_template, page_ref_mod_and_return,
+
+ TP_PROTO(struct page *page, int v, int ret),
+
+ TP_ARGS(page, v, ret)
+);
+
+DEFINE_EVENT(page_ref_mod_and_test_template, page_ref_mod_unless,
+
+ TP_PROTO(struct page *page, int v, int ret),
+
+ TP_ARGS(page, v, ret)
+);
+
+DEFINE_EVENT(page_ref_mod_and_test_template, page_ref_freeze,
+
+ TP_PROTO(struct page *page, int v, int ret),
+
+ TP_ARGS(page, v, ret)
+);
+
+DEFINE_EVENT(page_ref_mod_template, page_ref_unfreeze,
+
+ TP_PROTO(struct page *page, int v),
+
+ TP_ARGS(page, v)
+);
+
+#endif /* _TRACE_PAGE_COUNT_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index ee57dea..0596495 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -75,3 +75,16 @@ config PAGE_POISONING_ZERO
allocation.
If unsure, say N
+ bool
+
+config DEBUG_PAGE_REF
+ bool "Enable tracepoint to track down page reference manipulation"
+ depends on DEBUG_KERNEL
+ ---help---
+ This is the feature to add tracepoint for tracking down page reference
+ manipulation. This tracking is useful to diagnosis functional failure
+ due to migration failure caused by page reference mismatch. Be
+ careful to turn on this feature because it could bloat some kernel
+ text. In my configuration, it bloats 30 KB. Although kernel text will
+ be bloated, there would be no runtime performance overhead if
+ tracepoint isn't enabled thanks to jump label.
diff --git a/mm/Makefile b/mm/Makefile
index fb1a794..4f0f135 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -99,3 +99,4 @@ obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
+obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
diff --git a/mm/debug_page_ref.c b/mm/debug_page_ref.c
new file mode 100644
index 0000000..87e60e8
--- /dev/null
+++ b/mm/debug_page_ref.c
@@ -0,0 +1,53 @@
+#include <linux/tracepoint.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/page_ref.h>
+
+void __page_ref_set(struct page *page, int v)
+{
+ trace_page_ref_set(page, v);
+}
+EXPORT_SYMBOL(__page_ref_set);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_set);
+
+void __page_ref_mod(struct page *page, int v)
+{
+ trace_page_ref_mod(page, v);
+}
+EXPORT_SYMBOL(__page_ref_mod);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_mod);
+
+void __page_ref_mod_and_test(struct page *page, int v, int ret)
+{
+ trace_page_ref_mod_and_test(page, v, ret);
+}
+EXPORT_SYMBOL(__page_ref_mod_and_test);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_mod_and_test);
+
+void __page_ref_mod_and_return(struct page *page, int v, int ret)
+{
+ trace_page_ref_mod_and_return(page, v, ret);
+}
+EXPORT_SYMBOL(__page_ref_mod_and_return);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_mod_and_return);
+
+void __page_ref_mod_unless(struct page *page, int v, int u)
+{
+ trace_page_ref_mod_unless(page, v, u);
+}
+EXPORT_SYMBOL(__page_ref_mod_unless);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_mod_unless);
+
+void __page_ref_freeze(struct page *page, int v, int ret)
+{
+ trace_page_ref_freeze(page, v, ret);
+}
+EXPORT_SYMBOL(__page_ref_freeze);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_freeze);
+
+void __page_ref_unfreeze(struct page *page, int v)
+{
+ trace_page_ref_unfreeze(page, v);
+}
+EXPORT_SYMBOL(__page_ref_unfreeze);
+EXPORT_TRACEPOINT_SYMBOL(page_ref_unfreeze);
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 3:04 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation js1304
@ 2016-02-15 5:08 ` Sergey Senozhatsky
2016-02-15 5:28 ` Sergey Senozhatsky
2016-02-15 16:07 ` Steven Rostedt
2016-02-18 14:29 ` Steven Rostedt
2 siblings, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2016-02-15 5:08 UTC (permalink / raw)
To: js1304
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Sergey Senozhatsky,
Steven Rostedt, linux-mm, linux-kernel, linux-api, Joonsoo Kim
Hello Joonsoo,
On (02/15/16 12:04), js1304@gmail.com wrote:
[..]
> <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
> <...>-9018 [004] 92.678378: kernel_stack:
> => get_page_from_freelist (ffffffff81176659)
> => __alloc_pages_nodemask (ffffffff81176d22)
> => alloc_pages_vma (ffffffff811bf675)
> => handle_mm_fault (ffffffff8119e693)
> => __do_page_fault (ffffffff810631ea)
> => trace_do_page_fault (ffffffff81063543)
> => do_async_page_fault (ffffffff8105c40a)
> => async_page_fault (ffffffff817581d8)
> [snip]
> <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
> [snip]
[..]
> o Print human-readable page flag through show_page_flags()
not even a nitpick, just for note, the examples don't use show_page_flags().
[..]
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 534249c..fd6d9a5 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -1,6 +1,54 @@
> #include <linux/atomic.h>
> #include <linux/mm_types.h>
> #include <linux/page-flags.h>
will this compile with !CONFIG_TRACEPOINTS config?
+#ifdef CONFIG_TRACEPOINTS
#include <linux/tracepoint-defs.h>
extern struct tracepoint __tracepoint_page_ref_set;
extern struct tracepoint __tracepoint_page_ref_mod;
extern struct tracepoint __tracepoint_page_ref_mod_and_test;
extern struct tracepoint __tracepoint_page_ref_mod_and_return;
extern struct tracepoint __tracepoint_page_ref_mod_unless;
extern struct tracepoint __tracepoint_page_ref_freeze;
extern struct tracepoint __tracepoint_page_ref_unfreeze;
#ifdef CONFIG_DEBUG_PAGE_REF
#define page_ref_tracepoint_active(t) static_key_false(&(t).key)
extern void __page_ref_set(struct page *page, int v);
extern void __page_ref_mod(struct page *page, int v);
extern void __page_ref_mod_and_test(struct page *page, int v, int ret);
extern void __page_ref_mod_and_return(struct page *page, int v, int ret);
extern void __page_ref_mod_unless(struct page *page, int v, int u);
extern void __page_ref_freeze(struct page *page, int v, int ret);
extern void __page_ref_unfreeze(struct page *page, int v);
#else
#define page_ref_tracepoint_active(t) false
static inline void __page_ref_set(struct page *page, int v)
{
}
static inline void __page_ref_mod(struct page *page, int v)
{
}
static inline void __page_ref_mod_and_test(struct page *page, int v, int ret)
{
}
static inline void __page_ref_mod_and_return(struct page *page, int v, int ret)
{
}
static inline void __page_ref_mod_unless(struct page *page, int v, int u)
{
}
static inline void __page_ref_freeze(struct page *page, int v, int ret)
{
}
static inline void __page_ref_unfreeze(struct page *page, int v)
{
}
#endif /* CONFIG_DEBUG_PAGE_REF */
+#endif /* CONFIG_TRACEPOINTS */
-ss
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 5:08 ` Sergey Senozhatsky
@ 2016-02-15 5:28 ` Sergey Senozhatsky
2016-02-15 14:18 ` Joonsoo Kim
0 siblings, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2016-02-15 5:28 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: js1304, Andrew Morton, Michal Nazarewicz, Minchan Kim,
Mel Gorman, Vlastimil Babka, Kirill A. Shutemov, Steven Rostedt,
linux-mm, linux-kernel, linux-api, Joonsoo Kim
On (02/15/16 14:08), Sergey Senozhatsky wrote:
>
> will this compile with !CONFIG_TRACEPOINTS config?
>
uh.. sorry, was composed in email client. seems the correct way to do it is
+#if defined CONFIG_DEBUG_PAGE_REF && defined CONFIG_TRACEPOINTS
#include <linux/tracepoint-defs.h>
#define page_ref_tracepoint_active(t) static_key_false(&(t).key)
extern struct tracepoint __tracepoint_page_ref_set;
...
extern void __page_ref_set(struct page *page, int v);
...
#else
#define page_ref_tracepoint_active(t) false
static inline void __page_ref_set(struct page *page, int v)
{
}
...
#endif
or add a dependency of PAGE_REF on CONFIG_TRACEPOINTS in Kconfig.
-ss
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 5:28 ` Sergey Senozhatsky
@ 2016-02-15 14:18 ` Joonsoo Kim
0 siblings, 0 replies; 22+ messages in thread
From: Joonsoo Kim @ 2016-02-15 14:18 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Steven Rostedt,
Linux Memory Management List, LKML, linux-api, Joonsoo Kim
2016-02-15 14:28 GMT+09:00 Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com>:
> On (02/15/16 14:08), Sergey Senozhatsky wrote:
>>
>> will this compile with !CONFIG_TRACEPOINTS config?
>>
Yes, even if !CONFIG_TRACEPOINTS, it is compiled well.
> uh.. sorry, was composed in email client. seems the correct way to do it is
>
> +#if defined CONFIG_DEBUG_PAGE_REF && defined CONFIG_TRACEPOINTS
>
> #include <linux/tracepoint-defs.h>
>
> #define page_ref_tracepoint_active(t) static_key_false(&(t).key)
>
> extern struct tracepoint __tracepoint_page_ref_set;
> ...
>
> extern void __page_ref_set(struct page *page, int v);
> ...
>
> #else
>
> #define page_ref_tracepoint_active(t) false
>
> static inline void __page_ref_set(struct page *page, int v)
> {
> }
> ...
>
> #endif
>
>
>
> or add a dependency of PAGE_REF on CONFIG_TRACEPOINTS in Kconfig.
Thanks for catching it.
I will add "depends on CONFIG_TRACEPOINTS" to Kconfig because
this feature has no meaning if !CONFIG_TRACEPOINTS.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 3:04 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation js1304
2016-02-15 5:08 ` Sergey Senozhatsky
@ 2016-02-15 16:07 ` Steven Rostedt
2016-02-16 0:47 ` Joonsoo Kim
2016-02-18 14:29 ` Steven Rostedt
2 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2016-02-15 16:07 UTC (permalink / raw)
To: js1304
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Sergey Senozhatsky,
linux-mm, linux-kernel, linux-api, Joonsoo Kim
On Mon, 15 Feb 2016 12:04:50 +0900
js1304@gmail.com wrote:
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> CMA allocation should be guaranteed to succeed by definition, but,
> unfortunately, it would be failed sometimes. It is hard to track down
> the problem, because it is related to page reference manipulation and
> we don't have any facility to analyze it.
>
> This patch adds tracepoints to track down page reference manipulation.
> With it, we can find exact reason of failure and can fix the problem.
> Following is an example of tracepoint output.
>
> <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
> <...>-9018 [004] 92.678378: kernel_stack:
> => get_page_from_freelist (ffffffff81176659)
> => __alloc_pages_nodemask (ffffffff81176d22)
> => alloc_pages_vma (ffffffff811bf675)
> => handle_mm_fault (ffffffff8119e693)
> => __do_page_fault (ffffffff810631ea)
> => trace_do_page_fault (ffffffff81063543)
> => do_async_page_fault (ffffffff8105c40a)
> => async_page_fault (ffffffff817581d8)
> [snip]
> <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
> [snip]
> ...
> ...
> <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
> [snip]
> <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
> => release_pages (ffffffff8117c9e4)
> => free_pages_and_swap_cache (ffffffff811b0697)
> => tlb_flush_mmu_free (ffffffff81199616)
> => tlb_finish_mmu (ffffffff8119a62c)
> => exit_mmap (ffffffff811a53f7)
> => mmput (ffffffff81073f47)
> => do_exit (ffffffff810794e9)
> => do_group_exit (ffffffff81079def)
> => SyS_exit_group (ffffffff81079e74)
> => entry_SYSCALL_64_fastpath (ffffffff817560b6)
>
> This output shows that problem comes from exit path. In exit path,
> to improve performance, pages are not freed immediately. They are gathered
> and processed by batch. During this process, migration cannot be possible
> and CMA allocation is failed. This problem is hard to find without this
> page reference tracepoint facility.
>
> Enabling this feature bloat kernel text 30 KB in my configuration.
>
> text data bss dec hex filename
> 12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
> 12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
>
> v2:
> o Use static key of each tracepoints to avoid function call overhead
> when tracepoints are disabled.
> o Print human-readable page flag through show_page_flags()
> o Add more description to Kconfig.debug.
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> include/linux/page_ref.h | 90 +++++++++++++++++++++++++--
> include/trace/events/page_ref.h | 133 ++++++++++++++++++++++++++++++++++++++++
> mm/Kconfig.debug | 13 ++++
> mm/Makefile | 1 +
> mm/debug_page_ref.c | 53 ++++++++++++++++
> 5 files changed, 285 insertions(+), 5 deletions(-)
> create mode 100644 include/trace/events/page_ref.h
> create mode 100644 mm/debug_page_ref.c
>
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 534249c..fd6d9a5 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -1,6 +1,54 @@
> #include <linux/atomic.h>
> #include <linux/mm_types.h>
> #include <linux/page-flags.h>
> +#include <linux/tracepoint-defs.h>
> +
> +extern struct tracepoint __tracepoint_page_ref_set;
> +extern struct tracepoint __tracepoint_page_ref_mod;
> +extern struct tracepoint __tracepoint_page_ref_mod_and_test;
> +extern struct tracepoint __tracepoint_page_ref_mod_and_return;
> +extern struct tracepoint __tracepoint_page_ref_mod_unless;
> +extern struct tracepoint __tracepoint_page_ref_freeze;
> +extern struct tracepoint __tracepoint_page_ref_unfreeze;
> +
> +#ifdef CONFIG_DEBUG_PAGE_REF
> +#define page_ref_tracepoint_active(t) static_key_false(&(t).key)
Please don't open code this. Use the following instead:
trace_page_ref_set_enabled()
trace_page_ref_mod_enabled()
trace_page_ref_mod_and_test_enabled()
trace_page_ref_mod_and_return_enabled()
trace_page_ref_mod_unless_enabled()
trace_page_ref_freeze_enabled()
trace_page_ref_unfreeze_enabled()
They return true when CONFIG_TRACEPOINTS is configured in and the
tracepoint is enabled, and false otherwise.
-- Steve
> +
> +extern void __page_ref_set(struct page *page, int v);
> +extern void __page_ref_mod(struct page *page, int v);
> +extern void __page_ref_mod_and_test(struct page *page, int v, int ret);
> +extern void __page_ref_mod_and_return(struct page *page, int v, int ret);
> +extern void __page_ref_mod_unless(struct page *page, int v, int u);
> +extern void __page_ref_freeze(struct page *page, int v, int ret);
> +extern void __page_ref_unfreeze(struct page *page, int v);
> +
> +#else
> +
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 16:07 ` Steven Rostedt
@ 2016-02-16 0:47 ` Joonsoo Kim
2016-02-16 1:16 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Joonsoo Kim @ 2016-02-16 0:47 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Sergey Senozhatsky,
linux-mm, linux-kernel, linux-api
On Mon, Feb 15, 2016 at 11:07:41AM -0500, Steven Rostedt wrote:
> On Mon, 15 Feb 2016 12:04:50 +0900
> js1304@gmail.com wrote:
>
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> > CMA allocation should be guaranteed to succeed by definition, but,
> > unfortunately, it would be failed sometimes. It is hard to track down
> > the problem, because it is related to page reference manipulation and
> > we don't have any facility to analyze it.
> >
> > This patch adds tracepoints to track down page reference manipulation.
> > With it, we can find exact reason of failure and can fix the problem.
> > Following is an example of tracepoint output.
> >
> > <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
> > <...>-9018 [004] 92.678378: kernel_stack:
> > => get_page_from_freelist (ffffffff81176659)
> > => __alloc_pages_nodemask (ffffffff81176d22)
> > => alloc_pages_vma (ffffffff811bf675)
> > => handle_mm_fault (ffffffff8119e693)
> > => __do_page_fault (ffffffff810631ea)
> > => trace_do_page_fault (ffffffff81063543)
> > => do_async_page_fault (ffffffff8105c40a)
> > => async_page_fault (ffffffff817581d8)
> > [snip]
> > <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
> > [snip]
> > ...
> > ...
> > <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
> > [snip]
> > <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
> > => release_pages (ffffffff8117c9e4)
> > => free_pages_and_swap_cache (ffffffff811b0697)
> > => tlb_flush_mmu_free (ffffffff81199616)
> > => tlb_finish_mmu (ffffffff8119a62c)
> > => exit_mmap (ffffffff811a53f7)
> > => mmput (ffffffff81073f47)
> > => do_exit (ffffffff810794e9)
> > => do_group_exit (ffffffff81079def)
> > => SyS_exit_group (ffffffff81079e74)
> > => entry_SYSCALL_64_fastpath (ffffffff817560b6)
> >
> > This output shows that problem comes from exit path. In exit path,
> > to improve performance, pages are not freed immediately. They are gathered
> > and processed by batch. During this process, migration cannot be possible
> > and CMA allocation is failed. This problem is hard to find without this
> > page reference tracepoint facility.
> >
> > Enabling this feature bloat kernel text 30 KB in my configuration.
> >
> > text data bss dec hex filename
> > 12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
> > 12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
> >
> > v2:
> > o Use static key of each tracepoints to avoid function call overhead
> > when tracepoints are disabled.
> > o Print human-readable page flag through show_page_flags()
> > o Add more description to Kconfig.debug.
> >
> > Acked-by: Michal Nazarewicz <mina86@mina86.com>
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > include/linux/page_ref.h | 90 +++++++++++++++++++++++++--
> > include/trace/events/page_ref.h | 133 ++++++++++++++++++++++++++++++++++++++++
> > mm/Kconfig.debug | 13 ++++
> > mm/Makefile | 1 +
> > mm/debug_page_ref.c | 53 ++++++++++++++++
> > 5 files changed, 285 insertions(+), 5 deletions(-)
> > create mode 100644 include/trace/events/page_ref.h
> > create mode 100644 mm/debug_page_ref.c
> >
> > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> > index 534249c..fd6d9a5 100644
> > --- a/include/linux/page_ref.h
> > +++ b/include/linux/page_ref.h
> > @@ -1,6 +1,54 @@
> > #include <linux/atomic.h>
> > #include <linux/mm_types.h>
> > #include <linux/page-flags.h>
> > +#include <linux/tracepoint-defs.h>
> > +
> > +extern struct tracepoint __tracepoint_page_ref_set;
> > +extern struct tracepoint __tracepoint_page_ref_mod;
> > +extern struct tracepoint __tracepoint_page_ref_mod_and_test;
> > +extern struct tracepoint __tracepoint_page_ref_mod_and_return;
> > +extern struct tracepoint __tracepoint_page_ref_mod_unless;
> > +extern struct tracepoint __tracepoint_page_ref_freeze;
> > +extern struct tracepoint __tracepoint_page_ref_unfreeze;
> > +
> > +#ifdef CONFIG_DEBUG_PAGE_REF
> > +#define page_ref_tracepoint_active(t) static_key_false(&(t).key)
>
> Please don't open code this. Use the following instead:
>
> trace_page_ref_set_enabled()
> trace_page_ref_mod_enabled()
> trace_page_ref_mod_and_test_enabled()
> trace_page_ref_mod_and_return_enabled()
> trace_page_ref_mod_unless_enabled()
> trace_page_ref_freeze_enabled()
> trace_page_ref_unfreeze_enabled()
>
> They return true when CONFIG_TRACEPOINTS is configured in and the
> tracepoint is enabled, and false otherwise.
This implementation is what you proposed before. Please refer below
link and source.
https://lkml.org/lkml/2015/12/9/699
arch/x86/include/asm/msr.h
There is header file dependency problem between mm.h and tracepoint.h.
page_ref.h should be included in mm.h and tracepoint.h cannot
be included in this case.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-16 0:47 ` Joonsoo Kim
@ 2016-02-16 1:16 ` Steven Rostedt
2016-02-18 7:46 ` Joonsoo Kim
0 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2016-02-16 1:16 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Sergey Senozhatsky,
linux-mm, linux-kernel, linux-api
On Tue, 16 Feb 2016 09:47:20 +0900
Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > They return true when CONFIG_TRACEPOINTS is configured in and the
> > tracepoint is enabled, and false otherwise.
>
> This implementation is what you proposed before. Please refer below
> link and source.
>
> https://lkml.org/lkml/2015/12/9/699
> arch/x86/include/asm/msr.h
That was a year ago, how am I suppose to remember ;-)
>
> There is header file dependency problem between mm.h and tracepoint.h.
> page_ref.h should be included in mm.h and tracepoint.h cannot
> be included in this case.
Ah, OK, I forgot about that. I'll take another look at it again.
A lot happened since then, that's all a fog to me.
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-16 1:16 ` Steven Rostedt
@ 2016-02-18 7:46 ` Joonsoo Kim
2016-02-18 14:20 ` Steven Rostedt
0 siblings, 1 reply; 22+ messages in thread
From: Joonsoo Kim @ 2016-02-18 7:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: Joonsoo Kim, Andrew Morton, Michal Nazarewicz, Minchan Kim,
Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Sergey Senozhatsky, Linux Memory Management List, LKML,
linux-api
2016-02-16 10:16 GMT+09:00 Steven Rostedt <rostedt@goodmis.org>:
> On Tue, 16 Feb 2016 09:47:20 +0900
> Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>
>> > They return true when CONFIG_TRACEPOINTS is configured in and the
>> > tracepoint is enabled, and false otherwise.
>>
>> This implementation is what you proposed before. Please refer below
>> link and source.
>>
>> https://lkml.org/lkml/2015/12/9/699
>> arch/x86/include/asm/msr.h
>
> That was a year ago, how am I suppose to remember ;-)
I think you are smart enough to remember. :)
I will add it on commit description on next spin.
>>
>> There is header file dependency problem between mm.h and tracepoint.h.
>> page_ref.h should be included in mm.h and tracepoint.h cannot
>> be included in this case.
>
> Ah, OK, I forgot about that. I'll take another look at it again.
>
> A lot happened since then, that's all a fog to me.
Okay. Please let me know result of another look.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-18 7:46 ` Joonsoo Kim
@ 2016-02-18 14:20 ` Steven Rostedt
0 siblings, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2016-02-18 14:20 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Joonsoo Kim, Andrew Morton, Michal Nazarewicz, Minchan Kim,
Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Sergey Senozhatsky, Linux Memory Management List, LKML,
linux-api
On Thu, 18 Feb 2016 16:46:08 +0900
Joonsoo Kim <js1304@gmail.com> wrote:
> 2016-02-16 10:16 GMT+09:00 Steven Rostedt <rostedt@goodmis.org>:
> > On Tue, 16 Feb 2016 09:47:20 +0900
> > Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> >
> >> > They return true when CONFIG_TRACEPOINTS is configured in and the
> >> > tracepoint is enabled, and false otherwise.
> >>
> >> This implementation is what you proposed before. Please refer below
> >> link and source.
> >>
> >> https://lkml.org/lkml/2015/12/9/699
> >> arch/x86/include/asm/msr.h
> >
> > That was a year ago, how am I suppose to remember ;-)
>
> I think you are smart enough to remember. :)
> I will add it on commit description on next spin.
>
>
Better yet, add it to the code. I'll reply to the patch.
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-15 3:04 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation js1304
2016-02-15 5:08 ` Sergey Senozhatsky
2016-02-15 16:07 ` Steven Rostedt
@ 2016-02-18 14:29 ` Steven Rostedt
2016-02-19 0:34 ` Sergey Senozhatsky
2016-02-19 1:20 ` Joonsoo Kim
2 siblings, 2 replies; 22+ messages in thread
From: Steven Rostedt @ 2016-02-18 14:29 UTC (permalink / raw)
To: js1304
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Sergey Senozhatsky,
linux-mm, linux-kernel, linux-api, Joonsoo Kim
On Mon, 15 Feb 2016 12:04:50 +0900
js1304@gmail.com wrote:
> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> index 534249c..fd6d9a5 100644
> --- a/include/linux/page_ref.h
> +++ b/include/linux/page_ref.h
> @@ -1,6 +1,54 @@
> #include <linux/atomic.h>
> #include <linux/mm_types.h>
> #include <linux/page-flags.h>
> +#include <linux/tracepoint-defs.h>
> +
> +extern struct tracepoint __tracepoint_page_ref_set;
> +extern struct tracepoint __tracepoint_page_ref_mod;
> +extern struct tracepoint __tracepoint_page_ref_mod_and_test;
> +extern struct tracepoint __tracepoint_page_ref_mod_and_return;
> +extern struct tracepoint __tracepoint_page_ref_mod_unless;
> +extern struct tracepoint __tracepoint_page_ref_freeze;
> +extern struct tracepoint __tracepoint_page_ref_unfreeze;
> +
> +#ifdef CONFIG_DEBUG_PAGE_REF
Please add a comment here. Something to the effect of:
/*
* Ideally we would want to use the trace_<tracepoint>_enabled() helper
* functions. But due to include header file issues, that is not
* feasible. Instead we have to open code the static key functions.
*
* See trace_##name##_enabled(void) in include/linux/tracepoint.h
*/
I may have to work on something that lets these helpers be defined in
headers. I have some ideas on how to do that. But for now, this
solution is fine.
-- Steve
> +#define page_ref_tracepoint_active(t) static_key_false(&(t).key)
> +
> +extern void __page_ref_set(struct page *page, int v);
> +extern void __page_ref_mod(struct page *page, int v);
> +extern void __page_ref_mod_and_test(struct page *page, int v, int ret);
> +extern void __page_ref_mod_and_return(struct page *page, int v, int ret);
> +extern void __page_ref_mod_unless(struct page *page, int v, int u);
> +extern void __page_ref_freeze(struct page *page, int v, int ret);
> +extern void __page_ref_unfreeze(struct page *page, int v);
> +
> +#else
> +
> +#define page_ref_tracepoint_active(t) false
> +
> +static inline void __page_ref_set(struct page *page, int v)
> +{
> +}
> +static inline void __page_ref_mod(struct page *page, int v)
> +{
> +}
> +static inline void __page_ref_mod_and_test(struct page *page, int v, int ret)
> +{
> +}
> +static inline void __page_ref_mod_and_return(struct page *page, int v, int ret)
> +{
> +}
> +static inline void __page_ref_mod_unless(struct page *page, int v, int u)
> +{
> +}
> +static inline void __page_ref_freeze(struct page *page, int v, int ret)
> +{
> +}
> +static inline void __page_ref_unfreeze(struct page *page, int v)
> +{
> +}
> +
> +#endif
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-18 14:29 ` Steven Rostedt
@ 2016-02-19 0:34 ` Sergey Senozhatsky
2016-02-19 1:39 ` Joonsoo Kim
2016-02-19 1:20 ` Joonsoo Kim
1 sibling, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2016-02-19 0:34 UTC (permalink / raw)
To: Steven Rostedt
Cc: js1304, Andrew Morton, Michal Nazarewicz, Minchan Kim,
Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Sergey Senozhatsky, linux-mm, linux-kernel, linux-api,
Joonsoo Kim
On (02/18/16 09:29), Steven Rostedt wrote:
> > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
> > index 534249c..fd6d9a5 100644
> > --- a/include/linux/page_ref.h
> > +++ b/include/linux/page_ref.h
> > @@ -1,6 +1,54 @@
> > #include <linux/atomic.h>
> > #include <linux/mm_types.h>
> > #include <linux/page-flags.h>
> > +#include <linux/tracepoint-defs.h>
> > +
> > +extern struct tracepoint __tracepoint_page_ref_set;
> > +extern struct tracepoint __tracepoint_page_ref_mod;
> > +extern struct tracepoint __tracepoint_page_ref_mod_and_test;
> > +extern struct tracepoint __tracepoint_page_ref_mod_and_return;
> > +extern struct tracepoint __tracepoint_page_ref_mod_unless;
> > +extern struct tracepoint __tracepoint_page_ref_freeze;
> > +extern struct tracepoint __tracepoint_page_ref_unfreeze;
> > +
> > +#ifdef CONFIG_DEBUG_PAGE_REF
>
> Please add a comment here. Something to the effect of:
>
> /*
> * Ideally we would want to use the trace_<tracepoint>_enabled() helper
> * functions. But due to include header file issues, that is not
> * feasible. Instead we have to open code the static key functions.
> *
> * See trace_##name##_enabled(void) in include/linux/tracepoint.h
> */
>
not sure if it's worth mentioning in the comment, but the other
concern here is the performance impact of an extra function call,
I believe. otherwise, Joonsoo would just do:
in include/linux/page_ref.h
static inline void set_page_count(struct page *page, int v)
{
atomic_set(&page->_count, v);
__page_ref_set(page, v);
}
...
and in mm/debug_page_ref.c
void __page_ref_set(struct page *page, int v)
{
if (trace_page_ref_set_enabled())
trace_page_ref_set(page, v);
}
EXPORT_SYMBOL(__page_ref_set);
EXPORT_TRACEPOINT_SYMBOL(page_ref_set);
...
-ss
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-19 0:34 ` Sergey Senozhatsky
@ 2016-02-19 1:39 ` Joonsoo Kim
2016-02-19 1:46 ` Steven Rostedt
2016-02-19 2:15 ` Sergey Senozhatsky
0 siblings, 2 replies; 22+ messages in thread
From: Joonsoo Kim @ 2016-02-19 1:39 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Steven Rostedt, Andrew Morton, Michal Nazarewicz, Minchan Kim,
Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Linux Memory Management List, LKML, linux-api, Joonsoo Kim
2016-02-19 9:34 GMT+09:00 Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com>:
> On (02/18/16 09:29), Steven Rostedt wrote:
>> > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
>> > index 534249c..fd6d9a5 100644
>> > --- a/include/linux/page_ref.h
>> > +++ b/include/linux/page_ref.h
>> > @@ -1,6 +1,54 @@
>> > #include <linux/atomic.h>
>> > #include <linux/mm_types.h>
>> > #include <linux/page-flags.h>
>> > +#include <linux/tracepoint-defs.h>
>> > +
>> > +extern struct tracepoint __tracepoint_page_ref_set;
>> > +extern struct tracepoint __tracepoint_page_ref_mod;
>> > +extern struct tracepoint __tracepoint_page_ref_mod_and_test;
>> > +extern struct tracepoint __tracepoint_page_ref_mod_and_return;
>> > +extern struct tracepoint __tracepoint_page_ref_mod_unless;
>> > +extern struct tracepoint __tracepoint_page_ref_freeze;
>> > +extern struct tracepoint __tracepoint_page_ref_unfreeze;
>> > +
>> > +#ifdef CONFIG_DEBUG_PAGE_REF
>>
>> Please add a comment here. Something to the effect of:
>>
>> /*
>> * Ideally we would want to use the trace_<tracepoint>_enabled() helper
>> * functions. But due to include header file issues, that is not
>> * feasible. Instead we have to open code the static key functions.
>> *
>> * See trace_##name##_enabled(void) in include/linux/tracepoint.h
>> */
>>
>
> not sure if it's worth mentioning in the comment, but the other
> concern here is the performance impact of an extra function call,
> I believe. otherwise, Joonsoo would just do:
It's very natural thing so I'm not sure it is worth mentioning.
> in include/linux/page_ref.h
>
> static inline void set_page_count(struct page *page, int v)
> {
> atomic_set(&page->_count, v);
> __page_ref_set(page, v);
> }
> ...
>
>
>
> and in mm/debug_page_ref.c
>
> void __page_ref_set(struct page *page, int v)
> {
> if (trace_page_ref_set_enabled())
> trace_page_ref_set(page, v);
> }
> EXPORT_SYMBOL(__page_ref_set);
> EXPORT_TRACEPOINT_SYMBOL(page_ref_set);
It is what I did in v1.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-19 1:39 ` Joonsoo Kim
@ 2016-02-19 1:46 ` Steven Rostedt
2016-02-19 2:15 ` Sergey Senozhatsky
1 sibling, 0 replies; 22+ messages in thread
From: Steven Rostedt @ 2016-02-19 1:46 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Sergey Senozhatsky, Andrew Morton, Michal Nazarewicz,
Minchan Kim, Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Linux Memory Management List, LKML, linux-api, Joonsoo Kim
On Fri, 19 Feb 2016 10:39:10 +0900
Joonsoo Kim <js1304@gmail.com> wrote:
> > not sure if it's worth mentioning in the comment, but the other
> > concern here is the performance impact of an extra function call,
> > I believe. otherwise, Joonsoo would just do:
>
> It's very natural thing so I'm not sure it is worth mentioning.
>
Agreed.
-- Steve
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-19 1:39 ` Joonsoo Kim
2016-02-19 1:46 ` Steven Rostedt
@ 2016-02-19 2:15 ` Sergey Senozhatsky
1 sibling, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2016-02-19 2:15 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Sergey Senozhatsky, Steven Rostedt, Andrew Morton,
Michal Nazarewicz, Minchan Kim, Mel Gorman, Vlastimil Babka,
Kirill A. Shutemov, Linux Memory Management List, LKML,
linux-api, Joonsoo Kim
On (02/19/16 10:39), Joonsoo Kim wrote:
[..]
> > not sure if it's worth mentioning in the comment, but the other
> > concern here is the performance impact of an extra function call,
> > I believe. otherwise, Joonsoo would just do:
>
> It's very natural thing so I'm not sure it is worth mentioning.
agree.
> > and in mm/debug_page_ref.c
> >
> > void __page_ref_set(struct page *page, int v)
> > {
> > if (trace_page_ref_set_enabled())
> > trace_page_ref_set(page, v);
> > }
> > EXPORT_SYMBOL(__page_ref_set);
> > EXPORT_TRACEPOINT_SYMBOL(page_ref_set);
>
> It is what I did in v1.
ah... indeed. well, "That was a year ago, how am I suppose to remember"
-ss
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation
2016-02-18 14:29 ` Steven Rostedt
2016-02-19 0:34 ` Sergey Senozhatsky
@ 2016-02-19 1:20 ` Joonsoo Kim
1 sibling, 0 replies; 22+ messages in thread
From: Joonsoo Kim @ 2016-02-19 1:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, Sergey Senozhatsky,
Linux Memory Management List, LKML, linux-api, Joonsoo Kim
2016-02-18 23:29 GMT+09:00 Steven Rostedt <rostedt@goodmis.org>:
> On Mon, 15 Feb 2016 12:04:50 +0900
> js1304@gmail.com wrote:
>
>
>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
>> index 534249c..fd6d9a5 100644
>> --- a/include/linux/page_ref.h
>> +++ b/include/linux/page_ref.h
>> @@ -1,6 +1,54 @@
>> #include <linux/atomic.h>
>> #include <linux/mm_types.h>
>> #include <linux/page-flags.h>
>> +#include <linux/tracepoint-defs.h>
>> +
>> +extern struct tracepoint __tracepoint_page_ref_set;
>> +extern struct tracepoint __tracepoint_page_ref_mod;
>> +extern struct tracepoint __tracepoint_page_ref_mod_and_test;
>> +extern struct tracepoint __tracepoint_page_ref_mod_and_return;
>> +extern struct tracepoint __tracepoint_page_ref_mod_unless;
>> +extern struct tracepoint __tracepoint_page_ref_freeze;
>> +extern struct tracepoint __tracepoint_page_ref_unfreeze;
>> +
>> +#ifdef CONFIG_DEBUG_PAGE_REF
>
> Please add a comment here. Something to the effect of:
Okay!
> /*
> * Ideally we would want to use the trace_<tracepoint>_enabled() helper
> * functions. But due to include header file issues, that is not
> * feasible. Instead we have to open code the static key functions.
> *
> * See trace_##name##_enabled(void) in include/linux/tracepoint.h
> */
>
> I may have to work on something that lets these helpers be defined in
> headers. I have some ideas on how to do that. But for now, this
> solution is fine.
Okay.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/2] mm: introduce page reference manipulation functions
@ 2015-11-09 7:23 Joonsoo Kim
2015-11-09 7:53 ` Sergey Senozhatsky
2015-11-10 15:58 ` Michal Nazarewicz
0 siblings, 2 replies; 22+ messages in thread
From: Joonsoo Kim @ 2015-11-09 7:23 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Nazarewicz, Minchan Kim, Mel Gorman, Vlastimil Babka,
Kirill A. Shutemov, linux-mm, linux-kernel, linux-api,
Joonsoo Kim
Success of CMA allocation largely depends on success of migration
and key factor of it is page reference count. Until now, page reference
is manipulated by direct calling atomic functions so we cannot follow up
who and where manipulate it. Then, it is hard to find actual reason
of CMA allocation failure. CMA allocation should be guaranteed to succeed
so finding offending place is really important.
In this patch, call sites where page reference is manipulated are converted
to introduced wrapper function. This is preparation step to add tracepoint
to each page reference manipulation function. With this facility, we can
easily find reason of CMA allocation failure. There is no functional change
in this patch.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
arch/mips/mm/gup.c | 2 +-
arch/powerpc/mm/mmu_context_hash64.c | 3 +-
arch/powerpc/mm/pgtable_64.c | 2 +-
arch/x86/mm/gup.c | 2 +-
drivers/block/aoe/aoecmd.c | 4 +-
drivers/net/ethernet/freescale/gianfar.c | 2 +-
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 7 +--
drivers/net/ethernet/sun/niu.c | 2 +-
include/linux/mm.h | 21 ++-----
include/linux/page_ref.h | 76 +++++++++++++++++++++++
include/linux/pagemap.h | 19 +-----
mm/huge_memory.c | 6 +-
mm/internal.h | 5 --
mm/memory_hotplug.c | 4 +-
mm/migrate.c | 10 +--
mm/page_alloc.c | 6 +-
mm/vmscan.c | 6 +-
21 files changed, 114 insertions(+), 71 deletions(-)
create mode 100644 include/linux/page_ref.h
diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 1afd87c..6cdffc7 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -64,7 +64,7 @@ static inline void get_head_page_multiple(struct page *page, int nr)
{
VM_BUG_ON(page != compound_head(page));
VM_BUG_ON(page_count(page) == 0);
- atomic_add(nr, &page->_count);
+ page_ref_add(page, nr);
SetPageReferenced(page);
}
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index 4e4efbc..9ca6fe1 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -118,8 +118,7 @@ static void destroy_pagetable_page(struct mm_struct *mm)
/* drop all the pending references */
count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
/* We allow PTE_FRAG_NR fragments from a PTE page */
- count = atomic_sub_return(PTE_FRAG_NR - count, &page->_count);
- if (!count) {
+ if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
pgtable_page_dtor(page);
free_hot_cold_page(page, 0);
}
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 422c59a..b1cab84 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -403,7 +403,7 @@ static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
* count.
*/
if (likely(!mm->context.pte_frag)) {
- atomic_set(&page->_count, PTE_FRAG_NR);
+ set_page_count(page, PTE_FRAG_NR);
mm->context.pte_frag = ret + PTE_FRAG_SIZE;
}
spin_unlock(&mm->page_table_lock);
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index f8cb3e8..ba002ae 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -110,7 +110,7 @@ static inline void get_head_page_multiple(struct page *page, int nr)
{
VM_BUG_ON_PAGE(page != compound_head(page), page);
VM_BUG_ON_PAGE(page_count(page) == 0, page);
- atomic_add(nr, &page->_count);
+ page_ref_add(page, nr);
SetPageReferenced(page);
}
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index ad80c85..2c8dd1b 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -875,7 +875,7 @@ bio_pageinc(struct bio *bio)
* compound pages is no longer allowed by the kernel.
*/
page = compound_head(bv.bv_page);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
}
}
@@ -888,7 +888,7 @@ bio_pagedec(struct bio *bio)
bio_for_each_segment(bv, bio, iter) {
page = compound_head(bv.bv_page);
- atomic_dec(&page->_count);
+ page_ref_dec(page);
}
}
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 3e6b9b4..4511015 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2943,7 +2943,7 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, u32 lstatus,
/* change offset to the other half */
rxb->page_offset ^= GFAR_RXB_TRUESIZE;
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e76a44c..3766a0d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -245,7 +245,7 @@ static bool fm10k_can_reuse_rx_page(struct fm10k_rx_buffer *rx_buffer,
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ea7b098..9a7d492 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6622,7 +6622,7 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 47395ff..489b1c5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1941,7 +1941,7 @@ static bool ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 592ff23..424a159 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -835,7 +835,7 @@ add_tail_frag:
/* Even if we own the page, we are not allowed to use atomic_set()
* This would break get_page_unless_zero() users.
*/
- atomic_inc(&page->_count);
+ page_ref_inc(page);
return true;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index e7a5000..98f4536 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -82,8 +82,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
/* Not doing get_page() for each frag is a big win
* on asymetric workloads. Note we can not use atomic_set().
*/
- atomic_add(page_alloc->page_size / frag_info->frag_stride - 1,
- &page->_count);
+ page_ref_add(page, page_alloc->page_size / frag_info->frag_stride - 1);
return 0;
}
@@ -127,7 +126,7 @@ out:
dma_unmap_page(priv->ddev, page_alloc[i].dma,
page_alloc[i].page_size, PCI_DMA_FROMDEVICE);
page = page_alloc[i].page;
- atomic_set(&page->_count, 1);
+ set_page_count(page, 1);
put_page(page);
}
}
@@ -177,7 +176,7 @@ out:
dma_unmap_page(priv->ddev, page_alloc->dma,
page_alloc->page_size, PCI_DMA_FROMDEVICE);
page = page_alloc->page;
- atomic_set(&page->_count, 1);
+ set_page_count(page, 1);
put_page(page);
page_alloc->page = NULL;
}
diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
index ab6051a..9cc4564 100644
--- a/drivers/net/ethernet/sun/niu.c
+++ b/drivers/net/ethernet/sun/niu.c
@@ -3341,7 +3341,7 @@ static int niu_rbr_add_page(struct niu *np, struct rx_ring_info *rp,
niu_hash_page(rp, page, addr);
if (rp->rbr_blocks_per_page > 1)
- atomic_add(rp->rbr_blocks_per_page - 1, &page->_count);
+ page_ref_add(page, rp->rbr_blocks_per_page - 1);
for (i = 0; i < rp->rbr_blocks_per_page; i++) {
__le32 *rbr = &rp->rbr[start_index + i];
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 30ef3b5..d562638 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -21,6 +21,7 @@
#include <linux/resource.h>
#include <linux/page_ext.h>
#include <linux/err.h>
+#include <linux/page_ref.h>
struct mempolicy;
struct anon_vma;
@@ -340,7 +341,7 @@ struct inode;
static inline int put_page_testzero(struct page *page)
{
VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0, page);
- return atomic_dec_and_test(&page->_count);
+ return page_ref_dec_and_test(page);
}
/*
@@ -351,7 +352,7 @@ static inline int put_page_testzero(struct page *page)
*/
static inline int get_page_unless_zero(struct page *page)
{
- return atomic_inc_not_zero(&page->_count);
+ return page_ref_add_unless(page, 1, 0);
}
extern int page_is_ram(unsigned long pfn);
@@ -433,11 +434,6 @@ static inline int page_mapcount(struct page *page)
return ret;
}
-static inline int page_count(struct page *page)
-{
- return atomic_read(&compound_head(page)->_count);
-}
-
static inline void get_page(struct page *page)
{
page = compound_head(page);
@@ -446,7 +442,7 @@ static inline void get_page(struct page *page)
* requires to already have an elevated page->_count.
*/
VM_BUG_ON_PAGE(atomic_read(&page->_count) <= 0, page);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
}
static inline struct page *virt_to_head_page(const void *x)
@@ -456,15 +452,6 @@ static inline struct page *virt_to_head_page(const void *x)
return compound_head(page);
}
-/*
- * Setup the page count before being freed into the page allocator for
- * the first time (boot or memory hotplug)
- */
-static inline void init_page_count(struct page *page)
-{
- atomic_set(&page->_count, 1);
-}
-
void __put_page(struct page *page);
static inline void put_page(struct page *page)
diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
new file mode 100644
index 0000000..534249c
--- /dev/null
+++ b/include/linux/page_ref.h
@@ -0,0 +1,76 @@
+#include <linux/atomic.h>
+#include <linux/mm_types.h>
+#include <linux/page-flags.h>
+
+static inline int page_count(struct page *page)
+{
+ return atomic_read(&compound_head(page)->_count);
+}
+
+static inline void set_page_count(struct page *page, int v)
+{
+ atomic_set(&page->_count, v);
+}
+
+/*
+ * Setup the page count before being freed into the page allocator for
+ * the first time (boot or memory hotplug)
+ */
+static inline void init_page_count(struct page *page)
+{
+ set_page_count(page, 1);
+}
+
+static inline void page_ref_add(struct page *page, int nr)
+{
+ atomic_add(nr, &page->_count);
+}
+
+static inline void page_ref_sub(struct page *page, int nr)
+{
+ atomic_sub(nr, &page->_count);
+}
+
+static inline void page_ref_inc(struct page *page)
+{
+ atomic_inc(&page->_count);
+}
+
+static inline void page_ref_dec(struct page *page)
+{
+ atomic_dec(&page->_count);
+}
+
+static inline int page_ref_sub_and_test(struct page *page, int nr)
+{
+ return atomic_sub_and_test(nr, &page->_count);
+}
+
+static inline int page_ref_dec_and_test(struct page *page)
+{
+ return atomic_dec_and_test(&page->_count);
+}
+
+static inline int page_ref_dec_return(struct page *page)
+{
+ return atomic_dec_return(&page->_count);
+}
+
+static inline int page_ref_add_unless(struct page *page, int nr, int u)
+{
+ return atomic_add_unless(&page->_count, nr, u);
+}
+
+static inline int page_ref_freeze(struct page *page, int count)
+{
+ return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
+}
+
+static inline void page_ref_unfreeze(struct page *page, int count)
+{
+ VM_BUG_ON_PAGE(page_count(page) != 0, page);
+ VM_BUG_ON(count == 0);
+
+ atomic_set(&page->_count, count);
+}
+
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4d08b6c..67ce560 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -165,7 +165,7 @@ static inline int page_cache_get_speculative(struct page *page)
* SMP requires.
*/
VM_BUG_ON_PAGE(page_count(page) == 0, page);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
#else
if (unlikely(!get_page_unless_zero(page))) {
@@ -194,10 +194,10 @@ static inline int page_cache_add_speculative(struct page *page, int count)
VM_BUG_ON(!in_atomic());
# endif
VM_BUG_ON_PAGE(page_count(page) == 0, page);
- atomic_add(count, &page->_count);
+ page_ref_add(page, count);
#else
- if (unlikely(!atomic_add_unless(&page->_count, count, 0)))
+ if (unlikely(!page_ref_add_unless(page, count, 0)))
return 0;
#endif
VM_BUG_ON_PAGE(PageCompound(page) && page != compound_head(page), page);
@@ -205,19 +205,6 @@ static inline int page_cache_add_speculative(struct page *page, int count)
return 1;
}
-static inline int page_freeze_refs(struct page *page, int count)
-{
- return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
-}
-
-static inline void page_unfreeze_refs(struct page *page, int count)
-{
- VM_BUG_ON_PAGE(page_count(page) != 0, page);
- VM_BUG_ON(count == 0);
-
- atomic_set(&page->_count, count);
-}
-
#ifdef CONFIG_NUMA
extern struct page *__page_cache_alloc(gfp_t gfp);
#else
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4b3420a..a189f27 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2881,7 +2881,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
page = pmd_page(*pmd);
VM_BUG_ON_PAGE(!page_count(page), page);
- atomic_add(HPAGE_PMD_NR - 1, &page->_count);
+ page_ref_add(page, HPAGE_PMD_NR - 1);
write = pmd_write(*pmd);
young = pmd_young(*pmd);
@@ -3197,7 +3197,7 @@ static int __split_huge_page_tail(struct page *head, int tail,
* atomic_set() here would be safe on all archs (and not only on x86),
* it's safer to use atomic_add().
*/
- atomic_add(mapcount + 1, &page_tail->_count);
+ page_ref_add(page_tail, mapcount + 1);
/* after clearing PageTail the gup refcount can be released */
smp_mb__after_atomic();
@@ -3256,7 +3256,7 @@ static void __split_huge_page(struct page *page, struct list_head *list)
tail_mapcount = 0;
for (i = HPAGE_PMD_NR - 1; i >= 1; i--)
tail_mapcount += __split_huge_page_tail(head, i, lruvec, list);
- atomic_sub(tail_mapcount, &head->_count);
+ page_ref_sub(head, tail_mapcount);
ClearPageCompound(head);
spin_unlock_irq(&zone->lru_lock);
diff --git a/mm/internal.h b/mm/internal.h
index dbe0436..f7be635 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -41,11 +41,6 @@ extern int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
unsigned long floor, unsigned long ceiling);
-static inline void set_page_count(struct page *page, int v)
-{
- atomic_set(&page->_count, v);
-}
-
extern int __do_page_cache_readahead(struct address_space *mapping,
struct file *filp, pgoff_t offset, unsigned long nr_to_read,
unsigned long lookahead_size);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 67d488a..a598c1c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -161,7 +161,7 @@ void get_page_bootmem(unsigned long info, struct page *page,
page->lru.next = (struct list_head *) type;
SetPagePrivate(page);
set_page_private(page, info);
- atomic_inc(&page->_count);
+ page_ref_inc(page);
}
void put_page_bootmem(struct page *page)
@@ -172,7 +172,7 @@ void put_page_bootmem(struct page *page)
BUG_ON(type < MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
type > MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE);
- if (atomic_dec_return(&page->_count) == 1) {
+ if (page_ref_dec_return(page) == 1) {
ClearPagePrivate(page);
set_page_private(page, 0);
INIT_LIST_HEAD(&page->lru);
diff --git a/mm/migrate.c b/mm/migrate.c
index 1ae0113..b48a9b1 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -349,7 +349,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
return -EAGAIN;
}
- if (!page_freeze_refs(page, expected_count)) {
+ if (!page_ref_freeze(page, expected_count)) {
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
}
@@ -363,7 +363,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
*/
if (mode == MIGRATE_ASYNC && head &&
!buffer_migrate_lock_buffers(head, mode)) {
- page_unfreeze_refs(page, expected_count);
+ page_ref_unfreeze(page, expected_count);
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
}
@@ -398,7 +398,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
* to one less reference.
* We know this isn't the last reference.
*/
- page_unfreeze_refs(page, expected_count - 1);
+ page_ref_unfreeze(page, expected_count - 1);
spin_unlock(&mapping->tree_lock);
/* Leave irq disabled to prevent preemption while updating stats */
@@ -452,7 +452,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
return -EAGAIN;
}
- if (!page_freeze_refs(page, expected_count)) {
+ if (!page_ref_freeze(page, expected_count)) {
spin_unlock_irq(&mapping->tree_lock);
return -EAGAIN;
}
@@ -464,7 +464,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
radix_tree_replace_slot(pslot, newpage);
- page_unfreeze_refs(page, expected_count - 1);
+ page_ref_unfreeze(page, expected_count - 1);
spin_unlock_irq(&mapping->tree_lock);
return MIGRATEPAGE_SUCCESS;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e78d78f..40c07db 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3388,7 +3388,7 @@ refill:
/* Even if we own the page, we do not use atomic_set().
* This would break get_page_unless_zero() users.
*/
- atomic_add(size - 1, &page->_count);
+ page_ref_add(page, size - 1);
/* reset page count bias and offset to start of new frag */
nc->pfmemalloc = page_is_pfmemalloc(page);
@@ -3400,7 +3400,7 @@ refill:
if (unlikely(offset < 0)) {
page = virt_to_page(nc->va);
- if (!atomic_sub_and_test(nc->pagecnt_bias, &page->_count))
+ if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
goto refill;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
@@ -3408,7 +3408,7 @@ refill:
size = nc->size;
#endif
/* OK, page count is 0, we can safely set it */
- atomic_set(&page->_count, size);
+ set_page_count(page, size);
/* reset page count bias and offset to start of new frag */
nc->pagecnt_bias = size;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b52ecf..d9ccf02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -642,11 +642,11 @@ static int __remove_mapping(struct address_space *mapping, struct page *page,
* Note that if SetPageDirty is always performed via set_page_dirty,
* and thus under tree_lock, then this ordering is not required.
*/
- if (!page_freeze_refs(page, 2))
+ if (!page_ref_freeze(page, 2))
goto cannot_free;
/* note: atomic_cmpxchg in page_freeze_refs provides the smp_rmb */
if (unlikely(PageDirty(page))) {
- page_unfreeze_refs(page, 2);
+ page_ref_unfreeze(page, 2);
goto cannot_free;
}
@@ -705,7 +705,7 @@ int remove_mapping(struct address_space *mapping, struct page *page)
* drops the pagecache ref for us without requiring another
* atomic operation.
*/
- page_unfreeze_refs(page, 1);
+ page_ref_unfreeze(page, 1);
return 1;
}
return 0;
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm: introduce page reference manipulation functions
2015-11-09 7:23 [PATCH 1/2] mm: introduce page reference manipulation functions Joonsoo Kim
@ 2015-11-09 7:53 ` Sergey Senozhatsky
2015-11-09 8:00 ` Joonsoo Kim
2015-11-10 15:58 ` Michal Nazarewicz
1 sibling, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2015-11-09 7:53 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov, linux-mm, linux-kernel,
linux-api, Joonsoo Kim
Hi,
On (11/09/15 16:23), Joonsoo Kim wrote:
[..]
> +static inline int page_count(struct page *page)
> +{
> + return atomic_read(&compound_head(page)->_count);
> +}
> +
> +static inline void set_page_count(struct page *page, int v)
> +{
> + atomic_set(&page->_count, v);
> +}
> +
> +/*
> + * Setup the page count before being freed into the page allocator for
> + * the first time (boot or memory hotplug)
> + */
> +static inline void init_page_count(struct page *page)
> +{
> + set_page_count(page, 1);
> +}
> +
> +static inline void page_ref_add(struct page *page, int nr)
> +{
> + atomic_add(nr, &page->_count);
> +}
Since page_ref_FOO wrappers operate with page->_count and there
are already page_count()/set_page_count()/etc. may be name new
wrappers in page_count_FOO() manner?
> +static inline void page_ref_sub(struct page *page, int nr)
for example, page_count_sub(), etc.
-ss
> +{
> + atomic_sub(nr, &page->_count);
> +}
> +
> +static inline void page_ref_inc(struct page *page)
> +{
> + atomic_inc(&page->_count);
> +}
> +
> +static inline void page_ref_dec(struct page *page)
> +{
> + atomic_dec(&page->_count);
> +}
> +
> +static inline int page_ref_sub_and_test(struct page *page, int nr)
> +{
> + return atomic_sub_and_test(nr, &page->_count);
> +}
> +
> +static inline int page_ref_dec_and_test(struct page *page)
> +{
> + return atomic_dec_and_test(&page->_count);
> +}
> +
> +static inline int page_ref_dec_return(struct page *page)
> +{
> + return atomic_dec_return(&page->_count);
> +}
> +
> +static inline int page_ref_add_unless(struct page *page, int nr, int u)
> +{
> + return atomic_add_unless(&page->_count, nr, u);
> +}
> +
> +static inline int page_ref_freeze(struct page *page, int count)
> +{
> + return likely(atomic_cmpxchg(&page->_count, count, 0) == count);
> +}
> +
> +static inline void page_ref_unfreeze(struct page *page, int count)
> +{
> + VM_BUG_ON_PAGE(page_count(page) != 0, page);
> + VM_BUG_ON(count == 0);
> +
> + atomic_set(&page->_count, count);
> +}
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm: introduce page reference manipulation functions
2015-11-09 7:53 ` Sergey Senozhatsky
@ 2015-11-09 8:00 ` Joonsoo Kim
2015-11-09 11:45 ` Kirill A. Shutemov
0 siblings, 1 reply; 22+ messages in thread
From: Joonsoo Kim @ 2015-11-09 8:00 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Andrew Morton, Michal Nazarewicz, Minchan Kim, Mel Gorman,
Vlastimil Babka, Kirill A. Shutemov,
Linux Memory Management List, LKML, linux-api, Joonsoo Kim
2015-11-09 16:53 GMT+09:00 Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com>:
> Hi,
>
> On (11/09/15 16:23), Joonsoo Kim wrote:
> [..]
>> +static inline int page_count(struct page *page)
>> +{
>> + return atomic_read(&compound_head(page)->_count);
>> +}
>> +
>> +static inline void set_page_count(struct page *page, int v)
>> +{
>> + atomic_set(&page->_count, v);
>> +}
>> +
>> +/*
>> + * Setup the page count before being freed into the page allocator for
>> + * the first time (boot or memory hotplug)
>> + */
>> +static inline void init_page_count(struct page *page)
>> +{
>> + set_page_count(page, 1);
>> +}
>> +
>> +static inline void page_ref_add(struct page *page, int nr)
>> +{
>> + atomic_add(nr, &page->_count);
>> +}
>
> Since page_ref_FOO wrappers operate with page->_count and there
> are already page_count()/set_page_count()/etc. may be name new
> wrappers in page_count_FOO() manner?
Hello,
I used that page_count_ before but change my mind.
I think that ref is more relevant to this operation.
Perhaps, it'd be better to change page_count()/set_page_count()
to page_ref()/set_page_ref().
FYI, some functions such as page_(un)freeze_refs uses ref. :)
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm: introduce page reference manipulation functions
2015-11-09 8:00 ` Joonsoo Kim
@ 2015-11-09 11:45 ` Kirill A. Shutemov
2015-11-10 0:28 ` Joonsoo Kim
0 siblings, 1 reply; 22+ messages in thread
From: Kirill A. Shutemov @ 2015-11-09 11:45 UTC (permalink / raw)
To: Joonsoo Kim
Cc: Sergey Senozhatsky, Andrew Morton, Michal Nazarewicz,
Minchan Kim, Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Linux Memory Management List, LKML, linux-api, Joonsoo Kim
On Mon, Nov 09, 2015 at 05:00:32PM +0900, Joonsoo Kim wrote:
> 2015-11-09 16:53 GMT+09:00 Sergey Senozhatsky
> <sergey.senozhatsky.work@gmail.com>:
> > Hi,
> >
> > On (11/09/15 16:23), Joonsoo Kim wrote:
> > [..]
> >> +static inline int page_count(struct page *page)
> >> +{
> >> + return atomic_read(&compound_head(page)->_count);
> >> +}
> >> +
> >> +static inline void set_page_count(struct page *page, int v)
> >> +{
> >> + atomic_set(&page->_count, v);
> >> +}
> >> +
> >> +/*
> >> + * Setup the page count before being freed into the page allocator for
> >> + * the first time (boot or memory hotplug)
> >> + */
> >> +static inline void init_page_count(struct page *page)
> >> +{
> >> + set_page_count(page, 1);
> >> +}
> >> +
> >> +static inline void page_ref_add(struct page *page, int nr)
> >> +{
> >> + atomic_add(nr, &page->_count);
> >> +}
> >
> > Since page_ref_FOO wrappers operate with page->_count and there
> > are already page_count()/set_page_count()/etc. may be name new
> > wrappers in page_count_FOO() manner?
>
> Hello,
>
> I used that page_count_ before but change my mind.
> I think that ref is more relevant to this operation.
> Perhaps, it'd be better to change page_count()/set_page_count()
> to page_ref()/set_page_ref().
What about get_page() vs. page_cache_get() and put_page() vs.
page_cache_release()? Two different helpers for the same thing is annyoing
me for some time (plus PAGE_SIZE vs. PAGE_CACHE_SIZE, etc.).
If you want coherent API you might want to get them consitent too.
> FYI, some functions such as page_(un)freeze_refs uses ref. :)
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm: introduce page reference manipulation functions
2015-11-09 11:45 ` Kirill A. Shutemov
@ 2015-11-10 0:28 ` Joonsoo Kim
0 siblings, 0 replies; 22+ messages in thread
From: Joonsoo Kim @ 2015-11-10 0:28 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Sergey Senozhatsky, Andrew Morton, Michal Nazarewicz,
Minchan Kim, Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
Linux Memory Management List, LKML, linux-api
On Mon, Nov 09, 2015 at 01:45:37PM +0200, Kirill A. Shutemov wrote:
> On Mon, Nov 09, 2015 at 05:00:32PM +0900, Joonsoo Kim wrote:
> > 2015-11-09 16:53 GMT+09:00 Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com>:
> > > Hi,
> > >
> > > On (11/09/15 16:23), Joonsoo Kim wrote:
> > > [..]
> > >> +static inline int page_count(struct page *page)
> > >> +{
> > >> + return atomic_read(&compound_head(page)->_count);
> > >> +}
> > >> +
> > >> +static inline void set_page_count(struct page *page, int v)
> > >> +{
> > >> + atomic_set(&page->_count, v);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Setup the page count before being freed into the page allocator for
> > >> + * the first time (boot or memory hotplug)
> > >> + */
> > >> +static inline void init_page_count(struct page *page)
> > >> +{
> > >> + set_page_count(page, 1);
> > >> +}
> > >> +
> > >> +static inline void page_ref_add(struct page *page, int nr)
> > >> +{
> > >> + atomic_add(nr, &page->_count);
> > >> +}
> > >
> > > Since page_ref_FOO wrappers operate with page->_count and there
> > > are already page_count()/set_page_count()/etc. may be name new
> > > wrappers in page_count_FOO() manner?
> >
> > Hello,
> >
> > I used that page_count_ before but change my mind.
> > I think that ref is more relevant to this operation.
> > Perhaps, it'd be better to change page_count()/set_page_count()
> > to page_ref()/set_page_ref().
>
> What about get_page() vs. page_cache_get() and put_page() vs.
> page_cache_release()? Two different helpers for the same thing is annyoing
> me for some time (plus PAGE_SIZE vs. PAGE_CACHE_SIZE, etc.).
>
> If you want coherent API you might want to get them consitent too.
In fact, consistent naming is out of interest of this patchset. :)
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/2] mm: introduce page reference manipulation functions
2015-11-09 7:23 [PATCH 1/2] mm: introduce page reference manipulation functions Joonsoo Kim
2015-11-09 7:53 ` Sergey Senozhatsky
@ 2015-11-10 15:58 ` Michal Nazarewicz
1 sibling, 0 replies; 22+ messages in thread
From: Michal Nazarewicz @ 2015-11-10 15:58 UTC (permalink / raw)
To: Joonsoo Kim, Andrew Morton
Cc: Minchan Kim, Mel Gorman, Vlastimil Babka, Kirill A. Shutemov,
linux-mm, linux-kernel, linux-api, Joonsoo Kim
On Mon, Nov 09 2015, Joonsoo Kim wrote:
> Success of CMA allocation largely depends on success of migration
> and key factor of it is page reference count. Until now, page reference
> is manipulated by direct calling atomic functions so we cannot follow up
> who and where manipulate it. Then, it is hard to find actual reason
> of CMA allocation failure. CMA allocation should be guaranteed to succeed
> so finding offending place is really important.
>
> In this patch, call sites where page reference is manipulated are converted
> to introduced wrapper function. This is preparation step to add tracepoint
> to each page reference manipulation function. With this facility, we can
> easily find reason of CMA allocation failure. There is no functional change
> in this patch.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
> ---
> arch/mips/mm/gup.c | 2 +-
> arch/powerpc/mm/mmu_context_hash64.c | 3 +-
> arch/powerpc/mm/pgtable_64.c | 2 +-
> arch/x86/mm/gup.c | 2 +-
> drivers/block/aoe/aoecmd.c | 4 +-
> drivers/net/ethernet/freescale/gianfar.c | 2 +-
> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
> drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 7 +--
> drivers/net/ethernet/sun/niu.c | 2 +-
> include/linux/mm.h | 21 ++-----
> include/linux/page_ref.h | 76 +++++++++++++++++++++++
> include/linux/pagemap.h | 19 +-----
> mm/huge_memory.c | 6 +-
> mm/internal.h | 5 --
> mm/memory_hotplug.c | 4 +-
> mm/migrate.c | 10 +--
> mm/page_alloc.c | 6 +-
> mm/vmscan.c | 6 +-
> 21 files changed, 114 insertions(+), 71 deletions(-)
> create mode 100644 include/linux/page_ref.h
>
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, ミハウ “mina86” ナザレヴイツ (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>-----ooO--(_)--Ooo--
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-02-19 2:14 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-15 3:04 [PATCH 1/2] mm: introduce page reference manipulation functions js1304
2016-02-15 3:04 ` [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation js1304
2016-02-15 5:08 ` Sergey Senozhatsky
2016-02-15 5:28 ` Sergey Senozhatsky
2016-02-15 14:18 ` Joonsoo Kim
2016-02-15 16:07 ` Steven Rostedt
2016-02-16 0:47 ` Joonsoo Kim
2016-02-16 1:16 ` Steven Rostedt
2016-02-18 7:46 ` Joonsoo Kim
2016-02-18 14:20 ` Steven Rostedt
2016-02-18 14:29 ` Steven Rostedt
2016-02-19 0:34 ` Sergey Senozhatsky
2016-02-19 1:39 ` Joonsoo Kim
2016-02-19 1:46 ` Steven Rostedt
2016-02-19 2:15 ` Sergey Senozhatsky
2016-02-19 1:20 ` Joonsoo Kim
-- strict thread matches above, loose matches on Subject: below --
2015-11-09 7:23 [PATCH 1/2] mm: introduce page reference manipulation functions Joonsoo Kim
2015-11-09 7:53 ` Sergey Senozhatsky
2015-11-09 8:00 ` Joonsoo Kim
2015-11-09 11:45 ` Kirill A. Shutemov
2015-11-10 0:28 ` Joonsoo Kim
2015-11-10 15:58 ` Michal Nazarewicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).