* [PATCH 0/9] split page table lock for PMD tables
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Alex Thorlton noticed that some massivly threaded workloads work poorly,
if THP enabled. This patchset fixes this by introducing split page table
lock for PMD tables. hugetlbfs is not covered yet.
This patchset is based on work by Naoya Horiguchi.
Benchmark (from Alex): ftp://shell.sgi.com/collect/appsx_test/pthread_test.tar.gz
THP off:
--------
Performance counter stats for './thp_pthread -C 0 -m 0 -c 80 -b 100g' (5 runs):
1738259.808012 task-clock # 47.571 CPUs utilized ( +- 9.49% )
147,359 context-switches # 0.085 K/sec ( +- 9.67% )
14 cpu-migrations # 0.000 K/sec ( +- 13.25% )
24,410,139 page-faults # 0.014 M/sec ( +- 0.00% )
4,149,037,526,252 cycles # 2.387 GHz ( +- 9.50% )
3,649,839,735,027 stalled-cycles-frontend # 87.97% frontend cycles idle ( +- 6.60% )
2,455,558,969,567 stalled-cycles-backend # 59.18% backend cycles idle ( +- 22.92% )
1,434,961,518,604 instructions # 0.35 insns per cycle
# 2.54 stalled cycles per insn ( +- 92.86% )
241,472,020,951 branches # 138.916 M/sec ( +- 91.72% )
84,022,172 branch-misses # 0.03% of all branches ( +- 3.16% )
36.540185552 seconds time elapsed ( +- 18.36% )
THP on, no patchset:
--------------------
Performance counter stats for './thp_pthread -C 0 -m 0 -c 80 -b 100g' (5 runs):
2528378.966949 task-clock # 50.715 CPUs utilized ( +- 11.86% )
214,063 context-switches # 0.085 K/sec ( +- 11.94% )
19 cpu-migrations # 0.000 K/sec ( +- 22.72% )
49,226 page-faults # 0.019 K/sec ( +- 0.33% )
6,034,640,598,498 cycles # 2.387 GHz ( +- 11.91% )
5,685,933,794,081 stalled-cycles-frontend # 94.22% frontend cycles idle ( +- 7.67% )
4,414,381,393,353 stalled-cycles-backend # 73.15% backend cycles idle ( +- 2.09% )
952,086,804,776 instructions # 0.16 insns per cycle
# 5.97 stalled cycles per insn ( +- 89.59% )
166,191,211,974 branches # 65.730 M/sec ( +- 85.52% )
33,341,022 branch-misses # 0.02% of all branches ( +- 3.90% )
49.854741504 seconds time elapsed ( +- 14.76% )
THP on, with patchset:
----------------------
echo always > /sys/kernel/mm/transparent_hugepage/enabled
Performance counter stats for './thp_pthread -C 0 -m 0 -c 80 -b 100g' (5 runs):
1538763.343568 task-clock # 45.386 CPUs utilized ( +- 7.21% )
130,469 context-switches # 0.085 K/sec ( +- 7.32% )
14 cpu-migrations # 0.000 K/sec ( +- 23.58% )
49,299 page-faults # 0.032 K/sec ( +- 0.15% )
3,666,748,502,650 cycles # 2.383 GHz ( +- 7.25% )
3,330,488,035,212 stalled-cycles-frontend # 90.83% frontend cycles idle ( +- 4.70% )
2,383,357,073,990 stalled-cycles-backend # 65.00% backend cycles idle ( +- 16.06% )
935,504,610,528 instructions # 0.26 insns per cycle
# 3.56 stalled cycles per insn ( +- 91.16% )
161,466,689,532 branches # 104.933 M/sec ( +- 87.67% )
22,602,225 branch-misses # 0.01% of all branches ( +- 6.43% )
33.903917543 seconds time elapsed ( +- 12.57% )
Kirill A. Shutemov (9):
mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS
mm: convert mm->nr_ptes to atomic_t
mm: introduce api for split page table lock for PMD level
mm, thp: change pmd_trans_huge_lock() to return taken lock
mm, thp: move ptl taking inside page_check_address_pmd()
mm, thp: do not access mm->pmd_huge_pte directly
mm: convent the rest to new page table lock api
mm: implement split page table lock for PMD level
x86, mm: enable split page table lock for PMD level
arch/arm/mm/fault-armv.c | 6 +-
arch/s390/mm/pgtable.c | 12 +--
arch/sparc/mm/tlb.c | 12 +--
arch/um/defconfig | 2 +-
arch/x86/Kconfig | 4 +
arch/x86/include/asm/pgalloc.h | 8 +-
arch/x86/xen/mmu.c | 6 +-
arch/xtensa/configs/iss_defconfig | 2 +-
arch/xtensa/configs/s6105_defconfig | 2 +-
fs/proc/task_mmu.c | 15 +--
include/linux/huge_mm.h | 17 +--
include/linux/mm.h | 51 ++++++++-
include/linux/mm_types.h | 15 ++-
kernel/fork.c | 6 +-
mm/Kconfig | 12 ++-
mm/huge_memory.c | 201 +++++++++++++++++++++---------------
mm/memcontrol.c | 10 +-
mm/memory.c | 21 ++--
mm/migrate.c | 7 +-
mm/mmap.c | 3 +-
mm/mprotect.c | 4 +-
mm/oom_kill.c | 6 +-
mm/pgtable-generic.c | 16 +--
mm/rmap.c | 13 +--
24 files changed, 280 insertions(+), 171 deletions(-)
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
We're going to introduce split page table lock for PMD level.
Let's rename existing split ptlock for PTE level to avoid confusion.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/arm/mm/fault-armv.c | 6 +++---
arch/um/defconfig | 2 +-
arch/x86/xen/mmu.c | 6 +++---
arch/xtensa/configs/iss_defconfig | 2 +-
arch/xtensa/configs/s6105_defconfig | 2 +-
include/linux/mm.h | 6 +++---
include/linux/mm_types.h | 8 ++++----
mm/Kconfig | 2 +-
8 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 2a5907b..ff379ac 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -65,7 +65,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address,
return ret;
}
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
/*
* If we are using split PTE locks, then we need to take the page
* lock here. Otherwise we are using shared mm->page_table_lock
@@ -84,10 +84,10 @@ static inline void do_pte_unlock(spinlock_t *ptl)
{
spin_unlock(ptl);
}
-#else /* !USE_SPLIT_PTLOCKS */
+#else /* !USE_SPLIT_PTE_PTLOCKS */
static inline void do_pte_lock(spinlock_t *ptl) {}
static inline void do_pte_unlock(spinlock_t *ptl) {}
-#endif /* USE_SPLIT_PTLOCKS */
+#endif /* USE_SPLIT_PTE_PTLOCKS */
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
unsigned long pfn)
diff --git a/arch/um/defconfig b/arch/um/defconfig
index 08107a7..6b0a10f 100644
--- a/arch/um/defconfig
+++ b/arch/um/defconfig
@@ -82,7 +82,7 @@ CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_SPLIT_PTE_PTLOCK_CPUS=4
# CONFIG_COMPACTION is not set
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=0
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..455c873 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -796,7 +796,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct mm_struct *mm)
{
spinlock_t *ptl = NULL;
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
ptl = __pte_lockptr(page);
spin_lock_nest_lock(ptl, &mm->page_table_lock);
#endif
@@ -1637,7 +1637,7 @@ static inline void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn,
__set_pfn_prot(pfn, PAGE_KERNEL_RO);
- if (level == PT_PTE && USE_SPLIT_PTLOCKS)
+ if (level == PT_PTE && USE_SPLIT_PTE_PTLOCKS)
__pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
xen_mc_issue(PARAVIRT_LAZY_MMU);
@@ -1671,7 +1671,7 @@ static inline void xen_release_ptpage(unsigned long pfn, unsigned level)
if (!PageHighMem(page)) {
xen_mc_batch();
- if (level == PT_PTE && USE_SPLIT_PTLOCKS)
+ if (level == PT_PTE && USE_SPLIT_PTE_PTLOCKS)
__pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, pfn);
__set_pfn_prot(pfn, PAGE_KERNEL);
diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
index 77c52f8..54cc946 100644
--- a/arch/xtensa/configs/iss_defconfig
+++ b/arch/xtensa/configs/iss_defconfig
@@ -174,7 +174,7 @@ CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_SPLIT_PTE_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
diff --git a/arch/xtensa/configs/s6105_defconfig b/arch/xtensa/configs/s6105_defconfig
index 4799c6a..d802f11 100644
--- a/arch/xtensa/configs/s6105_defconfig
+++ b/arch/xtensa/configs/s6105_defconfig
@@ -138,7 +138,7 @@ CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_SPLIT_PTE_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_VIRT_TO_BUS=y
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8b6e55e..6cf8ddb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1232,7 +1232,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
/*
* We tuck a spinlock to guard each pagetable page into its struct page,
* at page->private, with BUILD_BUG_ON to make sure that this will not
@@ -1245,14 +1245,14 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
} while (0)
#define pte_lock_deinit(page) ((page)->mapping = NULL)
#define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
-#else /* !USE_SPLIT_PTLOCKS */
+#else /* !USE_SPLIT_PTE_PTLOCKS */
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
#define pte_lock_init(page) do {} while (0)
#define pte_lock_deinit(page) do {} while (0)
#define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
-#endif /* USE_SPLIT_PTLOCKS */
+#endif /* USE_SPLIT_PTE_PTLOCKS */
static inline void pgtable_page_ctor(struct page *page)
{
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index faf4b7c..fe0a4bb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,7 +23,7 @@
struct address_space;
-#define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
+#define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTE_PTLOCK_CPUS)
/*
* Each physical page in the system has a struct page associated with
@@ -141,7 +141,7 @@ struct page {
* indicates order in the buddy
* system if PG_buddy is set.
*/
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
spinlock_t ptl;
#endif
struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
@@ -309,14 +309,14 @@ enum {
NR_MM_COUNTERS
};
-#if USE_SPLIT_PTLOCKS && defined(CONFIG_MMU)
+#if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU)
#define SPLIT_RSS_COUNTING
/* per-thread cached information, */
struct task_rss_stat {
int events; /* for synchronization threshold */
int count[NR_MM_COUNTERS];
};
-#endif /* USE_SPLIT_PTLOCKS */
+#endif /* USE_SPLIT_PTE_PTLOCKS */
struct mm_rss_stat {
atomic_long_t count[NR_MM_COUNTERS];
diff --git a/mm/Kconfig b/mm/Kconfig
index 026771a..1977a33 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -207,7 +207,7 @@ config PAGEFLAGS_EXTENDED
# PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
# DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
#
-config SPLIT_PTLOCK_CPUS
+config SPLIT_PTE_PTLOCK_CPUS
int
default "999999" if ARM && !CPU_CACHE_VIPT
default "999999" if PARISC && !PA20
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
We're going to introduce split page table lock for PMD level.
Let's rename existing split ptlock for PTE level to avoid confusion.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/arm/mm/fault-armv.c | 6 +++---
arch/um/defconfig | 2 +-
arch/x86/xen/mmu.c | 6 +++---
arch/xtensa/configs/iss_defconfig | 2 +-
arch/xtensa/configs/s6105_defconfig | 2 +-
include/linux/mm.h | 6 +++---
include/linux/mm_types.h | 8 ++++----
mm/Kconfig | 2 +-
8 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 2a5907b..ff379ac 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -65,7 +65,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address,
return ret;
}
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
/*
* If we are using split PTE locks, then we need to take the page
* lock here. Otherwise we are using shared mm->page_table_lock
@@ -84,10 +84,10 @@ static inline void do_pte_unlock(spinlock_t *ptl)
{
spin_unlock(ptl);
}
-#else /* !USE_SPLIT_PTLOCKS */
+#else /* !USE_SPLIT_PTE_PTLOCKS */
static inline void do_pte_lock(spinlock_t *ptl) {}
static inline void do_pte_unlock(spinlock_t *ptl) {}
-#endif /* USE_SPLIT_PTLOCKS */
+#endif /* USE_SPLIT_PTE_PTLOCKS */
static int adjust_pte(struct vm_area_struct *vma, unsigned long address,
unsigned long pfn)
diff --git a/arch/um/defconfig b/arch/um/defconfig
index 08107a7..6b0a10f 100644
--- a/arch/um/defconfig
+++ b/arch/um/defconfig
@@ -82,7 +82,7 @@ CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_SPLIT_PTE_PTLOCK_CPUS=4
# CONFIG_COMPACTION is not set
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=0
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..455c873 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -796,7 +796,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct mm_struct *mm)
{
spinlock_t *ptl = NULL;
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
ptl = __pte_lockptr(page);
spin_lock_nest_lock(ptl, &mm->page_table_lock);
#endif
@@ -1637,7 +1637,7 @@ static inline void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn,
__set_pfn_prot(pfn, PAGE_KERNEL_RO);
- if (level == PT_PTE && USE_SPLIT_PTLOCKS)
+ if (level == PT_PTE && USE_SPLIT_PTE_PTLOCKS)
__pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
xen_mc_issue(PARAVIRT_LAZY_MMU);
@@ -1671,7 +1671,7 @@ static inline void xen_release_ptpage(unsigned long pfn, unsigned level)
if (!PageHighMem(page)) {
xen_mc_batch();
- if (level == PT_PTE && USE_SPLIT_PTLOCKS)
+ if (level == PT_PTE && USE_SPLIT_PTE_PTLOCKS)
__pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, pfn);
__set_pfn_prot(pfn, PAGE_KERNEL);
diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig
index 77c52f8..54cc946 100644
--- a/arch/xtensa/configs/iss_defconfig
+++ b/arch/xtensa/configs/iss_defconfig
@@ -174,7 +174,7 @@ CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_SPLIT_PTE_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
diff --git a/arch/xtensa/configs/s6105_defconfig b/arch/xtensa/configs/s6105_defconfig
index 4799c6a..d802f11 100644
--- a/arch/xtensa/configs/s6105_defconfig
+++ b/arch/xtensa/configs/s6105_defconfig
@@ -138,7 +138,7 @@ CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
-CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_SPLIT_PTE_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_VIRT_TO_BUS=y
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8b6e55e..6cf8ddb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1232,7 +1232,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
}
#endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
/*
* We tuck a spinlock to guard each pagetable page into its struct page,
* at page->private, with BUILD_BUG_ON to make sure that this will not
@@ -1245,14 +1245,14 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
} while (0)
#define pte_lock_deinit(page) ((page)->mapping = NULL)
#define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
-#else /* !USE_SPLIT_PTLOCKS */
+#else /* !USE_SPLIT_PTE_PTLOCKS */
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
#define pte_lock_init(page) do {} while (0)
#define pte_lock_deinit(page) do {} while (0)
#define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
-#endif /* USE_SPLIT_PTLOCKS */
+#endif /* USE_SPLIT_PTE_PTLOCKS */
static inline void pgtable_page_ctor(struct page *page)
{
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index faf4b7c..fe0a4bb 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -23,7 +23,7 @@
struct address_space;
-#define USE_SPLIT_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS)
+#define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTE_PTLOCK_CPUS)
/*
* Each physical page in the system has a struct page associated with
@@ -141,7 +141,7 @@ struct page {
* indicates order in the buddy
* system if PG_buddy is set.
*/
-#if USE_SPLIT_PTLOCKS
+#if USE_SPLIT_PTE_PTLOCKS
spinlock_t ptl;
#endif
struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
@@ -309,14 +309,14 @@ enum {
NR_MM_COUNTERS
};
-#if USE_SPLIT_PTLOCKS && defined(CONFIG_MMU)
+#if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU)
#define SPLIT_RSS_COUNTING
/* per-thread cached information, */
struct task_rss_stat {
int events; /* for synchronization threshold */
int count[NR_MM_COUNTERS];
};
-#endif /* USE_SPLIT_PTLOCKS */
+#endif /* USE_SPLIT_PTE_PTLOCKS */
struct mm_rss_stat {
atomic_long_t count[NR_MM_COUNTERS];
diff --git a/mm/Kconfig b/mm/Kconfig
index 026771a..1977a33 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -207,7 +207,7 @@ config PAGEFLAGS_EXTENDED
# PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
# DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
#
-config SPLIT_PTLOCK_CPUS
+config SPLIT_PTE_PTLOCK_CPUS
int
default "999999" if ARM && !CPU_CACHE_VIPT
default "999999" if PARISC && !PA20
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* Re: [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 15:20 ` Dave Hansen
-1 siblings, 0 replies; 77+ messages in thread
From: Dave Hansen @ 2013-09-13 15:20 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Jones, David Howells, Frederic Weisbecker,
Johannes Weiner, Kees Cook, Mel Gorman, Michael Kerrisk,
Oleg Nesterov, Peter Zijlstra, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On 09/13/2013 06:06 AM, Kirill A. Shutemov wrote:
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -207,7 +207,7 @@ config PAGEFLAGS_EXTENDED
> # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
> # DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
> #
> -config SPLIT_PTLOCK_CPUS
> +config SPLIT_PTE_PTLOCK_CPUS
> int
> default "999999" if ARM && !CPU_CACHE_VIPT
> default "999999" if PARISC && !PA20
If someone has a config where this is set to some non-default value,
won't changing the name cause this to revert back to the defaults?
I don't know how big of a deal it is to other folks, but you can always
do this:
config SPLIT_PTE_PTLOCK_CPUS
int
default SPLIT_PTLOCK_CPUS if SPLIT_PTLOCK_CPUS
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS
@ 2013-09-13 15:20 ` Dave Hansen
0 siblings, 0 replies; 77+ messages in thread
From: Dave Hansen @ 2013-09-13 15:20 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Jones, David Howells, Frederic Weisbecker,
Johannes Weiner, Kees Cook, Mel Gorman, Michael Kerrisk,
Oleg Nesterov, Peter Zijlstra, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On 09/13/2013 06:06 AM, Kirill A. Shutemov wrote:
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -207,7 +207,7 @@ config PAGEFLAGS_EXTENDED
> # PA-RISC 7xxx's spinlock_t would enlarge struct page from 32 to 44 bytes.
> # DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC spinlock_t also enlarge struct page.
> #
> -config SPLIT_PTLOCK_CPUS
> +config SPLIT_PTE_PTLOCK_CPUS
> int
> default "999999" if ARM && !CPU_CACHE_VIPT
> default "999999" if PARISC && !PA20
If someone has a config where this is set to some non-default value,
won't changing the name cause this to revert back to the defaults?
I don't know how big of a deal it is to other folks, but you can always
do this:
config SPLIT_PTE_PTLOCK_CPUS
int
default SPLIT_PTLOCK_CPUS if SPLIT_PTLOCK_CPUS
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH 2/9] mm: convert mm->nr_ptes to atomic_t
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
With split page table lock for PMD level we can't hold
mm->page_table_lock while updating nr_ptes.
Let's convert it to atomic_t to avoid races.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
fs/proc/task_mmu.c | 2 +-
include/linux/mm_types.h | 2 +-
kernel/fork.c | 2 +-
mm/huge_memory.c | 10 +++++-----
mm/memory.c | 4 ++--
mm/mmap.c | 3 ++-
mm/oom_kill.c | 6 +++---
7 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7366e9d..d45d423 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -62,7 +62,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
total_rss << (PAGE_SHIFT-10),
data << (PAGE_SHIFT-10),
mm->stack_vm << (PAGE_SHIFT-10), text, lib,
- (PTRS_PER_PTE*sizeof(pte_t)*mm->nr_ptes) >> 10,
+ (PTRS_PER_PTE*sizeof(pte_t)*atomic_read(&mm->nr_ptes)) >> 10,
swap << (PAGE_SHIFT-10));
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fe0a4bb..1c64730 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -338,6 +338,7 @@ struct mm_struct {
pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
+ atomic_t nr_ptes; /* Page table pages */
int map_count; /* number of VMAs */
spinlock_t page_table_lock; /* Protects page tables and some counters */
@@ -359,7 +360,6 @@ struct mm_struct {
unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE */
unsigned long stack_vm; /* VM_GROWSUP/DOWN */
unsigned long def_flags;
- unsigned long nr_ptes; /* Page table pages */
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
diff --git a/kernel/fork.c b/kernel/fork.c
index 81ccb4f..4c8b986 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -532,7 +532,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
mm->flags = (current->mm) ?
(current->mm->flags & MMF_INIT_MASK) : default_dump_filter;
mm->core_state = NULL;
- mm->nr_ptes = 0;
+ atomic_set(&mm->nr_ptes, 0);
memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
spin_lock_init(&mm->page_table_lock);
mm_init_aio(mm);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7489884..bbd41a2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -737,7 +737,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, haddr, pmd, entry);
add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
- mm->nr_ptes++;
+ atomic_inc(&mm->nr_ptes);
spin_unlock(&mm->page_table_lock);
}
@@ -778,7 +778,7 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
entry = pmd_mkhuge(entry);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, haddr, pmd, entry);
- mm->nr_ptes++;
+ atomic_inc(&mm->nr_ptes);
return true;
}
@@ -903,7 +903,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd = pmd_mkold(pmd_wrprotect(pmd));
pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
set_pmd_at(dst_mm, addr, dst_pmd, pmd);
- dst_mm->nr_ptes++;
+ atomic_inc(&dst_mm->nr_ptes);
ret = 0;
out_unlock:
@@ -1358,7 +1358,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
if (is_huge_zero_pmd(orig_pmd)) {
- tlb->mm->nr_ptes--;
+ atomic_dec(&tlb->mm->nr_ptes);
spin_unlock(&tlb->mm->page_table_lock);
put_huge_zero_page();
} else {
@@ -1367,7 +1367,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
VM_BUG_ON(page_mapcount(page) < 0);
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
VM_BUG_ON(!PageHead(page));
- tlb->mm->nr_ptes--;
+ atomic_dec(&tlb->mm->nr_ptes);
spin_unlock(&tlb->mm->page_table_lock);
tlb_remove_page(tlb, page);
}
diff --git a/mm/memory.c b/mm/memory.c
index ca00039..1046396 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -382,7 +382,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
pgtable_t token = pmd_pgtable(*pmd);
pmd_clear(pmd);
pte_free_tlb(tlb, token, addr);
- tlb->mm->nr_ptes--;
+ atomic_dec(&tlb->mm->nr_ptes);
}
static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -575,7 +575,7 @@ int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
spin_lock(&mm->page_table_lock);
wait_split_huge_page = 0;
if (likely(pmd_none(*pmd))) { /* Has another populated it ? */
- mm->nr_ptes++;
+ atomic_inc(&mm->nr_ptes);
pmd_populate(mm, pmd, new);
new = NULL;
} else if (unlikely(pmd_trans_splitting(*pmd)))
diff --git a/mm/mmap.c b/mm/mmap.c
index 9d54851..1d0efbc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2726,7 +2726,8 @@ void exit_mmap(struct mm_struct *mm)
}
vm_unacct_memory(nr_accounted);
- WARN_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
+ WARN_ON(atomic_read(&mm->nr_ptes) >
+ (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
}
/* Insert vm structure into process list sorted by address
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 314e9d2..7ab394e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -161,7 +161,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
* The baseline for the badness score is the proportion of RAM that each
* task's rss, pagetable and swap space use.
*/
- points = get_mm_rss(p->mm) + p->mm->nr_ptes +
+ points = get_mm_rss(p->mm) + atomic_read(&p->mm->nr_ptes) +
get_mm_counter(p->mm, MM_SWAPENTS);
task_unlock(p);
@@ -364,10 +364,10 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
continue;
}
- pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu %5hd %s\n",
+ pr_info("[%5d] %5d %5d %8lu %8lu %7d %8lu %5hd %s\n",
task->pid, from_kuid(&init_user_ns, task_uid(task)),
task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
- task->mm->nr_ptes,
+ atomic_read(&task->mm->nr_ptes),
get_mm_counter(task->mm, MM_SWAPENTS),
task->signal->oom_score_adj, task->comm);
task_unlock(task);
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 2/9] mm: convert mm->nr_ptes to atomic_t
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
With split page table lock for PMD level we can't hold
mm->page_table_lock while updating nr_ptes.
Let's convert it to atomic_t to avoid races.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
fs/proc/task_mmu.c | 2 +-
include/linux/mm_types.h | 2 +-
kernel/fork.c | 2 +-
mm/huge_memory.c | 10 +++++-----
mm/memory.c | 4 ++--
mm/mmap.c | 3 ++-
mm/oom_kill.c | 6 +++---
7 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7366e9d..d45d423 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -62,7 +62,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
total_rss << (PAGE_SHIFT-10),
data << (PAGE_SHIFT-10),
mm->stack_vm << (PAGE_SHIFT-10), text, lib,
- (PTRS_PER_PTE*sizeof(pte_t)*mm->nr_ptes) >> 10,
+ (PTRS_PER_PTE*sizeof(pte_t)*atomic_read(&mm->nr_ptes)) >> 10,
swap << (PAGE_SHIFT-10));
}
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fe0a4bb..1c64730 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -338,6 +338,7 @@ struct mm_struct {
pgd_t * pgd;
atomic_t mm_users; /* How many users with user space? */
atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */
+ atomic_t nr_ptes; /* Page table pages */
int map_count; /* number of VMAs */
spinlock_t page_table_lock; /* Protects page tables and some counters */
@@ -359,7 +360,6 @@ struct mm_struct {
unsigned long exec_vm; /* VM_EXEC & ~VM_WRITE */
unsigned long stack_vm; /* VM_GROWSUP/DOWN */
unsigned long def_flags;
- unsigned long nr_ptes; /* Page table pages */
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
diff --git a/kernel/fork.c b/kernel/fork.c
index 81ccb4f..4c8b986 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -532,7 +532,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p)
mm->flags = (current->mm) ?
(current->mm->flags & MMF_INIT_MASK) : default_dump_filter;
mm->core_state = NULL;
- mm->nr_ptes = 0;
+ atomic_set(&mm->nr_ptes, 0);
memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
spin_lock_init(&mm->page_table_lock);
mm_init_aio(mm);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7489884..bbd41a2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -737,7 +737,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, haddr, pmd, entry);
add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
- mm->nr_ptes++;
+ atomic_inc(&mm->nr_ptes);
spin_unlock(&mm->page_table_lock);
}
@@ -778,7 +778,7 @@ static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
entry = pmd_mkhuge(entry);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, haddr, pmd, entry);
- mm->nr_ptes++;
+ atomic_inc(&mm->nr_ptes);
return true;
}
@@ -903,7 +903,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd = pmd_mkold(pmd_wrprotect(pmd));
pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
set_pmd_at(dst_mm, addr, dst_pmd, pmd);
- dst_mm->nr_ptes++;
+ atomic_inc(&dst_mm->nr_ptes);
ret = 0;
out_unlock:
@@ -1358,7 +1358,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
if (is_huge_zero_pmd(orig_pmd)) {
- tlb->mm->nr_ptes--;
+ atomic_dec(&tlb->mm->nr_ptes);
spin_unlock(&tlb->mm->page_table_lock);
put_huge_zero_page();
} else {
@@ -1367,7 +1367,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
VM_BUG_ON(page_mapcount(page) < 0);
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
VM_BUG_ON(!PageHead(page));
- tlb->mm->nr_ptes--;
+ atomic_dec(&tlb->mm->nr_ptes);
spin_unlock(&tlb->mm->page_table_lock);
tlb_remove_page(tlb, page);
}
diff --git a/mm/memory.c b/mm/memory.c
index ca00039..1046396 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -382,7 +382,7 @@ static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
pgtable_t token = pmd_pgtable(*pmd);
pmd_clear(pmd);
pte_free_tlb(tlb, token, addr);
- tlb->mm->nr_ptes--;
+ atomic_dec(&tlb->mm->nr_ptes);
}
static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
@@ -575,7 +575,7 @@ int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
spin_lock(&mm->page_table_lock);
wait_split_huge_page = 0;
if (likely(pmd_none(*pmd))) { /* Has another populated it ? */
- mm->nr_ptes++;
+ atomic_inc(&mm->nr_ptes);
pmd_populate(mm, pmd, new);
new = NULL;
} else if (unlikely(pmd_trans_splitting(*pmd)))
diff --git a/mm/mmap.c b/mm/mmap.c
index 9d54851..1d0efbc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2726,7 +2726,8 @@ void exit_mmap(struct mm_struct *mm)
}
vm_unacct_memory(nr_accounted);
- WARN_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
+ WARN_ON(atomic_read(&mm->nr_ptes) >
+ (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
}
/* Insert vm structure into process list sorted by address
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 314e9d2..7ab394e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -161,7 +161,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
* The baseline for the badness score is the proportion of RAM that each
* task's rss, pagetable and swap space use.
*/
- points = get_mm_rss(p->mm) + p->mm->nr_ptes +
+ points = get_mm_rss(p->mm) + atomic_read(&p->mm->nr_ptes) +
get_mm_counter(p->mm, MM_SWAPENTS);
task_unlock(p);
@@ -364,10 +364,10 @@ static void dump_tasks(const struct mem_cgroup *memcg, const nodemask_t *nodemas
continue;
}
- pr_info("[%5d] %5d %5d %8lu %8lu %7lu %8lu %5hd %s\n",
+ pr_info("[%5d] %5d %5d %8lu %8lu %7d %8lu %5hd %s\n",
task->pid, from_kuid(&init_user_ns, task_uid(task)),
task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
- task->mm->nr_ptes,
+ atomic_read(&task->mm->nr_ptes),
get_mm_counter(task->mm, MM_SWAPENTS),
task->signal->oom_score_adj, task->comm);
task_unlock(task);
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 3/9] mm: introduce api for split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Basic api, backed by mm->page_table_lock for now. Actual implementation
will be added later.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6cf8ddb..d4361e7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1294,6 +1294,19 @@ static inline void pgtable_page_dtor(struct page *page)
((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
NULL: pte_offset_kernel(pmd, address))
+static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
+{
+ return &mm->page_table_lock;
+}
+
+
+static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
+{
+ spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
+ spin_lock(ptl);
+ return ptl;
+}
+
extern void free_area_init(unsigned long * zones_size);
extern void free_area_init_node(int nid, unsigned long * zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 3/9] mm: introduce api for split page table lock for PMD level
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Basic api, backed by mm->page_table_lock for now. Actual implementation
will be added later.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6cf8ddb..d4361e7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1294,6 +1294,19 @@ static inline void pgtable_page_dtor(struct page *page)
((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
NULL: pte_offset_kernel(pmd, address))
+static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
+{
+ return &mm->page_table_lock;
+}
+
+
+static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
+{
+ spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
+ spin_lock(ptl);
+ return ptl;
+}
+
extern void free_area_init(unsigned long * zones_size);
extern void free_area_init_node(int nid, unsigned long * zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* Re: [PATCH 3/9] mm: introduce api for split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:19 ` Peter Zijlstra
-1 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 13:19 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:10PM +0300, Kirill A. Shutemov wrote:
> Basic api, backed by mm->page_table_lock for now. Actual implementation
> will be added later.
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> include/linux/mm.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6cf8ddb..d4361e7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1294,6 +1294,19 @@ static inline void pgtable_page_dtor(struct page *page)
> ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
> NULL: pte_offset_kernel(pmd, address))
>
> +static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> +{
> + return &mm->page_table_lock;
> +}
> +
> +
> +static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
> +{
> + spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
> + spin_lock(ptl);
> + return ptl;
> +}
Why not call the thing pmd_lock()? The pmd bit differentiates it from
pte_lock() enough IMIO.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 3/9] mm: introduce api for split page table lock for PMD level
@ 2013-09-13 13:19 ` Peter Zijlstra
0 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 13:19 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:10PM +0300, Kirill A. Shutemov wrote:
> Basic api, backed by mm->page_table_lock for now. Actual implementation
> will be added later.
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> include/linux/mm.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6cf8ddb..d4361e7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1294,6 +1294,19 @@ static inline void pgtable_page_dtor(struct page *page)
> ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
> NULL: pte_offset_kernel(pmd, address))
>
> +static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> +{
> + return &mm->page_table_lock;
> +}
> +
> +
> +static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
> +{
> + spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
> + spin_lock(ptl);
> + return ptl;
> +}
Why not call the thing pmd_lock()? The pmd bit differentiates it from
pte_lock() enough IMIO.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 3/9] mm: introduce api for split page table lock for PMD level
2013-09-13 13:19 ` Peter Zijlstra
@ 2013-09-13 14:22 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 14:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kirill A. Shutemov, Alex Thorlton, Ingo Molnar, Andrew Morton,
Naoya Horiguchi, Eric W . Biederman, Paul E . McKenney, Al Viro,
Andi Kleen, Andrea Arcangeli, Dave Hansen, Dave Jones,
David Howells, Frederic Weisbecker, Johannes Weiner, Kees Cook,
Mel Gorman, Michael Kerrisk, Oleg Nesterov, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Fri, Sep 13, 2013 at 04:06:10PM +0300, Kirill A. Shutemov wrote:
> > Basic api, backed by mm->page_table_lock for now. Actual implementation
> > will be added later.
> >
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> > include/linux/mm.h | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 6cf8ddb..d4361e7 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1294,6 +1294,19 @@ static inline void pgtable_page_dtor(struct page *page)
> > ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
> > NULL: pte_offset_kernel(pmd, address))
> >
> > +static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> > +{
> > + return &mm->page_table_lock;
> > +}
> > +
> > +
> > +static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
> > +{
> > + spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
> > + spin_lock(ptl);
> > + return ptl;
> > +}
>
> Why not call the thing pmd_lock()? The pmd bit differentiates it from
> pte_lock() enough IMIO.
Okay, will rename it.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 3/9] mm: introduce api for split page table lock for PMD level
@ 2013-09-13 14:22 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 14:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kirill A. Shutemov, Alex Thorlton, Ingo Molnar, Andrew Morton,
Naoya Horiguchi, Eric W . Biederman, Paul E . McKenney, Al Viro,
Andi Kleen, Andrea Arcangeli, Dave Hansen, Dave Jones,
David Howells, Frederic Weisbecker, Johannes Weiner, Kees Cook,
Mel Gorman, Michael Kerrisk, Oleg Nesterov, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Fri, Sep 13, 2013 at 04:06:10PM +0300, Kirill A. Shutemov wrote:
> > Basic api, backed by mm->page_table_lock for now. Actual implementation
> > will be added later.
> >
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> > include/linux/mm.h | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 6cf8ddb..d4361e7 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1294,6 +1294,19 @@ static inline void pgtable_page_dtor(struct page *page)
> > ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
> > NULL: pte_offset_kernel(pmd, address))
> >
> > +static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> > +{
> > + return &mm->page_table_lock;
> > +}
> > +
> > +
> > +static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
> > +{
> > + spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
> > + spin_lock(ptl);
> > + return ptl;
> > +}
>
> Why not call the thing pmd_lock()? The pmd bit differentiates it from
> pte_lock() enough IMIO.
Okay, will rename it.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH 4/9] mm, thp: change pmd_trans_huge_lock() to return taken lock
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
With split ptlock it's important to know which lock pmd_trans_huge_lock()
took. This patch adds one more parameter to the function to return the
lock.
In most places new api migration to new api is trivial.
Exception is move_huge_pmd(): we need to take two locks if pmd tables
are different.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
fs/proc/task_mmu.c | 13 +++++++------
include/linux/huge_mm.h | 14 +++++++-------
mm/huge_memory.c | 40 +++++++++++++++++++++++++++-------------
mm/memcontrol.c | 10 +++++-----
4 files changed, 46 insertions(+), 31 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index d45d423..bbf7420 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -505,9 +505,9 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte_t *pte;
spinlock_t *ptl;
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
smaps_pte_entry(*(pte_t *)pmd, addr, HPAGE_PMD_SIZE, walk);
- spin_unlock(&walk->mm->page_table_lock);
+ spin_unlock(ptl);
mss->anonymous_thp += HPAGE_PMD_SIZE;
return 0;
}
@@ -993,13 +993,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
{
struct vm_area_struct *vma;
struct pagemapread *pm = walk->private;
+ spinlock_t *ptl;
pte_t *pte;
int err = 0;
pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
- if (vma && pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (vma && pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
int pmd_flags2;
if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
@@ -1017,7 +1018,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
if (err)
break;
}
- spin_unlock(&walk->mm->page_table_lock);
+ spin_unlock(ptl);
return err;
}
@@ -1319,7 +1320,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
md = walk->private;
- if (pmd_trans_huge_lock(pmd, md->vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, md->vma, &ptl) == 1) {
pte_t huge_pte = *(pte_t *)pmd;
struct page *page;
@@ -1327,7 +1328,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
if (page)
gather_stats(page, md, pte_dirty(huge_pte),
HPAGE_PMD_SIZE/PAGE_SIZE);
- spin_unlock(&walk->mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 3935428..4aca0d8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -129,15 +129,15 @@ extern void __vma_adjust_trans_huge(struct vm_area_struct *vma,
unsigned long start,
unsigned long end,
long adjust_next);
-extern int __pmd_trans_huge_lock(pmd_t *pmd,
- struct vm_area_struct *vma);
+extern int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl);
/* mmap_sem must be held on entry */
-static inline int pmd_trans_huge_lock(pmd_t *pmd,
- struct vm_area_struct *vma)
+static inline int pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl)
{
VM_BUG_ON(!rwsem_is_locked(&vma->vm_mm->mmap_sem));
if (pmd_trans_huge(*pmd))
- return __pmd_trans_huge_lock(pmd, vma);
+ return __pmd_trans_huge_lock(pmd, vma, ptl);
else
return 0;
}
@@ -215,8 +215,8 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
long adjust_next)
{
}
-static inline int pmd_trans_huge_lock(pmd_t *pmd,
- struct vm_area_struct *vma)
+static inline int pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl)
{
return 0;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbd41a2..acf5b4d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1342,9 +1342,10 @@ out_unlock:
int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
+ spinlock_t *ptl;
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
struct page *page;
pgtable_t pgtable;
pmd_t orig_pmd;
@@ -1359,7 +1360,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
if (is_huge_zero_pmd(orig_pmd)) {
atomic_dec(&tlb->mm->nr_ptes);
- spin_unlock(&tlb->mm->page_table_lock);
+ spin_unlock(ptl);
put_huge_zero_page();
} else {
page = pmd_page(orig_pmd);
@@ -1368,7 +1369,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
VM_BUG_ON(!PageHead(page));
atomic_dec(&tlb->mm->nr_ptes);
- spin_unlock(&tlb->mm->page_table_lock);
+ spin_unlock(ptl);
tlb_remove_page(tlb, page);
}
pte_free(tlb->mm, pgtable);
@@ -1381,14 +1382,15 @@ int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
unsigned char *vec)
{
+ spinlock_t *ptl;
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
/*
* All logical pages in the range are present
* if backed by a huge page.
*/
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
memset(vec, 1, (end - addr) >> PAGE_SHIFT);
ret = 1;
}
@@ -1401,6 +1403,7 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long old_end,
pmd_t *old_pmd, pmd_t *new_pmd)
{
+ spinlock_t *old_ptl, *new_ptl;
int ret = 0;
pmd_t pmd;
@@ -1421,12 +1424,21 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
goto out;
}
- ret = __pmd_trans_huge_lock(old_pmd, vma);
+ /*
+ * We don't have to worry about the ordering of src and dst
+ * ptlocks because exclusive mmap_sem prevents deadlock.
+ */
+ ret = __pmd_trans_huge_lock(old_pmd, vma, &old_ptl);
if (ret == 1) {
+ new_ptl = huge_pmd_lockptr(mm, new_pmd);
+ if (new_ptl != old_ptl)
+ spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
- spin_unlock(&mm->page_table_lock);
+ if (new_ptl != old_ptl)
+ spin_unlock(new_ptl);
+ spin_unlock(old_ptl);
}
out:
return ret;
@@ -1436,9 +1448,10 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, pgprot_t newprot, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
pmd_t entry;
entry = pmdp_get_and_clear(mm, addr, pmd);
if (!prot_numa) {
@@ -1454,7 +1467,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
}
}
set_pmd_at(mm, addr, pmd, entry);
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
ret = 1;
}
@@ -1468,12 +1481,13 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
* Note that if it returns 1, this routine returns without unlocking page
* table locks. So callers must unlock them.
*/
-int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
+int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl)
{
- spin_lock(&vma->vm_mm->page_table_lock);
+ *ptl = huge_pmd_lock(vma->vm_mm, pmd);
if (likely(pmd_trans_huge(*pmd))) {
if (unlikely(pmd_trans_splitting(*pmd))) {
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(*ptl);
wait_split_huge_page(vma->anon_vma, pmd);
return -1;
} else {
@@ -1482,7 +1496,7 @@ int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
return 1;
}
}
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(*ptl);
return 0;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d5ff3ce..5f35b2a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6376,10 +6376,10 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
pte_t *pte;
spinlock_t *ptl;
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
mc.precharge += HPAGE_PMD_NR;
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
@@ -6568,9 +6568,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
* to be unlocked in __split_huge_page_splitting(), where the main
* part of thp split is not executed yet.
*/
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
if (mc.precharge < HPAGE_PMD_NR) {
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
@@ -6587,7 +6587,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
}
put_page(page);
}
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 4/9] mm, thp: change pmd_trans_huge_lock() to return taken lock
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
With split ptlock it's important to know which lock pmd_trans_huge_lock()
took. This patch adds one more parameter to the function to return the
lock.
In most places new api migration to new api is trivial.
Exception is move_huge_pmd(): we need to take two locks if pmd tables
are different.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
fs/proc/task_mmu.c | 13 +++++++------
include/linux/huge_mm.h | 14 +++++++-------
mm/huge_memory.c | 40 +++++++++++++++++++++++++++-------------
mm/memcontrol.c | 10 +++++-----
4 files changed, 46 insertions(+), 31 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index d45d423..bbf7420 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -505,9 +505,9 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
pte_t *pte;
spinlock_t *ptl;
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
smaps_pte_entry(*(pte_t *)pmd, addr, HPAGE_PMD_SIZE, walk);
- spin_unlock(&walk->mm->page_table_lock);
+ spin_unlock(ptl);
mss->anonymous_thp += HPAGE_PMD_SIZE;
return 0;
}
@@ -993,13 +993,14 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
{
struct vm_area_struct *vma;
struct pagemapread *pm = walk->private;
+ spinlock_t *ptl;
pte_t *pte;
int err = 0;
pagemap_entry_t pme = make_pme(PM_NOT_PRESENT(pm->v2));
/* find the first VMA at or above 'addr' */
vma = find_vma(walk->mm, addr);
- if (vma && pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (vma && pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
int pmd_flags2;
if ((vma->vm_flags & VM_SOFTDIRTY) || pmd_soft_dirty(*pmd))
@@ -1017,7 +1018,7 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
if (err)
break;
}
- spin_unlock(&walk->mm->page_table_lock);
+ spin_unlock(ptl);
return err;
}
@@ -1319,7 +1320,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
md = walk->private;
- if (pmd_trans_huge_lock(pmd, md->vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, md->vma, &ptl) == 1) {
pte_t huge_pte = *(pte_t *)pmd;
struct page *page;
@@ -1327,7 +1328,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
if (page)
gather_stats(page, md, pte_dirty(huge_pte),
HPAGE_PMD_SIZE/PAGE_SIZE);
- spin_unlock(&walk->mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 3935428..4aca0d8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -129,15 +129,15 @@ extern void __vma_adjust_trans_huge(struct vm_area_struct *vma,
unsigned long start,
unsigned long end,
long adjust_next);
-extern int __pmd_trans_huge_lock(pmd_t *pmd,
- struct vm_area_struct *vma);
+extern int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl);
/* mmap_sem must be held on entry */
-static inline int pmd_trans_huge_lock(pmd_t *pmd,
- struct vm_area_struct *vma)
+static inline int pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl)
{
VM_BUG_ON(!rwsem_is_locked(&vma->vm_mm->mmap_sem));
if (pmd_trans_huge(*pmd))
- return __pmd_trans_huge_lock(pmd, vma);
+ return __pmd_trans_huge_lock(pmd, vma, ptl);
else
return 0;
}
@@ -215,8 +215,8 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
long adjust_next)
{
}
-static inline int pmd_trans_huge_lock(pmd_t *pmd,
- struct vm_area_struct *vma)
+static inline int pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl)
{
return 0;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bbd41a2..acf5b4d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1342,9 +1342,10 @@ out_unlock:
int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
+ spinlock_t *ptl;
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
struct page *page;
pgtable_t pgtable;
pmd_t orig_pmd;
@@ -1359,7 +1360,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
if (is_huge_zero_pmd(orig_pmd)) {
atomic_dec(&tlb->mm->nr_ptes);
- spin_unlock(&tlb->mm->page_table_lock);
+ spin_unlock(ptl);
put_huge_zero_page();
} else {
page = pmd_page(orig_pmd);
@@ -1368,7 +1369,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
VM_BUG_ON(!PageHead(page));
atomic_dec(&tlb->mm->nr_ptes);
- spin_unlock(&tlb->mm->page_table_lock);
+ spin_unlock(ptl);
tlb_remove_page(tlb, page);
}
pte_free(tlb->mm, pgtable);
@@ -1381,14 +1382,15 @@ int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end,
unsigned char *vec)
{
+ spinlock_t *ptl;
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
/*
* All logical pages in the range are present
* if backed by a huge page.
*/
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
memset(vec, 1, (end - addr) >> PAGE_SHIFT);
ret = 1;
}
@@ -1401,6 +1403,7 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long old_end,
pmd_t *old_pmd, pmd_t *new_pmd)
{
+ spinlock_t *old_ptl, *new_ptl;
int ret = 0;
pmd_t pmd;
@@ -1421,12 +1424,21 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
goto out;
}
- ret = __pmd_trans_huge_lock(old_pmd, vma);
+ /*
+ * We don't have to worry about the ordering of src and dst
+ * ptlocks because exclusive mmap_sem prevents deadlock.
+ */
+ ret = __pmd_trans_huge_lock(old_pmd, vma, &old_ptl);
if (ret == 1) {
+ new_ptl = huge_pmd_lockptr(mm, new_pmd);
+ if (new_ptl != old_ptl)
+ spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
- spin_unlock(&mm->page_table_lock);
+ if (new_ptl != old_ptl)
+ spin_unlock(new_ptl);
+ spin_unlock(old_ptl);
}
out:
return ret;
@@ -1436,9 +1448,10 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, pgprot_t newprot, int prot_numa)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
int ret = 0;
- if (__pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (__pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
pmd_t entry;
entry = pmdp_get_and_clear(mm, addr, pmd);
if (!prot_numa) {
@@ -1454,7 +1467,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
}
}
set_pmd_at(mm, addr, pmd, entry);
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
ret = 1;
}
@@ -1468,12 +1481,13 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
* Note that if it returns 1, this routine returns without unlocking page
* table locks. So callers must unlock them.
*/
-int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
+int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
+ spinlock_t **ptl)
{
- spin_lock(&vma->vm_mm->page_table_lock);
+ *ptl = huge_pmd_lock(vma->vm_mm, pmd);
if (likely(pmd_trans_huge(*pmd))) {
if (unlikely(pmd_trans_splitting(*pmd))) {
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(*ptl);
wait_split_huge_page(vma->anon_vma, pmd);
return -1;
} else {
@@ -1482,7 +1496,7 @@ int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma)
return 1;
}
}
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(*ptl);
return 0;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d5ff3ce..5f35b2a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6376,10 +6376,10 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
pte_t *pte;
spinlock_t *ptl;
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
mc.precharge += HPAGE_PMD_NR;
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
@@ -6568,9 +6568,9 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
* to be unlocked in __split_huge_page_splitting(), where the main
* part of thp split is not executed yet.
*/
- if (pmd_trans_huge_lock(pmd, vma) == 1) {
+ if (pmd_trans_huge_lock(pmd, vma, &ptl) == 1) {
if (mc.precharge < HPAGE_PMD_NR) {
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
@@ -6587,7 +6587,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
}
put_page(page);
}
- spin_unlock(&vma->vm_mm->page_table_lock);
+ spin_unlock(ptl);
return 0;
}
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 5/9] mm, thp: move ptl taking inside page_check_address_pmd()
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
With split page table lock we can't know which lock we need to take
before we find the relevant pmd.
Let's move lock taking inside the function.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/huge_mm.h | 3 ++-
mm/huge_memory.c | 43 +++++++++++++++++++++++++++----------------
mm/rmap.c | 13 +++++--------
3 files changed, 34 insertions(+), 25 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4aca0d8..91672e2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -54,7 +54,8 @@ enum page_check_address_pmd_flag {
extern pmd_t *page_check_address_pmd(struct page *page,
struct mm_struct *mm,
unsigned long address,
- enum page_check_address_pmd_flag flag);
+ enum page_check_address_pmd_flag flag,
+ spinlock_t **ptl);
#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
#define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index acf5b4d..4b58a01 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1500,23 +1500,33 @@ int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
return 0;
}
+/*
+ * This function returns whether a given @page is mapped onto the @address
+ * in the virtual space of @mm.
+ *
+ * When it's true, this function returns *pmd with holding the page table lock
+ * and passing it back to the caller via @ptl.
+ * If it's false, returns NULL without holding the page table lock.
+ */
pmd_t *page_check_address_pmd(struct page *page,
struct mm_struct *mm,
unsigned long address,
- enum page_check_address_pmd_flag flag)
+ enum page_check_address_pmd_flag flag,
+ spinlock_t **ptl)
{
- pmd_t *pmd, *ret = NULL;
+ pmd_t *pmd;
if (address & ~HPAGE_PMD_MASK)
- goto out;
+ return NULL;
pmd = mm_find_pmd(mm, address);
if (!pmd)
- goto out;
+ return NULL;
+ *ptl = huge_pmd_lock(mm, pmd);
if (pmd_none(*pmd))
- goto out;
+ goto unlock;
if (pmd_page(*pmd) != page)
- goto out;
+ goto unlock;
/*
* split_vma() may create temporary aliased mappings. There is
* no risk as long as all huge pmd are found and have their
@@ -1526,14 +1536,15 @@ pmd_t *page_check_address_pmd(struct page *page,
*/
if (flag == PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG &&
pmd_trans_splitting(*pmd))
- goto out;
+ goto unlock;
if (pmd_trans_huge(*pmd)) {
VM_BUG_ON(flag == PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG &&
!pmd_trans_splitting(*pmd));
- ret = pmd;
+ return pmd;
}
-out:
- return ret;
+unlock:
+ spin_unlock(*ptl);
+ return NULL;
}
static int __split_huge_page_splitting(struct page *page,
@@ -1541,6 +1552,7 @@ static int __split_huge_page_splitting(struct page *page,
unsigned long address)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
pmd_t *pmd;
int ret = 0;
/* For mmu_notifiers */
@@ -1548,9 +1560,8 @@ static int __split_huge_page_splitting(struct page *page,
const unsigned long mmun_end = address + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
- PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG);
+ PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG, &ptl);
if (pmd) {
/*
* We can't temporarily set the pmd to null in order
@@ -1561,8 +1572,8 @@ static int __split_huge_page_splitting(struct page *page,
*/
pmdp_splitting_flush(vma, address, pmd);
ret = 1;
+ spin_unlock(ptl);
}
- spin_unlock(&mm->page_table_lock);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
return ret;
@@ -1693,14 +1704,14 @@ static int __split_huge_page_map(struct page *page,
unsigned long address)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
pmd_t *pmd, _pmd;
int ret = 0, i;
pgtable_t pgtable;
unsigned long haddr;
- spin_lock(&mm->page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
- PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
+ PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG, &ptl);
if (pmd) {
pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
@@ -1755,8 +1766,8 @@ static int __split_huge_page_map(struct page *page,
pmdp_invalidate(vma, address, pmd);
pmd_populate(mm, pmd, pgtable);
ret = 1;
+ spin_unlock(ptl);
}
- spin_unlock(&mm->page_table_lock);
return ret;
}
diff --git a/mm/rmap.c b/mm/rmap.c
index fd3ee7a..b59d741 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -665,25 +665,23 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
unsigned long *vm_flags)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
int referenced = 0;
if (unlikely(PageTransHuge(page))) {
pmd_t *pmd;
- spin_lock(&mm->page_table_lock);
/*
* rmap might return false positives; we must filter
* these out using page_check_address_pmd().
*/
pmd = page_check_address_pmd(page, mm, address,
- PAGE_CHECK_ADDRESS_PMD_FLAG);
- if (!pmd) {
- spin_unlock(&mm->page_table_lock);
+ PAGE_CHECK_ADDRESS_PMD_FLAG, &ptl);
+ if (!pmd)
goto out;
- }
if (vma->vm_flags & VM_LOCKED) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
*mapcount = 0; /* break early from loop */
*vm_flags |= VM_LOCKED;
goto out;
@@ -692,10 +690,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
/* go ahead even if the pmd is pmd_trans_splitting() */
if (pmdp_clear_flush_young_notify(vma, address, pmd))
referenced++;
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
} else {
pte_t *pte;
- spinlock_t *ptl;
/*
* rmap might return false positives; we must filter
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 5/9] mm, thp: move ptl taking inside page_check_address_pmd()
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
With split page table lock we can't know which lock we need to take
before we find the relevant pmd.
Let's move lock taking inside the function.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/huge_mm.h | 3 ++-
mm/huge_memory.c | 43 +++++++++++++++++++++++++++----------------
mm/rmap.c | 13 +++++--------
3 files changed, 34 insertions(+), 25 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4aca0d8..91672e2 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -54,7 +54,8 @@ enum page_check_address_pmd_flag {
extern pmd_t *page_check_address_pmd(struct page *page,
struct mm_struct *mm,
unsigned long address,
- enum page_check_address_pmd_flag flag);
+ enum page_check_address_pmd_flag flag,
+ spinlock_t **ptl);
#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
#define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index acf5b4d..4b58a01 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1500,23 +1500,33 @@ int __pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma,
return 0;
}
+/*
+ * This function returns whether a given @page is mapped onto the @address
+ * in the virtual space of @mm.
+ *
+ * When it's true, this function returns *pmd with holding the page table lock
+ * and passing it back to the caller via @ptl.
+ * If it's false, returns NULL without holding the page table lock.
+ */
pmd_t *page_check_address_pmd(struct page *page,
struct mm_struct *mm,
unsigned long address,
- enum page_check_address_pmd_flag flag)
+ enum page_check_address_pmd_flag flag,
+ spinlock_t **ptl)
{
- pmd_t *pmd, *ret = NULL;
+ pmd_t *pmd;
if (address & ~HPAGE_PMD_MASK)
- goto out;
+ return NULL;
pmd = mm_find_pmd(mm, address);
if (!pmd)
- goto out;
+ return NULL;
+ *ptl = huge_pmd_lock(mm, pmd);
if (pmd_none(*pmd))
- goto out;
+ goto unlock;
if (pmd_page(*pmd) != page)
- goto out;
+ goto unlock;
/*
* split_vma() may create temporary aliased mappings. There is
* no risk as long as all huge pmd are found and have their
@@ -1526,14 +1536,15 @@ pmd_t *page_check_address_pmd(struct page *page,
*/
if (flag == PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG &&
pmd_trans_splitting(*pmd))
- goto out;
+ goto unlock;
if (pmd_trans_huge(*pmd)) {
VM_BUG_ON(flag == PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG &&
!pmd_trans_splitting(*pmd));
- ret = pmd;
+ return pmd;
}
-out:
- return ret;
+unlock:
+ spin_unlock(*ptl);
+ return NULL;
}
static int __split_huge_page_splitting(struct page *page,
@@ -1541,6 +1552,7 @@ static int __split_huge_page_splitting(struct page *page,
unsigned long address)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
pmd_t *pmd;
int ret = 0;
/* For mmu_notifiers */
@@ -1548,9 +1560,8 @@ static int __split_huge_page_splitting(struct page *page,
const unsigned long mmun_end = address + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
- PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG);
+ PAGE_CHECK_ADDRESS_PMD_NOTSPLITTING_FLAG, &ptl);
if (pmd) {
/*
* We can't temporarily set the pmd to null in order
@@ -1561,8 +1572,8 @@ static int __split_huge_page_splitting(struct page *page,
*/
pmdp_splitting_flush(vma, address, pmd);
ret = 1;
+ spin_unlock(ptl);
}
- spin_unlock(&mm->page_table_lock);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
return ret;
@@ -1693,14 +1704,14 @@ static int __split_huge_page_map(struct page *page,
unsigned long address)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
pmd_t *pmd, _pmd;
int ret = 0, i;
pgtable_t pgtable;
unsigned long haddr;
- spin_lock(&mm->page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
- PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
+ PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG, &ptl);
if (pmd) {
pgtable = pgtable_trans_huge_withdraw(mm, pmd);
pmd_populate(mm, &_pmd, pgtable);
@@ -1755,8 +1766,8 @@ static int __split_huge_page_map(struct page *page,
pmdp_invalidate(vma, address, pmd);
pmd_populate(mm, pmd, pgtable);
ret = 1;
+ spin_unlock(ptl);
}
- spin_unlock(&mm->page_table_lock);
return ret;
}
diff --git a/mm/rmap.c b/mm/rmap.c
index fd3ee7a..b59d741 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -665,25 +665,23 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
unsigned long *vm_flags)
{
struct mm_struct *mm = vma->vm_mm;
+ spinlock_t *ptl;
int referenced = 0;
if (unlikely(PageTransHuge(page))) {
pmd_t *pmd;
- spin_lock(&mm->page_table_lock);
/*
* rmap might return false positives; we must filter
* these out using page_check_address_pmd().
*/
pmd = page_check_address_pmd(page, mm, address,
- PAGE_CHECK_ADDRESS_PMD_FLAG);
- if (!pmd) {
- spin_unlock(&mm->page_table_lock);
+ PAGE_CHECK_ADDRESS_PMD_FLAG, &ptl);
+ if (!pmd)
goto out;
- }
if (vma->vm_flags & VM_LOCKED) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
*mapcount = 0; /* break early from loop */
*vm_flags |= VM_LOCKED;
goto out;
@@ -692,10 +690,9 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
/* go ahead even if the pmd is pmd_trans_splitting() */
if (pmdp_clear_flush_young_notify(vma, address, pmd))
referenced++;
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
} else {
pte_t *pte;
- spinlock_t *ptl;
/*
* rmap might return false positives; we must filter
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 6/9] mm, thp: do not access mm->pmd_huge_pte directly
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Currently mm->pmd_huge_pte protected by page table lock. It will not
work with split lock. We have to have per-pmd pmd_huge_pte for proper
access serialization.
For now, let's just introduce wrapper to access mm->pmd_huge_pte.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/s390/mm/pgtable.c | 12 ++++++------
arch/sparc/mm/tlb.c | 12 ++++++------
include/linux/mm.h | 1 +
mm/pgtable-generic.c | 12 ++++++------
4 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index de8cbc3..c463e5c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1225,11 +1225,11 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- if (!mm->pmd_huge_pte)
+ if (!pmd_huge_pte(mm, pmdp))
INIT_LIST_HEAD(lh);
else
- list_add(lh, (struct list_head *) mm->pmd_huge_pte);
- mm->pmd_huge_pte = pgtable;
+ list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp));
+ pmd_huge_pte(mm, pmdp) = pgtable;
}
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
@@ -1241,12 +1241,12 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- pgtable = mm->pmd_huge_pte;
+ pgtable = pmd_huge_pte(mm, pmdp);
lh = (struct list_head *) pgtable;
if (list_empty(lh))
- mm->pmd_huge_pte = NULL;
+ pmd_huge_pte(mm, pmdp) = NULL;
else {
- mm->pmd_huge_pte = (pgtable_t) lh->next;
+ pmd_huge_pte(mm, pmdp) = (pgtable_t) lh->next;
list_del(lh);
}
ptep = (pte_t *) pgtable;
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 7a91f28..656cc46 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -196,11 +196,11 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- if (!mm->pmd_huge_pte)
+ if (!pmd_huge_pte(mm, pmdp))
INIT_LIST_HEAD(lh);
else
- list_add(lh, (struct list_head *) mm->pmd_huge_pte);
- mm->pmd_huge_pte = pgtable;
+ list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp));
+ pmd_huge_pte(mm, pmdp) = pgtable;
}
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
@@ -211,12 +211,12 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- pgtable = mm->pmd_huge_pte;
+ pgtable = pmd_huge_pte(mm, pmdp);
lh = (struct list_head *) pgtable;
if (list_empty(lh))
- mm->pmd_huge_pte = NULL;
+ pmd_huge_pte(mm, pmdp) = NULL;
else {
- mm->pmd_huge_pte = (pgtable_t) lh->next;
+ pmd_huge_pte(mm, pmdp) = (pgtable_t) lh->next;
list_del(lh);
}
pte_val(pgtable[0]) = 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d4361e7..d2f8a50 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1299,6 +1299,7 @@ static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
return &mm->page_table_lock;
}
+#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
{
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 3929a40..41fee3e 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -154,11 +154,11 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- if (!mm->pmd_huge_pte)
+ if (!pmd_huge_pte(mm, pmdp))
INIT_LIST_HEAD(&pgtable->lru);
else
- list_add(&pgtable->lru, &mm->pmd_huge_pte->lru);
- mm->pmd_huge_pte = pgtable;
+ list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru);
+ pmd_huge_pte(mm, pmdp) = pgtable;
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif
@@ -173,11 +173,11 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- pgtable = mm->pmd_huge_pte;
+ pgtable = pmd_huge_pte(mm, pmdp);
if (list_empty(&pgtable->lru))
- mm->pmd_huge_pte = NULL;
+ pmd_huge_pte(mm, pmdp) = NULL;
else {
- mm->pmd_huge_pte = list_entry(pgtable->lru.next,
+ pmd_huge_pte(mm, pmdp) = list_entry(pgtable->lru.next,
struct page, lru);
list_del(&pgtable->lru);
}
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 6/9] mm, thp: do not access mm->pmd_huge_pte directly
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Currently mm->pmd_huge_pte protected by page table lock. It will not
work with split lock. We have to have per-pmd pmd_huge_pte for proper
access serialization.
For now, let's just introduce wrapper to access mm->pmd_huge_pte.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/s390/mm/pgtable.c | 12 ++++++------
arch/sparc/mm/tlb.c | 12 ++++++------
include/linux/mm.h | 1 +
mm/pgtable-generic.c | 12 ++++++------
4 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index de8cbc3..c463e5c 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1225,11 +1225,11 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- if (!mm->pmd_huge_pte)
+ if (!pmd_huge_pte(mm, pmdp))
INIT_LIST_HEAD(lh);
else
- list_add(lh, (struct list_head *) mm->pmd_huge_pte);
- mm->pmd_huge_pte = pgtable;
+ list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp));
+ pmd_huge_pte(mm, pmdp) = pgtable;
}
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
@@ -1241,12 +1241,12 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- pgtable = mm->pmd_huge_pte;
+ pgtable = pmd_huge_pte(mm, pmdp);
lh = (struct list_head *) pgtable;
if (list_empty(lh))
- mm->pmd_huge_pte = NULL;
+ pmd_huge_pte(mm, pmdp) = NULL;
else {
- mm->pmd_huge_pte = (pgtable_t) lh->next;
+ pmd_huge_pte(mm, pmdp) = (pgtable_t) lh->next;
list_del(lh);
}
ptep = (pte_t *) pgtable;
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 7a91f28..656cc46 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -196,11 +196,11 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- if (!mm->pmd_huge_pte)
+ if (!pmd_huge_pte(mm, pmdp))
INIT_LIST_HEAD(lh);
else
- list_add(lh, (struct list_head *) mm->pmd_huge_pte);
- mm->pmd_huge_pte = pgtable;
+ list_add(lh, (struct list_head *) pmd_huge_pte(mm, pmdp));
+ pmd_huge_pte(mm, pmdp) = pgtable;
}
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
@@ -211,12 +211,12 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- pgtable = mm->pmd_huge_pte;
+ pgtable = pmd_huge_pte(mm, pmdp);
lh = (struct list_head *) pgtable;
if (list_empty(lh))
- mm->pmd_huge_pte = NULL;
+ pmd_huge_pte(mm, pmdp) = NULL;
else {
- mm->pmd_huge_pte = (pgtable_t) lh->next;
+ pmd_huge_pte(mm, pmdp) = (pgtable_t) lh->next;
list_del(lh);
}
pte_val(pgtable[0]) = 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d4361e7..d2f8a50 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1299,6 +1299,7 @@ static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
return &mm->page_table_lock;
}
+#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
{
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 3929a40..41fee3e 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -154,11 +154,11 @@ void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- if (!mm->pmd_huge_pte)
+ if (!pmd_huge_pte(mm, pmdp))
INIT_LIST_HEAD(&pgtable->lru);
else
- list_add(&pgtable->lru, &mm->pmd_huge_pte->lru);
- mm->pmd_huge_pte = pgtable;
+ list_add(&pgtable->lru, &pmd_huge_pte(mm, pmdp)->lru);
+ pmd_huge_pte(mm, pmdp) = pgtable;
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#endif
@@ -173,11 +173,11 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
assert_spin_locked(&mm->page_table_lock);
/* FIFO */
- pgtable = mm->pmd_huge_pte;
+ pgtable = pmd_huge_pte(mm, pmdp);
if (list_empty(&pgtable->lru))
- mm->pmd_huge_pte = NULL;
+ pmd_huge_pte(mm, pmdp) = NULL;
else {
- mm->pmd_huge_pte = list_entry(pgtable->lru.next,
+ pmd_huge_pte(mm, pmdp) = list_entry(pgtable->lru.next,
struct page, lru);
list_del(&pgtable->lru);
}
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 7/9] mm: convent the rest to new page table lock api
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Only trivial cases left. Let's convert them altogether.
hugetlbfs is not covered for now.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
mm/huge_memory.c | 108 ++++++++++++++++++++++++++++-----------------------
mm/memory.c | 17 ++++----
mm/migrate.c | 7 ++--
mm/mprotect.c | 4 +-
mm/pgtable-generic.c | 4 +-
5 files changed, 77 insertions(+), 63 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4b58a01..e728d74 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -709,6 +709,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
struct page *page)
{
pgtable_t pgtable;
+ spinlock_t *ptl;
VM_BUG_ON(!PageCompound(page));
pgtable = pte_alloc_one(mm, haddr);
@@ -723,9 +724,9 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
*/
__SetPageUptodate(page);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_none(*pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mem_cgroup_uncharge_page(page);
put_page(page);
pte_free(mm, pgtable);
@@ -738,7 +739,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
set_pmd_at(mm, haddr, pmd, entry);
add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
atomic_inc(&mm->nr_ptes);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
}
return 0;
@@ -766,6 +767,7 @@ static inline struct page *alloc_hugepage(int defrag)
}
#endif
+/* Caller must hold page table lock. */
static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd,
struct page *zero_page)
@@ -797,6 +799,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
return VM_FAULT_OOM;
if (!(flags & FAULT_FLAG_WRITE) &&
transparent_hugepage_use_zero_page()) {
+ spinlock_t *ptl;
pgtable_t pgtable;
struct page *zero_page;
bool set;
@@ -809,10 +812,10 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
count_vm_event(THP_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
}
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
zero_page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (!set) {
pte_free(mm, pgtable);
put_huge_zero_page();
@@ -845,6 +848,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
struct vm_area_struct *vma)
{
+ spinlock_t *dst_ptl, *src_ptl;
struct page *src_page;
pmd_t pmd;
pgtable_t pgtable;
@@ -855,8 +859,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
if (unlikely(!pgtable))
goto out;
- spin_lock(&dst_mm->page_table_lock);
- spin_lock_nested(&src_mm->page_table_lock, SINGLE_DEPTH_NESTING);
+ dst_ptl = huge_pmd_lock(dst_mm, dst_pmd);
+ src_ptl = huge_pmd_lockptr(src_mm, src_pmd);
+ spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
ret = -EAGAIN;
pmd = *src_pmd;
@@ -865,7 +870,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
goto out_unlock;
}
/*
- * mm->page_table_lock is enough to be sure that huge zero pmd is not
+ * When page table lock is held, the huge zero pmd should not be
* under splitting since we don't split the page itself, only pmd to
* a page table.
*/
@@ -886,8 +891,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}
if (unlikely(pmd_trans_splitting(pmd))) {
/* split huge page running from under us */
- spin_unlock(&src_mm->page_table_lock);
- spin_unlock(&dst_mm->page_table_lock);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
pte_free(dst_mm, pgtable);
wait_split_huge_page(vma->anon_vma, src_pmd); /* src_vma */
@@ -907,8 +912,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
ret = 0;
out_unlock:
- spin_unlock(&src_mm->page_table_lock);
- spin_unlock(&dst_mm->page_table_lock);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
out:
return ret;
}
@@ -919,10 +924,11 @@ void huge_pmd_set_accessed(struct mm_struct *mm,
pmd_t *pmd, pmd_t orig_pmd,
int dirty)
{
+ spinlock_t *ptl;
pmd_t entry;
unsigned long haddr;
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto unlock;
@@ -932,13 +938,14 @@ void huge_pmd_set_accessed(struct mm_struct *mm,
update_mmu_cache_pmd(vma, address, pmd);
unlock:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
}
static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, pmd_t orig_pmd, unsigned long haddr)
{
+ spinlock_t *ptl;
pgtable_t pgtable;
pmd_t _pmd;
struct page *page;
@@ -965,7 +972,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_free_page;
@@ -992,7 +999,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
}
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
put_huge_zero_page();
inc_mm_counter(mm, MM_ANONPAGES);
@@ -1002,7 +1009,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
out:
return ret;
out_free_page:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
mem_cgroup_uncharge_page(page);
put_page(page);
@@ -1016,6 +1023,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
struct page *page,
unsigned long haddr)
{
+ spinlock_t *ptl;
pgtable_t pgtable;
pmd_t _pmd;
int ret = 0, i;
@@ -1062,7 +1070,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_free_pages;
VM_BUG_ON(!PageHead(page));
@@ -1088,7 +1096,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
page_remove_rmap(page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
@@ -1099,7 +1107,7 @@ out:
return ret;
out_free_pages:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
mem_cgroup_uncharge_start();
for (i = 0; i < HPAGE_PMD_NR; i++) {
@@ -1114,17 +1122,19 @@ out_free_pages:
int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, pmd_t orig_pmd)
{
+ spinlock_t *ptl;
int ret = 0;
struct page *page = NULL, *new_page;
unsigned long haddr;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
+ ptl = huge_pmd_lockptr(mm, pmd);
VM_BUG_ON(!vma->anon_vma);
haddr = address & HPAGE_PMD_MASK;
if (is_huge_zero_pmd(orig_pmd))
goto alloc;
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_unlock;
@@ -1140,7 +1150,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
goto out_unlock;
}
get_page(page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
alloc:
if (transparent_hugepage_enabled(vma) &&
!transparent_hugepage_debug_cow())
@@ -1187,11 +1197,11 @@ alloc:
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (page)
put_page(page);
if (unlikely(!pmd_same(*pmd, orig_pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mem_cgroup_uncharge_page(new_page);
put_page(new_page);
goto out_mn;
@@ -1213,13 +1223,13 @@ alloc:
}
ret |= VM_FAULT_WRITE;
}
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
out_mn:
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
out:
return ret;
out_unlock:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
return ret;
}
@@ -1231,7 +1241,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
- assert_spin_locked(&mm->page_table_lock);
+ assert_spin_locked(huge_pmd_lockptr(mm, pmd));
if (flags & FOLL_WRITE && !pmd_write(*pmd))
goto out;
@@ -1278,13 +1288,14 @@ out:
int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t pmd, pmd_t *pmdp)
{
+ spinlock_t *ptl;
struct page *page;
unsigned long haddr = addr & HPAGE_PMD_MASK;
int target_nid;
int current_nid = -1;
bool migrated;
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmdp);
if (unlikely(!pmd_same(pmd, *pmdp)))
goto out_unlock;
@@ -1302,17 +1313,17 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
}
/* Acquire the page lock to serialise THP migrations */
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
lock_page(page);
/* Confirm the PTE did not while locked */
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (unlikely(!pmd_same(pmd, *pmdp))) {
unlock_page(page);
put_page(page);
goto out_unlock;
}
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
/* Migrate the THP to the requested node */
migrated = migrate_misplaced_transhuge_page(mm, vma,
@@ -1324,7 +1335,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
check_same:
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (unlikely(!pmd_same(pmd, *pmdp)))
goto out_unlock;
clear_pmdnuma:
@@ -1333,7 +1344,7 @@ clear_pmdnuma:
VM_BUG_ON(pmd_numa(*pmdp));
update_mmu_cache_pmd(vma, addr, pmdp);
out_unlock:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (current_nid != -1)
task_numa_fault(current_nid, HPAGE_PMD_NR, false);
return 0;
@@ -2282,7 +2293,7 @@ static void collapse_huge_page(struct mm_struct *mm,
pte_t *pte;
pgtable_t pgtable;
struct page *new_page;
- spinlock_t *ptl;
+ spinlock_t *pmd_ptl, *pte_ptl;
int isolated;
unsigned long hstart, hend;
unsigned long mmun_start; /* For mmu_notifiers */
@@ -2325,12 +2336,12 @@ static void collapse_huge_page(struct mm_struct *mm,
anon_vma_lock_write(vma->anon_vma);
pte = pte_offset_map(pmd, address);
- ptl = pte_lockptr(mm, pmd);
+ pte_ptl = pte_lockptr(mm, pmd);
mmun_start = address;
mmun_end = address + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock); /* probably unnecessary */
+ pmd_ptl = huge_pmd_lock(mm, pmd); /* probably unnecessary */
/*
* After this gup_fast can't run anymore. This also removes
* any huge TLB entry from the CPU so we won't allow
@@ -2338,16 +2349,16 @@ static void collapse_huge_page(struct mm_struct *mm,
* to avoid the risk of CPU bugs in that area.
*/
_pmd = pmdp_clear_flush(vma, address, pmd);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(pmd_ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
- spin_lock(ptl);
+ spin_lock(pte_ptl);
isolated = __collapse_huge_page_isolate(vma, address, pte);
- spin_unlock(ptl);
+ spin_unlock(pte_ptl);
if (unlikely(!isolated)) {
pte_unmap(pte);
- spin_lock(&mm->page_table_lock);
+ spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
/*
* We can only use set_pmd_at when establishing
@@ -2355,7 +2366,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* points to regular pagetables. Use pmd_populate for that
*/
pmd_populate(mm, pmd, pmd_pgtable(_pmd));
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(pmd_ptl);
anon_vma_unlock_write(vma->anon_vma);
goto out;
}
@@ -2366,7 +2377,7 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
anon_vma_unlock_write(vma->anon_vma);
- __collapse_huge_page_copy(pte, new_page, vma, address, ptl);
+ __collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
pte_unmap(pte);
__SetPageUptodate(new_page);
pgtable = pmd_pgtable(_pmd);
@@ -2381,13 +2392,13 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
smp_wmb();
- spin_lock(&mm->page_table_lock);
+ spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(pmd_ptl);
*hpage = NULL;
@@ -2712,6 +2723,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd)
{
+ spinlock_t *ptl;
struct page *page;
struct mm_struct *mm = vma->vm_mm;
unsigned long haddr = address & HPAGE_PMD_MASK;
@@ -2723,22 +2735,22 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
mmun_start = haddr;
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_trans_huge(*pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
return;
}
if (is_huge_zero_pmd(*pmd)) {
__split_huge_zero_page_pmd(vma, haddr, pmd);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
return;
}
page = pmd_page(*pmd);
VM_BUG_ON(!page_count(page));
get_page(page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
split_huge_page(page);
diff --git a/mm/memory.c b/mm/memory.c
index 1046396..a0ed1d5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -552,6 +552,7 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long address)
{
+ spinlock_t *ptl;
pgtable_t new = pte_alloc_one(mm, address);
int wait_split_huge_page;
if (!new)
@@ -572,7 +573,7 @@ int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
*/
smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
wait_split_huge_page = 0;
if (likely(pmd_none(*pmd))) { /* Has another populated it ? */
atomic_inc(&mm->nr_ptes);
@@ -580,7 +581,7 @@ int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
new = NULL;
} else if (unlikely(pmd_trans_splitting(*pmd)))
wait_split_huge_page = 1;
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (new)
pte_free(mm, new);
if (wait_split_huge_page)
@@ -1516,20 +1517,20 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
split_huge_page_pmd(vma, address, pmd);
goto split_fallthrough;
}
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (likely(pmd_trans_huge(*pmd))) {
if (unlikely(pmd_trans_splitting(*pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
wait_split_huge_page(vma->anon_vma, pmd);
} else {
page = follow_trans_huge_pmd(vma, address,
pmd, flags);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
*page_mask = HPAGE_PMD_NR - 1;
goto out;
}
} else
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
/* fall through */
}
split_fallthrough:
@@ -3602,13 +3603,13 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
bool numa = false;
int local_nid = numa_node_id();
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
pmd = *pmdp;
if (pmd_numa(pmd)) {
set_pmd_at(mm, _addr, pmdp, pmd_mknonnuma(pmd));
numa = true;
}
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (!numa)
return 0;
diff --git a/mm/migrate.c b/mm/migrate.c
index b7ded7e..32eff0c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1653,6 +1653,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
unsigned long address,
struct page *page, int node)
{
+ spinlock_t *ptl;
unsigned long haddr = address & HPAGE_PMD_MASK;
pg_data_t *pgdat = NODE_DATA(node);
int isolated = 0;
@@ -1699,9 +1700,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
WARN_ON(PageLRU(new_page));
/* Recheck the target PMD */
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, entry))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
/* Reverse changes made by migrate_page_copy() */
if (TestClearPageActive(new_page))
@@ -1746,7 +1747,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
* before it's fully transferred to the new page.
*/
mem_cgroup_end_migration(memcg, page, new_page, true);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
unlock_page(new_page);
unlock_page(page);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 94722a4..885cd78 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -116,9 +116,9 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr,
pmd_t *pmd)
{
- spin_lock(&mm->page_table_lock);
+ spinlock_t *ptl = huge_pmd_lock(mm, pmd);
set_pmd_at(mm, addr & PMD_MASK, pmd, pmd_mknuma(*pmd));
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
}
#else
static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr,
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 41fee3e..80587a5 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -151,7 +151,7 @@ void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable)
{
- assert_spin_locked(&mm->page_table_lock);
+ assert_spin_locked(huge_pmd_lockptr(mm, pmdp));
/* FIFO */
if (!pmd_huge_pte(mm, pmdp))
@@ -170,7 +170,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
{
pgtable_t pgtable;
- assert_spin_locked(&mm->page_table_lock);
+ assert_spin_locked(huge_pmd_lockptr(mm, pmdp));
/* FIFO */
pgtable = pmd_huge_pte(mm, pmdp);
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 7/9] mm: convent the rest to new page table lock api
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Only trivial cases left. Let's convert them altogether.
hugetlbfs is not covered for now.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
mm/huge_memory.c | 108 ++++++++++++++++++++++++++++-----------------------
mm/memory.c | 17 ++++----
mm/migrate.c | 7 ++--
mm/mprotect.c | 4 +-
mm/pgtable-generic.c | 4 +-
5 files changed, 77 insertions(+), 63 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4b58a01..e728d74 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -709,6 +709,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
struct page *page)
{
pgtable_t pgtable;
+ spinlock_t *ptl;
VM_BUG_ON(!PageCompound(page));
pgtable = pte_alloc_one(mm, haddr);
@@ -723,9 +724,9 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
*/
__SetPageUptodate(page);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_none(*pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mem_cgroup_uncharge_page(page);
put_page(page);
pte_free(mm, pgtable);
@@ -738,7 +739,7 @@ static int __do_huge_pmd_anonymous_page(struct mm_struct *mm,
set_pmd_at(mm, haddr, pmd, entry);
add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
atomic_inc(&mm->nr_ptes);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
}
return 0;
@@ -766,6 +767,7 @@ static inline struct page *alloc_hugepage(int defrag)
}
#endif
+/* Caller must hold page table lock. */
static bool set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd,
struct page *zero_page)
@@ -797,6 +799,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
return VM_FAULT_OOM;
if (!(flags & FAULT_FLAG_WRITE) &&
transparent_hugepage_use_zero_page()) {
+ spinlock_t *ptl;
pgtable_t pgtable;
struct page *zero_page;
bool set;
@@ -809,10 +812,10 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
count_vm_event(THP_FAULT_FALLBACK);
return VM_FAULT_FALLBACK;
}
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
zero_page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (!set) {
pte_free(mm, pgtable);
put_huge_zero_page();
@@ -845,6 +848,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
struct vm_area_struct *vma)
{
+ spinlock_t *dst_ptl, *src_ptl;
struct page *src_page;
pmd_t pmd;
pgtable_t pgtable;
@@ -855,8 +859,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
if (unlikely(!pgtable))
goto out;
- spin_lock(&dst_mm->page_table_lock);
- spin_lock_nested(&src_mm->page_table_lock, SINGLE_DEPTH_NESTING);
+ dst_ptl = huge_pmd_lock(dst_mm, dst_pmd);
+ src_ptl = huge_pmd_lockptr(src_mm, src_pmd);
+ spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
ret = -EAGAIN;
pmd = *src_pmd;
@@ -865,7 +870,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
goto out_unlock;
}
/*
- * mm->page_table_lock is enough to be sure that huge zero pmd is not
+ * When page table lock is held, the huge zero pmd should not be
* under splitting since we don't split the page itself, only pmd to
* a page table.
*/
@@ -886,8 +891,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
}
if (unlikely(pmd_trans_splitting(pmd))) {
/* split huge page running from under us */
- spin_unlock(&src_mm->page_table_lock);
- spin_unlock(&dst_mm->page_table_lock);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
pte_free(dst_mm, pgtable);
wait_split_huge_page(vma->anon_vma, src_pmd); /* src_vma */
@@ -907,8 +912,8 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
ret = 0;
out_unlock:
- spin_unlock(&src_mm->page_table_lock);
- spin_unlock(&dst_mm->page_table_lock);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
out:
return ret;
}
@@ -919,10 +924,11 @@ void huge_pmd_set_accessed(struct mm_struct *mm,
pmd_t *pmd, pmd_t orig_pmd,
int dirty)
{
+ spinlock_t *ptl;
pmd_t entry;
unsigned long haddr;
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto unlock;
@@ -932,13 +938,14 @@ void huge_pmd_set_accessed(struct mm_struct *mm,
update_mmu_cache_pmd(vma, address, pmd);
unlock:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
}
static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, pmd_t orig_pmd, unsigned long haddr)
{
+ spinlock_t *ptl;
pgtable_t pgtable;
pmd_t _pmd;
struct page *page;
@@ -965,7 +972,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_free_page;
@@ -992,7 +999,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
}
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
put_huge_zero_page();
inc_mm_counter(mm, MM_ANONPAGES);
@@ -1002,7 +1009,7 @@ static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
out:
return ret;
out_free_page:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
mem_cgroup_uncharge_page(page);
put_page(page);
@@ -1016,6 +1023,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
struct page *page,
unsigned long haddr)
{
+ spinlock_t *ptl;
pgtable_t pgtable;
pmd_t _pmd;
int ret = 0, i;
@@ -1062,7 +1070,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_free_pages;
VM_BUG_ON(!PageHead(page));
@@ -1088,7 +1096,7 @@ static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
smp_wmb(); /* make pte visible before pmd */
pmd_populate(mm, pmd, pgtable);
page_remove_rmap(page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
@@ -1099,7 +1107,7 @@ out:
return ret;
out_free_pages:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
mem_cgroup_uncharge_start();
for (i = 0; i < HPAGE_PMD_NR; i++) {
@@ -1114,17 +1122,19 @@ out_free_pages:
int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, pmd_t orig_pmd)
{
+ spinlock_t *ptl;
int ret = 0;
struct page *page = NULL, *new_page;
unsigned long haddr;
unsigned long mmun_start; /* For mmu_notifiers */
unsigned long mmun_end; /* For mmu_notifiers */
+ ptl = huge_pmd_lockptr(mm, pmd);
VM_BUG_ON(!vma->anon_vma);
haddr = address & HPAGE_PMD_MASK;
if (is_huge_zero_pmd(orig_pmd))
goto alloc;
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_unlock;
@@ -1140,7 +1150,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
goto out_unlock;
}
get_page(page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
alloc:
if (transparent_hugepage_enabled(vma) &&
!transparent_hugepage_debug_cow())
@@ -1187,11 +1197,11 @@ alloc:
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (page)
put_page(page);
if (unlikely(!pmd_same(*pmd, orig_pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mem_cgroup_uncharge_page(new_page);
put_page(new_page);
goto out_mn;
@@ -1213,13 +1223,13 @@ alloc:
}
ret |= VM_FAULT_WRITE;
}
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
out_mn:
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
out:
return ret;
out_unlock:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
return ret;
}
@@ -1231,7 +1241,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
- assert_spin_locked(&mm->page_table_lock);
+ assert_spin_locked(huge_pmd_lockptr(mm, pmd));
if (flags & FOLL_WRITE && !pmd_write(*pmd))
goto out;
@@ -1278,13 +1288,14 @@ out:
int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t pmd, pmd_t *pmdp)
{
+ spinlock_t *ptl;
struct page *page;
unsigned long haddr = addr & HPAGE_PMD_MASK;
int target_nid;
int current_nid = -1;
bool migrated;
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmdp);
if (unlikely(!pmd_same(pmd, *pmdp)))
goto out_unlock;
@@ -1302,17 +1313,17 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
}
/* Acquire the page lock to serialise THP migrations */
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
lock_page(page);
/* Confirm the PTE did not while locked */
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (unlikely(!pmd_same(pmd, *pmdp))) {
unlock_page(page);
put_page(page);
goto out_unlock;
}
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
/* Migrate the THP to the requested node */
migrated = migrate_misplaced_transhuge_page(mm, vma,
@@ -1324,7 +1335,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
return 0;
check_same:
- spin_lock(&mm->page_table_lock);
+ spin_lock(ptl);
if (unlikely(!pmd_same(pmd, *pmdp)))
goto out_unlock;
clear_pmdnuma:
@@ -1333,7 +1344,7 @@ clear_pmdnuma:
VM_BUG_ON(pmd_numa(*pmdp));
update_mmu_cache_pmd(vma, addr, pmdp);
out_unlock:
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (current_nid != -1)
task_numa_fault(current_nid, HPAGE_PMD_NR, false);
return 0;
@@ -2282,7 +2293,7 @@ static void collapse_huge_page(struct mm_struct *mm,
pte_t *pte;
pgtable_t pgtable;
struct page *new_page;
- spinlock_t *ptl;
+ spinlock_t *pmd_ptl, *pte_ptl;
int isolated;
unsigned long hstart, hend;
unsigned long mmun_start; /* For mmu_notifiers */
@@ -2325,12 +2336,12 @@ static void collapse_huge_page(struct mm_struct *mm,
anon_vma_lock_write(vma->anon_vma);
pte = pte_offset_map(pmd, address);
- ptl = pte_lockptr(mm, pmd);
+ pte_ptl = pte_lockptr(mm, pmd);
mmun_start = address;
mmun_end = address + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock); /* probably unnecessary */
+ pmd_ptl = huge_pmd_lock(mm, pmd); /* probably unnecessary */
/*
* After this gup_fast can't run anymore. This also removes
* any huge TLB entry from the CPU so we won't allow
@@ -2338,16 +2349,16 @@ static void collapse_huge_page(struct mm_struct *mm,
* to avoid the risk of CPU bugs in that area.
*/
_pmd = pmdp_clear_flush(vma, address, pmd);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(pmd_ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
- spin_lock(ptl);
+ spin_lock(pte_ptl);
isolated = __collapse_huge_page_isolate(vma, address, pte);
- spin_unlock(ptl);
+ spin_unlock(pte_ptl);
if (unlikely(!isolated)) {
pte_unmap(pte);
- spin_lock(&mm->page_table_lock);
+ spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
/*
* We can only use set_pmd_at when establishing
@@ -2355,7 +2366,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* points to regular pagetables. Use pmd_populate for that
*/
pmd_populate(mm, pmd, pmd_pgtable(_pmd));
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(pmd_ptl);
anon_vma_unlock_write(vma->anon_vma);
goto out;
}
@@ -2366,7 +2377,7 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
anon_vma_unlock_write(vma->anon_vma);
- __collapse_huge_page_copy(pte, new_page, vma, address, ptl);
+ __collapse_huge_page_copy(pte, new_page, vma, address, pte_ptl);
pte_unmap(pte);
__SetPageUptodate(new_page);
pgtable = pmd_pgtable(_pmd);
@@ -2381,13 +2392,13 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
smp_wmb();
- spin_lock(&mm->page_table_lock);
+ spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(pmd_ptl);
*hpage = NULL;
@@ -2712,6 +2723,7 @@ static void __split_huge_zero_page_pmd(struct vm_area_struct *vma,
void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd)
{
+ spinlock_t *ptl;
struct page *page;
struct mm_struct *mm = vma->vm_mm;
unsigned long haddr = address & HPAGE_PMD_MASK;
@@ -2723,22 +2735,22 @@ void __split_huge_page_pmd(struct vm_area_struct *vma, unsigned long address,
mmun_start = haddr;
mmun_end = haddr + HPAGE_PMD_SIZE;
mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_trans_huge(*pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
return;
}
if (is_huge_zero_pmd(*pmd)) {
__split_huge_zero_page_pmd(vma, haddr, pmd);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
return;
}
page = pmd_page(*pmd);
VM_BUG_ON(!page_count(page));
get_page(page);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
split_huge_page(page);
diff --git a/mm/memory.c b/mm/memory.c
index 1046396..a0ed1d5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -552,6 +552,7 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long address)
{
+ spinlock_t *ptl;
pgtable_t new = pte_alloc_one(mm, address);
int wait_split_huge_page;
if (!new)
@@ -572,7 +573,7 @@ int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
*/
smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
wait_split_huge_page = 0;
if (likely(pmd_none(*pmd))) { /* Has another populated it ? */
atomic_inc(&mm->nr_ptes);
@@ -580,7 +581,7 @@ int __pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
new = NULL;
} else if (unlikely(pmd_trans_splitting(*pmd)))
wait_split_huge_page = 1;
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (new)
pte_free(mm, new);
if (wait_split_huge_page)
@@ -1516,20 +1517,20 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
split_huge_page_pmd(vma, address, pmd);
goto split_fallthrough;
}
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (likely(pmd_trans_huge(*pmd))) {
if (unlikely(pmd_trans_splitting(*pmd))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
wait_split_huge_page(vma->anon_vma, pmd);
} else {
page = follow_trans_huge_pmd(vma, address,
pmd, flags);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
*page_mask = HPAGE_PMD_NR - 1;
goto out;
}
} else
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
/* fall through */
}
split_fallthrough:
@@ -3602,13 +3603,13 @@ static int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
bool numa = false;
int local_nid = numa_node_id();
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
pmd = *pmdp;
if (pmd_numa(pmd)) {
set_pmd_at(mm, _addr, pmdp, pmd_mknonnuma(pmd));
numa = true;
}
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
if (!numa)
return 0;
diff --git a/mm/migrate.c b/mm/migrate.c
index b7ded7e..32eff0c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1653,6 +1653,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
unsigned long address,
struct page *page, int node)
{
+ spinlock_t *ptl;
unsigned long haddr = address & HPAGE_PMD_MASK;
pg_data_t *pgdat = NODE_DATA(node);
int isolated = 0;
@@ -1699,9 +1700,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
WARN_ON(PageLRU(new_page));
/* Recheck the target PMD */
- spin_lock(&mm->page_table_lock);
+ ptl = huge_pmd_lock(mm, pmd);
if (unlikely(!pmd_same(*pmd, entry))) {
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
/* Reverse changes made by migrate_page_copy() */
if (TestClearPageActive(new_page))
@@ -1746,7 +1747,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
* before it's fully transferred to the new page.
*/
mem_cgroup_end_migration(memcg, page, new_page, true);
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
unlock_page(new_page);
unlock_page(page);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 94722a4..885cd78 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -116,9 +116,9 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr,
pmd_t *pmd)
{
- spin_lock(&mm->page_table_lock);
+ spinlock_t *ptl = huge_pmd_lock(mm, pmd);
set_pmd_at(mm, addr & PMD_MASK, pmd, pmd_mknuma(*pmd));
- spin_unlock(&mm->page_table_lock);
+ spin_unlock(ptl);
}
#else
static inline void change_pmd_protnuma(struct mm_struct *mm, unsigned long addr,
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 41fee3e..80587a5 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -151,7 +151,7 @@ void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address,
void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pgtable)
{
- assert_spin_locked(&mm->page_table_lock);
+ assert_spin_locked(huge_pmd_lockptr(mm, pmdp));
/* FIFO */
if (!pmd_huge_pte(mm, pmdp))
@@ -170,7 +170,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
{
pgtable_t pgtable;
- assert_spin_locked(&mm->page_table_lock);
+ assert_spin_locked(huge_pmd_lockptr(mm, pmdp));
/* FIFO */
pgtable = pmd_huge_pte(mm, pmdp);
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.
Split pmd page table lock only makes sense on big machines.
Let's say >= 32 CPUs for now.
We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
take mm->page_table_lock anymore. Let's reuse page->lru of table's page
for that.
hugetlbfs hasn't converted to split locking: disable split locking if
hugetlbfs enabled.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 31 +++++++++++++++++++++++++++++++
include/linux/mm_types.h | 5 +++++
kernel/fork.c | 4 ++--
mm/Kconfig | 10 ++++++++++
4 files changed, 48 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d2f8a50..5b3922d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1294,13 +1294,44 @@ static inline void pgtable_page_dtor(struct page *page)
((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
NULL: pte_offset_kernel(pmd, address))
+#if USE_SPLIT_PMD_PTLOCKS
+
+static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
+{
+ return &virt_to_page(pmd)->ptl;
+}
+
+static inline void pgtable_pmd_page_ctor(struct page *page)
+{
+ spin_lock_init(&page->ptl);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ page->pmd_huge_pte = NULL;
+#endif
+}
+
+static inline void pgtable_pmd_page_dtor(struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ VM_BUG_ON(page->pmd_huge_pte);
+#endif
+}
+
+#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
+
+#else
+
static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline void pgtable_pmd_page_ctor(struct page *page) {}
+static inline void pgtable_pmd_page_dtor(struct page *page) {}
+
#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
+#endif
+
static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
{
spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1c64730..5706ddf 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -24,6 +24,8 @@
struct address_space;
#define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTE_PTLOCK_CPUS)
+#define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS && \
+ NR_CPUS >= CONFIG_SPLIT_PMD_PTLOCK_CPUS)
/*
* Each physical page in the system has a struct page associated with
@@ -130,6 +132,9 @@ struct page {
struct list_head list; /* slobs list of pages */
struct slab *slab_page; /* slab fields */
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+ pgtable_t pmd_huge_pte; /* protected by page->ptl */
+#endif
};
/* Remainder is not double word aligned */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4c8b986..1670af7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -560,7 +560,7 @@ static void check_mm(struct mm_struct *mm)
"mm:%p idx:%d val:%ld\n", mm, i, x);
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
VM_BUG_ON(mm->pmd_huge_pte);
#endif
}
@@ -814,7 +814,7 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
memcpy(mm, oldmm, sizeof(*mm));
mm_init_cpumask(mm);
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
#endif
#ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/Kconfig b/mm/Kconfig
index 1977a33..ab32eda 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -214,6 +214,16 @@ config SPLIT_PTE_PTLOCK_CPUS
default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
default "4"
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ boolean
+
+config SPLIT_PMD_PTLOCK_CPUS
+ int
+ # hugetlb hasn't converted to split locking yet
+ default "999999" if HUGETLB_PAGE
+ default "32" if ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ default "999999"
+
#
# support for memory balloon compaction
config BALLOON_COMPACTION
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.
Split pmd page table lock only makes sense on big machines.
Let's say >= 32 CPUs for now.
We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
take mm->page_table_lock anymore. Let's reuse page->lru of table's page
for that.
hugetlbfs hasn't converted to split locking: disable split locking if
hugetlbfs enabled.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
include/linux/mm.h | 31 +++++++++++++++++++++++++++++++
include/linux/mm_types.h | 5 +++++
kernel/fork.c | 4 ++--
mm/Kconfig | 10 ++++++++++
4 files changed, 48 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d2f8a50..5b3922d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1294,13 +1294,44 @@ static inline void pgtable_page_dtor(struct page *page)
((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
NULL: pte_offset_kernel(pmd, address))
+#if USE_SPLIT_PMD_PTLOCKS
+
+static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
+{
+ return &virt_to_page(pmd)->ptl;
+}
+
+static inline void pgtable_pmd_page_ctor(struct page *page)
+{
+ spin_lock_init(&page->ptl);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ page->pmd_huge_pte = NULL;
+#endif
+}
+
+static inline void pgtable_pmd_page_dtor(struct page *page)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ VM_BUG_ON(page->pmd_huge_pte);
+#endif
+}
+
+#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
+
+#else
+
static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline void pgtable_pmd_page_ctor(struct page *page) {}
+static inline void pgtable_pmd_page_dtor(struct page *page) {}
+
#define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
+#endif
+
static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
{
spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1c64730..5706ddf 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -24,6 +24,8 @@
struct address_space;
#define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTE_PTLOCK_CPUS)
+#define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS && \
+ NR_CPUS >= CONFIG_SPLIT_PMD_PTLOCK_CPUS)
/*
* Each physical page in the system has a struct page associated with
@@ -130,6 +132,9 @@ struct page {
struct list_head list; /* slobs list of pages */
struct slab *slab_page; /* slab fields */
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
+ pgtable_t pmd_huge_pte; /* protected by page->ptl */
+#endif
};
/* Remainder is not double word aligned */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4c8b986..1670af7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -560,7 +560,7 @@ static void check_mm(struct mm_struct *mm)
"mm:%p idx:%d val:%ld\n", mm, i, x);
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
VM_BUG_ON(mm->pmd_huge_pte);
#endif
}
@@ -814,7 +814,7 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
memcpy(mm, oldmm, sizeof(*mm));
mm_init_cpumask(mm);
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
#endif
#ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/Kconfig b/mm/Kconfig
index 1977a33..ab32eda 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -214,6 +214,16 @@ config SPLIT_PTE_PTLOCK_CPUS
default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
default "4"
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ boolean
+
+config SPLIT_PMD_PTLOCK_CPUS
+ int
+ # hugetlb hasn't converted to split locking yet
+ default "999999" if HUGETLB_PAGE
+ default "32" if ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ default "999999"
+
#
# support for memory balloon compaction
config BALLOON_COMPACTION
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:24 ` Peter Zijlstra
-1 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 13:24 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> The basic idea is the same as with PTE level: the lock is embedded into
> struct page of table's page.
>
> Split pmd page table lock only makes sense on big machines.
> Let's say >= 32 CPUs for now.
Why is this? Couldn't I generate the same amount of contention on PMD
level as I can on PTE level in the THP case?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 13:24 ` Peter Zijlstra
0 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 13:24 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> The basic idea is the same as with PTE level: the lock is embedded into
> struct page of table's page.
>
> Split pmd page table lock only makes sense on big machines.
> Let's say >= 32 CPUs for now.
Why is this? Couldn't I generate the same amount of contention on PMD
level as I can on PTE level in the THP case?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:24 ` Peter Zijlstra
@ 2013-09-13 14:25 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 14:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kirill A. Shutemov, Alex Thorlton, Ingo Molnar, Andrew Morton,
Naoya Horiguchi, Eric W . Biederman, Paul E . McKenney, Al Viro,
Andi Kleen, Andrea Arcangeli, Dave Hansen, Dave Jones,
David Howells, Frederic Weisbecker, Johannes Weiner, Kees Cook,
Mel Gorman, Michael Kerrisk, Oleg Nesterov, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> > The basic idea is the same as with PTE level: the lock is embedded into
> > struct page of table's page.
> >
> > Split pmd page table lock only makes sense on big machines.
> > Let's say >= 32 CPUs for now.
>
> Why is this? Couldn't I generate the same amount of contention on PMD
> level as I can on PTE level in the THP case?
Hm. You are right. You just need more memory for that.
Do you want it to be "4" too?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 14:25 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 14:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kirill A. Shutemov, Alex Thorlton, Ingo Molnar, Andrew Morton,
Naoya Horiguchi, Eric W . Biederman, Paul E . McKenney, Al Viro,
Andi Kleen, Andrea Arcangeli, Dave Hansen, Dave Jones,
David Howells, Frederic Weisbecker, Johannes Weiner, Kees Cook,
Mel Gorman, Michael Kerrisk, Oleg Nesterov, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> > The basic idea is the same as with PTE level: the lock is embedded into
> > struct page of table's page.
> >
> > Split pmd page table lock only makes sense on big machines.
> > Let's say >= 32 CPUs for now.
>
> Why is this? Couldn't I generate the same amount of contention on PMD
> level as I can on PTE level in the THP case?
Hm. You are right. You just need more memory for that.
Do you want it to be "4" too?
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 14:25 ` Kirill A. Shutemov
@ 2013-09-13 14:52 ` Peter Zijlstra
-1 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 14:52 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 05:25:13PM +0300, Kirill A. Shutemov wrote:
> Peter Zijlstra wrote:
> > On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> > > The basic idea is the same as with PTE level: the lock is embedded into
> > > struct page of table's page.
> > >
> > > Split pmd page table lock only makes sense on big machines.
> > > Let's say >= 32 CPUs for now.
> >
> > Why is this? Couldn't I generate the same amount of contention on PMD
> > level as I can on PTE level in the THP case?
>
> Hm. You are right. You just need more memory for that.
> Do you want it to be "4" too?
Well, I would drop your patch-1 and use the same config var.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 14:52 ` Peter Zijlstra
0 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 14:52 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 05:25:13PM +0300, Kirill A. Shutemov wrote:
> Peter Zijlstra wrote:
> > On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> > > The basic idea is the same as with PTE level: the lock is embedded into
> > > struct page of table's page.
> > >
> > > Split pmd page table lock only makes sense on big machines.
> > > Let's say >= 32 CPUs for now.
> >
> > Why is this? Couldn't I generate the same amount of contention on PMD
> > level as I can on PTE level in the THP case?
>
> Hm. You are right. You just need more memory for that.
> Do you want it to be "4" too?
Well, I would drop your patch-1 and use the same config var.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:36 ` Peter Zijlstra
-1 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 13:36 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> +#if USE_SPLIT_PMD_PTLOCKS
> +
> +static inline void pgtable_pmd_page_ctor(struct page *page)
> +{
> + spin_lock_init(&page->ptl);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + page->pmd_huge_pte = NULL;
> +#endif
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + VM_BUG_ON(page->pmd_huge_pte);
> +#endif
> +}
> +
> +#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
> +
> +#else
So on -rt we have the problem that spinlock_t is rather huge (its a
rtmutex) so instead of blowing up the pageframe like that we treat
page->pte as a pointer and allocate the spinlock.
Since allocations could fail the above ctor path gets 'interesting'.
It would be good if new code could assume the ctor could fail so we
don't have to replicate that horror-show.
---
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri, 3 Jul 2009 08:44:54 -0500
Subject: mm: shrink the page frame to !-rt size
He below is a boot-tested hack to shrink the page frame size back to
normal.
Should be a net win since there should be many less PTE-pages than
page-frames.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/mm.h | 46 +++++++++++++++++++++++++++++++++++++++-------
include/linux/mm_types.h | 4 ++++
mm/memory.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+), 7 deletions(-)
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1241,27 +1241,59 @@ static inline pmd_t *pmd_alloc(struct mm
* overflow into the next struct page (as it might with DEBUG_SPINLOCK).
* When freeing, reset page->mapping so free_pages_check won't complain.
*/
+#ifndef CONFIG_PREEMPT_RT_FULL
+
#define __pte_lockptr(page) &((page)->ptl)
-#define pte_lock_init(_page) do { \
- spin_lock_init(__pte_lockptr(_page)); \
-} while (0)
+
+static inline struct page *pte_lock_init(struct page *page)
+{
+ spin_lock_init(__pte_lockptr(page));
+ return page;
+}
+
#define pte_lock_deinit(page) ((page)->mapping = NULL)
+
+#else /* !PREEMPT_RT_FULL */
+
+/*
+ * On PREEMPT_RT_FULL the spinlock_t's are too large to embed in the
+ * page frame, hence it only has a pointer and we need to dynamically
+ * allocate the lock when we allocate PTE-pages.
+ *
+ * This is an overall win, since only a small fraction of the pages
+ * will be PTE pages under normal circumstances.
+ */
+
+#define __pte_lockptr(page) ((page)->ptl)
+
+extern struct page *pte_lock_init(struct page *page);
+extern void pte_lock_deinit(struct page *page);
+
+#endif /* PREEMPT_RT_FULL */
+
#define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
#else /* !USE_SPLIT_PTLOCKS */
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
-#define pte_lock_init(page) do {} while (0)
+static inline struct page *pte_lock_init(struct page *page) { return page; }
#define pte_lock_deinit(page) do {} while (0)
#define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
#endif /* USE_SPLIT_PTLOCKS */
-static inline void pgtable_page_ctor(struct page *page)
+static inline struct page *__pgtable_page_ctor(struct page *page)
{
- pte_lock_init(page);
- inc_zone_page_state(page, NR_PAGETABLE);
+ page = pte_lock_init(page);
+ if (page)
+ inc_zone_page_state(page, NR_PAGETABLE);
+ return page;
}
+#define pgtable_page_ctor(page) \
+do { \
+ page = __pgtable_page_ctor(page); \
+} while (0)
+
static inline void pgtable_page_dtor(struct page *page)
{
pte_lock_deinit(page);
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -142,7 +142,11 @@ struct page {
* system if PG_buddy is set.
*/
#if USE_SPLIT_PTLOCKS
+# ifndef CONFIG_PREEMPT_RT_FULL
spinlock_t ptl;
+# else
+ spinlock_t *ptl;
+# endif
#endif
struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
struct page *first_page; /* Compound tail pages */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4328,3 +4328,35 @@ void copy_user_huge_page(struct page *ds
}
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
+
+#if defined(CONFIG_PREEMPT_RT_FULL) && (USE_SPLIT_PTLOCKS > 0)
+/*
+ * Heinous hack, relies on the caller doing something like:
+ *
+ * pte = alloc_pages(PGALLOC_GFP, 0);
+ * if (pte)
+ * pgtable_page_ctor(pte);
+ * return pte;
+ *
+ * This ensures we release the page and return NULL when the
+ * lock allocation fails.
+ */
+struct page *pte_lock_init(struct page *page)
+{
+ page->ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+ if (page->ptl) {
+ spin_lock_init(__pte_lockptr(page));
+ } else {
+ __free_page(page);
+ page = NULL;
+ }
+ return page;
+}
+
+void pte_lock_deinit(struct page *page)
+{
+ kfree(page->ptl);
+ page->mapping = NULL;
+}
+
+#endif
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 13:36 ` Peter Zijlstra
0 siblings, 0 replies; 77+ messages in thread
From: Peter Zijlstra @ 2013-09-13 13:36 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> +#if USE_SPLIT_PMD_PTLOCKS
> +
> +static inline void pgtable_pmd_page_ctor(struct page *page)
> +{
> + spin_lock_init(&page->ptl);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + page->pmd_huge_pte = NULL;
> +#endif
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + VM_BUG_ON(page->pmd_huge_pte);
> +#endif
> +}
> +
> +#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
> +
> +#else
So on -rt we have the problem that spinlock_t is rather huge (its a
rtmutex) so instead of blowing up the pageframe like that we treat
page->pte as a pointer and allocate the spinlock.
Since allocations could fail the above ctor path gets 'interesting'.
It would be good if new code could assume the ctor could fail so we
don't have to replicate that horror-show.
---
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri, 3 Jul 2009 08:44:54 -0500
Subject: mm: shrink the page frame to !-rt size
He below is a boot-tested hack to shrink the page frame size back to
normal.
Should be a net win since there should be many less PTE-pages than
page-frames.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
include/linux/mm.h | 46 +++++++++++++++++++++++++++++++++++++++-------
include/linux/mm_types.h | 4 ++++
mm/memory.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+), 7 deletions(-)
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1241,27 +1241,59 @@ static inline pmd_t *pmd_alloc(struct mm
* overflow into the next struct page (as it might with DEBUG_SPINLOCK).
* When freeing, reset page->mapping so free_pages_check won't complain.
*/
+#ifndef CONFIG_PREEMPT_RT_FULL
+
#define __pte_lockptr(page) &((page)->ptl)
-#define pte_lock_init(_page) do { \
- spin_lock_init(__pte_lockptr(_page)); \
-} while (0)
+
+static inline struct page *pte_lock_init(struct page *page)
+{
+ spin_lock_init(__pte_lockptr(page));
+ return page;
+}
+
#define pte_lock_deinit(page) ((page)->mapping = NULL)
+
+#else /* !PREEMPT_RT_FULL */
+
+/*
+ * On PREEMPT_RT_FULL the spinlock_t's are too large to embed in the
+ * page frame, hence it only has a pointer and we need to dynamically
+ * allocate the lock when we allocate PTE-pages.
+ *
+ * This is an overall win, since only a small fraction of the pages
+ * will be PTE pages under normal circumstances.
+ */
+
+#define __pte_lockptr(page) ((page)->ptl)
+
+extern struct page *pte_lock_init(struct page *page);
+extern void pte_lock_deinit(struct page *page);
+
+#endif /* PREEMPT_RT_FULL */
+
#define pte_lockptr(mm, pmd) ({(void)(mm); __pte_lockptr(pmd_page(*(pmd)));})
#else /* !USE_SPLIT_PTLOCKS */
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
-#define pte_lock_init(page) do {} while (0)
+static inline struct page *pte_lock_init(struct page *page) { return page; }
#define pte_lock_deinit(page) do {} while (0)
#define pte_lockptr(mm, pmd) ({(void)(pmd); &(mm)->page_table_lock;})
#endif /* USE_SPLIT_PTLOCKS */
-static inline void pgtable_page_ctor(struct page *page)
+static inline struct page *__pgtable_page_ctor(struct page *page)
{
- pte_lock_init(page);
- inc_zone_page_state(page, NR_PAGETABLE);
+ page = pte_lock_init(page);
+ if (page)
+ inc_zone_page_state(page, NR_PAGETABLE);
+ return page;
}
+#define pgtable_page_ctor(page) \
+do { \
+ page = __pgtable_page_ctor(page); \
+} while (0)
+
static inline void pgtable_page_dtor(struct page *page)
{
pte_lock_deinit(page);
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -142,7 +142,11 @@ struct page {
* system if PG_buddy is set.
*/
#if USE_SPLIT_PTLOCKS
+# ifndef CONFIG_PREEMPT_RT_FULL
spinlock_t ptl;
+# else
+ spinlock_t *ptl;
+# endif
#endif
struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
struct page *first_page; /* Compound tail pages */
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4328,3 +4328,35 @@ void copy_user_huge_page(struct page *ds
}
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
+
+#if defined(CONFIG_PREEMPT_RT_FULL) && (USE_SPLIT_PTLOCKS > 0)
+/*
+ * Heinous hack, relies on the caller doing something like:
+ *
+ * pte = alloc_pages(PGALLOC_GFP, 0);
+ * if (pte)
+ * pgtable_page_ctor(pte);
+ * return pte;
+ *
+ * This ensures we release the page and return NULL when the
+ * lock allocation fails.
+ */
+struct page *pte_lock_init(struct page *page)
+{
+ page->ptl = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
+ if (page->ptl) {
+ spin_lock_init(__pte_lockptr(page));
+ } else {
+ __free_page(page);
+ page = NULL;
+ }
+ return page;
+}
+
+void pte_lock_deinit(struct page *page)
+{
+ kfree(page->ptl);
+ page->mapping = NULL;
+}
+
+#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:36 ` Peter Zijlstra
@ 2013-09-13 14:25 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 14:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kirill A. Shutemov, Alex Thorlton, Ingo Molnar, Andrew Morton,
Naoya Horiguchi, Eric W . Biederman, Paul E . McKenney, Al Viro,
Andi Kleen, Andrea Arcangeli, Dave Hansen, Dave Jones,
David Howells, Frederic Weisbecker, Johannes Weiner, Kees Cook,
Mel Gorman, Michael Kerrisk, Oleg Nesterov, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> > +#if USE_SPLIT_PMD_PTLOCKS
> > +
> > +static inline void pgtable_pmd_page_ctor(struct page *page)
> > +{
> > + spin_lock_init(&page->ptl);
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + page->pmd_huge_pte = NULL;
> > +#endif
> > +}
> > +
> > +static inline void pgtable_pmd_page_dtor(struct page *page)
> > +{
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + VM_BUG_ON(page->pmd_huge_pte);
> > +#endif
> > +}
> > +
> > +#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
> > +
> > +#else
>
> So on -rt we have the problem that spinlock_t is rather huge (its a
> rtmutex) so instead of blowing up the pageframe like that we treat
> page->pte as a pointer and allocate the spinlock.
>
> Since allocations could fail the above ctor path gets 'interesting'.
>
> It would be good if new code could assume the ctor could fail so we
> don't have to replicate that horror-show.
Okay, I'll rework this.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 14:25 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 14:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Kirill A. Shutemov, Alex Thorlton, Ingo Molnar, Andrew Morton,
Naoya Horiguchi, Eric W . Biederman, Paul E . McKenney, Al Viro,
Andi Kleen, Andrea Arcangeli, Dave Hansen, Dave Jones,
David Howells, Frederic Weisbecker, Johannes Weiner, Kees Cook,
Mel Gorman, Michael Kerrisk, Oleg Nesterov, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm
Peter Zijlstra wrote:
> On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> > +#if USE_SPLIT_PMD_PTLOCKS
> > +
> > +static inline void pgtable_pmd_page_ctor(struct page *page)
> > +{
> > + spin_lock_init(&page->ptl);
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + page->pmd_huge_pte = NULL;
> > +#endif
> > +}
> > +
> > +static inline void pgtable_pmd_page_dtor(struct page *page)
> > +{
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > + VM_BUG_ON(page->pmd_huge_pte);
> > +#endif
> > +}
> > +
> > +#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
> > +
> > +#else
>
> So on -rt we have the problem that spinlock_t is rather huge (its a
> rtmutex) so instead of blowing up the pageframe like that we treat
> page->pte as a pointer and allocate the spinlock.
>
> Since allocations could fail the above ctor path gets 'interesting'.
>
> It would be good if new code could assume the ctor could fail so we
> don't have to replicate that horror-show.
Okay, I'll rework this.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 15:45 ` Naoya Horiguchi
-1 siblings, 0 replies; 77+ messages in thread
From: Naoya Horiguchi @ 2013-09-13 15:45 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Eric W . Biederman,
Paul E . McKenney, Al Viro, Andi Kleen, Andrea Arcangeli,
Dave Hansen, Dave Jones, David Howells, Frederic Weisbecker,
Johannes Weiner, Kees Cook, Mel Gorman, Michael Kerrisk,
Oleg Nesterov, Peter Zijlstra, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> The basic idea is the same as with PTE level: the lock is embedded into
> struct page of table's page.
>
> Split pmd page table lock only makes sense on big machines.
> Let's say >= 32 CPUs for now.
>
> We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
> take mm->page_table_lock anymore. Let's reuse page->lru of table's page
> for that.
Looks nice.
> hugetlbfs hasn't converted to split locking: disable split locking if
> hugetlbfs enabled.
I don't think that we have to disable when hugetlbfs is enabled,
because hugetlbfs code doesn't use huge_pmd_lockptr() or huge_pmd_lock().
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> include/linux/mm.h | 31 +++++++++++++++++++++++++++++++
> include/linux/mm_types.h | 5 +++++
> kernel/fork.c | 4 ++--
> mm/Kconfig | 10 ++++++++++
> 4 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d2f8a50..5b3922d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1294,13 +1294,44 @@ static inline void pgtable_page_dtor(struct page *page)
> ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
> NULL: pte_offset_kernel(pmd, address))
>
> +#if USE_SPLIT_PMD_PTLOCKS
> +
> +static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> +{
> + return &virt_to_page(pmd)->ptl;
> +}
> +
> +static inline void pgtable_pmd_page_ctor(struct page *page)
> +{
> + spin_lock_init(&page->ptl);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + page->pmd_huge_pte = NULL;
> +#endif
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + VM_BUG_ON(page->pmd_huge_pte);
> +#endif
> +}
> +
> +#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
>
> +
> +#else
> +
> static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> {
> return &mm->page_table_lock;
> }
>
> +static inline void pgtable_pmd_page_ctor(struct page *page) {}
> +static inline void pgtable_pmd_page_dtor(struct page *page) {}
> +
> #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>
> +#endif
> +
> static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
> {
> spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 1c64730..5706ddf 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -24,6 +24,8 @@
> struct address_space;
>
> #define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTE_PTLOCK_CPUS)
> +#define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS && \
> + NR_CPUS >= CONFIG_SPLIT_PMD_PTLOCK_CPUS)
>
> /*
> * Each physical page in the system has a struct page associated with
> @@ -130,6 +132,9 @@ struct page {
>
> struct list_head list; /* slobs list of pages */
> struct slab *slab_page; /* slab fields */
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> + pgtable_t pmd_huge_pte; /* protected by page->ptl */
> +#endif
> };
>
> /* Remainder is not double word aligned */
Can we remove pmd_huge_pte from mm_struct when USE_SPLIT_PMD_PTLOCKS is true?
Thanks,
Naoya Horiguchi
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 15:45 ` Naoya Horiguchi
0 siblings, 0 replies; 77+ messages in thread
From: Naoya Horiguchi @ 2013-09-13 15:45 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Eric W . Biederman,
Paul E . McKenney, Al Viro, Andi Kleen, Andrea Arcangeli,
Dave Hansen, Dave Jones, David Howells, Frederic Weisbecker,
Johannes Weiner, Kees Cook, Mel Gorman, Michael Kerrisk,
Oleg Nesterov, Peter Zijlstra, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On Fri, Sep 13, 2013 at 04:06:15PM +0300, Kirill A. Shutemov wrote:
> The basic idea is the same as with PTE level: the lock is embedded into
> struct page of table's page.
>
> Split pmd page table lock only makes sense on big machines.
> Let's say >= 32 CPUs for now.
>
> We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
> take mm->page_table_lock anymore. Let's reuse page->lru of table's page
> for that.
Looks nice.
> hugetlbfs hasn't converted to split locking: disable split locking if
> hugetlbfs enabled.
I don't think that we have to disable when hugetlbfs is enabled,
because hugetlbfs code doesn't use huge_pmd_lockptr() or huge_pmd_lock().
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> include/linux/mm.h | 31 +++++++++++++++++++++++++++++++
> include/linux/mm_types.h | 5 +++++
> kernel/fork.c | 4 ++--
> mm/Kconfig | 10 ++++++++++
> 4 files changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d2f8a50..5b3922d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1294,13 +1294,44 @@ static inline void pgtable_page_dtor(struct page *page)
> ((unlikely(pmd_none(*(pmd))) && __pte_alloc_kernel(pmd, address))? \
> NULL: pte_offset_kernel(pmd, address))
>
> +#if USE_SPLIT_PMD_PTLOCKS
> +
> +static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> +{
> + return &virt_to_page(pmd)->ptl;
> +}
> +
> +static inline void pgtable_pmd_page_ctor(struct page *page)
> +{
> + spin_lock_init(&page->ptl);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + page->pmd_huge_pte = NULL;
> +#endif
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> + VM_BUG_ON(page->pmd_huge_pte);
> +#endif
> +}
> +
> +#define pmd_huge_pte(mm, pmd) (virt_to_page(pmd)->pmd_huge_pte)
>
> +
> +#else
> +
> static inline spinlock_t *huge_pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
> {
> return &mm->page_table_lock;
> }
>
> +static inline void pgtable_pmd_page_ctor(struct page *page) {}
> +static inline void pgtable_pmd_page_dtor(struct page *page) {}
> +
> #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>
> +#endif
> +
> static inline spinlock_t *huge_pmd_lock(struct mm_struct *mm, pmd_t *pmd)
> {
> spinlock_t *ptl = huge_pmd_lockptr(mm, pmd);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 1c64730..5706ddf 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -24,6 +24,8 @@
> struct address_space;
>
> #define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTE_PTLOCK_CPUS)
> +#define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS && \
> + NR_CPUS >= CONFIG_SPLIT_PMD_PTLOCK_CPUS)
>
> /*
> * Each physical page in the system has a struct page associated with
> @@ -130,6 +132,9 @@ struct page {
>
> struct list_head list; /* slobs list of pages */
> struct slab *slab_page; /* slab fields */
> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS
> + pgtable_t pmd_huge_pte; /* protected by page->ptl */
> +#endif
> };
>
> /* Remainder is not double word aligned */
Can we remove pmd_huge_pte from mm_struct when USE_SPLIT_PMD_PTLOCKS is true?
Thanks,
Naoya Horiguchi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 19:57 ` Dave Hansen
-1 siblings, 0 replies; 77+ messages in thread
From: Dave Hansen @ 2013-09-13 19:57 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Jones, David Howells, Frederic Weisbecker,
Johannes Weiner, Kees Cook, Mel Gorman, Michael Kerrisk,
Oleg Nesterov, Peter Zijlstra, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On 09/13/2013 06:06 AM, Kirill A. Shutemov wrote:
> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> + boolean
> +
> +config SPLIT_PMD_PTLOCK_CPUS
> + int
> + # hugetlb hasn't converted to split locking yet
> + default "999999" if HUGETLB_PAGE
> + default "32" if ARCH_ENABLE_SPLIT_PMD_PTLOCK
> + default "999999"
Is there a reason we should have separate config knobs for this from
SPLIT_PTLOCK_CPUS? Seem a bit silly.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH 8/9] mm: implement split page table lock for PMD level
@ 2013-09-13 19:57 ` Dave Hansen
0 siblings, 0 replies; 77+ messages in thread
From: Dave Hansen @ 2013-09-13 19:57 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi,
Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Jones, David Howells, Frederic Weisbecker,
Johannes Weiner, Kees Cook, Mel Gorman, Michael Kerrisk,
Oleg Nesterov, Peter Zijlstra, Rik van Riel, Robin Holt,
Sedat Dilek, Srikar Dronamraju, Thomas Gleixner, linux-kernel,
linux-mm
On 09/13/2013 06:06 AM, Kirill A. Shutemov wrote:
> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> + boolean
> +
> +config SPLIT_PMD_PTLOCK_CPUS
> + int
> + # hugetlb hasn't converted to split locking yet
> + default "999999" if HUGETLB_PAGE
> + default "32" if ARCH_ENABLE_SPLIT_PMD_PTLOCK
> + default "999999"
Is there a reason we should have separate config knobs for this from
SPLIT_PTLOCK_CPUS? Seem a bit silly.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH 9/9] x86, mm: enable split page table lock for PMD level
2013-09-13 13:06 ` Kirill A. Shutemov
@ 2013-09-13 13:06 ` Kirill A. Shutemov
-1 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Enable PMD split page table lock for X86_64 and PAE.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/Kconfig | 4 ++++
arch/x86/include/asm/pgalloc.h | 8 +++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 30c40f0..6a5cf6a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1880,6 +1880,10 @@ config USE_PERCPU_NUMA_NODE_ID
def_bool y
depends on NUMA
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ def_bool y
+ depends on X86_64 || X86_PAE
+
menu "Power management and ACPI options"
config ARCH_HIBERNATION_HEADER
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index b4389a4..f2daea1 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -80,12 +80,18 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
#if PAGETABLE_LEVELS > 2
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ struct page *page;
+ page = alloc_pages(GFP_KERNEL | __GFP_REPEAT| __GFP_ZERO, 0);
+ if (!page)
+ return NULL;
+ pgtable_pmd_page_ctor(page);
+ return (pmd_t *)page_address(page);
}
static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
{
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
+ pgtable_pmd_page_dtor(virt_to_page(pmd));
free_page((unsigned long)pmd);
}
--
1.8.4.rc3
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH 9/9] x86, mm: enable split page table lock for PMD level
@ 2013-09-13 13:06 ` Kirill A. Shutemov
0 siblings, 0 replies; 77+ messages in thread
From: Kirill A. Shutemov @ 2013-09-13 13:06 UTC (permalink / raw)
To: Alex Thorlton, Ingo Molnar, Andrew Morton, Naoya Horiguchi
Cc: Eric W . Biederman, Paul E . McKenney, Al Viro, Andi Kleen,
Andrea Arcangeli, Dave Hansen, Dave Jones, David Howells,
Frederic Weisbecker, Johannes Weiner, Kees Cook, Mel Gorman,
Michael Kerrisk, Oleg Nesterov, Peter Zijlstra, Rik van Riel,
Robin Holt, Sedat Dilek, Srikar Dronamraju, Thomas Gleixner,
linux-kernel, linux-mm, Kirill A. Shutemov
Enable PMD split page table lock for X86_64 and PAE.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/Kconfig | 4 ++++
arch/x86/include/asm/pgalloc.h | 8 +++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 30c40f0..6a5cf6a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1880,6 +1880,10 @@ config USE_PERCPU_NUMA_NODE_ID
def_bool y
depends on NUMA
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+ def_bool y
+ depends on X86_64 || X86_PAE
+
menu "Power management and ACPI options"
config ARCH_HIBERNATION_HEADER
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index b4389a4..f2daea1 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -80,12 +80,18 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
#if PAGETABLE_LEVELS > 2
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
- return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT);
+ struct page *page;
+ page = alloc_pages(GFP_KERNEL | __GFP_REPEAT| __GFP_ZERO, 0);
+ if (!page)
+ return NULL;
+ pgtable_pmd_page_ctor(page);
+ return (pmd_t *)page_address(page);
}
static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
{
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
+ pgtable_pmd_page_dtor(virt_to_page(pmd));
free_page((unsigned long)pmd);
}
--
1.8.4.rc3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 77+ messages in thread