All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Add 16GB hugepage support
@ 2017-07-13 21:53 Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: sparclinux

SPARC architecture supports 16G hugepages but the kernel did not
support them. This patch series adds support for it and also cleanes
up some page walk/alloc functions.

Patch 1/3: get_user_pages() etc. are used for direct IO. These
  functions were not aware of hugepages at the PUD level and would try
  to continue walking page tables beyond the PUD level. Since 16G
  hugepages have page tables allocated till PUD level only, these
  accesses would result in invalid access. This patch adds the case
  for PUD huge pages to these functions.

Patch 2/3: Core changes needed to add 16G hugepage support: To map a
  single 16G hugepage, two PUD entries are used. Each PUD entry maps
  8G portion of a 16G page. This page table encoding scheme is same as
  that used for hugepages at PMD level (8M, 256M and 2G pages) where
  each PMD entry points successively to 8M regions within a page.  No
  page table entries below the PUD level are allocated for 16G
  hugepage since those are not required.

  TSB entries for a 16G page are created at every 4M boundary since
  the HUGE_TSB is used for these pages which is configured with page
  size of 4M.  When walking page tables (on a TSB miss), bits [32:22]
  are transferred from vaddr to PUD to resolve addresses at 4M
  boundary. The resolved address mapping is then stored in HUGE_TSB.

Patch 3/3: Patch 2 added the case of PUD entry being huge in page
  table walk and alloc functions. This new case further increased
  nesting in these functions and made them harder to follow. This
  patch flattens these functions for better readability.

Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: sparclinux@vger.kernel.org

Nitin Gupta (3):
  sparc64: Support huge PUD case in get_user_pages
  sparc64: Add 16GB hugepage support
  sparc64: Cleanup hugepage table walk functions

 arch/sparc/include/asm/page_64.h    |   3 +-
 arch/sparc/include/asm/pgtable_64.h |  20 ++++++-
 arch/sparc/include/asm/tsb.h        |  30 +++++++++++
 arch/sparc/kernel/tsb.S             |   2 +-
 arch/sparc/mm/gup.c                 |  47 ++++++++++++++++-
 arch/sparc/mm/hugetlbpage.c         | 102 +++++++++++++++++++++---------------
 arch/sparc/mm/init_64.c             |  41 ++++++++++++---
 7 files changed, 191 insertions(+), 54 deletions(-)

-- 
2.9.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] sparc64: Support huge PUD case in get_user_pages
  2017-07-13 21:53 [PATCH 0/3] Add 16GB hugepage support Nitin Gupta
@ 2017-07-13 21:53   ` Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
  2 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: David S. Miller, Nitin Gupta, Kirill A. Shutemov, Tom Hromatka,
	Michal Hocko, Ingo Molnar, Lorenzo Stoakes, Jan Kara, sparclinux,
	linux-kernel

get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 arch/sparc/include/asm/pgtable_64.h | 15 ++++++++++--
 arch/sparc/mm/gup.c                 | 47 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931..2579f5a 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -687,6 +687,8 @@ static inline unsigned long pmd_write(pmd_t pmd)
 	return pte_write(pte);
 }
 
+#define pud_write(pud)	pte_write(__pte(pud_val(pud)))
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline unsigned long pmd_dirty(pmd_t pmd)
 {
@@ -823,9 +825,18 @@ static inline unsigned long __pmd_page(pmd_t pmd)
 
 	return ((unsigned long) __va(pfn << PAGE_SHIFT));
 }
+
+static inline unsigned long pud_page_vaddr(pud_t pud)
+{
+	pte_t pte = __pte(pud_val(pud));
+	unsigned long pfn;
+
+	pfn = pte_pfn(pte);
+
+	return ((unsigned long) __va(pfn << PAGE_SHIFT));
+}
+
 #define pmd_page(pmd) 			virt_to_page((void *)__pmd_page(pmd))
-#define pud_page_vaddr(pud)		\
-	((unsigned long) __va(pud_val(pud)))
 #define pud_page(pud) 			virt_to_page((void *)pud_page_vaddr(pud))
 #define pmd_clear(pmdp)			(pmd_val(*(pmdp)) = 0UL)
 #define pud_present(pud)		(pud_val(pud) != 0U)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index f80cfc6..d777594 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -103,6 +103,47 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
 	return 1;
 }
 
+static int gup_huge_pud(pud_t *pudp, pud_t pud, unsigned long addr,
+			unsigned long end, int write, struct page **pages,
+			int *nr)
+{
+	struct page *head, *page;
+	int refs;
+
+	if (!(pud_val(pud) & _PAGE_VALID))
+		return 0;
+
+	if (write && !pud_write(pud))
+		return 0;
+
+	refs = 0;
+	head = pud_page(pud);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	if (PageTail(head))
+		head = compound_head(head);
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(pud) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	return 1;
+}
+
 static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 		int write, struct page **pages, int *nr)
 {
@@ -141,7 +182,11 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 		next = pud_addr_end(addr, end);
 		if (pud_none(pud))
 			return 0;
-		if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+		if (unlikely(pud_large(pud))) {
+			if (!gup_huge_pud(pudp, pud, addr, next,
+					  write, pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
 			return 0;
 	} while (pudp++, addr = next, addr != end);
 
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/3] sparc64: Support huge PUD case in get_user_pages
@ 2017-07-13 21:53   ` Nitin Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: Nitin Gupta, Kirill A. Shutemov, Tom Hromatka, Michal Hocko,
	Ingo Molnar, Lorenzo Stoakes, Jan Kara, sparclinux, linux-kernel

get_user_pages() is used to do direct IO. It already
handles the case where the address range is backed
by PMD huge pages. This patch now adds the case where
the range could be backed by PUD huge pages.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 arch/sparc/include/asm/pgtable_64.h | 15 ++++++++++--
 arch/sparc/mm/gup.c                 | 47 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6fbd931..2579f5a 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -687,6 +687,8 @@ static inline unsigned long pmd_write(pmd_t pmd)
 	return pte_write(pte);
 }
 
+#define pud_write(pud)	pte_write(__pte(pud_val(pud)))
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline unsigned long pmd_dirty(pmd_t pmd)
 {
@@ -823,9 +825,18 @@ static inline unsigned long __pmd_page(pmd_t pmd)
 
 	return ((unsigned long) __va(pfn << PAGE_SHIFT));
 }
+
+static inline unsigned long pud_page_vaddr(pud_t pud)
+{
+	pte_t pte = __pte(pud_val(pud));
+	unsigned long pfn;
+
+	pfn = pte_pfn(pte);
+
+	return ((unsigned long) __va(pfn << PAGE_SHIFT));
+}
+
 #define pmd_page(pmd) 			virt_to_page((void *)__pmd_page(pmd))
-#define pud_page_vaddr(pud)		\
-	((unsigned long) __va(pud_val(pud)))
 #define pud_page(pud) 			virt_to_page((void *)pud_page_vaddr(pud))
 #define pmd_clear(pmdp)			(pmd_val(*(pmdp)) = 0UL)
 #define pud_present(pud)		(pud_val(pud) != 0U)
diff --git a/arch/sparc/mm/gup.c b/arch/sparc/mm/gup.c
index f80cfc6..d777594 100644
--- a/arch/sparc/mm/gup.c
+++ b/arch/sparc/mm/gup.c
@@ -103,6 +103,47 @@ static int gup_huge_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr,
 	return 1;
 }
 
+static int gup_huge_pud(pud_t *pudp, pud_t pud, unsigned long addr,
+			unsigned long end, int write, struct page **pages,
+			int *nr)
+{
+	struct page *head, *page;
+	int refs;
+
+	if (!(pud_val(pud) & _PAGE_VALID))
+		return 0;
+
+	if (write && !pud_write(pud))
+		return 0;
+
+	refs = 0;
+	head = pud_page(pud);
+	page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	if (PageTail(head))
+		head = compound_head(head);
+	do {
+		VM_BUG_ON(compound_head(page) != head);
+		pages[*nr] = page;
+		(*nr)++;
+		page++;
+		refs++;
+	} while (addr += PAGE_SIZE, addr != end);
+
+	if (!page_cache_add_speculative(head, refs)) {
+		*nr -= refs;
+		return 0;
+	}
+
+	if (unlikely(pud_val(pud) != pud_val(*pudp))) {
+		*nr -= refs;
+		while (refs--)
+			put_page(head);
+		return 0;
+	}
+
+	return 1;
+}
+
 static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 		int write, struct page **pages, int *nr)
 {
@@ -141,7 +182,11 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
 		next = pud_addr_end(addr, end);
 		if (pud_none(pud))
 			return 0;
-		if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+		if (unlikely(pud_large(pud))) {
+			if (!gup_huge_pud(pudp, pud, addr, next,
+					  write, pages, nr))
+				return 0;
+		} else if (!gup_pmd_range(pud, addr, next, write, pages, nr))
 			return 0;
 	} while (pudp++, addr = next, addr != end);
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] sparc64: Add 16GB hugepage support
  2017-07-13 21:53 [PATCH 0/3] Add 16GB hugepage support Nitin Gupta
@ 2017-07-13 21:53   ` Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
  2 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: David S. Miller, Nitin Gupta, Mike Kravetz, Kirill A. Shutemov,
	Tom Hromatka, Michal Hocko, Ingo Molnar, Andrew Morton,
	Steve Capper, Hugh Dickins, Punit Agrawal, bob picco,
	Pavel Tatashin, Steven Sistare, Paul Gortmaker, Thomas Tai,
	Atish Patra, sparclinux, linux-kernel

Adds support for 16GB hugepage size. To use this page size
use kernel parameters as:

default_hugepagesz=16G hugepagesz=16G hugepages=10

Testing:

Tested with the stream benchmark which allocates 48G of
arrays backed by 16G hugepages and does RW operation on
them in parallel.

Orabug: 25362942

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 arch/sparc/include/asm/page_64.h    |  3 +-
 arch/sparc/include/asm/pgtable_64.h |  5 +++
 arch/sparc/include/asm/tsb.h        | 30 +++++++++++++++
 arch/sparc/kernel/tsb.S             |  2 +-
 arch/sparc/mm/hugetlbpage.c         | 74 ++++++++++++++++++++++++++-----------
 arch/sparc/mm/init_64.c             | 41 ++++++++++++++++----
 6 files changed, 125 insertions(+), 30 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 5961b2d..8ee1f97 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -17,6 +17,7 @@
 
 #define HPAGE_SHIFT		23
 #define REAL_HPAGE_SHIFT	22
+#define HPAGE_16GB_SHIFT	34
 #define HPAGE_2GB_SHIFT		31
 #define HPAGE_256MB_SHIFT	28
 #define HPAGE_64K_SHIFT		16
@@ -28,7 +29,7 @@
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 #define REAL_HPAGE_PER_HPAGE	(_AC(1,UL) << (HPAGE_SHIFT - REAL_HPAGE_SHIFT))
-#define HUGE_MAX_HSTATE		4
+#define HUGE_MAX_HSTATE		5
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2579f5a..4fefe37 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -414,6 +414,11 @@ static inline bool is_hugetlb_pmd(pmd_t pmd)
 	return !!(pmd_val(pmd) & _PAGE_PMD_HUGE);
 }
 
+static inline bool is_hugetlb_pud(pud_t pud)
+{
+	return !!(pud_val(pud) & _PAGE_PUD_HUGE);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline pmd_t pmd_mkhuge(pmd_t pmd)
 {
diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 32258e0..7b240a3 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
 	 nop; \
 699:
 
+	/* PUD has been loaded into REG1, interpret the value, seeing
+	 * if it is a HUGE PUD or a normal one.  If it is not valid
+	 * then jump to FAIL_LABEL.  If it is a HUGE PUD, and it
+	 * translates to a valid PTE, branch to PTE_LABEL.
+	 *
+	 * We have to propagate bits [32:22] from the virtual address
+	 * to resolve at 4M granularity.
+	 */
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+	brz,pn		REG1, FAIL_LABEL;		\
+	 sethi		%uhi(_PAGE_PUD_HUGE), REG2;	\
+	sllx		REG2, 32, REG2;			\
+	andcc		REG1, REG2, %g0;		\
+	be,pt		%xcc, 700f;			\
+	 sethi		%hi(0x1ffc0000), REG2;		\
+	sllx		REG2, 1, REG2;			\
+	brgez,pn	REG1, FAIL_LABEL;		\
+	 andn		REG1, REG2, REG1;		\
+	and		VADDR, REG2, REG2;		\
+	brlz,pt		REG1, PTE_LABEL;		\
+	 or		REG1, REG2, REG1;		\
+700:
+#else
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+	brz,pn		REG1, FAIL_LABEL; \
+	 nop;
+#endif
+
 	/* PMD has been loaded into REG1, interpret the value, seeing
 	 * if it is a HUGE PMD or a normal one.  If it is not valid
 	 * then jump to FAIL_LABEL.  If it is a HUGE PMD, and it
@@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
 	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
 	andn		REG2, 0x7, REG2; \
 	ldxa		[REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+	USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
 	brz,pn		REG1, FAIL_LABEL; \
 	 sllx		VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
 	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 07c0df9..5f42ac0 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -117,7 +117,7 @@ tsb_miss_page_table_walk_sun4v_fastpath:
 	/* Valid PTE is now in %g5.  */
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-	sethi		%uhi(_PAGE_PMD_HUGE), %g7
+	sethi		%uhi(_PAGE_PMD_HUGE | _PAGE_PUD_HUGE), %g7
 	sllx		%g7, 32, %g7
 
 	andcc		%g5, %g7, %g0
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 28ee8d8..7acb84d 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -143,6 +143,10 @@ static pte_t sun4v_hugepage_shift_to_tte(pte_t entry, unsigned int shift)
 	pte_val(entry) = pte_val(entry) & ~_PAGE_SZALL_4V;
 
 	switch (shift) {
+	case HPAGE_16GB_SHIFT:
+		hugepage_size = _PAGE_SZ16GB_4V;
+		pte_val(entry) |= _PAGE_PUD_HUGE;
+		break;
 	case HPAGE_2GB_SHIFT:
 		hugepage_size = _PAGE_SZ2GB_4V;
 		pte_val(entry) |= _PAGE_PMD_HUGE;
@@ -187,6 +191,9 @@ static unsigned int sun4v_huge_tte_to_shift(pte_t entry)
 	unsigned int shift;
 
 	switch (tte_szbits) {
+	case _PAGE_SZ16GB_4V:
+		shift = HPAGE_16GB_SHIFT;
+		break;
 	case _PAGE_SZ2GB_4V:
 		shift = HPAGE_2GB_SHIFT;
 		break;
@@ -263,7 +270,12 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 
 	pgd = pgd_offset(mm, addr);
 	pud = pud_alloc(mm, pgd, addr);
-	if (pud) {
+	if (!pud)
+		return NULL;
+
+	if (sz >= PUD_SIZE)
+		pte = (pte_t *)pud;
+	else {
 		pmd = pmd_alloc(mm, pud, addr);
 		if (!pmd)
 			return NULL;
@@ -289,12 +301,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	if (!pgd_none(*pgd)) {
 		pud = pud_offset(pgd, addr);
 		if (!pud_none(*pud)) {
-			pmd = pmd_offset(pud, addr);
-			if (!pmd_none(*pmd)) {
-				if (is_hugetlb_pmd(*pmd))
-					pte = (pte_t *)pmd;
-				else
-					pte = pte_offset_map(pmd, addr);
+			if (is_hugetlb_pud(*pud))
+				pte = (pte_t *)pud;
+			else {
+				pmd = pmd_offset(pud, addr);
+				if (!pmd_none(*pmd)) {
+					if (is_hugetlb_pmd(*pmd))
+						pte = (pte_t *)pmd;
+					else
+						pte = pte_offset_map(pmd, addr);
+				}
 			}
 		}
 	}
@@ -305,12 +321,20 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 		     pte_t *ptep, pte_t entry)
 {
-	unsigned int i, nptes, orig_shift, shift;
-	unsigned long size;
+	unsigned int nptes, orig_shift, shift;
+	unsigned long i, size;
 	pte_t orig;
 
 	size = huge_tte_to_size(entry);
-	shift = size >= HPAGE_SIZE ? PMD_SHIFT : PAGE_SHIFT;
+
+	shift = PAGE_SHIFT;
+	if (size >= PUD_SIZE)
+		shift = PUD_SHIFT;
+	else if (size >= PMD_SIZE)
+		shift = PMD_SHIFT;
+	else
+		shift = PAGE_SHIFT;
+
 	nptes = size >> shift;
 
 	if (!pte_present(*ptep) && pte_present(entry))
@@ -333,19 +357,23 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep)
 {
-	unsigned int i, nptes, hugepage_shift;
+	unsigned int i, nptes, orig_shift, shift;
 	unsigned long size;
 	pte_t entry;
 
 	entry = *ptep;
 	size = huge_tte_to_size(entry);
-	if (size >= HPAGE_SIZE)
-		nptes = size >> PMD_SHIFT;
+
+	shift = PAGE_SHIFT;
+	if (size >= PUD_SIZE)
+		shift = PUD_SHIFT;
+	else if (size >= PMD_SIZE)
+		shift = PMD_SHIFT;
 	else
-		nptes = size >> PAGE_SHIFT;
+		shift = PAGE_SHIFT;
 
-	hugepage_shift = pte_none(entry) ? PAGE_SHIFT :
-		huge_tte_to_shift(entry);
+	nptes = size >> shift;
+	orig_shift = pte_none(entry) ? PAGE_SHIFT : huge_tte_to_shift(entry);
 
 	if (pte_present(entry))
 		mm->context.hugetlb_pte_count -= nptes;
@@ -354,11 +382,11 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	for (i = 0; i < nptes; i++)
 		ptep[i] = __pte(0UL);
 
-	maybe_tlb_batch_add(mm, addr, ptep, entry, 0, hugepage_shift);
+	maybe_tlb_batch_add(mm, addr, ptep, entry, 0, orig_shift);
 	/* An HPAGE_SIZE'ed page is composed of two REAL_HPAGE_SIZE'ed pages */
 	if (size == HPAGE_SIZE)
 		maybe_tlb_batch_add(mm, addr + REAL_HPAGE_SIZE, ptep, entry, 0,
-				    hugepage_shift);
+				    orig_shift);
 
 	return entry;
 }
@@ -371,7 +399,8 @@ int pmd_huge(pmd_t pmd)
 
 int pud_huge(pud_t pud)
 {
-	return 0;
+	return !pud_none(pud) &&
+		(pud_val(pud) & (_PAGE_VALID|_PAGE_PUD_HUGE)) != _PAGE_VALID;
 }
 
 static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
@@ -435,8 +464,11 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
 		next = pud_addr_end(addr, end);
 		if (pud_none_or_clear_bad(pud))
 			continue;
-		hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
-				       ceiling);
+		if (is_hugetlb_pud(*pud))
+			pud_clear(pud);
+		else
+			hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
+					       ceiling);
 	} while (pud++, addr = next, addr != end);
 
 	start &= PGDIR_MASK;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd..cc8d0d4 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -337,6 +337,10 @@ static int __init setup_hugepagesz(char *string)
 	hugepage_shift = ilog2(hugepage_size);
 
 	switch (hugepage_shift) {
+	case HPAGE_16GB_SHIFT:
+		hv_pgsz_mask = HV_PGSZ_MASK_16GB;
+		hv_pgsz_idx = HV_PGSZ_IDX_16GB;
+		break;
 	case HPAGE_2GB_SHIFT:
 		hv_pgsz_mask = HV_PGSZ_MASK_2GB;
 		hv_pgsz_idx = HV_PGSZ_IDX_2GB;
@@ -377,6 +381,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 {
 	struct mm_struct *mm;
 	unsigned long flags;
+	bool is_huge_tsb;
 	pte_t pte = *ptep;
 
 	if (tlb_type != hypervisor) {
@@ -394,15 +399,37 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 
 	spin_lock_irqsave(&mm->context.lock, flags);
 
+	is_huge_tsb = false;
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-	if ((mm->context.hugetlb_pte_count || mm->context.thp_pte_count) &&
-	    is_hugetlb_pmd(__pmd(pte_val(pte)))) {
-		/* We are fabricating 8MB pages using 4MB real hw pages.  */
-		pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
-		__update_mmu_tsb_insert(mm, MM_TSB_HUGE, REAL_HPAGE_SHIFT,
-					address, pte_val(pte));
-	} else
+	if (mm->context.hugetlb_pte_count || mm->context.thp_pte_count) {
+		unsigned long hugepage_size = PAGE_SIZE;
+
+		if (is_vm_hugetlb_page(vma))
+			hugepage_size = huge_page_size(hstate_vma(vma));
+
+		if (hugepage_size >= PUD_SIZE) {
+			unsigned long mask = 0x1ffc00000UL;
+
+			/* Transfer bits [32:22] from address to resolve
+			 * at 4M granularity.
+			 */
+			pte_val(pte) &= ~mask;
+			pte_val(pte) |= (address & mask);
+		} else if (hugepage_size >= PMD_SIZE) {
+			/* We are fabricating 8MB pages using 4MB
+			 * real hw pages.
+			 */
+			pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
+		}
+
+		if (hugepage_size >= PMD_SIZE) {
+			__update_mmu_tsb_insert(mm, MM_TSB_HUGE,
+				REAL_HPAGE_SHIFT, address, pte_val(pte));
+			is_huge_tsb = true;
+		}
+	}
 #endif
+	if (!is_huge_tsb)
 		__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
 					address, pte_val(pte));
 
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] sparc64: Add 16GB hugepage support
@ 2017-07-13 21:53   ` Nitin Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: Nitin Gupta, Mike Kravetz, Kirill A. Shutemov, Tom Hromatka,
	Michal Hocko, Ingo Molnar, Andrew Morton, Steve Capper,
	Hugh Dickins, Punit Agrawal, bob picco, Pavel Tatashin,
	Steven Sistare, Paul Gortmaker, Thomas Tai, Atish Patra,
	sparclinux, linux-kernel

Adds support for 16GB hugepage size. To use this page size
use kernel parameters as:

default_hugepagesz\x16G hugepagesz\x16G hugepages\x10

Testing:

Tested with the stream benchmark which allocates 48G of
arrays backed by 16G hugepages and does RW operation on
them in parallel.

Orabug: 25362942

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 arch/sparc/include/asm/page_64.h    |  3 +-
 arch/sparc/include/asm/pgtable_64.h |  5 +++
 arch/sparc/include/asm/tsb.h        | 30 +++++++++++++++
 arch/sparc/kernel/tsb.S             |  2 +-
 arch/sparc/mm/hugetlbpage.c         | 74 ++++++++++++++++++++++++++-----------
 arch/sparc/mm/init_64.c             | 41 ++++++++++++++++----
 6 files changed, 125 insertions(+), 30 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 5961b2d..8ee1f97 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -17,6 +17,7 @@
 
 #define HPAGE_SHIFT		23
 #define REAL_HPAGE_SHIFT	22
+#define HPAGE_16GB_SHIFT	34
 #define HPAGE_2GB_SHIFT		31
 #define HPAGE_256MB_SHIFT	28
 #define HPAGE_64K_SHIFT		16
@@ -28,7 +29,7 @@
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
 #define REAL_HPAGE_PER_HPAGE	(_AC(1,UL) << (HPAGE_SHIFT - REAL_HPAGE_SHIFT))
-#define HUGE_MAX_HSTATE		4
+#define HUGE_MAX_HSTATE		5
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2579f5a..4fefe37 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -414,6 +414,11 @@ static inline bool is_hugetlb_pmd(pmd_t pmd)
 	return !!(pmd_val(pmd) & _PAGE_PMD_HUGE);
 }
 
+static inline bool is_hugetlb_pud(pud_t pud)
+{
+	return !!(pud_val(pud) & _PAGE_PUD_HUGE);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline pmd_t pmd_mkhuge(pmd_t pmd)
 {
diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
index 32258e0..7b240a3 100644
--- a/arch/sparc/include/asm/tsb.h
+++ b/arch/sparc/include/asm/tsb.h
@@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
 	 nop; \
 699:
 
+	/* PUD has been loaded into REG1, interpret the value, seeing
+	 * if it is a HUGE PUD or a normal one.  If it is not valid
+	 * then jump to FAIL_LABEL.  If it is a HUGE PUD, and it
+	 * translates to a valid PTE, branch to PTE_LABEL.
+	 *
+	 * We have to propagate bits [32:22] from the virtual address
+	 * to resolve at 4M granularity.
+	 */
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+	brz,pn		REG1, FAIL_LABEL;		\
+	 sethi		%uhi(_PAGE_PUD_HUGE), REG2;	\
+	sllx		REG2, 32, REG2;			\
+	andcc		REG1, REG2, %g0;		\
+	be,pt		%xcc, 700f;			\
+	 sethi		%hi(0x1ffc0000), REG2;		\
+	sllx		REG2, 1, REG2;			\
+	brgez,pn	REG1, FAIL_LABEL;		\
+	 andn		REG1, REG2, REG1;		\
+	and		VADDR, REG2, REG2;		\
+	brlz,pt		REG1, PTE_LABEL;		\
+	 or		REG1, REG2, REG1;		\
+700:
+#else
+#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
+	brz,pn		REG1, FAIL_LABEL; \
+	 nop;
+#endif
+
 	/* PMD has been loaded into REG1, interpret the value, seeing
 	 * if it is a HUGE PMD or a normal one.  If it is not valid
 	 * then jump to FAIL_LABEL.  If it is a HUGE PMD, and it
@@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
 	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
 	andn		REG2, 0x7, REG2; \
 	ldxa		[REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
+	USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
 	brz,pn		REG1, FAIL_LABEL; \
 	 sllx		VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
 	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
diff --git a/arch/sparc/kernel/tsb.S b/arch/sparc/kernel/tsb.S
index 07c0df9..5f42ac0 100644
--- a/arch/sparc/kernel/tsb.S
+++ b/arch/sparc/kernel/tsb.S
@@ -117,7 +117,7 @@ tsb_miss_page_table_walk_sun4v_fastpath:
 	/* Valid PTE is now in %g5.  */
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-	sethi		%uhi(_PAGE_PMD_HUGE), %g7
+	sethi		%uhi(_PAGE_PMD_HUGE | _PAGE_PUD_HUGE), %g7
 	sllx		%g7, 32, %g7
 
 	andcc		%g5, %g7, %g0
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 28ee8d8..7acb84d 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -143,6 +143,10 @@ static pte_t sun4v_hugepage_shift_to_tte(pte_t entry, unsigned int shift)
 	pte_val(entry) = pte_val(entry) & ~_PAGE_SZALL_4V;
 
 	switch (shift) {
+	case HPAGE_16GB_SHIFT:
+		hugepage_size = _PAGE_SZ16GB_4V;
+		pte_val(entry) |= _PAGE_PUD_HUGE;
+		break;
 	case HPAGE_2GB_SHIFT:
 		hugepage_size = _PAGE_SZ2GB_4V;
 		pte_val(entry) |= _PAGE_PMD_HUGE;
@@ -187,6 +191,9 @@ static unsigned int sun4v_huge_tte_to_shift(pte_t entry)
 	unsigned int shift;
 
 	switch (tte_szbits) {
+	case _PAGE_SZ16GB_4V:
+		shift = HPAGE_16GB_SHIFT;
+		break;
 	case _PAGE_SZ2GB_4V:
 		shift = HPAGE_2GB_SHIFT;
 		break;
@@ -263,7 +270,12 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 
 	pgd = pgd_offset(mm, addr);
 	pud = pud_alloc(mm, pgd, addr);
-	if (pud) {
+	if (!pud)
+		return NULL;
+
+	if (sz >= PUD_SIZE)
+		pte = (pte_t *)pud;
+	else {
 		pmd = pmd_alloc(mm, pud, addr);
 		if (!pmd)
 			return NULL;
@@ -289,12 +301,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	if (!pgd_none(*pgd)) {
 		pud = pud_offset(pgd, addr);
 		if (!pud_none(*pud)) {
-			pmd = pmd_offset(pud, addr);
-			if (!pmd_none(*pmd)) {
-				if (is_hugetlb_pmd(*pmd))
-					pte = (pte_t *)pmd;
-				else
-					pte = pte_offset_map(pmd, addr);
+			if (is_hugetlb_pud(*pud))
+				pte = (pte_t *)pud;
+			else {
+				pmd = pmd_offset(pud, addr);
+				if (!pmd_none(*pmd)) {
+					if (is_hugetlb_pmd(*pmd))
+						pte = (pte_t *)pmd;
+					else
+						pte = pte_offset_map(pmd, addr);
+				}
 			}
 		}
 	}
@@ -305,12 +321,20 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 		     pte_t *ptep, pte_t entry)
 {
-	unsigned int i, nptes, orig_shift, shift;
-	unsigned long size;
+	unsigned int nptes, orig_shift, shift;
+	unsigned long i, size;
 	pte_t orig;
 
 	size = huge_tte_to_size(entry);
-	shift = size >= HPAGE_SIZE ? PMD_SHIFT : PAGE_SHIFT;
+
+	shift = PAGE_SHIFT;
+	if (size >= PUD_SIZE)
+		shift = PUD_SHIFT;
+	else if (size >= PMD_SIZE)
+		shift = PMD_SHIFT;
+	else
+		shift = PAGE_SHIFT;
+
 	nptes = size >> shift;
 
 	if (!pte_present(*ptep) && pte_present(entry))
@@ -333,19 +357,23 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep)
 {
-	unsigned int i, nptes, hugepage_shift;
+	unsigned int i, nptes, orig_shift, shift;
 	unsigned long size;
 	pte_t entry;
 
 	entry = *ptep;
 	size = huge_tte_to_size(entry);
-	if (size >= HPAGE_SIZE)
-		nptes = size >> PMD_SHIFT;
+
+	shift = PAGE_SHIFT;
+	if (size >= PUD_SIZE)
+		shift = PUD_SHIFT;
+	else if (size >= PMD_SIZE)
+		shift = PMD_SHIFT;
 	else
-		nptes = size >> PAGE_SHIFT;
+		shift = PAGE_SHIFT;
 
-	hugepage_shift = pte_none(entry) ? PAGE_SHIFT :
-		huge_tte_to_shift(entry);
+	nptes = size >> shift;
+	orig_shift = pte_none(entry) ? PAGE_SHIFT : huge_tte_to_shift(entry);
 
 	if (pte_present(entry))
 		mm->context.hugetlb_pte_count -= nptes;
@@ -354,11 +382,11 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	for (i = 0; i < nptes; i++)
 		ptep[i] = __pte(0UL);
 
-	maybe_tlb_batch_add(mm, addr, ptep, entry, 0, hugepage_shift);
+	maybe_tlb_batch_add(mm, addr, ptep, entry, 0, orig_shift);
 	/* An HPAGE_SIZE'ed page is composed of two REAL_HPAGE_SIZE'ed pages */
 	if (size = HPAGE_SIZE)
 		maybe_tlb_batch_add(mm, addr + REAL_HPAGE_SIZE, ptep, entry, 0,
-				    hugepage_shift);
+				    orig_shift);
 
 	return entry;
 }
@@ -371,7 +399,8 @@ int pmd_huge(pmd_t pmd)
 
 int pud_huge(pud_t pud)
 {
-	return 0;
+	return !pud_none(pud) &&
+		(pud_val(pud) & (_PAGE_VALID|_PAGE_PUD_HUGE)) != _PAGE_VALID;
 }
 
 static void hugetlb_free_pte_range(struct mmu_gather *tlb, pmd_t *pmd,
@@ -435,8 +464,11 @@ static void hugetlb_free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
 		next = pud_addr_end(addr, end);
 		if (pud_none_or_clear_bad(pud))
 			continue;
-		hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
-				       ceiling);
+		if (is_hugetlb_pud(*pud))
+			pud_clear(pud);
+		else
+			hugetlb_free_pmd_range(tlb, pud, addr, next, floor,
+					       ceiling);
 	} while (pud++, addr = next, addr != end);
 
 	start &= PGDIR_MASK;
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 3c40ebd..cc8d0d4 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -337,6 +337,10 @@ static int __init setup_hugepagesz(char *string)
 	hugepage_shift = ilog2(hugepage_size);
 
 	switch (hugepage_shift) {
+	case HPAGE_16GB_SHIFT:
+		hv_pgsz_mask = HV_PGSZ_MASK_16GB;
+		hv_pgsz_idx = HV_PGSZ_IDX_16GB;
+		break;
 	case HPAGE_2GB_SHIFT:
 		hv_pgsz_mask = HV_PGSZ_MASK_2GB;
 		hv_pgsz_idx = HV_PGSZ_IDX_2GB;
@@ -377,6 +381,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 {
 	struct mm_struct *mm;
 	unsigned long flags;
+	bool is_huge_tsb;
 	pte_t pte = *ptep;
 
 	if (tlb_type != hypervisor) {
@@ -394,15 +399,37 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 
 	spin_lock_irqsave(&mm->context.lock, flags);
 
+	is_huge_tsb = false;
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-	if ((mm->context.hugetlb_pte_count || mm->context.thp_pte_count) &&
-	    is_hugetlb_pmd(__pmd(pte_val(pte)))) {
-		/* We are fabricating 8MB pages using 4MB real hw pages.  */
-		pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
-		__update_mmu_tsb_insert(mm, MM_TSB_HUGE, REAL_HPAGE_SHIFT,
-					address, pte_val(pte));
-	} else
+	if (mm->context.hugetlb_pte_count || mm->context.thp_pte_count) {
+		unsigned long hugepage_size = PAGE_SIZE;
+
+		if (is_vm_hugetlb_page(vma))
+			hugepage_size = huge_page_size(hstate_vma(vma));
+
+		if (hugepage_size >= PUD_SIZE) {
+			unsigned long mask = 0x1ffc00000UL;
+
+			/* Transfer bits [32:22] from address to resolve
+			 * at 4M granularity.
+			 */
+			pte_val(pte) &= ~mask;
+			pte_val(pte) |= (address & mask);
+		} else if (hugepage_size >= PMD_SIZE) {
+			/* We are fabricating 8MB pages using 4MB
+			 * real hw pages.
+			 */
+			pte_val(pte) |= (address & (1UL << REAL_HPAGE_SHIFT));
+		}
+
+		if (hugepage_size >= PMD_SIZE) {
+			__update_mmu_tsb_insert(mm, MM_TSB_HUGE,
+				REAL_HPAGE_SHIFT, address, pte_val(pte));
+			is_huge_tsb = true;
+		}
+	}
 #endif
+	if (!is_huge_tsb)
 		__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
 					address, pte_val(pte));
 
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] sparc64: Cleanup hugepage table walk functions
  2017-07-13 21:53 [PATCH 0/3] Add 16GB hugepage support Nitin Gupta
@ 2017-07-13 21:53   ` Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
  2017-07-13 21:53   ` Nitin Gupta
  2 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: David S. Miller, Nitin Gupta, Andrew Morton, Steve Capper,
	Hugh Dickins, Mike Kravetz, Punit Agrawal, Ingo Molnar,
	sparclinux, linux-kernel

Flatten out nested code structure in huge_pte_offset()
and huge_pte_alloc().

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 arch/sparc/mm/hugetlbpage.c | 54 +++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 34 deletions(-)

diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 7acb84d..bcd8cdb 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -266,27 +266,19 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte = NULL;
 
 	pgd = pgd_offset(mm, addr);
 	pud = pud_alloc(mm, pgd, addr);
 	if (!pud)
 		return NULL;
-
 	if (sz >= PUD_SIZE)
-		pte = (pte_t *)pud;
-	else {
-		pmd = pmd_alloc(mm, pud, addr);
-		if (!pmd)
-			return NULL;
-
-		if (sz >= PMD_SIZE)
-			pte = (pte_t *)pmd;
-		else
-			pte = pte_alloc_map(mm, pmd, addr);
-	}
-
-	return pte;
+		return (pte_t *)pud;
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return NULL;
+	if (sz >= PMD_SIZE)
+		return (pte_t *)pmd;
+	return pte_alloc_map(mm, pmd, addr);
 }
 
 pte_t *huge_pte_offset(struct mm_struct *mm,
@@ -295,27 +287,21 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte = NULL;
 
 	pgd = pgd_offset(mm, addr);
-	if (!pgd_none(*pgd)) {
-		pud = pud_offset(pgd, addr);
-		if (!pud_none(*pud)) {
-			if (is_hugetlb_pud(*pud))
-				pte = (pte_t *)pud;
-			else {
-				pmd = pmd_offset(pud, addr);
-				if (!pmd_none(*pmd)) {
-					if (is_hugetlb_pmd(*pmd))
-						pte = (pte_t *)pmd;
-					else
-						pte = pte_offset_map(pmd, addr);
-				}
-			}
-		}
-	}
-
-	return pte;
+	if (pgd_none(*pgd))
+		return NULL;
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud))
+		return NULL;
+	if (is_hugetlb_pud(*pud))
+		return (pte_t *)pud;
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd))
+		return NULL;
+	if (is_hugetlb_pmd(*pmd))
+		return (pte_t *)pmd;
+	return pte_offset_map(pmd, addr);
 }
 
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] sparc64: Cleanup hugepage table walk functions
@ 2017-07-13 21:53   ` Nitin Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-13 21:53 UTC (permalink / raw)
  To: David S. Miller
  Cc: Nitin Gupta, Andrew Morton, Steve Capper, Hugh Dickins,
	Mike Kravetz, Punit Agrawal, Ingo Molnar, sparclinux,
	linux-kernel

Flatten out nested code structure in huge_pte_offset()
and huge_pte_alloc().

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
---
 arch/sparc/mm/hugetlbpage.c | 54 +++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 34 deletions(-)

diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 7acb84d..bcd8cdb 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -266,27 +266,19 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte = NULL;
 
 	pgd = pgd_offset(mm, addr);
 	pud = pud_alloc(mm, pgd, addr);
 	if (!pud)
 		return NULL;
-
 	if (sz >= PUD_SIZE)
-		pte = (pte_t *)pud;
-	else {
-		pmd = pmd_alloc(mm, pud, addr);
-		if (!pmd)
-			return NULL;
-
-		if (sz >= PMD_SIZE)
-			pte = (pte_t *)pmd;
-		else
-			pte = pte_alloc_map(mm, pmd, addr);
-	}
-
-	return pte;
+		return (pte_t *)pud;
+	pmd = pmd_alloc(mm, pud, addr);
+	if (!pmd)
+		return NULL;
+	if (sz >= PMD_SIZE)
+		return (pte_t *)pmd;
+	return pte_alloc_map(mm, pmd, addr);
 }
 
 pte_t *huge_pte_offset(struct mm_struct *mm,
@@ -295,27 +287,21 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
-	pte_t *pte = NULL;
 
 	pgd = pgd_offset(mm, addr);
-	if (!pgd_none(*pgd)) {
-		pud = pud_offset(pgd, addr);
-		if (!pud_none(*pud)) {
-			if (is_hugetlb_pud(*pud))
-				pte = (pte_t *)pud;
-			else {
-				pmd = pmd_offset(pud, addr);
-				if (!pmd_none(*pmd)) {
-					if (is_hugetlb_pmd(*pmd))
-						pte = (pte_t *)pmd;
-					else
-						pte = pte_offset_map(pmd, addr);
-				}
-			}
-		}
-	}
-
-	return pte;
+	if (pgd_none(*pgd))
+		return NULL;
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud))
+		return NULL;
+	if (is_hugetlb_pud(*pud))
+		return (pte_t *)pud;
+	pmd = pmd_offset(pud, addr);
+	if (pmd_none(*pmd))
+		return NULL;
+	if (is_hugetlb_pmd(*pmd))
+		return (pte_t *)pmd;
+	return pte_offset_map(pmd, addr);
 }
 
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
  2017-07-13 21:53   ` Nitin Gupta
@ 2017-07-20 20:04     ` David Miller
  -1 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2017-07-20 20:04 UTC (permalink / raw)
  To: nitin.m.gupta
  Cc: mike.kravetz, kirill.shutemov, tom.hromatka, mhocko, mingo, akpm,
	steve.capper, hughd, punit.agrawal, bob.picco, pasha.tatashin,
	steven.sistare, paul.gortmaker, thomas.tai, atish.patra,
	sparclinux, linux-kernel

From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Thu, 13 Jul 2017 14:53:24 -0700

> Testing:
> 
> Tested with the stream benchmark which allocates 48G of
> arrays backed by 16G hugepages and does RW operation on
> them in parallel.

It would be great if we started adding tests under
tools/testing/selftests so that other people can recreate
your tests/benchmarks.

> diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
> index 32258e0..7b240a3 100644
> --- a/arch/sparc/include/asm/tsb.h
> +++ b/arch/sparc/include/asm/tsb.h
> @@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>  	 nop; \
>  699:
>  
> +	/* PUD has been loaded into REG1, interpret the value, seeing
> +	 * if it is a HUGE PUD or a normal one.  If it is not valid
> +	 * then jump to FAIL_LABEL.  If it is a HUGE PUD, and it
> +	 * translates to a valid PTE, branch to PTE_LABEL.
> +	 *
> +	 * We have to propagate bits [32:22] from the virtual address
> +	 * to resolve at 4M granularity.
> +	 */
> +#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
> +	brz,pn		REG1, FAIL_LABEL;		\
> +	 sethi		%uhi(_PAGE_PUD_HUGE), REG2;	\
> +	sllx		REG2, 32, REG2;			\
> +	andcc		REG1, REG2, %g0;		\
> +	be,pt		%xcc, 700f;			\
> +	 sethi		%hi(0x1ffc0000), REG2;		\
> +	sllx		REG2, 1, REG2;			\
> +	brgez,pn	REG1, FAIL_LABEL;		\
> +	 andn		REG1, REG2, REG1;		\
> +	and		VADDR, REG2, REG2;		\
> +	brlz,pt		REG1, PTE_LABEL;		\
> +	 or		REG1, REG2, REG1;		\
> +700:
> +#else
> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
> +	brz,pn		REG1, FAIL_LABEL; \
> +	 nop;
> +#endif
> +
>  	/* PMD has been loaded into REG1, interpret the value, seeing
>  	 * if it is a HUGE PMD or a normal one.  If it is not valid
>  	 * then jump to FAIL_LABEL.  If it is a HUGE PMD, and it
> @@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
>  	andn		REG2, 0x7, REG2; \
>  	ldxa		[REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
> +	USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
>  	brz,pn		REG1, FAIL_LABEL; \
>  	 sllx		VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \

This macro is getting way out of control, every TLB/TSB miss is
going to invoke this sequence of code.

Yes, it's just a two cycle constant load, a test modifying the
condition codes, and an easy to predict branch.

But every machine will eat this overhead, even if they don't use
hugepages or don't set the 16GB knob.

I think we can do better, using code patching or similar.

Once the knob is set, you can know for sure that this code path
will never actually be taken.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
@ 2017-07-20 20:04     ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2017-07-20 20:04 UTC (permalink / raw)
  To: nitin.m.gupta
  Cc: mike.kravetz, kirill.shutemov, tom.hromatka, mhocko, mingo, akpm,
	steve.capper, hughd, punit.agrawal, bob.picco, pasha.tatashin,
	steven.sistare, paul.gortmaker, thomas.tai, atish.patra,
	sparclinux, linux-kernel

From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Thu, 13 Jul 2017 14:53:24 -0700

> Testing:
> 
> Tested with the stream benchmark which allocates 48G of
> arrays backed by 16G hugepages and does RW operation on
> them in parallel.

It would be great if we started adding tests under
tools/testing/selftests so that other people can recreate
your tests/benchmarks.

> diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
> index 32258e0..7b240a3 100644
> --- a/arch/sparc/include/asm/tsb.h
> +++ b/arch/sparc/include/asm/tsb.h
> @@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>  	 nop; \
>  699:
>  
> +	/* PUD has been loaded into REG1, interpret the value, seeing
> +	 * if it is a HUGE PUD or a normal one.  If it is not valid
> +	 * then jump to FAIL_LABEL.  If it is a HUGE PUD, and it
> +	 * translates to a valid PTE, branch to PTE_LABEL.
> +	 *
> +	 * We have to propagate bits [32:22] from the virtual address
> +	 * to resolve at 4M granularity.
> +	 */
> +#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
> +	brz,pn		REG1, FAIL_LABEL;		\
> +	 sethi		%uhi(_PAGE_PUD_HUGE), REG2;	\
> +	sllx		REG2, 32, REG2;			\
> +	andcc		REG1, REG2, %g0;		\
> +	be,pt		%xcc, 700f;			\
> +	 sethi		%hi(0x1ffc0000), REG2;		\
> +	sllx		REG2, 1, REG2;			\
> +	brgez,pn	REG1, FAIL_LABEL;		\
> +	 andn		REG1, REG2, REG1;		\
> +	and		VADDR, REG2, REG2;		\
> +	brlz,pt		REG1, PTE_LABEL;		\
> +	 or		REG1, REG2, REG1;		\
> +700:
> +#else
> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
> +	brz,pn		REG1, FAIL_LABEL; \
> +	 nop;
> +#endif
> +
>  	/* PMD has been loaded into REG1, interpret the value, seeing
>  	 * if it is a HUGE PMD or a normal one.  If it is not valid
>  	 * then jump to FAIL_LABEL.  If it is a HUGE PMD, and it
> @@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
>  	andn		REG2, 0x7, REG2; \
>  	ldxa		[REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
> +	USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
>  	brz,pn		REG1, FAIL_LABEL; \
>  	 sllx		VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \

This macro is getting way out of control, every TLB/TSB miss is
going to invoke this sequence of code.

Yes, it's just a two cycle constant load, a test modifying the
condition codes, and an easy to predict branch.

But every machine will eat this overhead, even if they don't use
hugepages or don't set the 16GB knob.

I think we can do better, using code patching or similar.

Once the knob is set, you can know for sure that this code path
will never actually be taken.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
  2017-07-20 20:04     ` David Miller
@ 2017-07-26 18:35       ` Nitin Gupta
  -1 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-26 18:35 UTC (permalink / raw)
  To: David Miller
  Cc: mike.kravetz, kirill.shutemov, tom.hromatka, mhocko, mingo, akpm,
	steve.capper, hughd, punit.agrawal, bob.picco, pasha.tatashin,
	steven.sistare, paul.gortmaker, thomas.tai, atish.patra,
	sparclinux, linux-kernel



On 07/20/2017 01:04 PM, David Miller wrote:
> From: Nitin Gupta <nitin.m.gupta@oracle.com>
> Date: Thu, 13 Jul 2017 14:53:24 -0700
> 
>> Testing:
>>
>> Tested with the stream benchmark which allocates 48G of
>> arrays backed by 16G hugepages and does RW operation on
>> them in parallel.
> 
> It would be great if we started adding tests under
> tools/testing/selftests so that other people can recreate
> your tests/benchmarks.
> 

Yes, I would like to add the stream benchmark to selftests too.
I will check if our internal version of stream can be released.


>> diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
>> index 32258e0..7b240a3 100644
>> --- a/arch/sparc/include/asm/tsb.h
>> +++ b/arch/sparc/include/asm/tsb.h
>> @@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>>  	 nop; \
>>  699:
>>  
>> +	/* PUD has been loaded into REG1, interpret the value, seeing
>> +	 * if it is a HUGE PUD or a normal one.  If it is not valid
>> +	 * then jump to FAIL_LABEL.  If it is a HUGE PUD, and it
>> +	 * translates to a valid PTE, branch to PTE_LABEL.
>> +	 *
>> +	 * We have to propagate bits [32:22] from the virtual address
>> +	 * to resolve at 4M granularity.
>> +	 */
>> +#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
>> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
>> +	brz,pn		REG1, FAIL_LABEL;		\
>> +	 sethi		%uhi(_PAGE_PUD_HUGE), REG2;	\
>> +	sllx		REG2, 32, REG2;			\
>> +	andcc		REG1, REG2, %g0;		\
>> +	be,pt		%xcc, 700f;			\
>> +	 sethi		%hi(0x1ffc0000), REG2;		\
>> +	sllx		REG2, 1, REG2;			\
>> +	brgez,pn	REG1, FAIL_LABEL;		\
>> +	 andn		REG1, REG2, REG1;		\
>> +	and		VADDR, REG2, REG2;		\
>> +	brlz,pt		REG1, PTE_LABEL;		\
>> +	 or		REG1, REG2, REG1;		\
>> +700:
>> +#else
>> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
>> +	brz,pn		REG1, FAIL_LABEL; \
>> +	 nop;
>> +#endif
>> +
>>  	/* PMD has been loaded into REG1, interpret the value, seeing
>>  	 * if it is a HUGE PMD or a normal one.  If it is not valid
>>  	 * then jump to FAIL_LABEL.  If it is a HUGE PMD, and it
>> @@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
>>  	andn		REG2, 0x7, REG2; \
>>  	ldxa		[REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
>> +	USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
>>  	brz,pn		REG1, FAIL_LABEL; \
>>  	 sllx		VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
>>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
> 
> This macro is getting way out of control, every TLB/TSB miss is
> going to invoke this sequence of code.
> 
> Yes, it's just a two cycle constant load, a test modifying the
> condition codes, and an easy to predict branch.
> 
> But every machine will eat this overhead, even if they don't use
> hugepages or don't set the 16GB knob.
> 
> I think we can do better, using code patching or similar.
> 
> Once the knob is set, you can know for sure that this code path
> will never actually be taken.

The simplest way I can think of is to add CONFIG_SPARC_16GB_HUGEPAGE
and exclude PUD check if not enabled.  Would this be okay?

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
@ 2017-07-26 18:35       ` Nitin Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-07-26 18:35 UTC (permalink / raw)
  To: David Miller
  Cc: mike.kravetz, kirill.shutemov, tom.hromatka, mhocko, mingo, akpm,
	steve.capper, hughd, punit.agrawal, bob.picco, pasha.tatashin,
	steven.sistare, paul.gortmaker, thomas.tai, atish.patra,
	sparclinux, linux-kernel



On 07/20/2017 01:04 PM, David Miller wrote:
> From: Nitin Gupta <nitin.m.gupta@oracle.com>
> Date: Thu, 13 Jul 2017 14:53:24 -0700
> 
>> Testing:
>>
>> Tested with the stream benchmark which allocates 48G of
>> arrays backed by 16G hugepages and does RW operation on
>> them in parallel.
> 
> It would be great if we started adding tests under
> tools/testing/selftests so that other people can recreate
> your tests/benchmarks.
> 

Yes, I would like to add the stream benchmark to selftests too.
I will check if our internal version of stream can be released.


>> diff --git a/arch/sparc/include/asm/tsb.h b/arch/sparc/include/asm/tsb.h
>> index 32258e0..7b240a3 100644
>> --- a/arch/sparc/include/asm/tsb.h
>> +++ b/arch/sparc/include/asm/tsb.h
>> @@ -195,6 +195,35 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>>  	 nop; \
>>  699:
>>  
>> +	/* PUD has been loaded into REG1, interpret the value, seeing
>> +	 * if it is a HUGE PUD or a normal one.  If it is not valid
>> +	 * then jump to FAIL_LABEL.  If it is a HUGE PUD, and it
>> +	 * translates to a valid PTE, branch to PTE_LABEL.
>> +	 *
>> +	 * We have to propagate bits [32:22] from the virtual address
>> +	 * to resolve at 4M granularity.
>> +	 */
>> +#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
>> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
>> +	brz,pn		REG1, FAIL_LABEL;		\
>> +	 sethi		%uhi(_PAGE_PUD_HUGE), REG2;	\
>> +	sllx		REG2, 32, REG2;			\
>> +	andcc		REG1, REG2, %g0;		\
>> +	be,pt		%xcc, 700f;			\
>> +	 sethi		%hi(0x1ffc0000), REG2;		\
>> +	sllx		REG2, 1, REG2;			\
>> +	brgez,pn	REG1, FAIL_LABEL;		\
>> +	 andn		REG1, REG2, REG1;		\
>> +	and		VADDR, REG2, REG2;		\
>> +	brlz,pt		REG1, PTE_LABEL;		\
>> +	 or		REG1, REG2, REG1;		\
>> +700:
>> +#else
>> +#define USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, PTE_LABEL) \
>> +	brz,pn		REG1, FAIL_LABEL; \
>> +	 nop;
>> +#endif
>> +
>>  	/* PMD has been loaded into REG1, interpret the value, seeing
>>  	 * if it is a HUGE PMD or a normal one.  If it is not valid
>>  	 * then jump to FAIL_LABEL.  If it is a HUGE PMD, and it
>> @@ -242,6 +271,7 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
>>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
>>  	andn		REG2, 0x7, REG2; \
>>  	ldxa		[REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
>> +	USER_PGTABLE_CHECK_PUD_HUGE(VADDR, REG1, REG2, FAIL_LABEL, 800f) \
>>  	brz,pn		REG1, FAIL_LABEL; \
>>  	 sllx		VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
>>  	srlx		REG2, 64 - PAGE_SHIFT, REG2; \
> 
> This macro is getting way out of control, every TLB/TSB miss is
> going to invoke this sequence of code.
> 
> Yes, it's just a two cycle constant load, a test modifying the
> condition codes, and an easy to predict branch.
> 
> But every machine will eat this overhead, even if they don't use
> hugepages or don't set the 16GB knob.
> 
> I think we can do better, using code patching or similar.
> 
> Once the knob is set, you can know for sure that this code path
> will never actually be taken.

The simplest way I can think of is to add CONFIG_SPARC_16GB_HUGEPAGE
and exclude PUD check if not enabled.  Would this be okay?

Thanks,
Nitin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
  2017-07-26 18:35       ` Nitin Gupta
@ 2017-07-26 20:10         ` David Miller
  -1 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2017-07-26 20:10 UTC (permalink / raw)
  To: nitin.m.gupta
  Cc: mike.kravetz, kirill.shutemov, tom.hromatka, mhocko, mingo, akpm,
	steve.capper, hughd, punit.agrawal, bob.picco, pasha.tatashin,
	steven.sistare, paul.gortmaker, thomas.tai, atish.patra,
	sparclinux, linux-kernel

From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Wed, 26 Jul 2017 11:35:28 -0700

> 
> 
> On 07/20/2017 01:04 PM, David Miller wrote:
>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>> Date: Thu, 13 Jul 2017 14:53:24 -0700
>> 
>>> Testing:
>>>
>>> Tested with the stream benchmark which allocates 48G of
>>> arrays backed by 16G hugepages and does RW operation on
>>> them in parallel.
>> 
>> It would be great if we started adding tests under
>> tools/testing/selftests so that other people can recreate
>> your tests/benchmarks.
>> 
> 
> Yes, I would like to add the stream benchmark to selftests too.
> I will check if our internal version of stream can be released.

That would be great.

>> This macro is getting way out of control, every TLB/TSB miss is
>> going to invoke this sequence of code.
>> 
>> Yes, it's just a two cycle constant load, a test modifying the
>> condition codes, and an easy to predict branch.
>> 
>> But every machine will eat this overhead, even if they don't use
>> hugepages or don't set the 16GB knob.
>> 
>> I think we can do better, using code patching or similar.
>> 
>> Once the knob is set, you can know for sure that this code path
>> will never actually be taken.
> 
> The simplest way I can think of is to add CONFIG_SPARC_16GB_HUGEPAGE
> and exclude PUD check if not enabled.  Would this be okay?

I am saying above to do a run-time code patch.

Kconfig knobs are completely pointless in this kind of situation
since every distribution is going to turn the thing on so essentially
all real users eat the overhead if you do it the Kconfig way.

So do a run-time code patch instead, thank you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
@ 2017-07-26 20:10         ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2017-07-26 20:10 UTC (permalink / raw)
  To: nitin.m.gupta
  Cc: mike.kravetz, kirill.shutemov, tom.hromatka, mhocko, mingo, akpm,
	steve.capper, hughd, punit.agrawal, bob.picco, pasha.tatashin,
	steven.sistare, paul.gortmaker, thomas.tai, atish.patra,
	sparclinux, linux-kernel

From: Nitin Gupta <nitin.m.gupta@oracle.com>
Date: Wed, 26 Jul 2017 11:35:28 -0700

> 
> 
> On 07/20/2017 01:04 PM, David Miller wrote:
>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>> Date: Thu, 13 Jul 2017 14:53:24 -0700
>> 
>>> Testing:
>>>
>>> Tested with the stream benchmark which allocates 48G of
>>> arrays backed by 16G hugepages and does RW operation on
>>> them in parallel.
>> 
>> It would be great if we started adding tests under
>> tools/testing/selftests so that other people can recreate
>> your tests/benchmarks.
>> 
> 
> Yes, I would like to add the stream benchmark to selftests too.
> I will check if our internal version of stream can be released.

That would be great.

>> This macro is getting way out of control, every TLB/TSB miss is
>> going to invoke this sequence of code.
>> 
>> Yes, it's just a two cycle constant load, a test modifying the
>> condition codes, and an easy to predict branch.
>> 
>> But every machine will eat this overhead, even if they don't use
>> hugepages or don't set the 16GB knob.
>> 
>> I think we can do better, using code patching or similar.
>> 
>> Once the knob is set, you can know for sure that this code path
>> will never actually be taken.
> 
> The simplest way I can think of is to add CONFIG_SPARC_16GB_HUGEPAGE
> and exclude PUD check if not enabled.  Would this be okay?

I am saying above to do a run-time code patch.

Kconfig knobs are completely pointless in this kind of situation
since every distribution is going to turn the thing on so essentially
all real users eat the overhead if you do it the Kconfig way.

So do a run-time code patch instead, thank you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
  2017-07-13 21:53   ` Nitin Gupta
  (?)
  (?)
@ 2017-08-12  0:28   ` Xose Vazquez Perez
  -1 siblings, 0 replies; 15+ messages in thread
From: Xose Vazquez Perez @ 2017-08-12  0:28 UTC (permalink / raw)
  To: sparclinux

Nitin Gupta wrote:

> On 07/20/2017 01:04 PM, David Miller wrote:
>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>> Date: Thu, 13 Jul 2017 14:53:24 -0700
>> 
>>> Testing:
>>>
>>> Tested with the stream benchmark which allocates 48G of
>>> arrays backed by 16G hugepages and does RW operation on
>>> them in parallel.
>> 
>> It would be great if we started adding tests under
>> tools/testing/selftests so that other people can recreate
>> your tests/benchmarks.
>> 
> 
> Yes, I would like to add the stream benchmark to selftests too.
> I will check if our internal version of stream can be released.

STREAM's $HOME is at: https://www.cs.virginia.edu/stream/
C and Fortran implementations: https://www.cs.virginia.edu/stream/FTP/Code/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sparc64: Add 16GB hugepage support
  2017-07-13 21:53   ` Nitin Gupta
                     ` (2 preceding siblings ...)
  (?)
@ 2017-08-12  2:50   ` Nitin Gupta
  -1 siblings, 0 replies; 15+ messages in thread
From: Nitin Gupta @ 2017-08-12  2:50 UTC (permalink / raw)
  To: sparclinux

On 8/11/17 5:28 PM, Xose Vazquez Perez wrote:
> Nitin Gupta wrote:
> 
>> On 07/20/2017 01:04 PM, David Miller wrote:
>>> From: Nitin Gupta <nitin.m.gupta@oracle.com>
>>> Date: Thu, 13 Jul 2017 14:53:24 -0700
>>>
>>>> Testing:
>>>>
>>>> Tested with the stream benchmark which allocates 48G of
>>>> arrays backed by 16G hugepages and does RW operation on
>>>> them in parallel.
>>>
>>> It would be great if we started adding tests under
>>> tools/testing/selftests so that other people can recreate
>>> your tests/benchmarks.
>>>
>>
>> Yes, I would like to add the stream benchmark to selftests too.
>> I will check if our internal version of stream can be released.
> 
> STREAM's $HOME is at: https://www.cs.virginia.edu/stream/
> C and Fortran implementations: https://www.cs.virginia.edu/stream/FTP/Code/

Looking at:
https://www.cs.virginia.edu/stream/FTP/Code/stream.c

I see that arrays are statically allocated. To be useful for hugepage
testing, it should probably be converted to use malloc/mmap for these
array allocations.  This is a small easy change.  However, I'm not sure
if publication restrictions (see Item 3 in source header) would be
acceptable to the Linux kernel which is a call that I can't make.

Nitin



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-08-12  2:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-13 21:53 [PATCH 0/3] Add 16GB hugepage support Nitin Gupta
2017-07-13 21:53 ` [PATCH 1/3] sparc64: Support huge PUD case in get_user_pages Nitin Gupta
2017-07-13 21:53   ` Nitin Gupta
2017-07-13 21:53 ` [PATCH 2/3] sparc64: Add 16GB hugepage support Nitin Gupta
2017-07-13 21:53   ` Nitin Gupta
2017-07-20 20:04   ` David Miller
2017-07-20 20:04     ` David Miller
2017-07-26 18:35     ` Nitin Gupta
2017-07-26 18:35       ` Nitin Gupta
2017-07-26 20:10       ` David Miller
2017-07-26 20:10         ` David Miller
2017-08-12  0:28   ` Xose Vazquez Perez
2017-08-12  2:50   ` Nitin Gupta
2017-07-13 21:53 ` [PATCH 3/3] sparc64: Cleanup hugepage table walk functions Nitin Gupta
2017-07-13 21:53   ` Nitin Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.