linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	"David S . Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 4.4 48/65] sparc64 mm: Fix more TSB sizing issues
Date: Thu, 25 Oct 2018 10:16:48 -0400	[thread overview]
Message-ID: <20181025141705.213937-48-sashal@kernel.org> (raw)
In-Reply-To: <20181025141705.213937-1-sashal@kernel.org>

From: Mike Kravetz <mike.kravetz@oracle.com>

[ Upstream commit 1e953d846ac015fbfcf09c857e8f893924cb629c ]

Commit af1b1a9b36b8 ("sparc64 mm: Fix base TSB sizing when hugetlb
pages are used") addressed the difference between hugetlb and THP
pages when computing TSB sizes.  The following additional issues
were also discovered while working with the code.

In order to save memory, THP makes use of a huge zero page.  This huge
zero page does not count against a task's RSS, but it does consume TSB
entries.  This is similar to hugetlb pages.  Therefore, count huge
zero page entries in hugetlb_pte_count.

Accounting of THP pages is done in the routine set_pmd_at().
Unfortunately, this does not catch the case where a THP page is split.
To handle this case, decrement the count in pmdp_invalidate().
pmdp_invalidate is only called when splitting a THP.  However, 'sanity
checks' are added in case it is ever called for other purposes.

A more general issue exists with HPAGE_SIZE accounting.
hugetlb_pte_count tracks the number of HPAGE_SIZE (8M) pages.  This
value is used to size the TSB for HPAGE_SIZE pages.  However,
each HPAGE_SIZE page consists of two REAL_HPAGE_SIZE (4M) pages.
The TSB contains an entry for each REAL_HPAGE_SIZE page.  Therefore,
the number of REAL_HPAGE_SIZE pages should be used to size the huge
page TSB.  A new compile time constant REAL_HPAGE_PER_HPAGE is used
to multiply hugetlb_pte_count before sizing the TSB.

Changes from V1
- Fixed build issue if hugetlb or THP not configured

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/sparc/include/asm/page_64.h |  1 +
 arch/sparc/mm/fault_64.c         |  1 +
 arch/sparc/mm/tlb.c              | 35 ++++++++++++++++++++++++++++----
 arch/sparc/mm/tsb.c              | 18 ++++++++++------
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/arch/sparc/include/asm/page_64.h b/arch/sparc/include/asm/page_64.h
index 8c2a8c937540..c1263fc390db 100644
--- a/arch/sparc/include/asm/page_64.h
+++ b/arch/sparc/include/asm/page_64.h
@@ -25,6 +25,7 @@
 #define HPAGE_MASK		(~(HPAGE_SIZE - 1UL))
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
 #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
+#define REAL_HPAGE_PER_HPAGE	(_AC(1,UL) << (HPAGE_SHIFT - REAL_HPAGE_SHIFT))
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index e15f33715103..b01ec72522cb 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -487,6 +487,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
 		tsb_grow(mm, MM_TSB_BASE, mm_rss);
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
 	mm_rss = mm->context.hugetlb_pte_count + mm->context.thp_pte_count;
+	mm_rss *= REAL_HPAGE_PER_HPAGE;
 	if (unlikely(mm_rss >
 		     mm->context.tsb_block[MM_TSB_HUGE].tsb_rss_limit)) {
 		if (mm->context.tsb_block[MM_TSB_HUGE].tsb)
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 3659d37b4d81..c56a195c9071 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -174,10 +174,25 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 		return;
 
 	if ((pmd_val(pmd) ^ pmd_val(orig)) & _PAGE_PMD_HUGE) {
-		if (pmd_val(pmd) & _PAGE_PMD_HUGE)
-			mm->context.thp_pte_count++;
-		else
-			mm->context.thp_pte_count--;
+		/*
+		 * Note that this routine only sets pmds for THP pages.
+		 * Hugetlb pages are handled elsewhere.  We need to check
+		 * for huge zero page.  Huge zero pages are like hugetlb
+		 * pages in that there is no RSS, but there is the need
+		 * for TSB entries.  So, huge zero page counts go into
+		 * hugetlb_pte_count.
+		 */
+		if (pmd_val(pmd) & _PAGE_PMD_HUGE) {
+			if (is_huge_zero_page(pmd_page(pmd)))
+				mm->context.hugetlb_pte_count++;
+			else
+				mm->context.thp_pte_count++;
+		} else {
+			if (is_huge_zero_page(pmd_page(orig)))
+				mm->context.hugetlb_pte_count--;
+			else
+				mm->context.thp_pte_count--;
+		}
 
 		/* Do not try to allocate the TSB hash table if we
 		 * don't have one already.  We have various locks held
@@ -204,6 +219,9 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+/*
+ * This routine is only called when splitting a THP
+ */
 void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 		     pmd_t *pmdp)
 {
@@ -213,6 +231,15 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 
 	set_pmd_at(vma->vm_mm, address, pmdp, entry);
 	flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
+
+	/*
+	 * set_pmd_at() will not be called in a way to decrement
+	 * thp_pte_count when splitting a THP, so do it now.
+	 * Sanity check pmd before doing the actual decrement.
+	 */
+	if ((pmd_val(entry) & _PAGE_PMD_HUGE) &&
+	    !is_huge_zero_page(pmd_page(entry)))
+		(vma->vm_mm)->context.thp_pte_count--;
 }
 
 void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
diff --git a/arch/sparc/mm/tsb.c b/arch/sparc/mm/tsb.c
index 266411291634..84cd593117a6 100644
--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -489,8 +489,10 @@ void tsb_grow(struct mm_struct *mm, unsigned long tsb_index, unsigned long rss)
 
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+	unsigned long mm_rss = get_mm_rss(mm);
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-	unsigned long total_huge_pte_count;
+	unsigned long saved_hugetlb_pte_count;
+	unsigned long saved_thp_pte_count;
 #endif
 	unsigned int i;
 
@@ -503,10 +505,12 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 	 * will re-increment the counters as the parent PTEs are
 	 * copied into the child address space.
 	 */
-	total_huge_pte_count = mm->context.hugetlb_pte_count +
-			 mm->context.thp_pte_count;
+	saved_hugetlb_pte_count = mm->context.hugetlb_pte_count;
+	saved_thp_pte_count = mm->context.thp_pte_count;
 	mm->context.hugetlb_pte_count = 0;
 	mm->context.thp_pte_count = 0;
+
+	mm_rss -= saved_thp_pte_count * (HPAGE_SIZE / PAGE_SIZE);
 #endif
 
 	/* copy_mm() copies over the parent's mm_struct before calling
@@ -519,11 +523,13 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 	/* If this is fork, inherit the parent's TSB size.  We would
 	 * grow it to that size on the first page fault anyways.
 	 */
-	tsb_grow(mm, MM_TSB_BASE, get_mm_rss(mm));
+	tsb_grow(mm, MM_TSB_BASE, mm_rss);
 
 #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
-	if (unlikely(total_huge_pte_count))
-		tsb_grow(mm, MM_TSB_HUGE, total_huge_pte_count);
+	if (unlikely(saved_hugetlb_pte_count + saved_thp_pte_count))
+		tsb_grow(mm, MM_TSB_HUGE,
+			 (saved_hugetlb_pte_count + saved_thp_pte_count) *
+			 REAL_HPAGE_PER_HPAGE);
 #endif
 
 	if (unlikely(!mm->context.tsb_block[MM_TSB_BASE].tsb))
-- 
2.17.1


  parent reply	other threads:[~2018-10-25 14:32 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-25 14:16 [PATCH AUTOSEL 4.4 01/65] KEYS: put keyring if install_session_keyring_to_cred() fails Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 02/65] ipv6: suppress sparse warnings in IP6_ECN_set_ce() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 03/65] net: drop write-only stack variable Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 04/65] ser_gigaset: use container_of() instead of detour Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 05/65] tracing: Skip more functions when doing stack tracing of events Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 06/65] ARM: dts: apq8064: add ahci ports-implemented mask Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 07/65] x86/mm/pat: Prevent hang during boot when mapping pages Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 08/65] btrfs: cleaner_kthread() doesn't need explicit freeze Sasha Levin
2018-10-25 15:07   ` David Sterba
2018-10-25 20:07     ` Sasha Levin
2018-10-26  6:58       ` Jiri Kosina
2018-10-26 10:57         ` Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 09/65] radix-tree: fix radix_tree_iter_retry() for tagged iterators Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 10/65] af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 11/65] net/mlx4_en: Resolve dividing by zero in 32-bit system Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 12/65] ipv6: orphan skbs in reassembly unit Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 13/65] um: Avoid longjmp/setjmp symbol clashes with libpthread.a Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 14/65] sched/cgroup: Fix cgroup entity load tracking tear-down Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 15/65] btrfs: don't create or leak aliased root while cleaning up orphans Sasha Levin
2018-10-25 15:12   ` David Sterba
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 16/65] thermal: allow spear-thermal driver to be a module Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 17/65] thermal: allow u8500-thermal " Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 18/65] tpm: fix: return rc when devm_add_action() fails Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 19/65] x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 20/65] aacraid: Start adapter after updating number of MSIX vectors Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 21/65] perf/core: Don't leak event in the syscall error path Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 22/65] [media] usbvision: revert commit 588afcc1 Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 23/65] MIPS: Fix FCSR Cause bit handling for correct SIGFPE issue Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 24/65] ASoC: ak4613: Enable cache usage to fix crashes on resume Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 25/65] ASoC: wm8940: " Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 26/65] CIFS: handle guest access errors to Windows shares Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 27/65] arm64: Fix potential race with hardware DBM in ptep_set_access_flags() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 28/65] xfrm: Clear sk_dst_cache when applying per-socket policy Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 29/65] scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 30/65] sparc/pci: Refactor dev_archdata initialization into pci_init_dev_archdata Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 31/65] sch_red: update backlog as well Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 32/65] usb-storage: fix bogus hardware error messages for ATA pass-thru devices Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 33/65] bpf: generally move prog destruction to RCU deferral Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 34/65] drm/nouveau/fbcon: fix oops without fbdev emulation Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 35/65] fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 36/65] ixgbevf: Fix handling of NAPI budget when multiple queues are enabled per vector Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 37/65] net/mlx5e: Fix LRO modify Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 38/65] net/mlx5e: Correctly handle RSS indirection table when changing number of channels Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 39/65] ixgbe: fix RSS limit for X550 Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 40/65] ixgbe: Correct X550EM_x revision check Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 41/65] ALSA: timer: Fix zero-division by continue of uninitialized instance Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 42/65] vti6: flush x-netns xfrm cache when vti interface is removed Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 43/65] gro: Allow tunnel stacking in the case of FOU/GUE Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 44/65] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 45/65] l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 46/65] tty: serial: sprd: fix error return code in sprd_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 47/65] video: fbdev: pxa3xx_gcu: fix error return code in pxa3xx_gcu_probe() Sasha Levin
2018-10-25 14:16 ` Sasha Levin [this message]
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 49/65] gpu: host1x: fix error return code in host1x_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 50/65] sparc64: Fix exception handling in UltraSPARC-III memcpy Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 51/65] gpio: msic: fix error return code in platform_msic_gpio_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 52/65] usb: imx21-hcd: fix error return code in imx21_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 53/65] usb: ehci-omap: fix error return code in ehci_hcd_omap_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 54/65] usb: dwc3: omap: fix error return code in dwc3_omap_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 55/65] spi/bcm63xx-hspi: fix error return code in bcm63xx_hsspi_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 56/65] MIPS: Handle non word sized instructions when examining frame Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 57/65] spi/bcm63xx: fix error return code in bcm63xx_spi_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 58/65] spi: xlp: fix error return code in xlp_spi_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 59/65] ASoC: spear: fix error return code in spdif_in_probe() Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 60/65] PM / devfreq: tegra: fix error return code in tegra_devfreq_probe() Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 61/65] bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 62/65] scsi: aacraid: Fix typo in blink status Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 63/65] MIPS: microMIPS: Fix decoding of swsp16 instruction Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 64/65] igb: Remove superfluous reset to PHY and page 0 selection Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 65/65] MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181025141705.213937-48-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).