linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Saravanan D <saravanand@fb.com>
Cc: x86@kernel.org, dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, corbet@lwn.net,
	linux-kernel@vger.kernel.org, kernel-team@fb.com,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	Song Liu <songliubraving@fb.com>
Subject: Re: [PATCH V4] x86/mm: Tracking linear mapping split events
Date: Thu, 28 Jan 2021 04:51:53 +0000	[thread overview]
Message-ID: <20210128045153.GW308988@casper.infradead.org> (raw)
In-Reply-To: <20210128043547.1560435-1-saravanand@fb.com>

You forgot to cc linux-mm.  Adding.  Also I think you should be cc'ing
Song.

On Wed, Jan 27, 2021 at 08:35:47PM -0800, Saravanan D wrote:
> To help with debugging the sluggishness caused by TLB miss/reload,
> we introduce monotonic lifetime hugepage split event counts since
> system state: SYSTEM_RUNNING to be displayed as part of
> /proc/vmstat in x86 servers
> 
> The lifetime split event information will be displayed at the bottom of
> /proc/vmstat
> ....
> swap_ra 0
> swap_ra_hit 0
> direct_map_level2_splits 94
> direct_map_level3_splits 4
> nr_unstable 0
> ....
> 
> One of the many lasting (as we don't coalesce back) sources for huge page
> splits is tracing as the granular page attribute/permission changes would
> force the kernel to split code segments mapped to huge pages to smaller
> ones thereby increasing the probability of TLB miss/reload even after
> tracing has been stopped.

Are you talking about kernel text here or application text?

In either case, I don't know why you're saying we don't coalesce
back after tracing is disabled.  I was under the impression we did
(either actively in the case of the kernel or via khugepaged for
user text).

> Documentation regarding linear mapping split events added to admin-guide
> as requested in V3 of the patch.
> 
> Signed-off-by: Saravanan D <saravanand@fb.com>
> ---
>  .../admin-guide/mm/direct_mapping_splits.rst  | 59 +++++++++++++++++++
>  Documentation/admin-guide/mm/index.rst        |  1 +
>  arch/x86/mm/pat/set_memory.c                  | 13 ++++
>  include/linux/vm_event_item.h                 |  4 ++
>  mm/vmstat.c                                   |  4 ++
>  5 files changed, 81 insertions(+)
>  create mode 100644 Documentation/admin-guide/mm/direct_mapping_splits.rst
> 
> diff --git a/Documentation/admin-guide/mm/direct_mapping_splits.rst b/Documentation/admin-guide/mm/direct_mapping_splits.rst
> new file mode 100644
> index 000000000000..298751391deb
> --- /dev/null
> +++ b/Documentation/admin-guide/mm/direct_mapping_splits.rst
> @@ -0,0 +1,59 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Direct Mapping Splits
> +=====================
> +
> +Kernel maps all of physical memory in linear/direct mapped pages with
> +translation of virtual kernel address to physical address is achieved
> +through a simple subtraction of offset. CPUs maintain a cache of these
> +translations on fast caches called TLBs. CPU architectures like x86 allow
> +direct mapping large portions of memory into hugepages (2M, 1G, etc) in
> +various page table levels.
> +
> +Maintaining huge direct mapped pages greatly reduces TLB miss pressure.
> +The splintering of huge direct pages into smaller ones does result in
> +a measurable performance hit caused by frequent TLB miss and reloads.
> +
> +One of the many lasting (as we don't coalesce back) sources for huge page
> +splits is tracing as the granular page attribute/permission changes would
> +force the kernel to split code segments mapped to hugepages to smaller
> +ones thus increasing the probability of TLB miss/reloads even after
> +tracing has been stopped.
> +
> +On x86 systems, we can track the splitting of huge direct mapped pages
> +through lifetime event counters in ``/proc/vmstat``
> +
> +	direct_map_level2_splits xxx
> +	direct_map_level3_splits yyy
> +
> +where:
> +
> +direct_map_level2_splits
> +	are 2M/4M hugepage split events
> +direct_map_level3_splits
> +	are 1G hugepage split events
> +
> +The distribution of direct mapped system memory in various page sizes
> +post splits can be viewed through ``/proc/meminfo`` whose output
> +will include the following lines depending upon supporting CPU
> +architecture
> +
> +	DirectMap4k:    xxxxx kB
> +	DirectMap2M:    yyyyy kB
> +	DirectMap1G:    zzzzz kB
> +
> +where:
> +
> +DirectMap4k
> +	is the total amount of direct mapped memory (in kB)
> +	accessed through 4k pages
> +DirectMap2M
> +	is the total amount of direct mapped memory (in kB)
> +	accessed through 2M pages
> +DirectMap1G
> +	is the total amount of direct mapped memory (in kB)
> +	accessed through 1G pages
> +
> +
> +-- Saravanan D, Jan 27, 2021
> diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
> index 4b14d8b50e9e..9439780f3f07 100644
> --- a/Documentation/admin-guide/mm/index.rst
> +++ b/Documentation/admin-guide/mm/index.rst
> @@ -38,3 +38,4 @@ the Linux memory management.
>     soft-dirty
>     transhuge
>     userfaultfd
> +   direct_mapping_splits
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 16f878c26667..767cade53bdc 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -16,6 +16,8 @@
>  #include <linux/pci.h>
>  #include <linux/vmalloc.h>
>  #include <linux/libnvdimm.h>
> +#include <linux/vmstat.h>
> +#include <linux/kernel.h>
>  
>  #include <asm/e820/api.h>
>  #include <asm/processor.h>
> @@ -85,12 +87,23 @@ void update_page_count(int level, unsigned long pages)
>  	spin_unlock(&pgd_lock);
>  }
>  
> +void update_split_page_event_count(int level)
> +{
> +	if (system_state == SYSTEM_RUNNING) {
> +		if (level == PG_LEVEL_2M)
> +			count_vm_event(DIRECT_MAP_LEVEL2_SPLIT);
> +		else if (level == PG_LEVEL_1G)
> +			count_vm_event(DIRECT_MAP_LEVEL3_SPLIT);
> +	}
> +}
> +
>  static void split_page_count(int level)
>  {
>  	if (direct_pages_count[level] == 0)
>  		return;
>  
>  	direct_pages_count[level]--;
> +	update_split_page_event_count(level);
>  	direct_pages_count[level - 1] += PTRS_PER_PTE;
>  }
>  
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 18e75974d4e3..7c06c2bdc33b 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -120,6 +120,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  #ifdef CONFIG_SWAP
>  		SWAP_RA,
>  		SWAP_RA_HIT,
> +#endif
> +#ifdef CONFIG_X86
> +		DIRECT_MAP_LEVEL2_SPLIT,
> +		DIRECT_MAP_LEVEL3_SPLIT,
>  #endif
>  		NR_VM_EVENT_ITEMS
>  };
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index f8942160fc95..a43ac4ac98a2 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1350,6 +1350,10 @@ const char * const vmstat_text[] = {
>  	"swap_ra",
>  	"swap_ra_hit",
>  #endif
> +#ifdef CONFIG_X86
> +	"direct_map_level2_splits",
> +	"direct_map_level3_splits",
> +#endif
>  #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
>  };
>  #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
> -- 
> 2.24.1
> 

  reply	other threads:[~2021-01-28  4:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BYAPR01MB40856478D5BE74CB6A7D5578CFBD9@BYAPR01MB4085.prod.exchangelabs.com>
2021-01-25 20:15 ` [PATCH] x86/mm: Tracking linear mapping split events since boot Dave Hansen
2021-01-25 20:32   ` Tejun Heo
2021-01-26  0:47     ` Dave Hansen
2021-01-26  0:53       ` Tejun Heo
2021-01-26  1:04         ` Dave Hansen
2021-01-26  1:17           ` Tejun Heo
2021-01-27 17:51           ` [PATCH V2] x86/mm: Tracking linear mapping split events Saravanan D
2021-01-27 21:03             ` Tejun Heo
2021-01-27 21:32               ` Dave Hansen
2021-01-27 21:36                 ` Tejun Heo
2021-01-27 21:42                   ` Saravanan D
2021-01-27 22:50                   ` [PATCH V3] " Saravanan D
2021-01-27 23:00                     ` Randy Dunlap
2021-01-27 23:56                       ` Saravanan D
2021-01-27 23:41                     ` Dave Hansen
2021-01-28  0:15                       ` Saravanan D
2021-01-28  4:35                       ` [PATCH V4] " Saravanan D
2021-01-28  4:51                         ` Matthew Wilcox [this message]
     [not found]                           ` <20210128104934.2916679-1-saravanand@fb.com>
2021-01-28 15:04                             ` [PATCH V5] " Matthew Wilcox
2021-01-28 19:49                               ` Saravanan D
2021-01-28 16:33                             ` Zi Yan
2021-01-28 16:41                               ` Dave Hansen
2021-01-28 16:56                                 ` Zi Yan
2021-01-28 16:59                               ` Song Liu
     [not found]                             ` <3aec2d10-f4c3-d07a-356f-6f1001679181@intel.com>
2021-01-28 21:20                               ` Saravanan D
     [not found]                                 ` <20210128233430.1460964-1-saravanand@fb.com>
2021-01-28 23:41                                   ` [PATCH V6] " Tejun Heo
2021-01-29 19:27                                   ` Johannes Weiner
2021-02-08 23:17                                     ` Saravanan D
2021-02-08 23:30                                   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210128045153.GW308988@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=saravanand@fb.com \
    --cc=songliubraving@fb.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).