linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V6] x86/mm: Tracking linear mapping split events
@ 2021-02-18 23:57 Saravanan D
  2021-03-01 22:43 ` Tejun Heo
  2021-03-06  0:57 ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Saravanan D @ 2021-02-18 23:57 UTC (permalink / raw)
  To: akpm, mingo, x86
  Cc: dave.hansen, tj, hannes, linux-kernel, kernel-team, Saravanan D

To help with debugging the sluggishness caused by TLB miss/reload,
we introduce monotonic hugepage [direct mapped] split event counts since
system state: SYSTEM_RUNNING to be displayed as part of
/proc/vmstat in x86 servers

The lifetime split event information will be displayed at the bottom of
/proc/vmstat
....
swap_ra 0
swap_ra_hit 0
direct_map_level2_splits 94
direct_map_level3_splits 4
nr_unstable 0
....

One of the many lasting sources of direct hugepage splits is kernel
tracing (kprobes, tracepoints).

Note that the kernel's code segment [512 MB] points to the same
physical addresses that have been already mapped in the kernel's
direct mapping range.

Source : Documentation/x86/x86_64/mm.rst

When we enable kernel tracing, the kernel has to modify
attributes/permissions
of the text segment hugepages that are direct mapped causing them to
split.

Kernel's direct mapped hugepages do not coalesce back after split and
remain in place for the remainder of the lifetime.

An instance of direct page splits when we turn on
dynamic kernel tracing
....
cat /proc/vmstat | grep -i direct_map_level
direct_map_level2_splits 784
direct_map_level3_splits 12
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ [pid, comm] =
count(); }'
cat /proc/vmstat | grep -i
direct_map_level
direct_map_level2_splits 789
direct_map_level3_splits 12
....

Signed-off-by: Saravanan D <saravanand@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
This patch has been acked and can be routed through either x86 or -mm
Please let me know if there's anything needed. Thanks.
---
 arch/x86/mm/pat/set_memory.c  | 8 ++++++++
 include/linux/vm_event_item.h | 4 ++++
 mm/vmstat.c                   | 4 ++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..a7b3c5f1d316 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -16,6 +16,8 @@
 #include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/libnvdimm.h>
+#include <linux/vmstat.h>
+#include <linux/kernel.h>
 
 #include <asm/e820/api.h>
 #include <asm/processor.h>
@@ -91,6 +93,12 @@ static void split_page_count(int level)
 		return;
 
 	direct_pages_count[level]--;
+	if (system_state == SYSTEM_RUNNING) {
+		if (level == PG_LEVEL_2M)
+			count_vm_event(DIRECT_MAP_LEVEL2_SPLIT);
+		else if (level == PG_LEVEL_1G)
+			count_vm_event(DIRECT_MAP_LEVEL3_SPLIT);
+	}
 	direct_pages_count[level - 1] += PTRS_PER_PTE;
 }
 
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 18e75974d4e3..7c06c2bdc33b 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -120,6 +120,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_SWAP
 		SWAP_RA,
 		SWAP_RA_HIT,
+#endif
+#ifdef CONFIG_X86
+		DIRECT_MAP_LEVEL2_SPLIT,
+		DIRECT_MAP_LEVEL3_SPLIT,
 #endif
 		NR_VM_EVENT_ITEMS
 };
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f8942160fc95..a43ac4ac98a2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1350,6 +1350,10 @@ const char * const vmstat_text[] = {
 	"swap_ra",
 	"swap_ra_hit",
 #endif
+#ifdef CONFIG_X86
+	"direct_map_level2_splits",
+	"direct_map_level3_splits",
+#endif
 #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */
 };
 #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
  2021-02-18 23:57 [PATCH V6] x86/mm: Tracking linear mapping split events Saravanan D
@ 2021-03-01 22:43 ` Tejun Heo
  2021-03-06  0:57 ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2021-03-01 22:43 UTC (permalink / raw)
  To: Saravanan D
  Cc: akpm, mingo, x86, dave.hansen, hannes, linux-kernel, kernel-team

Hello,

On Thu, Feb 18, 2021 at 03:57:44PM -0800, Saravanan D wrote:
> To help with debugging the sluggishness caused by TLB miss/reload,
> we introduce monotonic hugepage [direct mapped] split event counts since
> system state: SYSTEM_RUNNING to be displayed as part of
> /proc/vmstat in x86 servers
...
> Signed-off-by: Saravanan D <saravanand@fb.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

Andrew, do you mind picking this one up? It has enough acks and can go
through either mm or x86 tree.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
  2021-02-18 23:57 [PATCH V6] x86/mm: Tracking linear mapping split events Saravanan D
  2021-03-01 22:43 ` Tejun Heo
@ 2021-03-06  0:57 ` Andrew Morton
  2021-03-08 15:06   ` Johannes Weiner
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2021-03-06  0:57 UTC (permalink / raw)
  To: Saravanan D
  Cc: mingo, x86, dave.hansen, tj, hannes, linux-kernel, kernel-team

On Thu, 18 Feb 2021 15:57:44 -0800 Saravanan D <saravanand@fb.com> wrote:

> To help with debugging the sluggishness caused by TLB miss/reload,
> we introduce monotonic hugepage [direct mapped] split event counts since
> system state: SYSTEM_RUNNING to be displayed as part of
> /proc/vmstat in x86 servers
>
> ...
>
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -120,6 +120,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  #ifdef CONFIG_SWAP
>  		SWAP_RA,
>  		SWAP_RA_HIT,
> +#endif
> +#ifdef CONFIG_X86
> +		DIRECT_MAP_LEVEL2_SPLIT,
> +		DIRECT_MAP_LEVEL3_SPLIT,
>  #endif
>  		NR_VM_EVENT_ITEMS
>  };

This is the first appearance of arch-specific fields in /proc/vmstat.

I don't really see a problem with this - vmstat is basically a dumping
ground of random developer stuff.  But was this the best place in which
to present this data?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
  2021-03-06  0:57 ` Andrew Morton
@ 2021-03-08 15:06   ` Johannes Weiner
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2021-03-08 15:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Saravanan D, mingo, x86, dave.hansen, tj, linux-kernel, kernel-team

On Fri, Mar 05, 2021 at 04:57:15PM -0800, Andrew Morton wrote:
> On Thu, 18 Feb 2021 15:57:44 -0800 Saravanan D <saravanand@fb.com> wrote:
> 
> > To help with debugging the sluggishness caused by TLB miss/reload,
> > we introduce monotonic hugepage [direct mapped] split event counts since
> > system state: SYSTEM_RUNNING to be displayed as part of
> > /proc/vmstat in x86 servers
> >
> > ...
> >
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -120,6 +120,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> >  #ifdef CONFIG_SWAP
> >  		SWAP_RA,
> >  		SWAP_RA_HIT,
> > +#endif
> > +#ifdef CONFIG_X86
> > +		DIRECT_MAP_LEVEL2_SPLIT,
> > +		DIRECT_MAP_LEVEL3_SPLIT,
> >  #endif
> >  		NR_VM_EVENT_ITEMS
> >  };
> 
> This is the first appearance of arch-specific fields in /proc/vmstat.
> 
> I don't really see a problem with this - vmstat is basically a dumping
> ground of random developer stuff.  But was this the best place in which
> to present this data?

IMO it's a big plus for discoverability.

One of the first things I tend to do when triaging mysterious memory
issues is going to /proc/vmstat and seeing if anything looks abnormal.
There is value in making that file comprehensive for all things that
could indicate memory-related pathologies.

The impetus for adding these is a real-world tlb regression caused by
kprobes chewing up the direct mapping that took longer to debug than
necessary. We have the /proc/meminfo lines on the DirectMap, but those
are more useful when you already have a theory - they simply don't
make problems immediately stand out the same way.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
       [not found] ` <20210128233430.1460964-1-saravanand@fb.com>
  2021-01-28 23:41   ` [PATCH V6] " Tejun Heo
  2021-01-29 19:27   ` Johannes Weiner
@ 2021-02-08 23:30   ` Dave Hansen
  2 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2021-02-08 23:30 UTC (permalink / raw)
  To: Saravanan D, x86, dave.hansen, luto, peterz, willy
  Cc: linux-kernel, kernel-team, linux-mm, songliubraving, tj, hannes

On 1/28/21 3:34 PM, Saravanan D wrote:
> 
> One of the many lasting sources of direct hugepage splits is kernel
> tracing (kprobes, tracepoints).
> 
> Note that the kernel's code segment [512 MB] points to the same
> physical addresses that have been already mapped in the kernel's
> direct mapping range.

Looks fine to me:

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
  2021-01-29 19:27   ` Johannes Weiner
@ 2021-02-08 23:17     ` Saravanan D
  0 siblings, 0 replies; 8+ messages in thread
From: Saravanan D @ 2021-02-08 23:17 UTC (permalink / raw)
  To: Johannes Weiner, tj, x86
  Cc: dave.hansen, luto, peterz, willy, linux-kernel, kernel-team,
	linux-mm, songliubraving, tj

Hi all,

So far I have received two acks for V6 version of my patch

> Acked-by: Tejun Heo <tj@kernel.org>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Are there any more objections ?

Thanks,
Saravanan D

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
       [not found] ` <20210128233430.1460964-1-saravanand@fb.com>
  2021-01-28 23:41   ` [PATCH V6] " Tejun Heo
@ 2021-01-29 19:27   ` Johannes Weiner
  2021-02-08 23:17     ` Saravanan D
  2021-02-08 23:30   ` Dave Hansen
  2 siblings, 1 reply; 8+ messages in thread
From: Johannes Weiner @ 2021-01-29 19:27 UTC (permalink / raw)
  To: Saravanan D
  Cc: x86, dave.hansen, luto, peterz, willy, linux-kernel, kernel-team,
	linux-mm, songliubraving, tj

On Thu, Jan 28, 2021 at 03:34:30PM -0800, Saravanan D wrote:
> To help with debugging the sluggishness caused by TLB miss/reload,
> we introduce monotonic hugepage [direct mapped] split event counts since
> system state: SYSTEM_RUNNING to be displayed as part of
> /proc/vmstat in x86 servers
> 
> The lifetime split event information will be displayed at the bottom of
> /proc/vmstat
> ....
> swap_ra 0
> swap_ra_hit 0
> direct_map_level2_splits 94
> direct_map_level3_splits 4
> nr_unstable 0
> ....
> 
> One of the many lasting sources of direct hugepage splits is kernel
> tracing (kprobes, tracepoints).
> 
> Note that the kernel's code segment [512 MB] points to the same
> physical addresses that have been already mapped in the kernel's
> direct mapping range.
> 
> Source : Documentation/x86/x86_64/mm.rst
> 
> When we enable kernel tracing, the kernel has to modify
> attributes/permissions
> of the text segment hugepages that are direct mapped causing them to
> split.
> 
> Kernel's direct mapped hugepages do not coalesce back after split and
> remain in place for the remainder of the lifetime.
> 
> An instance of direct page splits when we turn on
> dynamic kernel tracing
> ....
> cat /proc/vmstat | grep -i direct_map_level
> direct_map_level2_splits 784
> direct_map_level3_splits 12
> bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ [pid, comm] =
> count(); }'
> cat /proc/vmstat | grep -i
> direct_map_level
> direct_map_level2_splits 789
> direct_map_level3_splits 12
> ....
> 
> Signed-off-by: Saravanan D <saravanand@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V6] x86/mm: Tracking linear mapping split events
       [not found] ` <20210128233430.1460964-1-saravanand@fb.com>
@ 2021-01-28 23:41   ` Tejun Heo
  2021-01-29 19:27   ` Johannes Weiner
  2021-02-08 23:30   ` Dave Hansen
  2 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2021-01-28 23:41 UTC (permalink / raw)
  To: Saravanan D
  Cc: x86, dave.hansen, luto, peterz, willy, linux-kernel, kernel-team,
	linux-mm, songliubraving, hannes

On Thu, Jan 28, 2021 at 03:34:30PM -0800, Saravanan D wrote:
> To help with debugging the sluggishness caused by TLB miss/reload,
> we introduce monotonic hugepage [direct mapped] split event counts since
> system state: SYSTEM_RUNNING to be displayed as part of
> /proc/vmstat in x86 servers
...
> Signed-off-by: Saravanan D <saravanand@fb.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-08 15:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-18 23:57 [PATCH V6] x86/mm: Tracking linear mapping split events Saravanan D
2021-03-01 22:43 ` Tejun Heo
2021-03-06  0:57 ` Andrew Morton
2021-03-08 15:06   ` Johannes Weiner
  -- strict thread matches above, loose matches on Subject: below --
2021-01-28 21:20 [PATCH V5] " Saravanan D
     [not found] ` <20210128233430.1460964-1-saravanand@fb.com>
2021-01-28 23:41   ` [PATCH V6] " Tejun Heo
2021-01-29 19:27   ` Johannes Weiner
2021-02-08 23:17     ` Saravanan D
2021-02-08 23:30   ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).