linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: introduce numa_zero_pfn
@ 2012-12-12 17:03 Joonsoo Kim
  2012-12-12 18:09 ` Kirill A. Shutemov
  2012-12-12 20:12 ` Christoph Lameter
  0 siblings, 2 replies; 5+ messages in thread
From: Joonsoo Kim @ 2012-12-12 17:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Joonsoo Kim

Currently, we use just *one* zero page regardless of user process' node.
When user process read zero page, at first, cpu should load this
to cpu cache. If node of cpu is not same as node of zero page, loading
takes long time. If we make zero pages for each nodes and use them
adequetly, we can reduce this overhead.

This patch implement basic infrastructure for numa_zero_pfn.
It is default disabled, because it doesn't provide page coloring and
some architecture use page coloring for zero page.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/Kconfig b/mm/Kconfig
index a3f8ddd..de0ab65 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -412,3 +412,8 @@ config FRONTSWAP
 	  and swap data is stored as normal on the matching swap device.
 
 	  If unsure, say Y to enable frontswap.
+
+config NUMA_ZERO_PFN
+	bool "Enable NUMA-aware zero page handling"
+	depends on NUMA
+	default n
diff --git a/mm/memory.c b/mm/memory.c
index 221fc9f..e7d3969 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -112,12 +112,43 @@ __setup("norandmaps", disable_randmaps);
 unsigned long zero_pfn __read_mostly;
 unsigned long highest_memmap_pfn __read_mostly;
 
+#ifdef CONFIG_NUMA_ZERO_PFN
+unsigned long node_to_zero_pfn[MAX_NUMNODES] __read_mostly;
+
+/* Should be called after zero_pfn initialization */
+static void __init init_numa_zero_pfn(void)
+{
+	unsigned int node;
+
+	if (nr_node_ids == 1)
+		return;
+
+	for_each_node_state(node, N_POSSIBLE) {
+		node_to_zero_pfn[node] = zero_pfn;
+	}
+
+	for_each_node_state(node, N_HIGH_MEMORY) {
+		struct page *page;
+		page = alloc_pages_exact_node(node,
+				GFP_HIGHUSER | __GFP_ZERO, 0);
+		if (!page)
+			continue;
+
+		node_to_zero_pfn[node] = page_to_pfn(page);
+	}
+}
+#else
+static inline void __init init_numa_zero_pfn(void) {}
+#endif
+
 /*
  * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init()
  */
 static int __init init_zero_pfn(void)
 {
 	zero_pfn = page_to_pfn(ZERO_PAGE(0));
+	init_numa_zero_pfn();
+
 	return 0;
 }
 core_initcall(init_zero_pfn);
@@ -717,6 +748,24 @@ static inline bool is_cow_mapping(vm_flags_t flags)
 	return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
 }
 
+#ifdef CONFIG_NUMA_ZERO_PFN
+static inline int is_numa_zero_pfn(unsigned long pfn)
+{
+	return zero_pfn == pfn || node_to_zero_pfn[pfn_to_nid(pfn)] == pfn;
+}
+
+static inline unsigned long my_numa_zero_pfn(unsigned long addr)
+{
+	if (nr_node_ids == 1)
+		return zero_pfn;
+
+	return node_to_zero_pfn[numa_node_id()];
+}
+
+#define is_zero_pfn is_numa_zero_pfn
+#define my_zero_pfn my_numa_zero_pfn
+#endif /* CONFIG_NUMA_ZERO_PFN */
+
 #ifndef is_zero_pfn
 static inline int is_zero_pfn(unsigned long pfn)
 {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: introduce numa_zero_pfn
  2012-12-12 17:03 [PATCH] mm: introduce numa_zero_pfn Joonsoo Kim
@ 2012-12-12 18:09 ` Kirill A. Shutemov
  2012-12-12 20:12 ` Christoph Lameter
  1 sibling, 0 replies; 5+ messages in thread
From: Kirill A. Shutemov @ 2012-12-12 18:09 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 649 bytes --]

On Thu, Dec 13, 2012 at 02:03:39AM +0900, Joonsoo Kim wrote:
> Currently, we use just *one* zero page regardless of user process' node.
> When user process read zero page, at first, cpu should load this
> to cpu cache. If node of cpu is not same as node of zero page, loading
> takes long time. If we make zero pages for each nodes and use them
> adequetly, we can reduce this overhead.
> 
> This patch implement basic infrastructure for numa_zero_pfn.
> It is default disabled, because it doesn't provide page coloring and
> some architecture use page coloring for zero page.

Do you have benchmark numbers?

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: introduce numa_zero_pfn
  2012-12-12 17:03 [PATCH] mm: introduce numa_zero_pfn Joonsoo Kim
  2012-12-12 18:09 ` Kirill A. Shutemov
@ 2012-12-12 20:12 ` Christoph Lameter
  2012-12-12 20:15   ` Andi Kleen
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Lameter @ 2012-12-12 20:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, andi, Kirill A. Shutemov

On Thu, 13 Dec 2012, Joonsoo Kim wrote:

> Currently, we use just *one* zero page regardless of user process' node.
> When user process read zero page, at first, cpu should load this
> to cpu cache. If node of cpu is not same as node of zero page, loading
> takes long time. If we make zero pages for each nodes and use them
> adequetly, we can reduce this overhead.

Are you sure about the loading taking a long time?

I would expect a processor to fetch the zero page cachelines from the l3
cache from other sockets avoiding memory transactions altogether. The zero
page is likely in use somewhere so no typically no memory accesses should
occur in a system.

Fetching from the l3 cache out of another socket is faster than
fetching from local memory.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: introduce numa_zero_pfn
  2012-12-12 20:12 ` Christoph Lameter
@ 2012-12-12 20:15   ` Andi Kleen
  2012-12-17 13:58     ` JoonSoo Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2012-12-12 20:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Joonsoo Kim, Andrew Morton, linux-kernel, linux-mm, andi,
	Kirill A. Shutemov

> I would expect a processor to fetch the zero page cachelines from the l3
> cache from other sockets avoiding memory transactions altogether. The zero
> page is likely in use somewhere so no typically no memory accesses should
> occur in a system.

It depends on how effectively the workload uses the caches. If something
is a cache pig of the L3 cache, then even shareable cache lines may need
to be refetched regularly.

But if your workloads spends a significant part of its time reading
from zero page read only data there is something wrong with the workload.

I would do some data profiling first to really prove that is the case.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: introduce numa_zero_pfn
  2012-12-12 20:15   ` Andi Kleen
@ 2012-12-17 13:58     ` JoonSoo Kim
  0 siblings, 0 replies; 5+ messages in thread
From: JoonSoo Kim @ 2012-12-17 13:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Lameter, Andrew Morton, linux-kernel, linux-mm,
	Kirill A. Shutemov

2012/12/13 Andi Kleen <andi@firstfloor.org>:
>> I would expect a processor to fetch the zero page cachelines from the l3
>> cache from other sockets avoiding memory transactions altogether. The zero
>> page is likely in use somewhere so no typically no memory accesses should
>> occur in a system.
>
> It depends on how effectively the workload uses the caches. If something
> is a cache pig of the L3 cache, then even shareable cache lines may need
> to be refetched regularly.
>
> But if your workloads spends a significant part of its time reading
> from zero page read only data there is something wrong with the workload.
>
> I would do some data profiling first to really prove that is the case.

Okay.
I didn't know about L3 cache, before.
Now, I think that I need some data profiling!
Thanks for comment.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-12-17 13:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-12 17:03 [PATCH] mm: introduce numa_zero_pfn Joonsoo Kim
2012-12-12 18:09 ` Kirill A. Shutemov
2012-12-12 20:12 ` Christoph Lameter
2012-12-12 20:15   ` Andi Kleen
2012-12-17 13:58     ` JoonSoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).