linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
@ 2014-05-08  9:28 Madhavan Srinivasan
  2014-05-08  9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-08  9:28 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen, Madhavan Srinivasan

Kirill A. Shutemov with 8c6e50b029 commit introduced
vm_ops->map_pages() for mapping easy accessible pages around
fault address in hope to reduce number of minor page faults.

This patch creates infrastructure to modify the FAULT_AROUND_ORDER
value using mm/Kconfig. This will enable architecture maintainers
to decide on suitable FAULT_AROUND_ORDER value based on
performance data for that architecture. First patch also defaults
FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
out the performance numbers for powerpc (platform pseries) and
initialize the fault around order variable for pseries platform of
powerpc.

V4 Changes:
  Replaced the BUILD_BUG_ON with VM_BUG_ON.
  Moved fault_around_pages() and fault_around_mask() functions outside of
   #ifdef CONFIG_DEBUG_FS.

V3 Changes:
  Replaced FAULT_AROUND_ORDER macro to a variable to support arch's that
   supports sub platforms.
  Made changes in commit messages.

V2 Changes:
  Created Kconfig parameter for FAULT_AROUND_ORDER
  Added check in do_read_fault to handle FAULT_AROUND_ORDER value of 0
  Made changes in commit messages.

Madhavan Srinivasan (2):
  mm: move FAULT_AROUND_ORDER to arch/
  powerpc/pseries: init fault_around_order for pseries

 arch/powerpc/platforms/pseries/pseries.h |    2 ++
 arch/powerpc/platforms/pseries/setup.c   |    5 +++++
 mm/Kconfig                               |    8 ++++++++
 mm/memory.c                              |   25 ++++++-------------------
 4 files changed, 21 insertions(+), 19 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/
  2014-05-08  9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
@ 2014-05-08  9:28 ` Madhavan Srinivasan
  2014-05-08  9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
  2014-05-15  8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
  2 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-08  9:28 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen, Madhavan Srinivasan

Kirill A. Shutemov with 8c6e50b029 commit introduced
vm_ops->map_pages() for mapping easy accessible pages around
fault address in hope to reduce number of minor page faults.

This patch creates infrastructure to modify the FAULT_AROUND_ORDER
value using mm/Kconfig. This will enable architecture maintainers
to decide on suitable FAULT_AROUND_ORDER value based on
performance data for that architecture. Patch also defaults
FAULT_AROUND_ORDER Kconfig element to 4.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 mm/Kconfig  |    8 ++++++++
 mm/memory.c |   25 ++++++-------------------
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ebe5880..c7fc4f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -176,6 +176,14 @@ config MOVABLE_NODE
 config HAVE_BOOTMEM_INFO_NODE
 	def_bool n
 
+#
+# Fault around order is a control knob to decide the fault around pages.
+# Default value is set to 4 , but the arch can override it as desired.
+#
+config FAULT_AROUND_ORDER
+	int
+	default	4
+
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"
diff --git a/mm/memory.c b/mm/memory.c
index 037b812..e3931ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3402,11 +3402,9 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 	update_mmu_cache(vma, address, pte);
 }
 
-#define FAULT_AROUND_ORDER 4
+unsigned int fault_around_order __read_mostly = CONFIG_FAULT_AROUND_ORDER;
 
 #ifdef CONFIG_DEBUG_FS
-static unsigned int fault_around_order = FAULT_AROUND_ORDER;
-
 static int fault_around_order_get(void *data, u64 *val)
 {
 	*val = fault_around_order;
@@ -3415,7 +3413,6 @@ static int fault_around_order_get(void *data, u64 *val)
 
 static int fault_around_order_set(void *data, u64 val)
 {
-	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
 	if (1UL << val > PTRS_PER_PTE)
 		return -EINVAL;
 	fault_around_order = val;
@@ -3435,31 +3432,21 @@ static int __init fault_around_debugfs(void)
 	return 0;
 }
 late_initcall(fault_around_debugfs);
+#endif
 
 static inline unsigned long fault_around_pages(void)
 {
-	return 1UL << fault_around_order;
-}
-
-static inline unsigned long fault_around_mask(void)
-{
-	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
-}
-#else
-static inline unsigned long fault_around_pages(void)
-{
 	unsigned long nr_pages;
 
-	nr_pages = 1UL << FAULT_AROUND_ORDER;
-	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
+	nr_pages = 1UL << fault_around_order;
+	VM_BUG_ON(nr_pages > PTRS_PER_PTE);
 	return nr_pages;
 }
 
 static inline unsigned long fault_around_mask(void)
 {
-	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
+	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
 }
-#endif
 
 static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 		pte_t *pte, pgoff_t pgoff, unsigned int flags)
@@ -3515,7 +3502,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * if page by the offset is not ready to be mapped (cold cache or
 	 * something).
 	 */
-	if (vma->vm_ops->map_pages) {
+	if ((vma->vm_ops->map_pages) && fault_around_order) {
 		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 		do_fault_around(vma, address, pte, pgoff, flags);
 		if (!pte_same(*pte, orig_pte))
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-05-08  9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
  2014-05-08  9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
@ 2014-05-08  9:28 ` Madhavan Srinivasan
  2014-05-20  7:28   ` Andrew Morton
  2014-05-15  8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
  2 siblings, 1 reply; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-08  9:28 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen, Madhavan Srinivasan

Performance data for different FAULT_AROUND_ORDER values from 4 socket
Power7 system (128 Threads and 128GB memory). perf stat with repeat of 5
is used to get the stddev values. Test ran in v3.14 kernel (Baseline) and
v3.15-rc1 for different fault around order values. %change here is calculated
in this method ((new value - baseline)/baseline). And negative %change says
its a drop in time.

FAULT_AROUND_ORDER      Baseline        1               3               4               5               8

Linux build (make -j64)
minor-faults            47,437,359      35,279,286      25,425,347      23,461,275      22,002,189      21,435,836
times in seconds        347.302528420   344.061588460   340.974022391   348.193508116   348.673900158   350.986543618
 stddev for time        ( +-  1.50% )   ( +-  0.73% )   ( +-  1.13% )   ( +-  1.01% )   ( +-  1.89% )   ( +-  1.55% )
 %chg time to baseline                  -0.9%           -1.8%           0.2%            0.39%           1.06%

Linux rebuild (make -j64)
minor-faults            941,552         718,319         486,625         440,124         410,510         397,416
times in seconds        30.569834718    31.219637539    31.319370649    31.434285472    31.972367174    31.443043580
 stddev for time        ( +-  1.07% )   ( +-  0.13% )   ( +-  0.43% )   ( +-  0.18% )   ( +-  0.95% )   ( +-  0.58% )
 %chg time to baseline                  2.1%            2.4%            2.8%            4.58%           2.85%

Binutils build (make all -j64 )
minor-faults            474,821         371,380         269,463         247,715         235,255         228,337
times in seconds        53.882492432    53.584289348    53.882773216    53.755816431    53.607824348    53.423759642
 stddev for time        ( +-  0.08% )   ( +-  0.56% )   ( +-  0.17% )   ( +-  0.11% )   ( +-  0.60% )   ( +-  0.69% )
 %chg time to baseline                  -0.55%          0.0%            -0.23%          -0.51%          -0.85%

Two synthetic tests: access every word in file in sequential/random order.

Sequential access 16GiB file
FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
1 thread
       minor-faults     263,148         131,166         32,908          16,514          8,260           1,093
       times in seconds 53.091138345    53.113191672    53.188776177    53.233017218    53.206841347    53.429979442
       stddev for time  ( +-  0.06% )   ( +-  0.07% )   ( +-  0.08% )   ( +-  0.09% )   ( +-  0.03% )   ( +-  0.03% )
       %chg time to baseline            0.04%           0.18%           0.26%           0.21%           0.63%
8 threads
       minor-faults     2,097,267       1,048,753       262,237         131,397         65,621          8,274
       times in seconds 55.173790028    54.591880790    54.824623287    54.802162211    54.969680503    54.790387715
       stddev for time  ( +-  0.78% )   ( +-  0.09% )   ( +-  0.08% )   ( +-  0.07% )   ( +-  0.28% )   ( +-  0.05% )
       %chg time to baseline            -1.05%          -0.63%          -0.67%          -0.36%          -0.69%
32 threads
       minor-faults     8,388,751       4,195,621       1,049,664       525,461         262,535         32,924
       times in seconds 60.431573046    60.669110744    60.485336388    60.697789706    60.077959564    60.588855032
       stddev for time  ( +-  0.44% )   ( +-  0.27% )   ( +-  0.46% )   ( +-  0.67% )   ( +-  0.31% )   ( +-  0.49% )
       %chg time to baseline            0.39%           0.08%           0.44%           -0.58%          0.25%
64 threads
       minor-faults     16,777,409      8,607,527       2,289,766       1,202,264       598,405         67,587
       times in seconds 96.932617720    100.675418760   102.109880836   103.881733383   102.580199555   105.751194041
       stddev for time  ( +-  1.39% )   ( +-  1.06% )   ( +-  0.99% )   ( +-  0.76% )   ( +-  1.65% )   ( +-  1.60% )
       %chg time to baseline            3.86%           5.34%           7.16%           5.82%           9.09%
128 threads
       minor-faults     33,554,705      17,375,375      4,682,462       2,337,245       1,179,007       134,819
       times in seconds 128.766704495   115.659225437   120.353046307   115.291871270   115.450886036   113.991902150
       stddev for time  ( +-  2.93% )   ( +-  0.30% )   ( +-  2.93% )   ( +-  1.24% )   ( +-  1.03% )   ( +-  0.70% )
       %chg time to baseline            -10.17%         -6.53%          -10.46%         -10.34%         -11.47%

Random access 1GiB file
FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
1 thread
       minor-faults     17,155          8,678           2,126           1,097           581             134
       times in seconds 51.904430523    51.658017987    51.919270792    51.560531738    52.354431597    51.976469502
       stddev for time  ( +-  3.19% )   ( +-  1.35% )   ( +-  1.56% )   ( +-  0.91% )   ( +-  1.70% )   ( +-  2.02% )
       %chg time to baseline            -0.47%          0.02%           -0.66%          0.86%           0.13%
8 threads
       minor-faults     131,844         70,705          17,457          8,505           4,251           598
       times in seconds 58.162813956    54.991706305    54.952675791    55.323057492    54.755587379    53.376722828
       stddev for time  ( +-  1.44% )   ( +-  0.69% )   ( +-  1.23% )   ( +-  2.78% )   ( +-  1.90% )   ( +-  2.91% )
       %chg time to baseline            -5.45%          -5.52%          -4.88%          -5.86%          -8.22%
32 threads
       minor-faults     524,437         270,760         67,069          33,414          16,641          2,204
       times in seconds 69.981777072    76.539570015    79.753578505    76.245943618    77.254258344    79.072596831
       stddev for time  ( +-  2.81% )   ( +-  1.95% )   ( +-  2.66% )   ( +-  0.99% )   ( +-  2.35% )   ( +-  3.22% )
       %chg time to baseline            9.37%           13.96%          8.95%           10.39%          12.98%
64 threads
       minor-faults     1,049,117       527,451         134,016         66,638          33,391          4,559
       times in seconds 108.024517536   117.575067996   115.322659914   111.943998437   115.049450815   119.218450840
       stddev for time  ( +-  2.40% )   ( +-  1.77% )   ( +-  1.19% )   ( +-  3.29% )   ( +-  2.32% )   ( +-  1.42% )
       %chg time to baseline            8.84%           6.75%           3.62%           6.5%            10.3%
128 threads
       minor-faults     2,097,440       1,054,360       267,042         133,328         66,532          8,652
       times in seconds 155.055861167   153.059625968   152.449492156   151.024005282   150.844647770   155.954366718
       stddev for time  ( +-  1.32% )   ( +-  1.14% )   ( +-  1.32% )   ( +-  0.81% )   ( +-  0.75% )   ( +-  0.72% )
       %chg time to baseline            -1.28%          -1.68%          -2.59%          -2.71%          0.57%

In case of kernel build, fault around order (fao) value of 1 and 3 wins when compared to 4 (but bit noisy).
Incase of kernel rebuild, slowdown for fao > 0 is seen. Incase of synthetic test, there are sporadic agains, but mostly
slowdown. No clear sweet spot fao value that can be suggested for the ppc64/pseries with the current
performance data. Hence, patch suggest value of zero to the fao.

Worst case scenario: we touch one page every 16M to demonstrate overhead.

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER      Baseline        1               3               4               5               8
1 thread
       minor-faults     1,104           1,090           1,071           1,068           1,065           1,063
       times in seconds 0.006583298     0.008531502     0.019733795     0.036033763     0.062300553     0.406857086
       stddev for time  ( +-  2.79% )   ( +-  2.42% )   ( +-  3.47% )   ( +-  2.81% )   ( +-  2.01% )   ( +-  1.33% )
8 threads
       minor-faults     8,279           8,264           8,245           8,243           8,239           8,240
       times in seconds 0.044572398     0.057211811     0.107606306     0.205626815     0.381679120     2.647979955
       stddev for time  ( +-  1.95% )   ( +-  2.98% )   ( +-  1.74% )   ( +-  2.80% )   ( +-  2.01% )   ( +-  1.86% )
32 threads
       minor-faults     32,879          32,864          32,849          32,845          32,839          32,843
       times in seconds 0.197659343     0.218486087     0.445116407     0.694235883     1.296894038     9.127517045
       stddev for time  ( +-  3.05% )   ( +-  3.05% )   ( +-  4.33% )   ( +-  3.08% )   ( +-  3.75% )   ( +-  0.56% )
64 threads
       minor-faults     65,680          65,664          65,646          65,645          65,640          65,647
       times in seconds 0.455537304     0.489688780     0.866490093     1.427393118     2.379628982     17.059295051
       stddev for time  ( +-  4.01% )   ( +-  4.13% )   ( +-  2.92% )   ( +-  1.68% )   ( +-  1.79% )   ( +-  0.48% )
128 threads
       minor-faults     131,279         131,265         131,250         131,245         131,241         131,254
       times in seconds 1.026880651     1.095327536     1.721728274     2.808233068     4.662729948     31.732848290
       stddev for time  ( +-  6.85% )   ( +-  4.09% )   ( +-  1.71% )   ( +-  3.45% )   ( +-  2.40% )   ( +-  0.68% )

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/pseries.h |    2 ++
 arch/powerpc/platforms/pseries/setup.c   |    5 +++++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 9921953..6e6c993 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -17,6 +17,8 @@ struct device_node;
 extern void request_event_sources_irqs(struct device_node *np,
 				       irq_handler_t handler, const char *name);
 
+extern unsigned int fault_around_order;
+
 #include <linux/of.h>
 
 extern void __init fw_hypertas_feature_init(const char *hypertas,
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 2db8cc6..4391c3c 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -465,6 +465,11 @@ static void __init pSeries_setup_arch(void)
 {
 	set_arch_panic_timeout(10, ARCH_PANIC_TIMEOUT);
 
+	/*
+	 * Defaulting to zero since no sweet spot value found in the performance test.
+	 */
+	fault_around_order = 0;
+
 	/* Discover PIC type and setup ppc_md accordingly */
 	pseries_discover_pic();
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-08  9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
  2014-05-08  9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
  2014-05-08  9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
@ 2014-05-15  8:25 ` Madhavan Srinivasan
  2014-05-15 17:28   ` Hugh Dickins
  2 siblings, 1 reply; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-15  8:25 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86
  Cc: benh, paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen


Hi Ingo,

	Do you have any comments for the latest version of the patchset. If
not, kindly can you pick it up as is.


With regards
Maddy

> Kirill A. Shutemov with 8c6e50b029 commit introduced
> vm_ops->map_pages() for mapping easy accessible pages around
> fault address in hope to reduce number of minor page faults.
> 
> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
> value using mm/Kconfig. This will enable architecture maintainers
> to decide on suitable FAULT_AROUND_ORDER value based on
> performance data for that architecture. First patch also defaults
> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
> out the performance numbers for powerpc (platform pseries) and
> initialize the fault around order variable for pseries platform of
> powerpc.
> 
> V4 Changes:
>   Replaced the BUILD_BUG_ON with VM_BUG_ON.
>   Moved fault_around_pages() and fault_around_mask() functions outside of
>    #ifdef CONFIG_DEBUG_FS.
> 
> V3 Changes:
>   Replaced FAULT_AROUND_ORDER macro to a variable to support arch's that
>    supports sub platforms.
>   Made changes in commit messages.
> 
> V2 Changes:
>   Created Kconfig parameter for FAULT_AROUND_ORDER
>   Added check in do_read_fault to handle FAULT_AROUND_ORDER value of 0
>   Made changes in commit messages.
> 
> Madhavan Srinivasan (2):
>   mm: move FAULT_AROUND_ORDER to arch/
>   powerpc/pseries: init fault_around_order for pseries
> 
>  arch/powerpc/platforms/pseries/pseries.h |    2 ++
>  arch/powerpc/platforms/pseries/setup.c   |    5 +++++
>  mm/Kconfig                               |    8 ++++++++
>  mm/memory.c                              |   25 ++++++-------------------
>  4 files changed, 21 insertions(+), 19 deletions(-)
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-15  8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
@ 2014-05-15 17:28   ` Hugh Dickins
  2014-05-19  0:12     ` Rusty Russell
  0 siblings, 1 reply; 26+ messages in thread
From: Hugh Dickins @ 2014-05-15 17:28 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, akpm, riel, mgorman, ak, peterz,
	mingo, dave.hansen

On Thu, 15 May 2014, Madhavan Srinivasan wrote:
> 
> Hi Ingo,
> 
> 	Do you have any comments for the latest version of the patchset. If
> not, kindly can you pick it up as is.
> 
> 
> With regards
> Maddy
> 
> > Kirill A. Shutemov with 8c6e50b029 commit introduced
> > vm_ops->map_pages() for mapping easy accessible pages around
> > fault address in hope to reduce number of minor page faults.
> > 
> > This patch creates infrastructure to modify the FAULT_AROUND_ORDER
> > value using mm/Kconfig. This will enable architecture maintainers
> > to decide on suitable FAULT_AROUND_ORDER value based on
> > performance data for that architecture. First patch also defaults
> > FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
> > out the performance numbers for powerpc (platform pseries) and
> > initialize the fault around order variable for pseries platform of
> > powerpc.

Sorry for not commenting earlier - just reminded by this ping to Ingo.

I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use.

arch/powerpc/Kconfig suggests that Power supports base page size of
4k, 16k, 64k or 256k.

I would expect your optimal fault_around_order to depend very much on
the base page size.

Perhaps fault_around_size would provide a more useful default?

Hugh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-15 17:28   ` Hugh Dickins
@ 2014-05-19  0:12     ` Rusty Russell
  2014-05-19  3:05       ` Madhavan Srinivasan
  0 siblings, 1 reply; 26+ messages in thread
From: Rusty Russell @ 2014-05-19  0:12 UTC (permalink / raw)
  To: Hugh Dickins, Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz, mingo,
	dave.hansen

Hugh Dickins <hughd@google.com> writes:
> On Thu, 15 May 2014, Madhavan Srinivasan wrote:
>> 
>> Hi Ingo,
>> 
>> 	Do you have any comments for the latest version of the patchset. If
>> not, kindly can you pick it up as is.
>> 
>> 
>> With regards
>> Maddy
>> 
>> > Kirill A. Shutemov with 8c6e50b029 commit introduced
>> > vm_ops->map_pages() for mapping easy accessible pages around
>> > fault address in hope to reduce number of minor page faults.
>> > 
>> > This patch creates infrastructure to modify the FAULT_AROUND_ORDER
>> > value using mm/Kconfig. This will enable architecture maintainers
>> > to decide on suitable FAULT_AROUND_ORDER value based on
>> > performance data for that architecture. First patch also defaults
>> > FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
>> > out the performance numbers for powerpc (platform pseries) and
>> > initialize the fault around order variable for pseries platform of
>> > powerpc.
>
> Sorry for not commenting earlier - just reminded by this ping to Ingo.
>
> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use.
>
> arch/powerpc/Kconfig suggests that Power supports base page size of
> 4k, 16k, 64k or 256k.
>
> I would expect your optimal fault_around_order to depend very much on
> the base page size.

It was 64k, which is what PPC64 uses on all the major distributions.
You really only get a choice of 4k and 64k with 64 bit power.

> Perhaps fault_around_size would provide a more useful default?

That seems to fit.  With 4k pages and order 4, you're asking for 64k.
Maddy's result shows 64k is also reasonable for 64k pages.

Perhaps we try to generalize from two data points (a slight improvement
over doing it from 1!), eg:

/* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */
unsigned int fault_around_order __read_mostly =
        (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT);

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-19  0:12     ` Rusty Russell
@ 2014-05-19  3:05       ` Madhavan Srinivasan
  2014-05-19 23:23         ` Hugh Dickins
  0 siblings, 1 reply; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-19  3:05 UTC (permalink / raw)
  To: Rusty Russell, Hugh Dickins
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, akpm, riel, mgorman, ak, peterz, mingo,
	dave.hansen

On Monday 19 May 2014 05:42 AM, Rusty Russell wrote:
> Hugh Dickins <hughd@google.com> writes:
>> On Thu, 15 May 2014, Madhavan Srinivasan wrote:
>>>
>>> Hi Ingo,
>>>
>>> 	Do you have any comments for the latest version of the patchset. If
>>> not, kindly can you pick it up as is.
>>>
>>>
>>> With regards
>>> Maddy
>>>
>>>> Kirill A. Shutemov with 8c6e50b029 commit introduced
>>>> vm_ops->map_pages() for mapping easy accessible pages around
>>>> fault address in hope to reduce number of minor page faults.
>>>>
>>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
>>>> value using mm/Kconfig. This will enable architecture maintainers
>>>> to decide on suitable FAULT_AROUND_ORDER value based on
>>>> performance data for that architecture. First patch also defaults
>>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
>>>> out the performance numbers for powerpc (platform pseries) and
>>>> initialize the fault around order variable for pseries platform of
>>>> powerpc.
>>
>> Sorry for not commenting earlier - just reminded by this ping to Ingo.
>>
>> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use.
>>
>> arch/powerpc/Kconfig suggests that Power supports base page size of
>> 4k, 16k, 64k or 256k.
>>
>> I would expect your optimal fault_around_order to depend very much on
>> the base page size.
> 
> It was 64k, which is what PPC64 uses on all the major distributions.
> You really only get a choice of 4k and 64k with 64 bit power.
> 
This is true. PPC64 support multiple pagesize and yes the default page
size of 64k, is taken as base pagesize for the tests.

>> Perhaps fault_around_size would provide a more useful default?
> 
> That seems to fit.  With 4k pages and order 4, you're asking for 64k.
> Maddy's result shows 64k is also reasonable for 64k pages.
> 
> Perhaps we try to generalize from two data points (a slight improvement
> over doing it from 1!), eg:
> 
> /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */
> unsigned int fault_around_order __read_mostly =
>         (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT);
> 

This may be right. But these are the concerns, will not this make other
arch to pick default without any tuning and also this will remove the
compile time option to disable the feature?

Thanks for review
With regards
Maddy



> Cheers,
> Rusty.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-19  3:05       ` Madhavan Srinivasan
@ 2014-05-19 23:23         ` Hugh Dickins
  2014-05-19 23:43           ` Andrew Morton
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Hugh Dickins @ 2014-05-19 23:23 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: Rusty Russell, Kirill A. Shutemov, linux-kernel, linuxppc-dev,
	linux-mm, linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen

On Mon, 19 May 2014, Madhavan Srinivasan wrote:
> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote:
> > Hugh Dickins <hughd@google.com> writes:
> >> On Thu, 15 May 2014, Madhavan Srinivasan wrote:
> >>>
> >>> Hi Ingo,
> >>>
> >>> 	Do you have any comments for the latest version of the patchset. If
> >>> not, kindly can you pick it up as is.
> >>>
> >>>
> >>> With regards
> >>> Maddy
> >>>
> >>>> Kirill A. Shutemov with 8c6e50b029 commit introduced
> >>>> vm_ops->map_pages() for mapping easy accessible pages around
> >>>> fault address in hope to reduce number of minor page faults.
> >>>>
> >>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
> >>>> value using mm/Kconfig. This will enable architecture maintainers
> >>>> to decide on suitable FAULT_AROUND_ORDER value based on
> >>>> performance data for that architecture. First patch also defaults
> >>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
> >>>> out the performance numbers for powerpc (platform pseries) and
> >>>> initialize the fault around order variable for pseries platform of
> >>>> powerpc.
> >>
> >> Sorry for not commenting earlier - just reminded by this ping to Ingo.
> >>
> >> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use.
> >>
> >> arch/powerpc/Kconfig suggests that Power supports base page size of
> >> 4k, 16k, 64k or 256k.
> >>
> >> I would expect your optimal fault_around_order to depend very much on
> >> the base page size.
> > 
> > It was 64k, which is what PPC64 uses on all the major distributions.
> > You really only get a choice of 4k and 64k with 64 bit power.
> > 
> This is true. PPC64 support multiple pagesize and yes the default page
> size of 64k, is taken as base pagesize for the tests.
> 
> >> Perhaps fault_around_size would provide a more useful default?
> > 
> > That seems to fit.  With 4k pages and order 4, you're asking for 64k.
> > Maddy's result shows 64k is also reasonable for 64k pages.
> > 
> > Perhaps we try to generalize from two data points (a slight improvement
> > over doing it from 1!), eg:
> > 
> > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */
> > unsigned int fault_around_order __read_mostly =
> >         (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT);

Rusty's bimodal answer doesn't seem the right starting point to me.

Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
the order of the fault-around size in bytes, and fault_around_pages()
use 1UL << (fault_around_order - PAGE_SHIFT)
- when that doesn't wrap, of course!

That would at least have a better chance of being appropriate for
architectures with 8k and 16k pages (Itanium springs to mind).

Not necessarily right for them, since each architecture may have
different faulting overheads; but a better chance of being right
than blindly assuming 4k or 64k pages for everyone.

I'd be glad to see that change go into v3.15: what do you think,
Kirill, are we too late to make such a change now?
Or do you see some objection to it?

> This may be right. But these are the concerns, will not this make other
> arch to pick default without any tuning

Wasn't FAULT_AROUND_ORDER 4 chosen solely on the basis of x86 4k pages?
Did other architectures, with other page sizes, back that default?
Clearly not powerpc.

> and also this will remove the
> compile time option to disable the feature?

Compile time option meaning your FAULT_AROUND_ORDER in mm/Kconfig
for v3.16?

I'm not sure whether Rusty was arguing against that or not.  I think
we are all three concerned to have a more sensible default than what's
there at present.  I don't feel very strongly about your Kconfig
option: I've no objection, if it were to default to byte order 16.

Hugh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-19 23:23         ` Hugh Dickins
@ 2014-05-19 23:43           ` Andrew Morton
  2014-05-20  0:44             ` Kirill A. Shutemov
  2014-05-20  1:14           ` Rusty Russell
  2014-05-20  2:06           ` Madhavan Srinivasan
  2 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2014-05-19 23:43 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Madhavan Srinivasan, Rusty Russell, Kirill A. Shutemov,
	linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, riel, mgorman, ak, peterz, mingo, dave.hansen

On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:

> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> the order of the fault-around size in bytes, and fault_around_pages()
> use 1UL << (fault_around_order - PAGE_SHIFT)

Yes.  And shame on me for missing it (this time!) at review.

There's still time to fix this.  Patches, please.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-19 23:43           ` Andrew Morton
@ 2014-05-20  0:44             ` Kirill A. Shutemov
  2014-05-20  6:22               ` Rusty Russell
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-05-20  0:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Madhavan Srinivasan, Rusty Russell,
	Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm,
	linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo,
	dave.hansen

Andrew Morton wrote:
> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> 
> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> > the order of the fault-around size in bytes, and fault_around_pages()
> > use 1UL << (fault_around_order - PAGE_SHIFT)
> 
> Yes.  And shame on me for missing it (this time!) at review.
> 
> There's still time to fix this.  Patches, please.

Here it is. Made at 3.30 AM, build tested only.

I'll sign it off tomorrow after testing.

diff --git a/mm/memory.c b/mm/memory.c
index 037b812a9531..9d6941c9a9e4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3402,62 +3402,62 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 	update_mmu_cache(vma, address, pte);
 }
 
-#define FAULT_AROUND_ORDER 4
+#define FAULT_AROUND_BYTES 65536
 
 #ifdef CONFIG_DEBUG_FS
-static unsigned int fault_around_order = FAULT_AROUND_ORDER;
+static unsigned int fault_around_bytes = FAULT_AROUND_BYTES;
 
-static int fault_around_order_get(void *data, u64 *val)
+static int fault_around_bytes_get(void *data, u64 *val)
 {
-	*val = fault_around_order;
+	*val = fault_around_bytes;
 	return 0;
 }
 
-static int fault_around_order_set(void *data, u64 val)
+static int fault_around_bytes_set(void *data, u64 val)
 {
-	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
-	if (1UL << val > PTRS_PER_PTE)
+	BUILD_BUG_ON(FAULT_AROUND_BYTES / PAGE_SIZE > PTRS_PER_PTE);
+	if (val / PAGE_SIZE > PTRS_PER_PTE)
 		return -EINVAL;
-	fault_around_order = val;
+	fault_around_bytes = val;
 	return 0;
 }
-DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops,
-		fault_around_order_get, fault_around_order_set, "%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops,
+		fault_around_bytes_get, fault_around_bytes_set, "%llu\n");
 
 static int __init fault_around_debugfs(void)
 {
 	void *ret;
 
-	ret = debugfs_create_file("fault_around_order",	0644, NULL, NULL,
-			&fault_around_order_fops);
+	ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL,
+			&fault_around_bytes_fops);
 	if (!ret)
-		pr_warn("Failed to create fault_around_order in debugfs");
+		pr_warn("Failed to create fault_around_bytes in debugfs");
 	return 0;
 }
 late_initcall(fault_around_debugfs);
 
 static inline unsigned long fault_around_pages(void)
 {
-	return 1UL << fault_around_order;
+	return fault_around_bytes / PAGE_SIZE;
 }
 
 static inline unsigned long fault_around_mask(void)
 {
-	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
+	return ~(round_down(fault_around_bytes, PAGE_SIZE) - 1);
 }
 #else
 static inline unsigned long fault_around_pages(void)
 {
 	unsigned long nr_pages;
 
-	nr_pages = 1UL << FAULT_AROUND_ORDER;
+	nr_pages = FAULT_AROUND_BYTES / PAGE_SIZE;
 	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
 	return nr_pages;
 }
 
 static inline unsigned long fault_around_mask(void)
 {
-	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
+	return ~(round_down(FAULT_AROUND_BYTES, PAGE_SIZE) - 1);
 }
 #endif
 
@@ -3515,7 +3515,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * if page by the offset is not ready to be mapped (cold cache or
 	 * something).
 	 */
-	if (vma->vm_ops->map_pages) {
+	if (vma->vm_ops->map_pages && fault_around_pages() > 1) {
 		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 		do_fault_around(vma, address, pte, pgoff, flags);
 		if (!pte_same(*pte, orig_pte))
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-19 23:23         ` Hugh Dickins
  2014-05-19 23:43           ` Andrew Morton
@ 2014-05-20  1:14           ` Rusty Russell
  2014-05-20  2:34             ` Hugh Dickins
  2014-05-20  2:06           ` Madhavan Srinivasan
  2 siblings, 1 reply; 26+ messages in thread
From: Rusty Russell @ 2014-05-20  1:14 UTC (permalink / raw)
  To: Hugh Dickins, Madhavan Srinivasan
  Cc: Kirill A. Shutemov, linux-kernel, linuxppc-dev, linux-mm,
	linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak, peterz,
	mingo, dave.hansen

Hugh Dickins <hughd@google.com> writes:
> On Mon, 19 May 2014, Madhavan Srinivasan wrote:
>> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote:
>> > Hugh Dickins <hughd@google.com> writes:
>> >> On Thu, 15 May 2014, Madhavan Srinivasan wrote:
>> >>>
>> >>> Hi Ingo,
>> >>>
>> >>> 	Do you have any comments for the latest version of the patchset. If
>> >>> not, kindly can you pick it up as is.
>> >>>
>> >>>
>> >>> With regards
>> >>> Maddy
>> >>>
>> >>>> Kirill A. Shutemov with 8c6e50b029 commit introduced
>> >>>> vm_ops->map_pages() for mapping easy accessible pages around
>> >>>> fault address in hope to reduce number of minor page faults.
>> >>>>
>> >>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
>> >>>> value using mm/Kconfig. This will enable architecture maintainers
>> >>>> to decide on suitable FAULT_AROUND_ORDER value based on
>> >>>> performance data for that architecture. First patch also defaults
>> >>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
>> >>>> out the performance numbers for powerpc (platform pseries) and
>> >>>> initialize the fault around order variable for pseries platform of
>> >>>> powerpc.
>> >>
>> >> Sorry for not commenting earlier - just reminded by this ping to Ingo.
>> >>
>> >> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use.
>> >>
>> >> arch/powerpc/Kconfig suggests that Power supports base page size of
>> >> 4k, 16k, 64k or 256k.
>> >>
>> >> I would expect your optimal fault_around_order to depend very much on
>> >> the base page size.
>> > 
>> > It was 64k, which is what PPC64 uses on all the major distributions.
>> > You really only get a choice of 4k and 64k with 64 bit power.
>> > 
>> This is true. PPC64 support multiple pagesize and yes the default page
>> size of 64k, is taken as base pagesize for the tests.
>> 
>> >> Perhaps fault_around_size would provide a more useful default?
>> > 
>> > That seems to fit.  With 4k pages and order 4, you're asking for 64k.
>> > Maddy's result shows 64k is also reasonable for 64k pages.
>> > 
>> > Perhaps we try to generalize from two data points (a slight improvement
>> > over doing it from 1!), eg:
>> > 
>> > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */
>> > unsigned int fault_around_order __read_mostly =
>> >         (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT);
>
> Rusty's bimodal answer doesn't seem the right starting point to me.

?  It's not bimodal, it's graded.  I think you misread?

> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> the order of the fault-around size in bytes, and fault_around_pages()
> use 1UL << (fault_around_order - PAGE_SHIFT)
> - when that doesn't wrap, of course!
>
> That would at least have a better chance of being appropriate for
> architectures with 8k and 16k pages (Itanium springs to mind).

Well, from our two data points it seems that we want to fault in
64k at a time whatever our page size.  Perhaps it's clearer if the
code expresses itself that way.

> Wasn't FAULT_AROUND_ORDER 4 chosen solely on the basis of x86 4k pages?
> Did other architectures, with other page sizes, back that default?
> Clearly not powerpc.

Yeah, BenH flagged it as "we should test this" for powerpc, which is
what Maddy then did.

>> and also this will remove the
>> compile time option to disable the feature?
>
> Compile time option meaning your FAULT_AROUND_ORDER in mm/Kconfig
> for v3.16?
>
> I'm not sure whether Rusty was arguing against that or not.  I think
> we are all three concerned to have a more sensible default than what's
> there at present.  I don't feel very strongly about your Kconfig
> option: I've no objection, if it were to default to byte order 16.

I don't mind either.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-19 23:23         ` Hugh Dickins
  2014-05-19 23:43           ` Andrew Morton
  2014-05-20  1:14           ` Rusty Russell
@ 2014-05-20  2:06           ` Madhavan Srinivasan
  2 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-20  2:06 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rusty Russell, Kirill A. Shutemov, linux-kernel, linuxppc-dev,
	linux-mm, linux-arch, x86, benh, paulus, akpm, riel, mgorman, ak,
	peterz, mingo, dave.hansen

On Tuesday 20 May 2014 04:53 AM, Hugh Dickins wrote:
> On Mon, 19 May 2014, Madhavan Srinivasan wrote:
>> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote:
>>> Hugh Dickins <hughd@google.com> writes:
>>>> On Thu, 15 May 2014, Madhavan Srinivasan wrote:
>>>>>
>>>>> Hi Ingo,
>>>>>
>>>>> 	Do you have any comments for the latest version of the patchset. If
>>>>> not, kindly can you pick it up as is.
>>>>>
>>>>>
>>>>> With regards
>>>>> Maddy
>>>>>
>>>>>> Kirill A. Shutemov with 8c6e50b029 commit introduced
>>>>>> vm_ops->map_pages() for mapping easy accessible pages around
>>>>>> fault address in hope to reduce number of minor page faults.
>>>>>>
>>>>>> This patch creates infrastructure to modify the FAULT_AROUND_ORDER
>>>>>> value using mm/Kconfig. This will enable architecture maintainers
>>>>>> to decide on suitable FAULT_AROUND_ORDER value based on
>>>>>> performance data for that architecture. First patch also defaults
>>>>>> FAULT_AROUND_ORDER Kconfig element to 4. Second patch list
>>>>>> out the performance numbers for powerpc (platform pseries) and
>>>>>> initialize the fault around order variable for pseries platform of
>>>>>> powerpc.
>>>>
>>>> Sorry for not commenting earlier - just reminded by this ping to Ingo.
>>>>
>>>> I didn't study your numbers, but nowhere did I see what PAGE_SIZE you use.
>>>>
>>>> arch/powerpc/Kconfig suggests that Power supports base page size of
>>>> 4k, 16k, 64k or 256k.
>>>>
>>>> I would expect your optimal fault_around_order to depend very much on
>>>> the base page size.
>>>
>>> It was 64k, which is what PPC64 uses on all the major distributions.
>>> You really only get a choice of 4k and 64k with 64 bit power.
>>>
>> This is true. PPC64 support multiple pagesize and yes the default page
>> size of 64k, is taken as base pagesize for the tests.
>>
>>>> Perhaps fault_around_size would provide a more useful default?
>>>
>>> That seems to fit.  With 4k pages and order 4, you're asking for 64k.
>>> Maddy's result shows 64k is also reasonable for 64k pages.
>>>
>>> Perhaps we try to generalize from two data points (a slight improvement
>>> over doing it from 1!), eg:
>>>
>>> /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */
>>> unsigned int fault_around_order __read_mostly =
>>>         (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT);
> 
> Rusty's bimodal answer doesn't seem the right starting point to me.
> 
> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> the order of the fault-around size in bytes, and fault_around_pages()
> use 1UL << (fault_around_order - PAGE_SHIFT)
> - when that doesn't wrap, of course!
> 
> That would at least have a better chance of being appropriate for
> architectures with 8k and 16k pages (Itanium springs to mind).
> 
> Not necessarily right for them, since each architecture may have
> different faulting overheads; but a better chance of being right
> than blindly assuming 4k or 64k pages for everyone.
> 
> I'd be glad to see that change go into v3.15: what do you think,
> Kirill, are we too late to make such a change now?
> Or do you see some objection to it?
> 
>> This may be right. But these are the concerns, will not this make other
>> arch to pick default without any tuning
> 
> Wasn't FAULT_AROUND_ORDER 4 chosen solely on the basis of x86 4k pages?
> Did other architectures, with other page sizes, back that default?
> Clearly not powerpc.

Ok.

> 
>> and also this will remove the
>> compile time option to disable the feature?
> 
> Compile time option meaning your FAULT_AROUND_ORDER in mm/Kconfig
> for v3.16?
> 
> I'm not sure whether Rusty was arguing against that or not I think

> we are all three concerned to have a more sensible default than what's
> there at present.  I don't feel very strongly about your Kconfig

Added it as one way to reset or disable the default value. But then I
guess we decided on having FAULT_AROUND_ORDER as a variable which is
more important than Kconfig option.

> option: I've no objection, if it were to default to byte order 16.
> 

Thanks for review
With regards
Maddy

> Hugh
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20  1:14           ` Rusty Russell
@ 2014-05-20  2:34             ` Hugh Dickins
  0 siblings, 0 replies; 26+ messages in thread
From: Hugh Dickins @ 2014-05-20  2:34 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Madhavan Srinivasan, Kirill A. Shutemov, linux-kernel,
	linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, akpm,
	riel, mgorman, ak, peterz, mingo, dave.hansen

On Tue, 20 May 2014, Rusty Russell wrote:
> Hugh Dickins <hughd@google.com> writes:
> >> On Monday 19 May 2014 05:42 AM, Rusty Russell wrote:
> >> > 
> >> > Perhaps we try to generalize from two data points (a slight improvement
> >> > over doing it from 1!), eg:
> >> > 
> >> > /* 4 seems good for 4k-page x86, 0 seems good for 64k page ppc64, so: */
> >> > unsigned int fault_around_order __read_mostly =
> >> >         (16 - PAGE_SHIFT < 0 ? 0 : 16 - PAGE_SHIFT);
> >
> > Rusty's bimodal answer doesn't seem the right starting point to me.
> 
> ?  It's not bimodal, it's graded.  I think you misread?

Yikes, worse than misread, more like I was too rude even to read: sorry!

Hugh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20  0:44             ` Kirill A. Shutemov
@ 2014-05-20  6:22               ` Rusty Russell
  2014-05-20  7:32                 ` Andrew Morton
  2014-05-20 10:27                 ` Kirill A. Shutemov
  0 siblings, 2 replies; 26+ messages in thread
From: Rusty Russell @ 2014-05-20  6:22 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton
  Cc: Hugh Dickins, Madhavan Srinivasan, Kirill A. Shutemov,
	linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, riel, mgorman, ak, peterz, mingo, dave.hansen

"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> Andrew Morton wrote:
>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
>> 
>> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
>> > the order of the fault-around size in bytes, and fault_around_pages()
>> > use 1UL << (fault_around_order - PAGE_SHIFT)
>> 
>> Yes.  And shame on me for missing it (this time!) at review.
>> 
>> There's still time to fix this.  Patches, please.
>
> Here it is. Made at 3.30 AM, build tested only.

Prefer on top of Maddy's patch which makes it always a variable, rather
than CONFIG_DEBUG_FS.  It's got enough hair as it is.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-05-08  9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
@ 2014-05-20  7:28   ` Andrew Morton
  2014-05-20  8:03     ` Madhavan Srinivasan
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2014-05-20  7:28 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, riel, mgorman, ak, peterz, mingo,
	dave.hansen

On Thu,  8 May 2014 14:58:16 +0530 Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:

> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -17,6 +17,8 @@ struct device_node;
>  extern void request_event_sources_irqs(struct device_node *np,
>  				       irq_handler_t handler, const char *name);
>  
> +extern unsigned int fault_around_order;

This isn't an appropriate header file for exporting something from core
mm - what happens if arch/mn10300 wants it?.

I guess include/linux/mm.h is the place.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20  6:22               ` Rusty Russell
@ 2014-05-20  7:32                 ` Andrew Morton
  2014-05-20  7:53                   ` Madhavan Srinivasan
  2014-05-20 10:27                 ` Kirill A. Shutemov
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2014-05-20  7:32 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Kirill A. Shutemov, Hugh Dickins, Madhavan Srinivasan,
	linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, riel, mgorman, ak, peterz, mingo, dave.hansen

On Tue, 20 May 2014 15:52:07 +0930 Rusty Russell <rusty@rustcorp.com.au> wrote:

> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> > Andrew Morton wrote:
> >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> >> 
> >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> >> > the order of the fault-around size in bytes, and fault_around_pages()
> >> > use 1UL << (fault_around_order - PAGE_SHIFT)
> >> 
> >> Yes.  And shame on me for missing it (this time!) at review.
> >> 
> >> There's still time to fix this.  Patches, please.
> >
> > Here it is. Made at 3.30 AM, build tested only.
> 
> Prefer on top of Maddy's patch which makes it always a variable, rather
> than CONFIG_DEBUG_FS.  It's got enough hair as it is.
> 

We're at 3.15-rc5 and this interface should be finalised for 3.16.  So
Kirrill's patch is pretty urgent and should come first.

Well.  It's only a debugfs interface at this stage so we are allowed to
change it later, but it's better not to.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20  7:32                 ` Andrew Morton
@ 2014-05-20  7:53                   ` Madhavan Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-20  7:53 UTC (permalink / raw)
  To: Andrew Morton, Rusty Russell
  Cc: Kirill A. Shutemov, Hugh Dickins, linux-kernel, linuxppc-dev,
	linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak,
	peterz, mingo, dave.hansen

On Tuesday 20 May 2014 01:02 PM, Andrew Morton wrote:
> On Tue, 20 May 2014 15:52:07 +0930 Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>>> Andrew Morton wrote:
>>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
>>>>
>>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
>>>>> the order of the fault-around size in bytes, and fault_around_pages()
>>>>> use 1UL << (fault_around_order - PAGE_SHIFT)
>>>>
>>>> Yes.  And shame on me for missing it (this time!) at review.
>>>>
>>>> There's still time to fix this.  Patches, please.
>>>
>>> Here it is. Made at 3.30 AM, build tested only.
>>
>> Prefer on top of Maddy's patch which makes it always a variable, rather
>> than CONFIG_DEBUG_FS.  It's got enough hair as it is.
>>
> 
> We're at 3.15-rc5 and this interface should be finalised for 3.16.  So
> Kirrill's patch is pretty urgent and should come first.
> 
> Well.  It's only a debugfs interface at this stage so we are allowed to
> change it later, but it's better not to.
>
My patchset does not change the interface, but uses the current fault
around order variable from CONFIG_DEBUG_FS block to allow changes at
runtime, instead of having a constant and some cleanup.

Thanks for review
Regards
--Maddy




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries
  2014-05-20  7:28   ` Andrew Morton
@ 2014-05-20  8:03     ` Madhavan Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-20  8:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, kirill.shutemov, rusty, riel, mgorman, ak, peterz, mingo,
	dave.hansen

On Tuesday 20 May 2014 12:58 PM, Andrew Morton wrote:
> On Thu,  8 May 2014 14:58:16 +0530 Madhavan Srinivasan <maddy@linux.vnet.ibm.com> wrote:
> 
>> --- a/arch/powerpc/platforms/pseries/pseries.h
>> +++ b/arch/powerpc/platforms/pseries/pseries.h
>> @@ -17,6 +17,8 @@ struct device_node;
>>  extern void request_event_sources_irqs(struct device_node *np,
>>  				       irq_handler_t handler, const char *name);
>>  
>> +extern unsigned int fault_around_order;
> 
> This isn't an appropriate header file for exporting something from core
> mm - what happens if arch/mn10300 wants it?.
>
> I guess include/linux/mm.h is the place.
> 

Rusty already suggested this. My bad.  Reason for adding it here was
that, I did the performance test for this platform. Will change and send
it out.

Thanks for review
Regards
Maddy


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20  6:22               ` Rusty Russell
  2014-05-20  7:32                 ` Andrew Morton
@ 2014-05-20 10:27                 ` Kirill A. Shutemov
  2014-05-20 19:59                   ` Andrew Morton
  2014-05-27  6:24                   ` Madhavan Srinivasan
  1 sibling, 2 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-05-20 10:27 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Kirill A. Shutemov, Andrew Morton, Hugh Dickins,
	Madhavan Srinivasan, Kirill A. Shutemov, linux-kernel,
	linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel,
	mgorman, ak, peterz, mingo, dave.hansen

Rusty Russell wrote:
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> > Andrew Morton wrote:
> >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> >> 
> >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> >> > the order of the fault-around size in bytes, and fault_around_pages()
> >> > use 1UL << (fault_around_order - PAGE_SHIFT)
> >> 
> >> Yes.  And shame on me for missing it (this time!) at review.
> >> 
> >> There's still time to fix this.  Patches, please.
> >
> > Here it is. Made at 3.30 AM, build tested only.
> 
> Prefer on top of Maddy's patch which makes it always a variable, rather
> than CONFIG_DEBUG_FS.  It's got enough hair as it is.

Something like this?

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Tue, 20 May 2014 13:02:03 +0300
Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order

There are evidences that faultaround feature is less relevant on
architectures with page size bigger then 4k. Which makes sense since
page fault overhead per byte of mapped area should be less there.

Let's rework the feature to specify faultaround area in bytes instead of
page order. It's 64 kilobytes for now.

The patch effectively disables faultaround on architectures with
page size >= 64k (like ppc64).

It's possible that some other size of faultaround area is relevant for a
platform. We can expose `fault_around_bytes' variable to arch-specific
code once such platforms will be found.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c | 62 +++++++++++++++++++++++--------------------------------------
 1 file changed, 23 insertions(+), 39 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 037b812a9531..252b319e8cdf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 	update_mmu_cache(vma, address, pte);
 }
 
-#define FAULT_AROUND_ORDER 4
+static unsigned long fault_around_bytes = 65536;
+
+static inline unsigned long fault_around_pages(void)
+{
+	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
+}
+
+static inline unsigned long fault_around_mask(void)
+{
+	return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK;
+}
 
-#ifdef CONFIG_DEBUG_FS
-static unsigned int fault_around_order = FAULT_AROUND_ORDER;
 
-static int fault_around_order_get(void *data, u64 *val)
+#ifdef CONFIG_DEBUG_FS
+static int fault_around_bytes_get(void *data, u64 *val)
 {
-	*val = fault_around_order;
+	*val = fault_around_bytes;
 	return 0;
 }
 
-static int fault_around_order_set(void *data, u64 val)
+static int fault_around_bytes_set(void *data, u64 val)
 {
-	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
-	if (1UL << val > PTRS_PER_PTE)
+	if (val / PAGE_SIZE > PTRS_PER_PTE)
 		return -EINVAL;
-	fault_around_order = val;
+	fault_around_bytes = val;
 	return 0;
 }
-DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops,
-		fault_around_order_get, fault_around_order_set, "%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops,
+		fault_around_bytes_get, fault_around_bytes_set, "%llu\n");
 
 static int __init fault_around_debugfs(void)
 {
 	void *ret;
 
-	ret = debugfs_create_file("fault_around_order",	0644, NULL, NULL,
-			&fault_around_order_fops);
+	ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL,
+			&fault_around_bytes_fops);
 	if (!ret)
-		pr_warn("Failed to create fault_around_order in debugfs");
+		pr_warn("Failed to create fault_around_bytes in debugfs");
 	return 0;
 }
 late_initcall(fault_around_debugfs);
-
-static inline unsigned long fault_around_pages(void)
-{
-	return 1UL << fault_around_order;
-}
-
-static inline unsigned long fault_around_mask(void)
-{
-	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
-}
-#else
-static inline unsigned long fault_around_pages(void)
-{
-	unsigned long nr_pages;
-
-	nr_pages = 1UL << FAULT_AROUND_ORDER;
-	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
-	return nr_pages;
-}
-
-static inline unsigned long fault_around_mask(void)
-{
-	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
-}
 #endif
 
 static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
@@ -3515,7 +3499,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * if page by the offset is not ready to be mapped (cold cache or
 	 * something).
 	 */
-	if (vma->vm_ops->map_pages) {
+	if (vma->vm_ops->map_pages && fault_around_pages() > 1) {
 		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 		do_fault_around(vma, address, pte, pgoff, flags);
 		if (!pte_same(*pte, orig_pte))
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20 10:27                 ` Kirill A. Shutemov
@ 2014-05-20 19:59                   ` Andrew Morton
  2014-05-21 13:40                     ` Kirill A. Shutemov
  2014-05-27  6:24                   ` Madhavan Srinivasan
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2014-05-20 19:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Rusty Russell, Hugh Dickins, Madhavan Srinivasan, linux-kernel,
	linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel,
	mgorman, ak, peterz, mingo, dave.hansen

On Tue, 20 May 2014 13:27:38 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> Rusty Russell wrote:
> > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> > > Andrew Morton wrote:
> > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> > >> 
> > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> > >> > the order of the fault-around size in bytes, and fault_around_pages()
> > >> > use 1UL << (fault_around_order - PAGE_SHIFT)
> > >> 
> > >> Yes.  And shame on me for missing it (this time!) at review.
> > >> 
> > >> There's still time to fix this.  Patches, please.
> > >
> > > Here it is. Made at 3.30 AM, build tested only.
> > 
> > Prefer on top of Maddy's patch which makes it always a variable, rather
> > than CONFIG_DEBUG_FS.  It's got enough hair as it is.
> 
> Something like this?

This appears to be against mainline, not against Madhavan's patch.  As
mentioned previously, I'd prefer it that way but confused.


> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 May 2014 13:02:03 +0300
> Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order
> 
> There are evidences that faultaround feature is less relevant on
> architectures with page size bigger then 4k. Which makes sense since
> page fault overhead per byte of mapped area should be less there.
> 
> Let's rework the feature to specify faultaround area in bytes instead of
> page order. It's 64 kilobytes for now.
> 
> The patch effectively disables faultaround on architectures with
> page size >= 64k (like ppc64).
> 
> It's possible that some other size of faultaround area is relevant for a
> platform. We can expose `fault_around_bytes' variable to arch-specific
> code once such platforms will be found.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/memory.c | 62 +++++++++++++++++++++++--------------------------------------
>  1 file changed, 23 insertions(+), 39 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..252b319e8cdf 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>  	update_mmu_cache(vma, address, pte);
>  }
>  
> -#define FAULT_AROUND_ORDER 4
> +static unsigned long fault_around_bytes = 65536;
> +
> +static inline unsigned long fault_around_pages(void)
> +{
> +	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
> +}

I think we should round up, not down.  So if the user asks for 1kb,
they get one page.

So this becomes

	return PAGE_ALIGN(fault_around_bytes) / PAGE_SIZE;

> +static inline unsigned long fault_around_mask(void)
> +{
> +	return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK;
> +}

And this has me a bit stumped.  It's not helpful that do_fault_around()
is undocumented.  Does it fault in N/2 pages ahead and N/2 pages
behind?  Or does it align the address down to the highest multiple of
fault_around_bytes?  It appears to be the latter, so the location of
the faultaround window around the fault address is basically random,
depending on what address userspace happened to pick.  I don't know why
we did this :(

Or something.  Can we please get some code commentary over
do_fault_around() describing this design decision and explaining the
reasoning behind it?


Also, "neast" is not a word.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20 19:59                   ` Andrew Morton
@ 2014-05-21 13:40                     ` Kirill A. Shutemov
  2014-05-21 20:34                       ` Andrew Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-05-21 13:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rusty Russell, Hugh Dickins,
	Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm,
	linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo,
	dave.hansen

Andrew Morton wrote:
> On Tue, 20 May 2014 13:27:38 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > Rusty Russell wrote:
> > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> > > > Andrew Morton wrote:
> > > >> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> > > >> 
> > > >> > Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> > > >> > the order of the fault-around size in bytes, and fault_around_pages()
> > > >> > use 1UL << (fault_around_order - PAGE_SHIFT)
> > > >> 
> > > >> Yes.  And shame on me for missing it (this time!) at review.
> > > >> 
> > > >> There's still time to fix this.  Patches, please.
> > > >
> > > > Here it is. Made at 3.30 AM, build tested only.
> > > 
> > > Prefer on top of Maddy's patch which makes it always a variable, rather
> > > than CONFIG_DEBUG_FS.  It's got enough hair as it is.
> > 
> > Something like this?
> 
> This appears to be against mainline, not against Madhavan's patch.  As
> mentioned previously, I'd prefer it that way but confused.
> 
> 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Tue, 20 May 2014 13:02:03 +0300
> > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order
> > 
> > There are evidences that faultaround feature is less relevant on
> > architectures with page size bigger then 4k. Which makes sense since
> > page fault overhead per byte of mapped area should be less there.
> > 
> > Let's rework the feature to specify faultaround area in bytes instead of
> > page order. It's 64 kilobytes for now.
> > 
> > The patch effectively disables faultaround on architectures with
> > page size >= 64k (like ppc64).
> > 
> > It's possible that some other size of faultaround area is relevant for a
> > platform. We can expose `fault_around_bytes' variable to arch-specific
> > code once such platforms will be found.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/memory.c | 62 +++++++++++++++++++++++--------------------------------------
> >  1 file changed, 23 insertions(+), 39 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 037b812a9531..252b319e8cdf 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
> >  	update_mmu_cache(vma, address, pte);
> >  }
> >  
> > -#define FAULT_AROUND_ORDER 4
> > +static unsigned long fault_around_bytes = 65536;
> > +
> > +static inline unsigned long fault_around_pages(void)
> > +{
> > +	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
> > +}
> 
> I think we should round up, not down.  So if the user asks for 1kb,
> they get one page.
> 
> So this becomes
> 
> 	return PAGE_ALIGN(fault_around_bytes) / PAGE_SIZE;

See below.

> > +static inline unsigned long fault_around_mask(void)
> > +{
> > +	return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK;
> > +}
> 
> And this has me a bit stumped.  It's not helpful that do_fault_around()
> is undocumented.  Does it fault in N/2 pages ahead and N/2 pages
> behind?  Or does it align the address down to the highest multiple of
> fault_around_bytes?  It appears to be the latter, so the location of
> the faultaround window around the fault address is basically random,
> depending on what address userspace happened to pick.  I don't know why
> we did this :(

When we call ->map_pages() we need to make sure that we stay within VMA
and the page table. We don't want to cross page table boundary, because
page table is what ptlock covers in split ptlock case.

I've designed the feature with fault area nominated in page order in mind
and I found it's easier to make sure we don't cross boundaries, if we
would align virtual address of fault around area to PAGE_SIZE <<
FAULT_AROUND_ORDER.

And yes fault address may be anywhere within the area. You can think about
this as a virtual page with size PAGE_SIZE << FAULT_AROUND_ORDER: no matter
what is fault address, we handle area naturally aligned to page size which
fault address belong to.

I've used rounddown_pow_of_two() in the patch to align to nearest page
order, not to page size, because that's what current do_fault_around()
expect to see. And roundup is not an option: nobody expects fault around
area to be 128k if fault_around_bytes set to 64k + 1 bytes.

If you think we need this I can rework do_fault_around() to handle
non-pow-of-two fault_around_pages(), but I don't think it's good idea to
do this for v3.15. Anyway, patch I've proposed allows change
fault_around_bytes only from DEBUG_FS and roundown should be good
enough there.

> Or something.  Can we please get some code commentary over
> do_fault_around() describing this design decision and explaining the
> reasoning behind it?

I'll do this. But if do_fault_around() rework is needed, I want to do that
first.

> Also, "neast" is not a word.

:facepalm:

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Wed, 21 May 2014 16:36:42 +0300
Subject: [PATCH] mm: fix typo in comment in do_fault_around()

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 252b319e8cdf..f76663c31da6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3460,7 +3460,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 
 	/*
 	 *  max_pgoff is either end of page table or end of vma
-	 *  or fault_around_pages() from pgoff, depending what is neast.
+	 *  or fault_around_pages() from pgoff, depending what is nearest.
 	 */
 	max_pgoff = pgoff - ((start_addr >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) +
 		PTRS_PER_PTE - 1;
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-21 13:40                     ` Kirill A. Shutemov
@ 2014-05-21 20:34                       ` Andrew Morton
  2014-05-23 12:28                         ` Kirill A. Shutemov
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2014-05-21 20:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Rusty Russell, Hugh Dickins, Madhavan Srinivasan, linux-kernel,
	linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel,
	mgorman, ak, peterz, mingo, dave.hansen

On Wed, 21 May 2014 16:40:27 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> > Or something.  Can we please get some code commentary over
> > do_fault_around() describing this design decision and explaining the
> > reasoning behind it?
> 
> I'll do this. But if do_fault_around() rework is needed, I want to do that
> first.

This sort of thing should be at least partially driven by observation
and I don't have the data for that.  My seat of the pants feel is that
after the first fault, accesses at higher addresses are more
common/probable than accesses at lower addresses.  In which case we
should see improvements by centering the window at some higher address
than the fault.  Much instrumentation and downstream analysis is needed
and the returns will be pretty small!

But we don't need to do all that right now.  Let's get the current
implementation wrapped up for 3.15: get the interface finalized (bytes,
not pages!) and get the current design decisions appropriately
documented.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-21 20:34                       ` Andrew Morton
@ 2014-05-23 12:28                         ` Kirill A. Shutemov
  0 siblings, 0 replies; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-05-23 12:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rusty Russell, Hugh Dickins,
	Madhavan Srinivasan, linux-kernel, linuxppc-dev, linux-mm,
	linux-arch, x86, benh, paulus, riel, mgorman, ak, peterz, mingo,
	dave.hansen

Andrew Morton wrote:
> On Wed, 21 May 2014 16:40:27 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > > Or something.  Can we please get some code commentary over
> > > do_fault_around() describing this design decision and explaining the
> > > reasoning behind it?
> > 
> > I'll do this. But if do_fault_around() rework is needed, I want to do that
> > first.
> 
> This sort of thing should be at least partially driven by observation
> and I don't have the data for that.  My seat of the pants feel is that
> after the first fault, accesses at higher addresses are more
> common/probable than accesses at lower addresses.

It's probably true for data, but the feature is mostly targeted to code pages
and situation is not that obvious to me with all jumps.

> But we don't need to do all that right now.  Let's get the current
> implementation wrapped up for 3.15: get the interface finalized (bytes,
> not pages!)

The patch above by thread is okay for that, right?

> and get the current design decisions appropriately documented.

Here it is. Based on patch to convert order->bytes.

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Fri, 23 May 2014 15:16:47 +0300
Subject: [PATCH] mm: document do_fault_around() feature

Some clarification on how faultaround works.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/memory.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 252b319e8cdf..8d723b8d3c86 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3404,6 +3404,10 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 
 static unsigned long fault_around_bytes = 65536;
 
+/*
+ * fault_around_pages() and fault_around_mask() round down fault_around_bytes
+ * to nearest page order. It's what do_fault_around() expects to see.
+ */
 static inline unsigned long fault_around_pages(void)
 {
 	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
@@ -3445,6 +3449,29 @@ static int __init fault_around_debugfs(void)
 late_initcall(fault_around_debugfs);
 #endif
 
+/*
+ * do_fault_around() tries to map few pages around the fault address. The hope
+ * is that the pages will be needed soon and this would lower the number of
+ * faults to handle.
+ *
+ * It uses vm_ops->map_pages() to map the pages, which skips the page if it's
+ * not ready to be mapped: not up-to-date, locked, etc.
+ *
+ * This function is called with the page table lock taken. In the split ptlock
+ * case the page table lock only protects only those entries which belong to
+ * page table corresponding to the fault address.
+ *
+ * This function don't cross the VMA boundaries in order to call map_pages()
+ * only once.
+ *
+ * fault_around_pages() defines how many pages we'll try to map.
+ * do_fault_around() expects it to be power of two and less or equal to
+ * PTRS_PER_PTE.
+ *
+ * The virtual address of the area that we map is naturally aligned to the
+ * fault_around_pages() (and therefore to page order). This way it's easier to
+ * guarantee that we don't cross the page table boundaries.
+ */
 static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 		pte_t *pte, pgoff_t pgoff, unsigned int flags)
 {
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-20 10:27                 ` Kirill A. Shutemov
  2014-05-20 19:59                   ` Andrew Morton
@ 2014-05-27  6:24                   ` Madhavan Srinivasan
  2014-05-27 10:21                     ` Kirill A. Shutemov
  1 sibling, 1 reply; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-27  6:24 UTC (permalink / raw)
  To: Kirill A. Shutemov, Rusty Russell
  Cc: Andrew Morton, Hugh Dickins, linux-kernel, linuxppc-dev,
	linux-mm, linux-arch, x86, benh, paulus, riel, mgorman, ak,
	peterz, mingo, dave.hansen

On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote:
> Rusty Russell wrote:
>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>>> Andrew Morton wrote:
>>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
>>>>
>>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
>>>>> the order of the fault-around size in bytes, and fault_around_pages()
>>>>> use 1UL << (fault_around_order - PAGE_SHIFT)
>>>>
>>>> Yes.  And shame on me for missing it (this time!) at review.
>>>>
>>>> There's still time to fix this.  Patches, please.
>>>
>>> Here it is. Made at 3.30 AM, build tested only.
>>
>> Prefer on top of Maddy's patch which makes it always a variable, rather
>> than CONFIG_DEBUG_FS.  It's got enough hair as it is.
> 
> Something like this?
> 
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Tue, 20 May 2014 13:02:03 +0300
> Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order
> 
> There are evidences that faultaround feature is less relevant on
> architectures with page size bigger then 4k. Which makes sense since
> page fault overhead per byte of mapped area should be less there.
> 
> Let's rework the feature to specify faultaround area in bytes instead of
> page order. It's 64 kilobytes for now.
> 
> The patch effectively disables faultaround on architectures with
> page size >= 64k (like ppc64).
> 
> It's possible that some other size of faultaround area is relevant for a
> platform. We can expose `fault_around_bytes' variable to arch-specific
> code once such platforms will be found.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/memory.c | 62 +++++++++++++++++++++++--------------------------------------
>  1 file changed, 23 insertions(+), 39 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 037b812a9531..252b319e8cdf 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>  	update_mmu_cache(vma, address, pte);
>  }
> 
> -#define FAULT_AROUND_ORDER 4
> +static unsigned long fault_around_bytes = 65536;
> +
> +static inline unsigned long fault_around_pages(void)
> +{
> +	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
> +}
> +
> +static inline unsigned long fault_around_mask(void)
> +{
> +	return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK;
> +}
> 
> -#ifdef CONFIG_DEBUG_FS
> -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
> 
> -static int fault_around_order_get(void *data, u64 *val)
> +#ifdef CONFIG_DEBUG_FS
> +static int fault_around_bytes_get(void *data, u64 *val)
>  {
> -	*val = fault_around_order;
> +	*val = fault_around_bytes;
>  	return 0;
>  }
> 
> -static int fault_around_order_set(void *data, u64 val)
> +static int fault_around_bytes_set(void *data, u64 val)
>  {

Kindly ignore the question if not relevant. Even though we need root
access to alter the value, will we be fine with
negative value?.

Regards
Maddy

> -	BUILD_BUG_ON((1UL << FAULT_AROUND_ORDER) > PTRS_PER_PTE);
> -	if (1UL << val > PTRS_PER_PTE)
> +	if (val / PAGE_SIZE > PTRS_PER_PTE)
>  		return -EINVAL;
> -	fault_around_order = val;
> +	fault_around_bytes = val;
>  	return 0;
>  }
> -DEFINE_SIMPLE_ATTRIBUTE(fault_around_order_fops,
> -		fault_around_order_get, fault_around_order_set, "%llu\n");
> +DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops,
> +		fault_around_bytes_get, fault_around_bytes_set, "%llu\n");
> 
>  static int __init fault_around_debugfs(void)
>  {
>  	void *ret;
> 
> -	ret = debugfs_create_file("fault_around_order",	0644, NULL, NULL,
> -			&fault_around_order_fops);
> +	ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL,
> +			&fault_around_bytes_fops);
>  	if (!ret)
> -		pr_warn("Failed to create fault_around_order in debugfs");
> +		pr_warn("Failed to create fault_around_bytes in debugfs");
>  	return 0;
>  }
>  late_initcall(fault_around_debugfs);
> -
> -static inline unsigned long fault_around_pages(void)
> -{
> -	return 1UL << fault_around_order;
> -}
> -
> -static inline unsigned long fault_around_mask(void)
> -{
> -	return ~((1UL << (PAGE_SHIFT + fault_around_order)) - 1);
> -}
> -#else
> -static inline unsigned long fault_around_pages(void)
> -{
> -	unsigned long nr_pages;
> -
> -	nr_pages = 1UL << FAULT_AROUND_ORDER;
> -	BUILD_BUG_ON(nr_pages > PTRS_PER_PTE);
> -	return nr_pages;
> -}
> -
> -static inline unsigned long fault_around_mask(void)
> -{
> -	return ~((1UL << (PAGE_SHIFT + FAULT_AROUND_ORDER)) - 1);
> -}
>  #endif
> 
>  static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
> @@ -3515,7 +3499,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 * if page by the offset is not ready to be mapped (cold cache or
>  	 * something).
>  	 */
> -	if (vma->vm_ops->map_pages) {
> +	if (vma->vm_ops->map_pages && fault_around_pages() > 1) {
>  		pte = pte_offset_map_lock(mm, pmd, address, &ptl);
>  		do_fault_around(vma, address, pte, pgoff, flags);
>  		if (!pte_same(*pte, orig_pte))
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-27  6:24                   ` Madhavan Srinivasan
@ 2014-05-27 10:21                     ` Kirill A. Shutemov
  2014-05-27 10:44                       ` Madhavan Srinivasan
  0 siblings, 1 reply; 26+ messages in thread
From: Kirill A. Shutemov @ 2014-05-27 10:21 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: Kirill A. Shutemov, Rusty Russell, Andrew Morton, Hugh Dickins,
	linux-kernel, linuxppc-dev, linux-mm, linux-arch, x86, benh,
	paulus, riel, mgorman, ak, peterz, mingo, dave.hansen

Madhavan Srinivasan wrote:
> On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote:
> > Rusty Russell wrote:
> >> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
> >>> Andrew Morton wrote:
> >>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
> >>>>
> >>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
> >>>>> the order of the fault-around size in bytes, and fault_around_pages()
> >>>>> use 1UL << (fault_around_order - PAGE_SHIFT)
> >>>>
> >>>> Yes.  And shame on me for missing it (this time!) at review.
> >>>>
> >>>> There's still time to fix this.  Patches, please.
> >>>
> >>> Here it is. Made at 3.30 AM, build tested only.
> >>
> >> Prefer on top of Maddy's patch which makes it always a variable, rather
> >> than CONFIG_DEBUG_FS.  It's got enough hair as it is.
> > 
> > Something like this?
> > 
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Tue, 20 May 2014 13:02:03 +0300
> > Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order
> > 
> > There are evidences that faultaround feature is less relevant on
> > architectures with page size bigger then 4k. Which makes sense since
> > page fault overhead per byte of mapped area should be less there.
> > 
> > Let's rework the feature to specify faultaround area in bytes instead of
> > page order. It's 64 kilobytes for now.
> > 
> > The patch effectively disables faultaround on architectures with
> > page size >= 64k (like ppc64).
> > 
> > It's possible that some other size of faultaround area is relevant for a
> > platform. We can expose `fault_around_bytes' variable to arch-specific
> > code once such platforms will be found.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/memory.c | 62 +++++++++++++++++++++++--------------------------------------
> >  1 file changed, 23 insertions(+), 39 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 037b812a9531..252b319e8cdf 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
> >  	update_mmu_cache(vma, address, pte);
> >  }
> > 
> > -#define FAULT_AROUND_ORDER 4
> > +static unsigned long fault_around_bytes = 65536;
> > +
> > +static inline unsigned long fault_around_pages(void)
> > +{
> > +	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
> > +}
> > +
> > +static inline unsigned long fault_around_mask(void)
> > +{
> > +	return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK;
> > +}
> > 
> > -#ifdef CONFIG_DEBUG_FS
> > -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
> > 
> > -static int fault_around_order_get(void *data, u64 *val)
> > +#ifdef CONFIG_DEBUG_FS
> > +static int fault_around_bytes_get(void *data, u64 *val)
> >  {
> > -	*val = fault_around_order;
> > +	*val = fault_around_bytes;
> >  	return 0;
> >  }
> > 
> > -static int fault_around_order_set(void *data, u64 val)
> > +static int fault_around_bytes_set(void *data, u64 val)
> >  {
> 
> Kindly ignore the question if not relevant. Even though we need root
> access to alter the value, will we be fine with
> negative value?.

val is u64. or I miss something?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc
  2014-05-27 10:21                     ` Kirill A. Shutemov
@ 2014-05-27 10:44                       ` Madhavan Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2014-05-27 10:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Rusty Russell, Andrew Morton, Hugh Dickins, linux-kernel,
	linuxppc-dev, linux-mm, linux-arch, x86, benh, paulus, riel,
	mgorman, ak, peterz, mingo, dave.hansen

On Tuesday 27 May 2014 03:51 PM, Kirill A. Shutemov wrote:
> Madhavan Srinivasan wrote:
>> On Tuesday 20 May 2014 03:57 PM, Kirill A. Shutemov wrote:
>>> Rusty Russell wrote:
>>>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> writes:
>>>>> Andrew Morton wrote:
>>>>>> On Mon, 19 May 2014 16:23:07 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote:
>>>>>>
>>>>>>> Shouldn't FAULT_AROUND_ORDER and fault_around_order be changed to be
>>>>>>> the order of the fault-around size in bytes, and fault_around_pages()
>>>>>>> use 1UL << (fault_around_order - PAGE_SHIFT)
>>>>>>
>>>>>> Yes.  And shame on me for missing it (this time!) at review.
>>>>>>
>>>>>> There's still time to fix this.  Patches, please.
>>>>>
>>>>> Here it is. Made at 3.30 AM, build tested only.
>>>>
>>>> Prefer on top of Maddy's patch which makes it always a variable, rather
>>>> than CONFIG_DEBUG_FS.  It's got enough hair as it is.
>>>
>>> Something like this?
>>>
>>> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>> Date: Tue, 20 May 2014 13:02:03 +0300
>>> Subject: [PATCH] mm: nominate faultaround area in bytes rather then page order
>>>
>>> There are evidences that faultaround feature is less relevant on
>>> architectures with page size bigger then 4k. Which makes sense since
>>> page fault overhead per byte of mapped area should be less there.
>>>
>>> Let's rework the feature to specify faultaround area in bytes instead of
>>> page order. It's 64 kilobytes for now.
>>>
>>> The patch effectively disables faultaround on architectures with
>>> page size >= 64k (like ppc64).
>>>
>>> It's possible that some other size of faultaround area is relevant for a
>>> platform. We can expose `fault_around_bytes' variable to arch-specific
>>> code once such platforms will be found.
>>>
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> ---
>>>  mm/memory.c | 62 +++++++++++++++++++++++--------------------------------------
>>>  1 file changed, 23 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 037b812a9531..252b319e8cdf 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3402,63 +3402,47 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>>>  	update_mmu_cache(vma, address, pte);
>>>  }
>>>
>>> -#define FAULT_AROUND_ORDER 4
>>> +static unsigned long fault_around_bytes = 65536;
>>> +
>>> +static inline unsigned long fault_around_pages(void)
>>> +{
>>> +	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
>>> +}
>>> +
>>> +static inline unsigned long fault_around_mask(void)
>>> +{
>>> +	return ~(rounddown_pow_of_two(fault_around_bytes) - 1) & PAGE_MASK;
>>> +}
>>>
>>> -#ifdef CONFIG_DEBUG_FS
>>> -static unsigned int fault_around_order = FAULT_AROUND_ORDER;
>>>
>>> -static int fault_around_order_get(void *data, u64 *val)
>>> +#ifdef CONFIG_DEBUG_FS
>>> +static int fault_around_bytes_get(void *data, u64 *val)
>>>  {
>>> -	*val = fault_around_order;
>>> +	*val = fault_around_bytes;
>>>  	return 0;
>>>  }
>>>
>>> -static int fault_around_order_set(void *data, u64 val)
>>> +static int fault_around_bytes_set(void *data, u64 val)
>>>  {
>>
>> Kindly ignore the question if not relevant. Even though we need root
>> access to alter the value, will we be fine with
>> negative value?.
> ppc
> val is u64. or I miss something?
> 

My Bad. What I wanted to check was for all 0xf input and guess we are
fine. Sorry about that.

Regards
Maddy


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-05-27 10:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08  9:28 [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
2014-05-08  9:28 ` [PATCH V4 1/2] mm: move FAULT_AROUND_ORDER to arch/ Madhavan Srinivasan
2014-05-08  9:28 ` [PATCH V4 2/2] powerpc/pseries: init fault_around_order for pseries Madhavan Srinivasan
2014-05-20  7:28   ` Andrew Morton
2014-05-20  8:03     ` Madhavan Srinivasan
2014-05-15  8:25 ` [PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc Madhavan Srinivasan
2014-05-15 17:28   ` Hugh Dickins
2014-05-19  0:12     ` Rusty Russell
2014-05-19  3:05       ` Madhavan Srinivasan
2014-05-19 23:23         ` Hugh Dickins
2014-05-19 23:43           ` Andrew Morton
2014-05-20  0:44             ` Kirill A. Shutemov
2014-05-20  6:22               ` Rusty Russell
2014-05-20  7:32                 ` Andrew Morton
2014-05-20  7:53                   ` Madhavan Srinivasan
2014-05-20 10:27                 ` Kirill A. Shutemov
2014-05-20 19:59                   ` Andrew Morton
2014-05-21 13:40                     ` Kirill A. Shutemov
2014-05-21 20:34                       ` Andrew Morton
2014-05-23 12:28                         ` Kirill A. Shutemov
2014-05-27  6:24                   ` Madhavan Srinivasan
2014-05-27 10:21                     ` Kirill A. Shutemov
2014-05-27 10:44                       ` Madhavan Srinivasan
2014-05-20  1:14           ` Rusty Russell
2014-05-20  2:34             ` Hugh Dickins
2014-05-20  2:06           ` Madhavan Srinivasan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).