All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] mm: make faultaround produce old ptes
@ 2018-01-22  5:40 Vinayak Menon
  2018-01-23 14:55 ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Vinayak Menon @ 2018-01-22  5:40 UTC (permalink / raw)
  To: linux-mm
  Cc: kirill.shutemov, akpm, minchan, catalin.marinas, will.deacon,
	ying.huang, riel, dave.hansen, mgorman, torvalds, jack,
	Vinayak Menon

Based on Kirill's patch [1].

Currently, faultaround code produces young pte.  This can screw up
vmscan behaviour[2], as it makes vmscan think that these pages are hot
and not push them out on first round.

During sparse file access faultaround gets more pages mapped and all of
them are young. Under memory pressure, this makes vmscan swap out anon
pages instead, or to drop other page cache pages which otherwise stay
resident.

Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
is set, so they can easily be reclaimed under memory pressure.

This can to some extend defeat the purpose of faultaround on machines
without hardware accessed bit as it will not help us with reducing the
number of minor page faults.

Making the faultaround ptes old results in a unixbench regression for some
architectures [3][4]. But on some architectures like arm64 it is not found
to cause any regression.

unixbench shell8 scores on arm64 v8.2 hardware with CONFIG_ARM64_HW_AFDBM
enabled  (5 runs min, max, avg):
Base: (741,748,744)
With this patch: (739,748,743)

So by default produce young ptes and provide a sysctl option to make the
ptes old.

[1] http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
[2] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org
[3] https://marc.info/?l=linux-kernel&m=146582237922378&w=2
[4] https://marc.info/?l=linux-mm&m=146589376909424&w=2

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
---

V3:
Fix the documentation (suggested by Kirill)

V2:
1. Removed the arch hook and want_old_faultaround_pte is made a sysctl
2. Renamed FAULT_FLAG_MKOLD to FAULT_FLAG_PREFAULT_OLD (suggested by Jan Kara)
3. Removed the saved fault address from vmf (suggested by Jan Kara)

 Documentation/sysctl/vm.txt | 22 ++++++++++++++++++++++
 include/linux/mm.h          |  3 +++
 kernel/sysctl.c             |  9 +++++++++
 mm/filemap.c                | 10 ++++++++++
 mm/memory.c                 |  4 ++++
 5 files changed, 48 insertions(+)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 17256f2..c3c9c6d 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -63,6 +63,7 @@ Currently, these files are in /proc/sys/vm:
 - vfs_cache_pressure
 - watermark_scale_factor
 - zone_reclaim_mode
+- want_old_faultaround_pte
 
 ==============================================================
 
@@ -887,4 +888,25 @@ Allowing regular swap effectively restricts allocations to the local
 node unless explicitly overridden by memory policies or cpuset
 configurations.
 
+=============================================================
+
+want_old_faultaround_pte:
+
+By default faultaround code produces young pte. When want_old_faultaround_pte is
+set to 1, faultaround produces old ptes.
+
+During sparse file access faultaround gets more pages mapped and when all of
+them are young (default), under memory pressure, this makes vmscan swap out anon
+pages instead, or to drop other page cache pages which otherwise stay resident.
+Setting want_old_faultaround_pte to 1 avoids this.
+
+Making the faultaround ptes old can result in performance regression on some
+architectures. This is due to cycles spent in micro-faults which would take page
+walk to set young bit in the pte. One such known test that shows a regression on
+x86 is unixbench shell8. Set want_old_faultaround_pte to 1 on architectures
+which does not show this regression or if the workload shows overall performance
+benefit with old faultaround ptes.
+
+The default value is 0.
+
 ============ End of Document =================================
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 63f7ba1..55b5667 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -302,6 +302,7 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
 #define FAULT_FLAG_USER		0x40	/* The fault originated in userspace */
 #define FAULT_FLAG_REMOTE	0x80	/* faulting for non current tsk/mm */
 #define FAULT_FLAG_INSTRUCTION  0x100	/* The fault was during an instruction fetch */
+#define FAULT_FLAG_PREFAULT_OLD	0x200	/* Make faultaround ptes old */
 
 #define FAULT_FLAG_TRACE \
 	{ FAULT_FLAG_WRITE,		"WRITE" }, \
@@ -2676,5 +2677,7 @@ static inline bool page_is_guard(struct page *page)
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int want_old_faultaround_pte;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f98f28c..2ab3a4e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1343,6 +1343,15 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+	{
+		.procname       = "want_old_faultaround_pte",
+		.data           = &want_old_faultaround_pte,
+		.maxlen         = sizeof(want_old_faultaround_pte),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec_minmax,
+		.extra1         = &zero,
+		.extra2         = &one,
+	},
 #ifdef CONFIG_HUGETLB_PAGE
 	{
 		.procname	= "nr_hugepages",
diff --git a/mm/filemap.c b/mm/filemap.c
index 693f622..f58393d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -48,6 +48,8 @@
 
 #include <asm/mman.h>
 
+int want_old_faultaround_pte;
+
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -2677,6 +2679,14 @@ void filemap_map_pages(struct vm_fault *vmf,
 		if (vmf->pte)
 			vmf->pte += iter.index - last_pgoff;
 		last_pgoff = iter.index;
+
+		if (want_old_faultaround_pte) {
+			if (iter.index == vmf->pgoff)
+				vmf->flags &= ~FAULT_FLAG_PREFAULT_OLD;
+			else
+				vmf->flags |= FAULT_FLAG_PREFAULT_OLD;
+		}
+
 		if (alloc_set_pte(vmf, NULL, page))
 			goto unlock;
 		unlock_page(page);
diff --git a/mm/memory.c b/mm/memory.c
index c7f9a43..11412cc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3438,6 +3438,10 @@ int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
 	entry = mk_pte(page, vma->vm_page_prot);
 	if (write)
 		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+
+	if (vmf->flags & FAULT_FLAG_PREFAULT_OLD)
+		entry = pte_mkold(entry);
+
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-22  5:40 [PATCH v3] mm: make faultaround produce old ptes Vinayak Menon
@ 2018-01-23 14:55 ` Michal Hocko
  2018-01-23 14:55   ` Michal Hocko
  2018-01-23 15:38   ` Vinayak Menon
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2018-01-23 14:55 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

[Please cc linux-api when proposing user interface]

On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
> Based on Kirill's patch [1].
> 
> Currently, faultaround code produces young pte.  This can screw up
> vmscan behaviour[2], as it makes vmscan think that these pages are hot
> and not push them out on first round.
> 
> During sparse file access faultaround gets more pages mapped and all of
> them are young. Under memory pressure, this makes vmscan swap out anon
> pages instead, or to drop other page cache pages which otherwise stay
> resident.
> 
> Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
> is set, so they can easily be reclaimed under memory pressure.
> 
> This can to some extend defeat the purpose of faultaround on machines
> without hardware accessed bit as it will not help us with reducing the
> number of minor page faults.

So we just want to add a knob to cripple the feature? Isn't it better to
simply disable it than to have two distinct implementation which is
rather non-intuitive and I would bet that most users will be clueless
about how to set it or when to touch it at all. So we will end up with
random cargo cult hints all over internet giving you your performance
back...

I really dislike this new interface. If the fault around doesn't work
for you then disable it.

> Making the faultaround ptes old results in a unixbench regression for some
> architectures [3][4]. But on some architectures like arm64 it is not found
> to cause any regression.
> 
> unixbench shell8 scores on arm64 v8.2 hardware with CONFIG_ARM64_HW_AFDBM
> enabled  (5 runs min, max, avg):
> Base: (741,748,744)
> With this patch: (739,748,743)
> 
> So by default produce young ptes and provide a sysctl option to make the
> ptes old.
>
> [1] http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
> [2] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org
> [3] https://marc.info/?l=linux-kernel&m=146582237922378&w=2
> [4] https://marc.info/?l=linux-mm&m=146589376909424&w=2
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-23 14:55 ` Michal Hocko
@ 2018-01-23 14:55   ` Michal Hocko
  2018-01-23 15:38   ` Vinayak Menon
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2018-01-23 14:55 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack, linux-api

On Tue 23-01-18 15:55:06, Michal Hocko wrote:
> [Please cc linux-api when proposing user interface]

now for real...

> On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
> > Based on Kirill's patch [1].
> > 
> > Currently, faultaround code produces young pte.  This can screw up
> > vmscan behaviour[2], as it makes vmscan think that these pages are hot
> > and not push them out on first round.
> > 
> > During sparse file access faultaround gets more pages mapped and all of
> > them are young. Under memory pressure, this makes vmscan swap out anon
> > pages instead, or to drop other page cache pages which otherwise stay
> > resident.
> > 
> > Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
> > is set, so they can easily be reclaimed under memory pressure.
> > 
> > This can to some extend defeat the purpose of faultaround on machines
> > without hardware accessed bit as it will not help us with reducing the
> > number of minor page faults.
> 
> So we just want to add a knob to cripple the feature? Isn't it better to
> simply disable it than to have two distinct implementation which is
> rather non-intuitive and I would bet that most users will be clueless
> about how to set it or when to touch it at all. So we will end up with
> random cargo cult hints all over internet giving you your performance
> back...
> 
> I really dislike this new interface. If the fault around doesn't work
> for you then disable it.
> 
> > Making the faultaround ptes old results in a unixbench regression for some
> > architectures [3][4]. But on some architectures like arm64 it is not found
> > to cause any regression.
> > 
> > unixbench shell8 scores on arm64 v8.2 hardware with CONFIG_ARM64_HW_AFDBM
> > enabled  (5 runs min, max, avg):
> > Base: (741,748,744)
> > With this patch: (739,748,743)
> > 
> > So by default produce young ptes and provide a sysctl option to make the
> > ptes old.
> >
> > [1] http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
> > [2] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org
> > [3] https://marc.info/?l=linux-kernel&m=146582237922378&w=2
> > [4] https://marc.info/?l=linux-mm&m=146589376909424&w=2
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-23 14:55 ` Michal Hocko
  2018-01-23 14:55   ` Michal Hocko
@ 2018-01-23 15:38   ` Vinayak Menon
  2018-01-23 16:05     ` Michal Hocko
  1 sibling, 1 reply; 12+ messages in thread
From: Vinayak Menon @ 2018-01-23 15:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack



On 1/23/2018 8:25 PM, Michal Hocko wrote:
> [Please cc linux-api when proposing user interface]
>
> On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
>> Based on Kirill's patch [1].
>>
>> Currently, faultaround code produces young pte.  This can screw up
>> vmscan behaviour[2], as it makes vmscan think that these pages are hot
>> and not push them out on first round.
>>
>> During sparse file access faultaround gets more pages mapped and all of
>> them are young. Under memory pressure, this makes vmscan swap out anon
>> pages instead, or to drop other page cache pages which otherwise stay
>> resident.
>>
>> Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
>> is set, so they can easily be reclaimed under memory pressure.
>>
>> This can to some extend defeat the purpose of faultaround on machines
>> without hardware accessed bit as it will not help us with reducing the
>> number of minor page faults.
> So we just want to add a knob to cripple the feature? Isn't it better to
> simply disable it than to have two distinct implementation which is
> rather non-intuitive and I would bet that most users will be clueless
> about how to set it or when to touch it at all. So we will end up with
> random cargo cult hints all over internet giving you your performance
> back...


If you are talking about non-HW access bit systems, then yes it would be better to disable faultaround
when want_old_faultaround_pte is set to 1, like MInchan did here https://patchwork.kernel.org/patch/9115901/
I can submit a patch for that.

> I really dislike this new interface. If the fault around doesn't work
> for you then disable it.


Faultaround works well for me on systems with HW access bit. But the benefit is reduced because of making the
faultaround ptes young [2]. Ideally they should be old as they are speculatively mapped and not really
accessed. But because of issues on certain architectures they need to be made young[3][4]. This patch is trying to
help the other architectures which can tolerate old ptes, by fixing the vmscan behaviour. And this is not a
theoretical problem that I am trying to fix. We have really seen the benefit of faultaround on arm mobile targets,
but the problem is the vmscan behaviour due to the young pte workaround. And this patch helps in fixing that.
Do you think something more needs to be added in the documentation to make things more clear on the flag usage ?

>
>> Making the faultaround ptes old results in a unixbench regression for some
>> architectures [3][4]. But on some architectures like arm64 it is not found
>> to cause any regression.
>>
>> unixbench shell8 scores on arm64 v8.2 hardware with CONFIG_ARM64_HW_AFDBM
>> enabled  (5 runs min, max, avg):
>> Base: (741,748,744)
>> With this patch: (739,748,743)
>>
>> So by default produce young ptes and provide a sysctl option to make the
>> ptes old.
>>
>> [1] http://lkml.kernel.org/r/1463488366-47723-1-git-send-email-kirill.shutemov@linux.intel.com
>> [2] https://lkml.kernel.org/r/1460992636-711-1-git-send-email-vinmenon@codeaurora.org
>> [3] https://marc.info/?l=linux-kernel&m=146582237922378&w=2
>> [4] https://marc.info/?l=linux-mm&m=146589376909424&w=2
>>
>> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-23 15:38   ` Vinayak Menon
@ 2018-01-23 16:05     ` Michal Hocko
  2018-01-24  9:05       ` Vinayak Menon
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-01-23 16:05 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On Tue 23-01-18 21:08:36, Vinayak Menon wrote:
> 
> 
> On 1/23/2018 8:25 PM, Michal Hocko wrote:
> > [Please cc linux-api when proposing user interface]
> >
> > On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
> >> Based on Kirill's patch [1].
> >>
> >> Currently, faultaround code produces young pte.  This can screw up
> >> vmscan behaviour[2], as it makes vmscan think that these pages are hot
> >> and not push them out on first round.
> >>
> >> During sparse file access faultaround gets more pages mapped and all of
> >> them are young. Under memory pressure, this makes vmscan swap out anon
> >> pages instead, or to drop other page cache pages which otherwise stay
> >> resident.
> >>
> >> Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
> >> is set, so they can easily be reclaimed under memory pressure.
> >>
> >> This can to some extend defeat the purpose of faultaround on machines
> >> without hardware accessed bit as it will not help us with reducing the
> >> number of minor page faults.
> > So we just want to add a knob to cripple the feature? Isn't it better to
> > simply disable it than to have two distinct implementation which is
> > rather non-intuitive and I would bet that most users will be clueless
> > about how to set it or when to touch it at all. So we will end up with
> > random cargo cult hints all over internet giving you your performance
> > back...
> 
> 
> If you are talking about non-HW access bit systems, then yes it would be better to disable faultaround
> when want_old_faultaround_pte is set to 1, like MInchan did here https://patchwork.kernel.org/patch/9115901/
> I can submit a patch for that.
> 
> > I really dislike this new interface. If the fault around doesn't work
> > for you then disable it.
> 
> 
> Faultaround works well for me on systems with HW access bit. But
> the benefit is reduced because of making the faultaround ptes young
> [2]. Ideally they should be old as they are speculatively mapped and
> not really accessed. But because of issues on certain architectures
> they need to be made young[3][4]. This patch is trying to help the
> other architectures which can tolerate old ptes, by fixing the vmscan
> behaviour. And this is not a theoretical problem that I am trying to
> fix. We have really seen the benefit of faultaround on arm mobile
> targets, but the problem is the vmscan behaviour due to the young
> pte workaround. And this patch helps in fixing that.  Do you think
> something more needs to be added in the documentation to make things
> more clear on the flag usage ?

No, I would either prefer auto-tuning or document that fault around
can lead to this behavior and recommend to disable it rather than add a
new knob.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-23 16:05     ` Michal Hocko
@ 2018-01-24  9:05       ` Vinayak Menon
  2018-01-24  9:38         ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Vinayak Menon @ 2018-01-24  9:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack


On 1/23/2018 9:35 PM, Michal Hocko wrote:
> On Tue 23-01-18 21:08:36, Vinayak Menon wrote:
>>
>> On 1/23/2018 8:25 PM, Michal Hocko wrote:
>>> [Please cc linux-api when proposing user interface]
>>>
>>> On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
>>>> Based on Kirill's patch [1].
>>>>
>>>> Currently, faultaround code produces young pte.  This can screw up
>>>> vmscan behaviour[2], as it makes vmscan think that these pages are hot
>>>> and not push them out on first round.
>>>>
>>>> During sparse file access faultaround gets more pages mapped and all of
>>>> them are young. Under memory pressure, this makes vmscan swap out anon
>>>> pages instead, or to drop other page cache pages which otherwise stay
>>>> resident.
>>>>
>>>> Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
>>>> is set, so they can easily be reclaimed under memory pressure.
>>>>
>>>> This can to some extend defeat the purpose of faultaround on machines
>>>> without hardware accessed bit as it will not help us with reducing the
>>>> number of minor page faults.
>>> So we just want to add a knob to cripple the feature? Isn't it better to
>>> simply disable it than to have two distinct implementation which is
>>> rather non-intuitive and I would bet that most users will be clueless
>>> about how to set it or when to touch it at all. So we will end up with
>>> random cargo cult hints all over internet giving you your performance
>>> back...
>>
>> If you are talking about non-HW access bit systems, then yes it would be better to disable faultaround
>> when want_old_faultaround_pte is set to 1, like MInchan did here https://patchwork.kernel.org/patch/9115901/
>> I can submit a patch for that.
>>
>>> I really dislike this new interface. If the fault around doesn't work
>>> for you then disable it.
>>
>> Faultaround works well for me on systems with HW access bit. But
>> the benefit is reduced because of making the faultaround ptes young
>> [2]. Ideally they should be old as they are speculatively mapped and
>> not really accessed. But because of issues on certain architectures
>> they need to be made young[3][4]. This patch is trying to help the
>> other architectures which can tolerate old ptes, by fixing the vmscan
>> behaviour. And this is not a theoretical problem that I am trying to
>> fix. We have really seen the benefit of faultaround on arm mobile
>> targets, but the problem is the vmscan behaviour due to the young
>> pte workaround. And this patch helps in fixing that.  Do you think
>> something more needs to be added in the documentation to make things
>> more clear on the flag usage ?
> No, I would either prefer auto-tuning or document that fault around
> can lead to this behavior and recommend to disable it rather than add a
> new knob.


One of the objectives of making it a sysctl was to let user space tune it based on vmpressure [5]. But I am not
sure how effective it would be. The vmpressure increase itself can be because of making faultaround ptes
young [6] and it could be difficult to find a heuristic to enable/disable faultaround. And with the way vmpressure
works, it can happen that vmpressure values don't indicate exact vmscan behavior always. Same is the case with
auto-tuning based on vmpressure. Any other suggestions on how auto tuning can be implemented ?

Could you elaborate a bit on why you think sysctl is not a good option ? Is it because of the difficulty for the
user to figure out how and when to use the interface ? If the document clearly explains what the knob is, wouldn't
be easy for the user to just try the knob and see if his workload benefits or not. It's not just non-x86 devices that can
benefit. There may be x86 workloads where the vmscan behavior masks the benefit of avoiding micro faults.

Or if you think sysctl is not the right place for such knobs, do you think it should be an expert level config option or a
kernel command line param ?
Since there are lots of mobile and embedded devices that can get the full benefits of faultaround with such an option,
I really don't think it is a good option to just document the problem and disable faultaround on those devices.

[5] https://www.spinics.net/lists/arm-kernel/msg622070.html
[6] https://lkml.org/lkml/2016/5/9/134

Thanks,
Vinayak

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-24  9:05       ` Vinayak Menon
@ 2018-01-24  9:38         ` Michal Hocko
  2018-01-24 10:43           ` Vinayak Menon
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-01-24  9:38 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On Wed 24-01-18 14:35:54, Vinayak Menon wrote:
> 
> On 1/23/2018 9:35 PM, Michal Hocko wrote:
> > On Tue 23-01-18 21:08:36, Vinayak Menon wrote:
> >>
> >> On 1/23/2018 8:25 PM, Michal Hocko wrote:
> >>> [Please cc linux-api when proposing user interface]
> >>>
> >>> On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
> >>>> Based on Kirill's patch [1].
> >>>>
> >>>> Currently, faultaround code produces young pte.  This can screw up
> >>>> vmscan behaviour[2], as it makes vmscan think that these pages are hot
> >>>> and not push them out on first round.
> >>>>
> >>>> During sparse file access faultaround gets more pages mapped and all of
> >>>> them are young. Under memory pressure, this makes vmscan swap out anon
> >>>> pages instead, or to drop other page cache pages which otherwise stay
> >>>> resident.
> >>>>
> >>>> Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
> >>>> is set, so they can easily be reclaimed under memory pressure.
> >>>>
> >>>> This can to some extend defeat the purpose of faultaround on machines
> >>>> without hardware accessed bit as it will not help us with reducing the
> >>>> number of minor page faults.
> >>> So we just want to add a knob to cripple the feature? Isn't it better to
> >>> simply disable it than to have two distinct implementation which is
> >>> rather non-intuitive and I would bet that most users will be clueless
> >>> about how to set it or when to touch it at all. So we will end up with
> >>> random cargo cult hints all over internet giving you your performance
> >>> back...
> >>
> >> If you are talking about non-HW access bit systems, then yes it would be better to disable faultaround
> >> when want_old_faultaround_pte is set to 1, like MInchan did here https://patchwork.kernel.org/patch/9115901/
> >> I can submit a patch for that.
> >>
> >>> I really dislike this new interface. If the fault around doesn't work
> >>> for you then disable it.
> >>
> >> Faultaround works well for me on systems with HW access bit. But
> >> the benefit is reduced because of making the faultaround ptes young
> >> [2]. Ideally they should be old as they are speculatively mapped and
> >> not really accessed. But because of issues on certain architectures
> >> they need to be made young[3][4]. This patch is trying to help the
> >> other architectures which can tolerate old ptes, by fixing the vmscan
> >> behaviour. And this is not a theoretical problem that I am trying to
> >> fix. We have really seen the benefit of faultaround on arm mobile
> >> targets, but the problem is the vmscan behaviour due to the young
> >> pte workaround. And this patch helps in fixing that.  Do you think
> >> something more needs to be added in the documentation to make things
> >> more clear on the flag usage ?
> > No, I would either prefer auto-tuning or document that fault around
> > can lead to this behavior and recommend to disable it rather than add a
> > new knob.
> 
> 
> One of the objectives of making it a sysctl was to let user space
> tune it based on vmpressure [5]. But I am not sure how effective it
> would be. The vmpressure increase itself can be because of making
> faultaround ptes young [6] and it could be difficult to find a
> heuristic to enable/disable faultaround. And with the way vmpressure
> works, it can happen that vmpressure values don't indicate exact
> vmscan behavior always. Same is the case with auto-tuning based
> on vmpressure. Any other suggestions on how auto tuning can be
> implemented ?

I would start simple. Just disable it on platforms which are known to
suffer from this heuristic. Do not try to invent a sysctl nobody will
know hot to setup.

> Could you elaborate a bit on why you think sysctl is not a good option
> ? Is it because of the difficulty for the user to figure out how and
> when to use the interface ?

Absolutely. Not only that but the mere fact to realize that the fault
around is the culprit is not something most users will be able/willing
to do. So instead we will end up in yet another "disable THP because
that solves all the problems in universe" cargo cult.

> If the document clearly explains what the knob is, wouldn't be easy
> for the user to just try the knob and see if his workload benefits
> or not. It's not just non-x86 devices that can benefit. There may be
> x86 workloads where the vmscan behavior masks the benefit of avoiding
> micro faults.

Try to be more realistic. We have way too many sysctls. Some of them are
really implementation specific and then it is not really trivial to get
rid of them because people tend to (think they) depend on them. This is
a user interface like any others and we do not add them without a due
scrutiny. Moreover we do have an interface to suppress the effect of the
faultaround. Instead you are trying to add another tunable for something
that we can live without altogether. See my point?

> Or if you think sysctl is not the right place for such knobs, do you
> think it should be an expert level config option or a kernel command
> line param ?

No it doesn't make much difference. It is still a user interface. Maybe
one that is slightly easier to deprecate but just think about it. Do you
really need a faultaround so much you really want to fiddle with such
lowlevel stuff like old-vs-you pte bits?

> Since there are lots of mobile and embedded devices that can get the
> full benefits of faultaround with such an option, I really don't
> think it is a good option to just document the problem and disable
> faultaround on those devices.

Could you point me to some numbers that prove that? Your unixbench
doesn't sound overly convincing. And if this is really about some arches
then change them to use old ptes in an arch specific code. Do not make
it tunable.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-24  9:38         ` Michal Hocko
@ 2018-01-24 10:43           ` Vinayak Menon
  2018-01-24 11:11             ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Vinayak Menon @ 2018-01-24 10:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On 1/24/2018 3:08 PM, Michal Hocko wrote:
> On Wed 24-01-18 14:35:54, Vinayak Menon wrote:
>> On 1/23/2018 9:35 PM, Michal Hocko wrote:
>>> On Tue 23-01-18 21:08:36, Vinayak Menon wrote:
>>>> On 1/23/2018 8:25 PM, Michal Hocko wrote:
>>>>> [Please cc linux-api when proposing user interface]
>>>>>
>>>>> On Mon 22-01-18 11:10:14, Vinayak Menon wrote:
>>>>>> Based on Kirill's patch [1].
>>>>>>
>>>>>> Currently, faultaround code produces young pte.  This can screw up
>>>>>> vmscan behaviour[2], as it makes vmscan think that these pages are hot
>>>>>> and not push them out on first round.
>>>>>>
>>>>>> During sparse file access faultaround gets more pages mapped and all of
>>>>>> them are young. Under memory pressure, this makes vmscan swap out anon
>>>>>> pages instead, or to drop other page cache pages which otherwise stay
>>>>>> resident.
>>>>>>
>>>>>> Modify faultaround to produce old ptes if sysctl 'want_old_faultaround_pte'
>>>>>> is set, so they can easily be reclaimed under memory pressure.
>>>>>>
>>>>>> This can to some extend defeat the purpose of faultaround on machines
>>>>>> without hardware accessed bit as it will not help us with reducing the
>>>>>> number of minor page faults.
>>>>> So we just want to add a knob to cripple the feature? Isn't it better to
>>>>> simply disable it than to have two distinct implementation which is
>>>>> rather non-intuitive and I would bet that most users will be clueless
>>>>> about how to set it or when to touch it at all. So we will end up with
>>>>> random cargo cult hints all over internet giving you your performance
>>>>> back...
>>>> If you are talking about non-HW access bit systems, then yes it would be better to disable faultaround
>>>> when want_old_faultaround_pte is set to 1, like MInchan did here https://patchwork.kernel.org/patch/9115901/
>>>> I can submit a patch for that.
>>>>
>>>>> I really dislike this new interface. If the fault around doesn't work
>>>>> for you then disable it.
>>>> Faultaround works well for me on systems with HW access bit. But
>>>> the benefit is reduced because of making the faultaround ptes young
>>>> [2]. Ideally they should be old as they are speculatively mapped and
>>>> not really accessed. But because of issues on certain architectures
>>>> they need to be made young[3][4]. This patch is trying to help the
>>>> other architectures which can tolerate old ptes, by fixing the vmscan
>>>> behaviour. And this is not a theoretical problem that I am trying to
>>>> fix. We have really seen the benefit of faultaround on arm mobile
>>>> targets, but the problem is the vmscan behaviour due to the young
>>>> pte workaround. And this patch helps in fixing that.  Do you think
>>>> something more needs to be added in the documentation to make things
>>>> more clear on the flag usage ?
>>> No, I would either prefer auto-tuning or document that fault around
>>> can lead to this behavior and recommend to disable it rather than add a
>>> new knob.
>>
>> One of the objectives of making it a sysctl was to let user space
>> tune it based on vmpressure [5]. But I am not sure how effective it
>> would be. The vmpressure increase itself can be because of making
>> faultaround ptes young [6] and it could be difficult to find a
>> heuristic to enable/disable faultaround. And with the way vmpressure
>> works, it can happen that vmpressure values don't indicate exact
>> vmscan behavior always. Same is the case with auto-tuning based
>> on vmpressure. Any other suggestions on how auto tuning can be
>> implemented ?
> I would start simple. Just disable it on platforms which are known to
> suffer from this heuristic. Do not try to invent a sysctl nobody will
> know hot to setup.
>
>> Could you elaborate a bit on why you think sysctl is not a good option
>> ? Is it because of the difficulty for the user to figure out how and
>> when to use the interface ?
> Absolutely. Not only that but the mere fact to realize that the fault
> around is the culprit is not something most users will be able/willing
> to do. So instead we will end up in yet another "disable THP because
> that solves all the problems in universe" cargo cult.
>
>> If the document clearly explains what the knob is, wouldn't be easy
>> for the user to just try the knob and see if his workload benefits
>> or not. It's not just non-x86 devices that can benefit. There may be
>> x86 workloads where the vmscan behavior masks the benefit of avoiding
>> micro faults.
> Try to be more realistic. We have way too many sysctls. Some of them are
> really implementation specific and then it is not really trivial to get
> rid of them because people tend to (think they) depend on them. This is
> a user interface like any others and we do not add them without a due
> scrutiny. Moreover we do have an interface to suppress the effect of the
> faultaround. Instead you are trying to add another tunable for something
> that we can live without altogether. See my point?

I agree on the sysctl part. But why should we disable faultaround and not find a way to make it
useful ? I will try to put it this way. The original ideal behavior of ptes being old was reverted because of the
unixbench regression. Isn't the current scenario similar to moving back to old ptes and document that architectures
which observes a unixbench regression can disable faultaround ? Is it not better to find a way to fix the problem ?
If not sysctl, why not a config or something like that ?

>> Or if you think sysctl is not the right place for such knobs, do you
>> think it should be an expert level config option or a kernel command
>> line param ?
> No it doesn't make much difference. It is still a user interface. Maybe
> one that is slightly easier to deprecate but just think about it. Do you
> really need a faultaround so much you really want to fiddle with such
> lowlevel stuff like old-vs-you pte bits?

Yes. See below.

>> Since there are lots of mobile and embedded devices that can get the
>> full benefits of faultaround with such an option, I really don't
>> think it is a good option to just document the problem and disable
>> faultaround on those devices.
> Could you point me to some numbers that prove that? Your unixbench
> doesn't sound overly convincing. 

The unixbench results were to show that there are devices that does not show a unixbench regression and can
work well with old ptes.

Due to the vmscan behavior with faultaround, we were actually disabling the feature in our builds (Android).
Recently it was noticed that faultaround can actually help in reducing the app launch times for apps which
does a lot of library loading. Cases which I know about, 100-250ms of benefit in app launch time is seen with faultaround.

These are numbers for a tiny app like calculator.
A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A  App Launch latency(ms) A A  minor faults during the launch
No faultaround (base)A A A A A A A A A A A A A A A A A  382A A A  A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A  35282
With FaultaroundA A A A A A A A A A A A A A A A A A A A A A A A A  322A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A  20114
With Fault around and new fixA A  325A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A  20848

The problem with faultaround starts when reclaim kicks in and that is described here https://lkml.org/lkml/2016/4/18/612

> And if this is really about some arches
> then change them to use old ptes in an arch specific code. Do not make
> it tunable.

This may not be actually specific to arch. Though the arm devices which I have tested on does not show any unixbench
regression, there can be devices which may show because of micro architectural differences. And as I said earlier,A  I think
even x86 worloads can show benefits with old pte if the vmscan behaviour masks the faultaround advantage.

Thanks,
Vinayak

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-24 10:43           ` Vinayak Menon
@ 2018-01-24 11:11             ` Michal Hocko
  2018-01-24 12:09               ` Vinayak Menon
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-01-24 11:11 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On Wed 24-01-18 16:13:06, Vinayak Menon wrote:
> On 1/24/2018 3:08 PM, Michal Hocko wrote:
[...]
> > Try to be more realistic. We have way too many sysctls. Some of them are
> > really implementation specific and then it is not really trivial to get
> > rid of them because people tend to (think they) depend on them. This is
> > a user interface like any others and we do not add them without a due
> > scrutiny. Moreover we do have an interface to suppress the effect of the
> > faultaround. Instead you are trying to add another tunable for something
> > that we can live without altogether. See my point?
> 
> I agree on the sysctl part. But why should we disable faultaround and
> not find a way to make it useful ?

I didn't say that. Please read what I've written. I really hate your new
sysctl, because that is not a solution. If you can find a different one
than disabling it then go ahead. But do not try to put burden to users
because they know what to set. Because they won't.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-24 11:11             ` Michal Hocko
@ 2018-01-24 12:09               ` Vinayak Menon
  2018-01-24 12:21                 ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Vinayak Menon @ 2018-01-24 12:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On 1/24/2018 4:41 PM, Michal Hocko wrote:
> On Wed 24-01-18 16:13:06, Vinayak Menon wrote:
>> On 1/24/2018 3:08 PM, Michal Hocko wrote:
> [...]
>>> Try to be more realistic. We have way too many sysctls. Some of them are
>>> really implementation specific and then it is not really trivial to get
>>> rid of them because people tend to (think they) depend on them. This is
>>> a user interface like any others and we do not add them without a due
>>> scrutiny. Moreover we do have an interface to suppress the effect of the
>>> faultaround. Instead you are trying to add another tunable for something
>>> that we can live without altogether. See my point?
>> I agree on the sysctl part. But why should we disable faultaround and
>> not find a way to make it useful ?
> I didn't say that. Please read what I've written. I really hate your new
> sysctl, because that is not a solution. If you can find a different one
> than disabling it then go ahead. But do not try to put burden to users
> because they know what to set. Because they won't.

What about an expert level config option which is by default disabled ?
Whether to consider faultaround ptes as old or young is dependent on architectural details that can't be
gathered at runtime by reading some system registers. This needs to be figured out by experiments, just like
how a value for watermark_scale_factor is arrived at. So the user, in this case an engineer expert in this area
decides whether the option can be enabled or not in the build.
I agree that it need not be a sysctl, but what is the problem that you see in making it a expert level config ?
How is it a burden to a non-expert user ?

Thanks,
Vinayak

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-24 12:09               ` Vinayak Menon
@ 2018-01-24 12:21                 ` Michal Hocko
  2018-01-30 12:01                   ` Vinayak Menon
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2018-01-24 12:21 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On Wed 24-01-18 17:39:44, Vinayak Menon wrote:
> On 1/24/2018 4:41 PM, Michal Hocko wrote:
> > On Wed 24-01-18 16:13:06, Vinayak Menon wrote:
> >> On 1/24/2018 3:08 PM, Michal Hocko wrote:
> > [...]
> >>> Try to be more realistic. We have way too many sysctls. Some of them are
> >>> really implementation specific and then it is not really trivial to get
> >>> rid of them because people tend to (think they) depend on them. This is
> >>> a user interface like any others and we do not add them without a due
> >>> scrutiny. Moreover we do have an interface to suppress the effect of the
> >>> faultaround. Instead you are trying to add another tunable for something
> >>> that we can live without altogether. See my point?
> >> I agree on the sysctl part. But why should we disable faultaround and
> >> not find a way to make it useful ?
> > I didn't say that. Please read what I've written. I really hate your new
> > sysctl, because that is not a solution. If you can find a different one
> > than disabling it then go ahead. But do not try to put burden to users
> > because they know what to set. Because they won't.
> 
> What about an expert level config option which is by default disabled ?

so we have way too many sysctls and it is hard for users to decide what
to do and now you are suggesting a config option instead? How come this
makes any sense?

> Whether to consider faultaround ptes as old or young is dependent on
> architectural details that can't be gathered at runtime by reading
> some system registers. This needs to be figured out by experiments,
> just like how a value for watermark_scale_factor is arrived at. So the
> user, in this case an engineer expert in this area decides whether the
> option can be enabled or not in the build.
> I agree that it need not be a sysctl, but what is the problem that
> you see in making it a expert level config ? How is it a burden to a
> non-expert user ?

Our config space is immense. Adding more on top will not put a relief.
Just imagine that you get a bug report about a strange reclaim behavior.
Now you have a one more aspect to consider.

Seriously, if a heuristic fails on somebody then just make it more
conservative. Maybe it is time to sit down and rethink how the fault
around should be implemented. No shortcuts and fancy tunables to paper
over those problems.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm: make faultaround produce old ptes
  2018-01-24 12:21                 ` Michal Hocko
@ 2018-01-30 12:01                   ` Vinayak Menon
  0 siblings, 0 replies; 12+ messages in thread
From: Vinayak Menon @ 2018-01-30 12:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, kirill.shutemov, akpm, minchan, catalin.marinas,
	will.deacon, ying.huang, riel, dave.hansen, mgorman, torvalds,
	jack

On 1/24/2018 5:51 PM, Michal Hocko wrote:
> On Wed 24-01-18 17:39:44, Vinayak Menon wrote:
>> On 1/24/2018 4:41 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 16:13:06, Vinayak Menon wrote:
>>>> On 1/24/2018 3:08 PM, Michal Hocko wrote:
>>> [...]
>>>>> Try to be more realistic. We have way too many sysctls. Some of them are
>>>>> really implementation specific and then it is not really trivial to get
>>>>> rid of them because people tend to (think they) depend on them. This is
>>>>> a user interface like any others and we do not add them without a due
>>>>> scrutiny. Moreover we do have an interface to suppress the effect of the
>>>>> faultaround. Instead you are trying to add another tunable for something
>>>>> that we can live without altogether. See my point?
>>>> I agree on the sysctl part. But why should we disable faultaround and
>>>> not find a way to make it useful ?
>>> I didn't say that. Please read what I've written. I really hate your new
>>> sysctl, because that is not a solution. If you can find a different one
>>> than disabling it then go ahead. But do not try to put burden to users
>>> because they know what to set. Because they won't.
>> What about an expert level config option which is by default disabled ?
> so we have way too many sysctls and it is hard for users to decide what
> to do and now you are suggesting a config option instead? How come this
> makes any sense?

Because by making it a expert level config we are reducing the users exposed to the configuration.

>> Whether to consider faultaround ptes as old or young is dependent on
>> architectural details that can't be gathered at runtime by reading
>> some system registers. This needs to be figured out by experiments,
>> just like how a value for watermark_scale_factor is arrived at. So the
>> user, in this case an engineer expert in this area decides whether the
>> option can be enabled or not in the build.
>> I agree that it need not be a sysctl, but what is the problem that
>> you see in making it a expert level config ? How is it a burden to a
>> non-expert user ?
> Our config space is immense. Adding more on top will not put a relief.
> Just imagine that you get a bug report about a strange reclaim behavior.
> Now you have a one more aspect to consider.
>
> Seriously, if a heuristic fails on somebody then just make it more
> conservative. Maybe it is time to sit down and rethink how the fault
> around should be implemented. No shortcuts and fancy tunables to paper
> over those problems.

Not sure if this is a fault around problem, because without the arch workaround to make the ptes young,
faultaround works well. But anyway let me see if I can do something to avoid tunables. Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-01-30 12:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22  5:40 [PATCH v3] mm: make faultaround produce old ptes Vinayak Menon
2018-01-23 14:55 ` Michal Hocko
2018-01-23 14:55   ` Michal Hocko
2018-01-23 15:38   ` Vinayak Menon
2018-01-23 16:05     ` Michal Hocko
2018-01-24  9:05       ` Vinayak Menon
2018-01-24  9:38         ` Michal Hocko
2018-01-24 10:43           ` Vinayak Menon
2018-01-24 11:11             ` Michal Hocko
2018-01-24 12:09               ` Vinayak Menon
2018-01-24 12:21                 ` Michal Hocko
2018-01-30 12:01                   ` Vinayak Menon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.