* [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-18 15:17 ` Vinayak Menon
0 siblings, 0 replies; 34+ messages in thread
From: Vinayak Menon @ 2016-04-18 15:17 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: akpm, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd, Vinayak Menon
Mapping pages around fault is found to cause performance degradation
in certain use cases. The test performed here is launch of 10 apps
one by one, doing something with the app each time, and then repeating
the same sequence once more, on an ARM 64-bit Android device with 2GB
of RAM. The time taken to launch the apps is found to be better when
fault around feature is disabled by setting fault_around_bytes to page
size (4096 in this case).
The tests were done on 3.18 kernel. 4 extra vmstat counters were added
for debugging. pgpgoutclean accounts the clean pages reclaimed via
__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
and pageref_keep accounts the mapped file pages activated and retained
by page_check_references.
=== Without swap ===
3.18 3.18-fault_around_bytes=4096
-----------------------------------------------------------------------
workingset_refault 691100 664339
workingset_activate 210379 179139
pgpgin 4676096 4492780
pgpgout 163967 96711
pgpgoutclean 1090664 990659
pgalloc_dma 3463111 3328299
pgfree 3502365 3363866
pgactivate 568134 238570
pgdeactivate 752260 392138
pageref_activate 315078 121705
pageref_activate_vm_exec 162940 55815
pageref_keep 141354 51011
pgmajfault 24863 23633
pgrefill_dma 1116370 544042
pgscan_kswapd_dma 1735186 1234622
pgsteal_kswapd_dma 1121769 1005725
pgscan_direct_dma 12966 1090
pgsteal_direct_dma 6209 967
slabs_scanned 1539849 977351
pageoutrun 1260 1333
allocstall 47 7
=== With swap ===
3.18 3.18-fault_around_bytes=4096
-----------------------------------------------------------------------
workingset_refault 597687 878109
workingset_activate 167169 254037
pgpgin 4035424 5157348
pgpgout 162151 85231
pgpgoutclean 928587 1225029
pswpin 46033 17100
pswpout 237952 127686
pgalloc_dma 3305034 3542614
pgfree 3354989 3592132
pgactivate 626468 355275
pgdeactivate 990205 771902
pageref_activate 294780 157106
pageref_activate_vm_exec 141722 63469
pageref_keep 121931 63028
pgmajfault 67818 45643
pgrefill_dma 1324023 977192
pgscan_kswapd_dma 1825267 1720322
pgsteal_kswapd_dma 1181882 1365500
pgscan_direct_dma 41957 9622
pgsteal_direct_dma 25136 6759
slabs_scanned 689575 542705
pageoutrun 1234 1538
allocstall 110 26
Looks like with fault_around, there is more pressure on reclaim because
of the presence of more mapped pages, resulting in more IO activity,
more faults, more swapping, and allocstalls.
Make fault_around_bytes configurable so that it can be tuned to avoid
performance degradation.
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
---
mm/Kconfig | 10 ++++++++++
mm/memory.c | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index f644106..e3476fd 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -681,6 +681,16 @@ config ZONE_DEVICE
If FS_DAX is enabled, then say Y.
+config FAULT_AROUND_BYTES
+ int
+ range 4096 65536
+ default 65536
+ help
+ The number of bytes to be mapped around the fault. The default
+ value of 64 kilobytes effectively disables faultaround on
+ architectures with page size >= 64k, considering the fact that
+ the feature is less relevant when page size is bigger than 4k.
+
config FRAME_VECTOR
bool
diff --git a/mm/memory.c b/mm/memory.c
index 758b0b4..be06714 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2939,7 +2939,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
}
static unsigned long fault_around_bytes __read_mostly =
- rounddown_pow_of_two(65536);
+ rounddown_pow_of_two(CONFIG_FAULT_AROUND_BYTES);
#ifdef CONFIG_DEBUG_FS
static int fault_around_bytes_get(void *data, u64 *val)
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-18 15:17 ` Vinayak Menon
0 siblings, 0 replies; 34+ messages in thread
From: Vinayak Menon @ 2016-04-18 15:17 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: akpm, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd, Vinayak Menon
Mapping pages around fault is found to cause performance degradation
in certain use cases. The test performed here is launch of 10 apps
one by one, doing something with the app each time, and then repeating
the same sequence once more, on an ARM 64-bit Android device with 2GB
of RAM. The time taken to launch the apps is found to be better when
fault around feature is disabled by setting fault_around_bytes to page
size (4096 in this case).
The tests were done on 3.18 kernel. 4 extra vmstat counters were added
for debugging. pgpgoutclean accounts the clean pages reclaimed via
__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
and pageref_keep accounts the mapped file pages activated and retained
by page_check_references.
=== Without swap ===
3.18 3.18-fault_around_bytes=4096
-----------------------------------------------------------------------
workingset_refault 691100 664339
workingset_activate 210379 179139
pgpgin 4676096 4492780
pgpgout 163967 96711
pgpgoutclean 1090664 990659
pgalloc_dma 3463111 3328299
pgfree 3502365 3363866
pgactivate 568134 238570
pgdeactivate 752260 392138
pageref_activate 315078 121705
pageref_activate_vm_exec 162940 55815
pageref_keep 141354 51011
pgmajfault 24863 23633
pgrefill_dma 1116370 544042
pgscan_kswapd_dma 1735186 1234622
pgsteal_kswapd_dma 1121769 1005725
pgscan_direct_dma 12966 1090
pgsteal_direct_dma 6209 967
slabs_scanned 1539849 977351
pageoutrun 1260 1333
allocstall 47 7
=== With swap ===
3.18 3.18-fault_around_bytes=4096
-----------------------------------------------------------------------
workingset_refault 597687 878109
workingset_activate 167169 254037
pgpgin 4035424 5157348
pgpgout 162151 85231
pgpgoutclean 928587 1225029
pswpin 46033 17100
pswpout 237952 127686
pgalloc_dma 3305034 3542614
pgfree 3354989 3592132
pgactivate 626468 355275
pgdeactivate 990205 771902
pageref_activate 294780 157106
pageref_activate_vm_exec 141722 63469
pageref_keep 121931 63028
pgmajfault 67818 45643
pgrefill_dma 1324023 977192
pgscan_kswapd_dma 1825267 1720322
pgsteal_kswapd_dma 1181882 1365500
pgscan_direct_dma 41957 9622
pgsteal_direct_dma 25136 6759
slabs_scanned 689575 542705
pageoutrun 1234 1538
allocstall 110 26
Looks like with fault_around, there is more pressure on reclaim because
of the presence of more mapped pages, resulting in more IO activity,
more faults, more swapping, and allocstalls.
Make fault_around_bytes configurable so that it can be tuned to avoid
performance degradation.
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
---
mm/Kconfig | 10 ++++++++++
mm/memory.c | 2 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index f644106..e3476fd 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -681,6 +681,16 @@ config ZONE_DEVICE
If FS_DAX is enabled, then say Y.
+config FAULT_AROUND_BYTES
+ int
+ range 4096 65536
+ default 65536
+ help
+ The number of bytes to be mapped around the fault. The default
+ value of 64 kilobytes effectively disables faultaround on
+ architectures with page size >= 64k, considering the fact that
+ the feature is less relevant when page size is bigger than 4k.
+
config FRAME_VECTOR
bool
diff --git a/mm/memory.c b/mm/memory.c
index 758b0b4..be06714 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2939,7 +2939,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
}
static unsigned long fault_around_bytes __read_mostly =
- rounddown_pow_of_two(65536);
+ rounddown_pow_of_two(CONFIG_FAULT_AROUND_BYTES);
#ifdef CONFIG_DEBUG_FS
static int fault_around_bytes_get(void *data, u64 *val)
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-18 15:17 ` Vinayak Menon
@ 2016-04-22 0:01 ` Andrew Morton
-1 siblings, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2016-04-22 0:01 UTC (permalink / raw)
To: Vinayak Menon
Cc: linux-mm, linux-kernel, dan.j.williams, mgorman, vbabka,
kirill.shutemov, dave.hansen, hughd
On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> Mapping pages around fault is found to cause performance degradation
> in certain use cases. The test performed here is launch of 10 apps
> one by one, doing something with the app each time, and then repeating
> the same sequence once more, on an ARM 64-bit Android device with 2GB
> of RAM. The time taken to launch the apps is found to be better when
> fault around feature is disabled by setting fault_around_bytes to page
> size (4096 in this case).
Well that's one workload, and a somewhat strange one. What is the
effect on other workloads (of which there are a lot!).
> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> and pageref_keep accounts the mapped file pages activated and retained
> by page_check_references.
>
> === Without swap ===
> 3.18 3.18-fault_around_bytes=4096
> -----------------------------------------------------------------------
> workingset_refault 691100 664339
> workingset_activate 210379 179139
> pgpgin 4676096 4492780
> pgpgout 163967 96711
> pgpgoutclean 1090664 990659
> pgalloc_dma 3463111 3328299
> pgfree 3502365 3363866
> pgactivate 568134 238570
> pgdeactivate 752260 392138
> pageref_activate 315078 121705
> pageref_activate_vm_exec 162940 55815
> pageref_keep 141354 51011
> pgmajfault 24863 23633
> pgrefill_dma 1116370 544042
> pgscan_kswapd_dma 1735186 1234622
> pgsteal_kswapd_dma 1121769 1005725
> pgscan_direct_dma 12966 1090
> pgsteal_direct_dma 6209 967
> slabs_scanned 1539849 977351
> pageoutrun 1260 1333
> allocstall 47 7
>
> === With swap ===
> 3.18 3.18-fault_around_bytes=4096
> -----------------------------------------------------------------------
> workingset_refault 597687 878109
> workingset_activate 167169 254037
> pgpgin 4035424 5157348
> pgpgout 162151 85231
> pgpgoutclean 928587 1225029
> pswpin 46033 17100
> pswpout 237952 127686
> pgalloc_dma 3305034 3542614
> pgfree 3354989 3592132
> pgactivate 626468 355275
> pgdeactivate 990205 771902
> pageref_activate 294780 157106
> pageref_activate_vm_exec 141722 63469
> pageref_keep 121931 63028
> pgmajfault 67818 45643
> pgrefill_dma 1324023 977192
> pgscan_kswapd_dma 1825267 1720322
> pgsteal_kswapd_dma 1181882 1365500
> pgscan_direct_dma 41957 9622
> pgsteal_direct_dma 25136 6759
> slabs_scanned 689575 542705
> pageoutrun 1234 1538
> allocstall 110 26
>
> Looks like with fault_around, there is more pressure on reclaim because
> of the presence of more mapped pages, resulting in more IO activity,
> more faults, more swapping, and allocstalls.
A few of those things did get a bit worse?
Do you have any data on actual wall-time changes? How much faster do
things become with the patch? If it is "0.1%" then I'd say "umm, no".
> Make fault_around_bytes configurable so that it can be tuned to avoid
> performance degradation.
It sounds like we need to be smarter about auto-tuning this thing.
Maybe the refault code could be taught to provide the feedback path but
that sounds hard.
Still. I do think it would be better to make this configurable at
runtime. Move the existing debugfs tunable into /proc/sys/vm (and
document it!). I do dislkie adding even more tunables but this one
does make sense. People will want to run their workloads with various
values until they find the peak throughput, and requiring a kernel
rebuild for that is a huge pain.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 0:01 ` Andrew Morton
0 siblings, 0 replies; 34+ messages in thread
From: Andrew Morton @ 2016-04-22 0:01 UTC (permalink / raw)
To: Vinayak Menon
Cc: linux-mm, linux-kernel, dan.j.williams, mgorman, vbabka,
kirill.shutemov, dave.hansen, hughd
On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> Mapping pages around fault is found to cause performance degradation
> in certain use cases. The test performed here is launch of 10 apps
> one by one, doing something with the app each time, and then repeating
> the same sequence once more, on an ARM 64-bit Android device with 2GB
> of RAM. The time taken to launch the apps is found to be better when
> fault around feature is disabled by setting fault_around_bytes to page
> size (4096 in this case).
Well that's one workload, and a somewhat strange one. What is the
effect on other workloads (of which there are a lot!).
> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> and pageref_keep accounts the mapped file pages activated and retained
> by page_check_references.
>
> === Without swap ===
> 3.18 3.18-fault_around_bytes=4096
> -----------------------------------------------------------------------
> workingset_refault 691100 664339
> workingset_activate 210379 179139
> pgpgin 4676096 4492780
> pgpgout 163967 96711
> pgpgoutclean 1090664 990659
> pgalloc_dma 3463111 3328299
> pgfree 3502365 3363866
> pgactivate 568134 238570
> pgdeactivate 752260 392138
> pageref_activate 315078 121705
> pageref_activate_vm_exec 162940 55815
> pageref_keep 141354 51011
> pgmajfault 24863 23633
> pgrefill_dma 1116370 544042
> pgscan_kswapd_dma 1735186 1234622
> pgsteal_kswapd_dma 1121769 1005725
> pgscan_direct_dma 12966 1090
> pgsteal_direct_dma 6209 967
> slabs_scanned 1539849 977351
> pageoutrun 1260 1333
> allocstall 47 7
>
> === With swap ===
> 3.18 3.18-fault_around_bytes=4096
> -----------------------------------------------------------------------
> workingset_refault 597687 878109
> workingset_activate 167169 254037
> pgpgin 4035424 5157348
> pgpgout 162151 85231
> pgpgoutclean 928587 1225029
> pswpin 46033 17100
> pswpout 237952 127686
> pgalloc_dma 3305034 3542614
> pgfree 3354989 3592132
> pgactivate 626468 355275
> pgdeactivate 990205 771902
> pageref_activate 294780 157106
> pageref_activate_vm_exec 141722 63469
> pageref_keep 121931 63028
> pgmajfault 67818 45643
> pgrefill_dma 1324023 977192
> pgscan_kswapd_dma 1825267 1720322
> pgsteal_kswapd_dma 1181882 1365500
> pgscan_direct_dma 41957 9622
> pgsteal_direct_dma 25136 6759
> slabs_scanned 689575 542705
> pageoutrun 1234 1538
> allocstall 110 26
>
> Looks like with fault_around, there is more pressure on reclaim because
> of the presence of more mapped pages, resulting in more IO activity,
> more faults, more swapping, and allocstalls.
A few of those things did get a bit worse?
Do you have any data on actual wall-time changes? How much faster do
things become with the patch? If it is "0.1%" then I'd say "umm, no".
> Make fault_around_bytes configurable so that it can be tuned to avoid
> performance degradation.
It sounds like we need to be smarter about auto-tuning this thing.
Maybe the refault code could be taught to provide the feedback path but
that sounds hard.
Still. I do think it would be better to make this configurable at
runtime. Move the existing debugfs tunable into /proc/sys/vm (and
document it!). I do dislkie adding even more tunables but this one
does make sense. People will want to run their workloads with various
values until they find the peak throughput, and requiring a kernel
rebuild for that is a huge pain.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 0:01 ` Andrew Morton
@ 2016-04-22 8:45 ` Vinayak Menon
-1 siblings, 0 replies; 34+ messages in thread
From: Vinayak Menon @ 2016-04-22 8:45 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, dan.j.williams, mgorman, vbabka,
kirill.shutemov, dave.hansen, hughd
On 04/22/2016 05:31 AM, Andrew Morton wrote:
> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
>
>> Mapping pages around fault is found to cause performance degradation
>> in certain use cases. The test performed here is launch of 10 apps
>> one by one, doing something with the app each time, and then repeating
>> the same sequence once more, on an ARM 64-bit Android device with 2GB
>> of RAM. The time taken to launch the apps is found to be better when
>> fault around feature is disabled by setting fault_around_bytes to page
>> size (4096 in this case).
>
> Well that's one workload, and a somewhat strange one. What is the
> effect on other workloads (of which there are a lot!).
>
This workload emulates the way a user would use his mobile device,
opening an application, using it for some time, switching to next, and
then coming back to the same application later. Another stat which shows
significant degradation on Android with fault_around is device boot up
time. I have not tried any other workload other than these.
>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
>> and pageref_keep accounts the mapped file pages activated and retained
>> by page_check_references.
>>
>> === Without swap ===
>> 3.18 3.18-fault_around_bytes=4096
>> -----------------------------------------------------------------------
>> workingset_refault 691100 664339
>> workingset_activate 210379 179139
>> pgpgin 4676096 4492780
>> pgpgout 163967 96711
>> pgpgoutclean 1090664 990659
>> pgalloc_dma 3463111 3328299
>> pgfree 3502365 3363866
>> pgactivate 568134 238570
>> pgdeactivate 752260 392138
>> pageref_activate 315078 121705
>> pageref_activate_vm_exec 162940 55815
>> pageref_keep 141354 51011
>> pgmajfault 24863 23633
>> pgrefill_dma 1116370 544042
>> pgscan_kswapd_dma 1735186 1234622
>> pgsteal_kswapd_dma 1121769 1005725
>> pgscan_direct_dma 12966 1090
>> pgsteal_direct_dma 6209 967
>> slabs_scanned 1539849 977351
>> pageoutrun 1260 1333
>> allocstall 47 7
>>
>> === With swap ===
>> 3.18 3.18-fault_around_bytes=4096
>> -----------------------------------------------------------------------
>> workingset_refault 597687 878109
>> workingset_activate 167169 254037
>> pgpgin 4035424 5157348
>> pgpgout 162151 85231
>> pgpgoutclean 928587 1225029
>> pswpin 46033 17100
>> pswpout 237952 127686
>> pgalloc_dma 3305034 3542614
>> pgfree 3354989 3592132
>> pgactivate 626468 355275
>> pgdeactivate 990205 771902
>> pageref_activate 294780 157106
>> pageref_activate_vm_exec 141722 63469
>> pageref_keep 121931 63028
>> pgmajfault 67818 45643
>> pgrefill_dma 1324023 977192
>> pgscan_kswapd_dma 1825267 1720322
>> pgsteal_kswapd_dma 1181882 1365500
>> pgscan_direct_dma 41957 9622
>> pgsteal_direct_dma 25136 6759
>> slabs_scanned 689575 542705
>> pageoutrun 1234 1538
>> allocstall 110 26
>>
>> Looks like with fault_around, there is more pressure on reclaim because
>> of the presence of more mapped pages, resulting in more IO activity,
>> more faults, more swapping, and allocstalls.
>
> A few of those things did get a bit worse?
I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
better with fault_around because, increased number of mapped pages is
resulting in less number of file pages being reclaimed
(pageref_activate, pageref_activate_vm_exec, pageref_keep above), but
increased swapping. Latency numbers are far bad with fault_around_bytes
+ swap, possibly because of increased swapping, decrease in kswapd
efficiency and increase in allocstalls.
So the problem looks to be that unwanted pages are mapped around the
fault and page_check_references is unaware of this.
>
> Do you have any data on actual wall-time changes? How much faster do
> things become with the patch? If it is "0.1%" then I'd say "umm, no".
>
=== Without swap ====
3.18 3.18-fault_around_bytes=4096
Avg launch latency 1695ms 1300ms (23.3%)
Max launch latency 5097ms 3135ms (38.49%)
>> Make fault_around_bytes configurable so that it can be tuned to avoid
>> performance degradation.
>
> It sounds like we need to be smarter about auto-tuning this thing.
> Maybe the refault code could be taught to provide the feedback path but
> that sounds hard.
>
> Still. I do think it would be better to make this configurable at
> runtime. Move the existing debugfs tunable into /proc/sys/vm (and
> document it!). I do dislkie adding even more tunables but this one
> does make sense. People will want to run their workloads with various
> values until they find the peak throughput, and requiring a kernel
> rebuild for that is a huge pain.
>
I can send a v2 to do this runtime via /proc/sys/vm.
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 8:45 ` Vinayak Menon
0 siblings, 0 replies; 34+ messages in thread
From: Vinayak Menon @ 2016-04-22 8:45 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, dan.j.williams, mgorman, vbabka,
kirill.shutemov, dave.hansen, hughd
On 04/22/2016 05:31 AM, Andrew Morton wrote:
> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
>
>> Mapping pages around fault is found to cause performance degradation
>> in certain use cases. The test performed here is launch of 10 apps
>> one by one, doing something with the app each time, and then repeating
>> the same sequence once more, on an ARM 64-bit Android device with 2GB
>> of RAM. The time taken to launch the apps is found to be better when
>> fault around feature is disabled by setting fault_around_bytes to page
>> size (4096 in this case).
>
> Well that's one workload, and a somewhat strange one. What is the
> effect on other workloads (of which there are a lot!).
>
This workload emulates the way a user would use his mobile device,
opening an application, using it for some time, switching to next, and
then coming back to the same application later. Another stat which shows
significant degradation on Android with fault_around is device boot up
time. I have not tried any other workload other than these.
>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
>> and pageref_keep accounts the mapped file pages activated and retained
>> by page_check_references.
>>
>> === Without swap ===
>> 3.18 3.18-fault_around_bytes=4096
>> -----------------------------------------------------------------------
>> workingset_refault 691100 664339
>> workingset_activate 210379 179139
>> pgpgin 4676096 4492780
>> pgpgout 163967 96711
>> pgpgoutclean 1090664 990659
>> pgalloc_dma 3463111 3328299
>> pgfree 3502365 3363866
>> pgactivate 568134 238570
>> pgdeactivate 752260 392138
>> pageref_activate 315078 121705
>> pageref_activate_vm_exec 162940 55815
>> pageref_keep 141354 51011
>> pgmajfault 24863 23633
>> pgrefill_dma 1116370 544042
>> pgscan_kswapd_dma 1735186 1234622
>> pgsteal_kswapd_dma 1121769 1005725
>> pgscan_direct_dma 12966 1090
>> pgsteal_direct_dma 6209 967
>> slabs_scanned 1539849 977351
>> pageoutrun 1260 1333
>> allocstall 47 7
>>
>> === With swap ===
>> 3.18 3.18-fault_around_bytes=4096
>> -----------------------------------------------------------------------
>> workingset_refault 597687 878109
>> workingset_activate 167169 254037
>> pgpgin 4035424 5157348
>> pgpgout 162151 85231
>> pgpgoutclean 928587 1225029
>> pswpin 46033 17100
>> pswpout 237952 127686
>> pgalloc_dma 3305034 3542614
>> pgfree 3354989 3592132
>> pgactivate 626468 355275
>> pgdeactivate 990205 771902
>> pageref_activate 294780 157106
>> pageref_activate_vm_exec 141722 63469
>> pageref_keep 121931 63028
>> pgmajfault 67818 45643
>> pgrefill_dma 1324023 977192
>> pgscan_kswapd_dma 1825267 1720322
>> pgsteal_kswapd_dma 1181882 1365500
>> pgscan_direct_dma 41957 9622
>> pgsteal_direct_dma 25136 6759
>> slabs_scanned 689575 542705
>> pageoutrun 1234 1538
>> allocstall 110 26
>>
>> Looks like with fault_around, there is more pressure on reclaim because
>> of the presence of more mapped pages, resulting in more IO activity,
>> more faults, more swapping, and allocstalls.
>
> A few of those things did get a bit worse?
I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
better with fault_around because, increased number of mapped pages is
resulting in less number of file pages being reclaimed
(pageref_activate, pageref_activate_vm_exec, pageref_keep above), but
increased swapping. Latency numbers are far bad with fault_around_bytes
+ swap, possibly because of increased swapping, decrease in kswapd
efficiency and increase in allocstalls.
So the problem looks to be that unwanted pages are mapped around the
fault and page_check_references is unaware of this.
>
> Do you have any data on actual wall-time changes? How much faster do
> things become with the patch? If it is "0.1%" then I'd say "umm, no".
>
=== Without swap ====
3.18 3.18-fault_around_bytes=4096
Avg launch latency 1695ms 1300ms (23.3%)
Max launch latency 5097ms 3135ms (38.49%)
>> Make fault_around_bytes configurable so that it can be tuned to avoid
>> performance degradation.
>
> It sounds like we need to be smarter about auto-tuning this thing.
> Maybe the refault code could be taught to provide the feedback path but
> that sounds hard.
>
> Still. I do think it would be better to make this configurable at
> runtime. Move the existing debugfs tunable into /proc/sys/vm (and
> document it!). I do dislkie adding even more tunables but this one
> does make sense. People will want to run their workloads with various
> values until they find the peak throughput, and requiring a kernel
> rebuild for that is a huge pain.
>
I can send a v2 to do this runtime via /proc/sys/vm.
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 8:45 ` Vinayak Menon
@ 2016-04-22 9:44 ` Kirill A. Shutemov
-1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 9:44 UTC (permalink / raw)
To: Vinayak Menon
Cc: Andrew Morton, linux-mm, linux-kernel, dan.j.williams, mgorman,
vbabka, kirill.shutemov, dave.hansen, hughd
On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> >
> >>Mapping pages around fault is found to cause performance degradation
> >>in certain use cases. The test performed here is launch of 10 apps
> >>one by one, doing something with the app each time, and then repeating
> >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> >>of RAM. The time taken to launch the apps is found to be better when
> >>fault around feature is disabled by setting fault_around_bytes to page
> >>size (4096 in this case).
> >
> >Well that's one workload, and a somewhat strange one. What is the
> >effect on other workloads (of which there are a lot!).
> >
> This workload emulates the way a user would use his mobile device, opening
> an application, using it for some time, switching to next, and then coming
> back to the same application later. Another stat which shows significant
> degradation on Android with fault_around is device boot up time. I have not
> tried any other workload other than these.
>
> >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> >>and pageref_keep accounts the mapped file pages activated and retained
> >>by page_check_references.
> >>
> >>=== Without swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 691100 664339
> >>workingset_activate 210379 179139
> >>pgpgin 4676096 4492780
> >>pgpgout 163967 96711
> >>pgpgoutclean 1090664 990659
> >>pgalloc_dma 3463111 3328299
> >>pgfree 3502365 3363866
> >>pgactivate 568134 238570
> >>pgdeactivate 752260 392138
> >>pageref_activate 315078 121705
> >>pageref_activate_vm_exec 162940 55815
> >>pageref_keep 141354 51011
> >>pgmajfault 24863 23633
> >>pgrefill_dma 1116370 544042
> >>pgscan_kswapd_dma 1735186 1234622
> >>pgsteal_kswapd_dma 1121769 1005725
> >>pgscan_direct_dma 12966 1090
> >>pgsteal_direct_dma 6209 967
> >>slabs_scanned 1539849 977351
> >>pageoutrun 1260 1333
> >>allocstall 47 7
> >>
> >>=== With swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 597687 878109
> >>workingset_activate 167169 254037
> >>pgpgin 4035424 5157348
> >>pgpgout 162151 85231
> >>pgpgoutclean 928587 1225029
> >>pswpin 46033 17100
> >>pswpout 237952 127686
> >>pgalloc_dma 3305034 3542614
> >>pgfree 3354989 3592132
> >>pgactivate 626468 355275
> >>pgdeactivate 990205 771902
> >>pageref_activate 294780 157106
> >>pageref_activate_vm_exec 141722 63469
> >>pageref_keep 121931 63028
> >>pgmajfault 67818 45643
> >>pgrefill_dma 1324023 977192
> >>pgscan_kswapd_dma 1825267 1720322
> >>pgsteal_kswapd_dma 1181882 1365500
> >>pgscan_direct_dma 41957 9622
> >>pgsteal_direct_dma 25136 6759
> >>slabs_scanned 689575 542705
> >>pageoutrun 1234 1538
> >>allocstall 110 26
> >>
> >>Looks like with fault_around, there is more pressure on reclaim because
> >>of the presence of more mapped pages, resulting in more IO activity,
> >>more faults, more swapping, and allocstalls.
> >
> >A few of those things did get a bit worse?
> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> better with fault_around because, increased number of mapped pages is
> resulting in less number of file pages being reclaimed (pageref_activate,
> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> of increased swapping, decrease in kswapd efficiency and increase in
> allocstalls.
> So the problem looks to be that unwanted pages are mapped around the fault
> and page_check_references is unaware of this.
Hm. It makes me think we should make ptes setup by faultaround old.
Although, it would defeat (to some extend) purpose of faultaround on
architectures without HW accessed bit :-/
Could you check if the patch below changes the situation?
It would require some more work to not mark the pte we've got fault for old.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a55e5be0894f..1066fabf17c3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -584,7 +584,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
}
void do_set_pte(struct vm_area_struct *vma, unsigned long address,
- struct page *page, pte_t *pte, bool write, bool anon);
+ struct page *page, pte_t *pte, bool write, bool anon, bool old);
#endif
/*
diff --git a/mm/filemap.c b/mm/filemap.c
index f2479af09da9..47ba88fd7192 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2189,7 +2189,7 @@ repeat:
if (file->f_ra.mmap_miss > 0)
file->f_ra.mmap_miss--;
addr = address + (page->index - vmf->pgoff) * PAGE_SIZE;
- do_set_pte(vma, addr, page, pte, false, false);
+ do_set_pte(vma, addr, page, pte, false, false, true);
unlock_page(page);
goto next;
unlock:
diff --git a/mm/memory.c b/mm/memory.c
index 93897f23cc11..fa3ac184eafd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2836,7 +2836,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
* vm_ops->map_pages.
*/
void do_set_pte(struct vm_area_struct *vma, unsigned long address,
- struct page *page, pte_t *pte, bool write, bool anon)
+ struct page *page, pte_t *pte, bool write, bool anon, bool old)
{
pte_t entry;
@@ -2844,6 +2844,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
entry = mk_pte(page, vma->vm_page_prot);
if (write)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (old)
+ entry = pte_mkold(entry);
if (anon) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address, false);
@@ -2998,7 +3000,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
put_page(fault_page);
return ret;
}
- do_set_pte(vma, address, fault_page, pte, false, false);
+ do_set_pte(vma, address, fault_page, pte, false, false, false);
unlock_page(fault_page);
unlock_out:
pte_unmap_unlock(pte, ptl);
@@ -3050,7 +3052,7 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}
goto uncharge_out;
}
- do_set_pte(vma, address, new_page, pte, true, true);
+ do_set_pte(vma, address, new_page, pte, true, true, false);
mem_cgroup_commit_charge(new_page, memcg, false, false);
lru_cache_add_active_or_unevictable(new_page, vma);
pte_unmap_unlock(pte, ptl);
@@ -3107,7 +3109,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
put_page(fault_page);
return ret;
}
- do_set_pte(vma, address, fault_page, pte, true, false);
+ do_set_pte(vma, address, fault_page, pte, true, false, false);
pte_unmap_unlock(pte, ptl);
if (set_page_dirty(fault_page))
--
Kirill A. Shutemov
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 9:44 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 9:44 UTC (permalink / raw)
To: Vinayak Menon
Cc: Andrew Morton, linux-mm, linux-kernel, dan.j.williams, mgorman,
vbabka, kirill.shutemov, dave.hansen, hughd
On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> >
> >>Mapping pages around fault is found to cause performance degradation
> >>in certain use cases. The test performed here is launch of 10 apps
> >>one by one, doing something with the app each time, and then repeating
> >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> >>of RAM. The time taken to launch the apps is found to be better when
> >>fault around feature is disabled by setting fault_around_bytes to page
> >>size (4096 in this case).
> >
> >Well that's one workload, and a somewhat strange one. What is the
> >effect on other workloads (of which there are a lot!).
> >
> This workload emulates the way a user would use his mobile device, opening
> an application, using it for some time, switching to next, and then coming
> back to the same application later. Another stat which shows significant
> degradation on Android with fault_around is device boot up time. I have not
> tried any other workload other than these.
>
> >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> >>and pageref_keep accounts the mapped file pages activated and retained
> >>by page_check_references.
> >>
> >>=== Without swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 691100 664339
> >>workingset_activate 210379 179139
> >>pgpgin 4676096 4492780
> >>pgpgout 163967 96711
> >>pgpgoutclean 1090664 990659
> >>pgalloc_dma 3463111 3328299
> >>pgfree 3502365 3363866
> >>pgactivate 568134 238570
> >>pgdeactivate 752260 392138
> >>pageref_activate 315078 121705
> >>pageref_activate_vm_exec 162940 55815
> >>pageref_keep 141354 51011
> >>pgmajfault 24863 23633
> >>pgrefill_dma 1116370 544042
> >>pgscan_kswapd_dma 1735186 1234622
> >>pgsteal_kswapd_dma 1121769 1005725
> >>pgscan_direct_dma 12966 1090
> >>pgsteal_direct_dma 6209 967
> >>slabs_scanned 1539849 977351
> >>pageoutrun 1260 1333
> >>allocstall 47 7
> >>
> >>=== With swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 597687 878109
> >>workingset_activate 167169 254037
> >>pgpgin 4035424 5157348
> >>pgpgout 162151 85231
> >>pgpgoutclean 928587 1225029
> >>pswpin 46033 17100
> >>pswpout 237952 127686
> >>pgalloc_dma 3305034 3542614
> >>pgfree 3354989 3592132
> >>pgactivate 626468 355275
> >>pgdeactivate 990205 771902
> >>pageref_activate 294780 157106
> >>pageref_activate_vm_exec 141722 63469
> >>pageref_keep 121931 63028
> >>pgmajfault 67818 45643
> >>pgrefill_dma 1324023 977192
> >>pgscan_kswapd_dma 1825267 1720322
> >>pgsteal_kswapd_dma 1181882 1365500
> >>pgscan_direct_dma 41957 9622
> >>pgsteal_direct_dma 25136 6759
> >>slabs_scanned 689575 542705
> >>pageoutrun 1234 1538
> >>allocstall 110 26
> >>
> >>Looks like with fault_around, there is more pressure on reclaim because
> >>of the presence of more mapped pages, resulting in more IO activity,
> >>more faults, more swapping, and allocstalls.
> >
> >A few of those things did get a bit worse?
> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> better with fault_around because, increased number of mapped pages is
> resulting in less number of file pages being reclaimed (pageref_activate,
> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> of increased swapping, decrease in kswapd efficiency and increase in
> allocstalls.
> So the problem looks to be that unwanted pages are mapped around the fault
> and page_check_references is unaware of this.
Hm. It makes me think we should make ptes setup by faultaround old.
Although, it would defeat (to some extend) purpose of faultaround on
architectures without HW accessed bit :-/
Could you check if the patch below changes the situation?
It would require some more work to not mark the pte we've got fault for old.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a55e5be0894f..1066fabf17c3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -584,7 +584,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
}
void do_set_pte(struct vm_area_struct *vma, unsigned long address,
- struct page *page, pte_t *pte, bool write, bool anon);
+ struct page *page, pte_t *pte, bool write, bool anon, bool old);
#endif
/*
diff --git a/mm/filemap.c b/mm/filemap.c
index f2479af09da9..47ba88fd7192 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2189,7 +2189,7 @@ repeat:
if (file->f_ra.mmap_miss > 0)
file->f_ra.mmap_miss--;
addr = address + (page->index - vmf->pgoff) * PAGE_SIZE;
- do_set_pte(vma, addr, page, pte, false, false);
+ do_set_pte(vma, addr, page, pte, false, false, true);
unlock_page(page);
goto next;
unlock:
diff --git a/mm/memory.c b/mm/memory.c
index 93897f23cc11..fa3ac184eafd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2836,7 +2836,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
* vm_ops->map_pages.
*/
void do_set_pte(struct vm_area_struct *vma, unsigned long address,
- struct page *page, pte_t *pte, bool write, bool anon)
+ struct page *page, pte_t *pte, bool write, bool anon, bool old)
{
pte_t entry;
@@ -2844,6 +2844,8 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
entry = mk_pte(page, vma->vm_page_prot);
if (write)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+ if (old)
+ entry = pte_mkold(entry);
if (anon) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, address, false);
@@ -2998,7 +3000,7 @@ static int do_read_fault(struct mm_struct *mm, struct vm_area_struct *vma,
put_page(fault_page);
return ret;
}
- do_set_pte(vma, address, fault_page, pte, false, false);
+ do_set_pte(vma, address, fault_page, pte, false, false, false);
unlock_page(fault_page);
unlock_out:
pte_unmap_unlock(pte, ptl);
@@ -3050,7 +3052,7 @@ static int do_cow_fault(struct mm_struct *mm, struct vm_area_struct *vma,
}
goto uncharge_out;
}
- do_set_pte(vma, address, new_page, pte, true, true);
+ do_set_pte(vma, address, new_page, pte, true, true, false);
mem_cgroup_commit_charge(new_page, memcg, false, false);
lru_cache_add_active_or_unevictable(new_page, vma);
pte_unmap_unlock(pte, ptl);
@@ -3107,7 +3109,7 @@ static int do_shared_fault(struct mm_struct *mm, struct vm_area_struct *vma,
put_page(fault_page);
return ret;
}
- do_set_pte(vma, address, fault_page, pte, true, false);
+ do_set_pte(vma, address, fault_page, pte, true, false, false);
pte_unmap_unlock(pte, ptl);
if (set_page_dirty(fault_page))
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 8:45 ` Vinayak Menon
@ 2016-04-22 14:02 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-04-22 14:02 UTC (permalink / raw)
To: Vinayak Menon
Cc: Andrew Morton, linux-mm, linux-kernel, dan.j.williams, mgorman,
vbabka, kirill.shutemov, dave.hansen, hughd
On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> >
> >>Mapping pages around fault is found to cause performance degradation
> >>in certain use cases. The test performed here is launch of 10 apps
> >>one by one, doing something with the app each time, and then repeating
> >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> >>of RAM. The time taken to launch the apps is found to be better when
> >>fault around feature is disabled by setting fault_around_bytes to page
> >>size (4096 in this case).
> >
> >Well that's one workload, and a somewhat strange one. What is the
> >effect on other workloads (of which there are a lot!).
> >
> This workload emulates the way a user would use his mobile device, opening
> an application, using it for some time, switching to next, and then coming
> back to the same application later. Another stat which shows significant
> degradation on Android with fault_around is device boot up time. I have not
> tried any other workload other than these.
>
> >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> >>and pageref_keep accounts the mapped file pages activated and retained
> >>by page_check_references.
> >>
> >>=== Without swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 691100 664339
> >>workingset_activate 210379 179139
> >>pgpgin 4676096 4492780
> >>pgpgout 163967 96711
> >>pgpgoutclean 1090664 990659
> >>pgalloc_dma 3463111 3328299
> >>pgfree 3502365 3363866
> >>pgactivate 568134 238570
> >>pgdeactivate 752260 392138
> >>pageref_activate 315078 121705
> >>pageref_activate_vm_exec 162940 55815
> >>pageref_keep 141354 51011
> >>pgmajfault 24863 23633
> >>pgrefill_dma 1116370 544042
> >>pgscan_kswapd_dma 1735186 1234622
> >>pgsteal_kswapd_dma 1121769 1005725
> >>pgscan_direct_dma 12966 1090
> >>pgsteal_direct_dma 6209 967
> >>slabs_scanned 1539849 977351
> >>pageoutrun 1260 1333
> >>allocstall 47 7
> >>
> >>=== With swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 597687 878109
> >>workingset_activate 167169 254037
> >>pgpgin 4035424 5157348
> >>pgpgout 162151 85231
> >>pgpgoutclean 928587 1225029
> >>pswpin 46033 17100
> >>pswpout 237952 127686
> >>pgalloc_dma 3305034 3542614
> >>pgfree 3354989 3592132
> >>pgactivate 626468 355275
> >>pgdeactivate 990205 771902
> >>pageref_activate 294780 157106
> >>pageref_activate_vm_exec 141722 63469
> >>pageref_keep 121931 63028
> >>pgmajfault 67818 45643
> >>pgrefill_dma 1324023 977192
> >>pgscan_kswapd_dma 1825267 1720322
> >>pgsteal_kswapd_dma 1181882 1365500
> >>pgscan_direct_dma 41957 9622
> >>pgsteal_direct_dma 25136 6759
> >>slabs_scanned 689575 542705
> >>pageoutrun 1234 1538
> >>allocstall 110 26
> >>
> >>Looks like with fault_around, there is more pressure on reclaim because
> >>of the presence of more mapped pages, resulting in more IO activity,
> >>more faults, more swapping, and allocstalls.
> >
> >A few of those things did get a bit worse?
> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> better with fault_around because, increased number of mapped pages is
> resulting in less number of file pages being reclaimed (pageref_activate,
> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> of increased swapping, decrease in kswapd efficiency and increase in
> allocstalls.
> So the problem looks to be that unwanted pages are mapped around the fault
> and page_check_references is unaware of this.
The page_check_references makes difference only when pte has marked access_bit.
enum page_references page_check_references(struct page *page)
{
referenced_ptes = page_referenced(page);
if (referenced_ptes) {
...
return PAGEREF_ACTIVATE
}
}
But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
pte_mkold. So, I think page_check_reference shouldn't make any difference.
Other thing it can make the difference about reclaiming is that it can
make more pressure slab shrinking.
unsigned long shrink_page_list()
{
..
/* Double the slab pressure for mapped and swapcache pages */
if (page_mapped(page) || PageSwapCache(page))
sc->nr_scanned++;
..
}
But I'm not sure it can make such difference.
Could you explain why I am missing?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 14:02 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-04-22 14:02 UTC (permalink / raw)
To: Vinayak Menon
Cc: Andrew Morton, linux-mm, linux-kernel, dan.j.williams, mgorman,
vbabka, kirill.shutemov, dave.hansen, hughd
On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> >
> >>Mapping pages around fault is found to cause performance degradation
> >>in certain use cases. The test performed here is launch of 10 apps
> >>one by one, doing something with the app each time, and then repeating
> >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> >>of RAM. The time taken to launch the apps is found to be better when
> >>fault around feature is disabled by setting fault_around_bytes to page
> >>size (4096 in this case).
> >
> >Well that's one workload, and a somewhat strange one. What is the
> >effect on other workloads (of which there are a lot!).
> >
> This workload emulates the way a user would use his mobile device, opening
> an application, using it for some time, switching to next, and then coming
> back to the same application later. Another stat which shows significant
> degradation on Android with fault_around is device boot up time. I have not
> tried any other workload other than these.
>
> >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> >>and pageref_keep accounts the mapped file pages activated and retained
> >>by page_check_references.
> >>
> >>=== Without swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 691100 664339
> >>workingset_activate 210379 179139
> >>pgpgin 4676096 4492780
> >>pgpgout 163967 96711
> >>pgpgoutclean 1090664 990659
> >>pgalloc_dma 3463111 3328299
> >>pgfree 3502365 3363866
> >>pgactivate 568134 238570
> >>pgdeactivate 752260 392138
> >>pageref_activate 315078 121705
> >>pageref_activate_vm_exec 162940 55815
> >>pageref_keep 141354 51011
> >>pgmajfault 24863 23633
> >>pgrefill_dma 1116370 544042
> >>pgscan_kswapd_dma 1735186 1234622
> >>pgsteal_kswapd_dma 1121769 1005725
> >>pgscan_direct_dma 12966 1090
> >>pgsteal_direct_dma 6209 967
> >>slabs_scanned 1539849 977351
> >>pageoutrun 1260 1333
> >>allocstall 47 7
> >>
> >>=== With swap ===
> >> 3.18 3.18-fault_around_bytes=4096
> >>-----------------------------------------------------------------------
> >>workingset_refault 597687 878109
> >>workingset_activate 167169 254037
> >>pgpgin 4035424 5157348
> >>pgpgout 162151 85231
> >>pgpgoutclean 928587 1225029
> >>pswpin 46033 17100
> >>pswpout 237952 127686
> >>pgalloc_dma 3305034 3542614
> >>pgfree 3354989 3592132
> >>pgactivate 626468 355275
> >>pgdeactivate 990205 771902
> >>pageref_activate 294780 157106
> >>pageref_activate_vm_exec 141722 63469
> >>pageref_keep 121931 63028
> >>pgmajfault 67818 45643
> >>pgrefill_dma 1324023 977192
> >>pgscan_kswapd_dma 1825267 1720322
> >>pgsteal_kswapd_dma 1181882 1365500
> >>pgscan_direct_dma 41957 9622
> >>pgsteal_direct_dma 25136 6759
> >>slabs_scanned 689575 542705
> >>pageoutrun 1234 1538
> >>allocstall 110 26
> >>
> >>Looks like with fault_around, there is more pressure on reclaim because
> >>of the presence of more mapped pages, resulting in more IO activity,
> >>more faults, more swapping, and allocstalls.
> >
> >A few of those things did get a bit worse?
> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> better with fault_around because, increased number of mapped pages is
> resulting in less number of file pages being reclaimed (pageref_activate,
> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> of increased swapping, decrease in kswapd efficiency and increase in
> allocstalls.
> So the problem looks to be that unwanted pages are mapped around the fault
> and page_check_references is unaware of this.
The page_check_references makes difference only when pte has marked access_bit.
enum page_references page_check_references(struct page *page)
{
referenced_ptes = page_referenced(page);
if (referenced_ptes) {
...
return PAGEREF_ACTIVATE
}
}
But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
pte_mkold. So, I think page_check_reference shouldn't make any difference.
Other thing it can make the difference about reclaiming is that it can
make more pressure slab shrinking.
unsigned long shrink_page_list()
{
..
/* Double the slab pressure for mapped and swapcache pages */
if (page_mapped(page) || PageSwapCache(page))
sc->nr_scanned++;
..
}
But I'm not sure it can make such difference.
Could you explain why I am missing?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 14:02 ` Minchan Kim
@ 2016-04-22 14:11 ` Kirill A. Shutemov
-1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 14:11 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Fri, Apr 22, 2016 at 11:02:16PM +0900, Minchan Kim wrote:
> On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > >
> > >>Mapping pages around fault is found to cause performance degradation
> > >>in certain use cases. The test performed here is launch of 10 apps
> > >>one by one, doing something with the app each time, and then repeating
> > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > >>of RAM. The time taken to launch the apps is found to be better when
> > >>fault around feature is disabled by setting fault_around_bytes to page
> > >>size (4096 in this case).
> > >
> > >Well that's one workload, and a somewhat strange one. What is the
> > >effect on other workloads (of which there are a lot!).
> > >
> > This workload emulates the way a user would use his mobile device, opening
> > an application, using it for some time, switching to next, and then coming
> > back to the same application later. Another stat which shows significant
> > degradation on Android with fault_around is device boot up time. I have not
> > tried any other workload other than these.
> >
> > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > >>and pageref_keep accounts the mapped file pages activated and retained
> > >>by page_check_references.
> > >>
> > >>=== Without swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 691100 664339
> > >>workingset_activate 210379 179139
> > >>pgpgin 4676096 4492780
> > >>pgpgout 163967 96711
> > >>pgpgoutclean 1090664 990659
> > >>pgalloc_dma 3463111 3328299
> > >>pgfree 3502365 3363866
> > >>pgactivate 568134 238570
> > >>pgdeactivate 752260 392138
> > >>pageref_activate 315078 121705
> > >>pageref_activate_vm_exec 162940 55815
> > >>pageref_keep 141354 51011
> > >>pgmajfault 24863 23633
> > >>pgrefill_dma 1116370 544042
> > >>pgscan_kswapd_dma 1735186 1234622
> > >>pgsteal_kswapd_dma 1121769 1005725
> > >>pgscan_direct_dma 12966 1090
> > >>pgsteal_direct_dma 6209 967
> > >>slabs_scanned 1539849 977351
> > >>pageoutrun 1260 1333
> > >>allocstall 47 7
> > >>
> > >>=== With swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 597687 878109
> > >>workingset_activate 167169 254037
> > >>pgpgin 4035424 5157348
> > >>pgpgout 162151 85231
> > >>pgpgoutclean 928587 1225029
> > >>pswpin 46033 17100
> > >>pswpout 237952 127686
> > >>pgalloc_dma 3305034 3542614
> > >>pgfree 3354989 3592132
> > >>pgactivate 626468 355275
> > >>pgdeactivate 990205 771902
> > >>pageref_activate 294780 157106
> > >>pageref_activate_vm_exec 141722 63469
> > >>pageref_keep 121931 63028
> > >>pgmajfault 67818 45643
> > >>pgrefill_dma 1324023 977192
> > >>pgscan_kswapd_dma 1825267 1720322
> > >>pgsteal_kswapd_dma 1181882 1365500
> > >>pgscan_direct_dma 41957 9622
> > >>pgsteal_direct_dma 25136 6759
> > >>slabs_scanned 689575 542705
> > >>pageoutrun 1234 1538
> > >>allocstall 110 26
> > >>
> > >>Looks like with fault_around, there is more pressure on reclaim because
> > >>of the presence of more mapped pages, resulting in more IO activity,
> > >>more faults, more swapping, and allocstalls.
> > >
> > >A few of those things did get a bit worse?
> > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > better with fault_around because, increased number of mapped pages is
> > resulting in less number of file pages being reclaimed (pageref_activate,
> > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > of increased swapping, decrease in kswapd efficiency and increase in
> > allocstalls.
> > So the problem looks to be that unwanted pages are mapped around the fault
> > and page_check_references is unaware of this.
>
> The page_check_references makes difference only when pte has marked access_bit.
>
> enum page_references page_check_references(struct page *page)
> {
> referenced_ptes = page_referenced(page);
> if (referenced_ptes) {
> ...
> return PAGEREF_ACTIVATE
> }
> }
>
> But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
> pte_mkold. So, I think page_check_reference shouldn't make any difference.
Actually, I've checked and mk_pte() produces young ptes for me. Not sure
why.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 14:11 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 14:11 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Fri, Apr 22, 2016 at 11:02:16PM +0900, Minchan Kim wrote:
> On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > >
> > >>Mapping pages around fault is found to cause performance degradation
> > >>in certain use cases. The test performed here is launch of 10 apps
> > >>one by one, doing something with the app each time, and then repeating
> > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > >>of RAM. The time taken to launch the apps is found to be better when
> > >>fault around feature is disabled by setting fault_around_bytes to page
> > >>size (4096 in this case).
> > >
> > >Well that's one workload, and a somewhat strange one. What is the
> > >effect on other workloads (of which there are a lot!).
> > >
> > This workload emulates the way a user would use his mobile device, opening
> > an application, using it for some time, switching to next, and then coming
> > back to the same application later. Another stat which shows significant
> > degradation on Android with fault_around is device boot up time. I have not
> > tried any other workload other than these.
> >
> > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > >>and pageref_keep accounts the mapped file pages activated and retained
> > >>by page_check_references.
> > >>
> > >>=== Without swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 691100 664339
> > >>workingset_activate 210379 179139
> > >>pgpgin 4676096 4492780
> > >>pgpgout 163967 96711
> > >>pgpgoutclean 1090664 990659
> > >>pgalloc_dma 3463111 3328299
> > >>pgfree 3502365 3363866
> > >>pgactivate 568134 238570
> > >>pgdeactivate 752260 392138
> > >>pageref_activate 315078 121705
> > >>pageref_activate_vm_exec 162940 55815
> > >>pageref_keep 141354 51011
> > >>pgmajfault 24863 23633
> > >>pgrefill_dma 1116370 544042
> > >>pgscan_kswapd_dma 1735186 1234622
> > >>pgsteal_kswapd_dma 1121769 1005725
> > >>pgscan_direct_dma 12966 1090
> > >>pgsteal_direct_dma 6209 967
> > >>slabs_scanned 1539849 977351
> > >>pageoutrun 1260 1333
> > >>allocstall 47 7
> > >>
> > >>=== With swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 597687 878109
> > >>workingset_activate 167169 254037
> > >>pgpgin 4035424 5157348
> > >>pgpgout 162151 85231
> > >>pgpgoutclean 928587 1225029
> > >>pswpin 46033 17100
> > >>pswpout 237952 127686
> > >>pgalloc_dma 3305034 3542614
> > >>pgfree 3354989 3592132
> > >>pgactivate 626468 355275
> > >>pgdeactivate 990205 771902
> > >>pageref_activate 294780 157106
> > >>pageref_activate_vm_exec 141722 63469
> > >>pageref_keep 121931 63028
> > >>pgmajfault 67818 45643
> > >>pgrefill_dma 1324023 977192
> > >>pgscan_kswapd_dma 1825267 1720322
> > >>pgsteal_kswapd_dma 1181882 1365500
> > >>pgscan_direct_dma 41957 9622
> > >>pgsteal_direct_dma 25136 6759
> > >>slabs_scanned 689575 542705
> > >>pageoutrun 1234 1538
> > >>allocstall 110 26
> > >>
> > >>Looks like with fault_around, there is more pressure on reclaim because
> > >>of the presence of more mapped pages, resulting in more IO activity,
> > >>more faults, more swapping, and allocstalls.
> > >
> > >A few of those things did get a bit worse?
> > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > better with fault_around because, increased number of mapped pages is
> > resulting in less number of file pages being reclaimed (pageref_activate,
> > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > of increased swapping, decrease in kswapd efficiency and increase in
> > allocstalls.
> > So the problem looks to be that unwanted pages are mapped around the fault
> > and page_check_references is unaware of this.
>
> The page_check_references makes difference only when pte has marked access_bit.
>
> enum page_references page_check_references(struct page *page)
> {
> referenced_ptes = page_referenced(page);
> if (referenced_ptes) {
> ...
> return PAGEREF_ACTIVATE
> }
> }
>
> But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
> pte_mkold. So, I think page_check_reference shouldn't make any difference.
Actually, I've checked and mk_pte() produces young ptes for me. Not sure
why.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 14:11 ` Kirill A. Shutemov
@ 2016-04-22 14:17 ` Kirill A. Shutemov
-1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 14:17 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Fri, Apr 22, 2016 at 05:11:41PM +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 11:02:16PM +0900, Minchan Kim wrote:
> > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > >
> > > >>Mapping pages around fault is found to cause performance degradation
> > > >>in certain use cases. The test performed here is launch of 10 apps
> > > >>one by one, doing something with the app each time, and then repeating
> > > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > >>of RAM. The time taken to launch the apps is found to be better when
> > > >>fault around feature is disabled by setting fault_around_bytes to page
> > > >>size (4096 in this case).
> > > >
> > > >Well that's one workload, and a somewhat strange one. What is the
> > > >effect on other workloads (of which there are a lot!).
> > > >
> > > This workload emulates the way a user would use his mobile device, opening
> > > an application, using it for some time, switching to next, and then coming
> > > back to the same application later. Another stat which shows significant
> > > degradation on Android with fault_around is device boot up time. I have not
> > > tried any other workload other than these.
> > >
> > > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > >>and pageref_keep accounts the mapped file pages activated and retained
> > > >>by page_check_references.
> > > >>
> > > >>=== Without swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 691100 664339
> > > >>workingset_activate 210379 179139
> > > >>pgpgin 4676096 4492780
> > > >>pgpgout 163967 96711
> > > >>pgpgoutclean 1090664 990659
> > > >>pgalloc_dma 3463111 3328299
> > > >>pgfree 3502365 3363866
> > > >>pgactivate 568134 238570
> > > >>pgdeactivate 752260 392138
> > > >>pageref_activate 315078 121705
> > > >>pageref_activate_vm_exec 162940 55815
> > > >>pageref_keep 141354 51011
> > > >>pgmajfault 24863 23633
> > > >>pgrefill_dma 1116370 544042
> > > >>pgscan_kswapd_dma 1735186 1234622
> > > >>pgsteal_kswapd_dma 1121769 1005725
> > > >>pgscan_direct_dma 12966 1090
> > > >>pgsteal_direct_dma 6209 967
> > > >>slabs_scanned 1539849 977351
> > > >>pageoutrun 1260 1333
> > > >>allocstall 47 7
> > > >>
> > > >>=== With swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 597687 878109
> > > >>workingset_activate 167169 254037
> > > >>pgpgin 4035424 5157348
> > > >>pgpgout 162151 85231
> > > >>pgpgoutclean 928587 1225029
> > > >>pswpin 46033 17100
> > > >>pswpout 237952 127686
> > > >>pgalloc_dma 3305034 3542614
> > > >>pgfree 3354989 3592132
> > > >>pgactivate 626468 355275
> > > >>pgdeactivate 990205 771902
> > > >>pageref_activate 294780 157106
> > > >>pageref_activate_vm_exec 141722 63469
> > > >>pageref_keep 121931 63028
> > > >>pgmajfault 67818 45643
> > > >>pgrefill_dma 1324023 977192
> > > >>pgscan_kswapd_dma 1825267 1720322
> > > >>pgsteal_kswapd_dma 1181882 1365500
> > > >>pgscan_direct_dma 41957 9622
> > > >>pgsteal_direct_dma 25136 6759
> > > >>slabs_scanned 689575 542705
> > > >>pageoutrun 1234 1538
> > > >>allocstall 110 26
> > > >>
> > > >>Looks like with fault_around, there is more pressure on reclaim because
> > > >>of the presence of more mapped pages, resulting in more IO activity,
> > > >>more faults, more swapping, and allocstalls.
> > > >
> > > >A few of those things did get a bit worse?
> > > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > better with fault_around because, increased number of mapped pages is
> > > resulting in less number of file pages being reclaimed (pageref_activate,
> > > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > of increased swapping, decrease in kswapd efficiency and increase in
> > > allocstalls.
> > > So the problem looks to be that unwanted pages are mapped around the fault
> > > and page_check_references is unaware of this.
> >
> > The page_check_references makes difference only when pte has marked access_bit.
> >
> > enum page_references page_check_references(struct page *page)
> > {
> > referenced_ptes = page_referenced(page);
> > if (referenced_ptes) {
> > ...
> > return PAGEREF_ACTIVATE
> > }
> > }
> >
> > But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
> > pte_mkold. So, I think page_check_reference shouldn't make any difference.
>
> Actually, I've checked and mk_pte() produces young ptes for me. Not sure
> why.
Ah. Okay, _PAGE_ACCESSED included into pgprot mask, which is reasonable to
have if you handle page fault for the address. But it should be adjusted
for faultaround.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 14:17 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 14:17 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Fri, Apr 22, 2016 at 05:11:41PM +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 11:02:16PM +0900, Minchan Kim wrote:
> > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > >
> > > >>Mapping pages around fault is found to cause performance degradation
> > > >>in certain use cases. The test performed here is launch of 10 apps
> > > >>one by one, doing something with the app each time, and then repeating
> > > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > >>of RAM. The time taken to launch the apps is found to be better when
> > > >>fault around feature is disabled by setting fault_around_bytes to page
> > > >>size (4096 in this case).
> > > >
> > > >Well that's one workload, and a somewhat strange one. What is the
> > > >effect on other workloads (of which there are a lot!).
> > > >
> > > This workload emulates the way a user would use his mobile device, opening
> > > an application, using it for some time, switching to next, and then coming
> > > back to the same application later. Another stat which shows significant
> > > degradation on Android with fault_around is device boot up time. I have not
> > > tried any other workload other than these.
> > >
> > > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > >>and pageref_keep accounts the mapped file pages activated and retained
> > > >>by page_check_references.
> > > >>
> > > >>=== Without swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 691100 664339
> > > >>workingset_activate 210379 179139
> > > >>pgpgin 4676096 4492780
> > > >>pgpgout 163967 96711
> > > >>pgpgoutclean 1090664 990659
> > > >>pgalloc_dma 3463111 3328299
> > > >>pgfree 3502365 3363866
> > > >>pgactivate 568134 238570
> > > >>pgdeactivate 752260 392138
> > > >>pageref_activate 315078 121705
> > > >>pageref_activate_vm_exec 162940 55815
> > > >>pageref_keep 141354 51011
> > > >>pgmajfault 24863 23633
> > > >>pgrefill_dma 1116370 544042
> > > >>pgscan_kswapd_dma 1735186 1234622
> > > >>pgsteal_kswapd_dma 1121769 1005725
> > > >>pgscan_direct_dma 12966 1090
> > > >>pgsteal_direct_dma 6209 967
> > > >>slabs_scanned 1539849 977351
> > > >>pageoutrun 1260 1333
> > > >>allocstall 47 7
> > > >>
> > > >>=== With swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 597687 878109
> > > >>workingset_activate 167169 254037
> > > >>pgpgin 4035424 5157348
> > > >>pgpgout 162151 85231
> > > >>pgpgoutclean 928587 1225029
> > > >>pswpin 46033 17100
> > > >>pswpout 237952 127686
> > > >>pgalloc_dma 3305034 3542614
> > > >>pgfree 3354989 3592132
> > > >>pgactivate 626468 355275
> > > >>pgdeactivate 990205 771902
> > > >>pageref_activate 294780 157106
> > > >>pageref_activate_vm_exec 141722 63469
> > > >>pageref_keep 121931 63028
> > > >>pgmajfault 67818 45643
> > > >>pgrefill_dma 1324023 977192
> > > >>pgscan_kswapd_dma 1825267 1720322
> > > >>pgsteal_kswapd_dma 1181882 1365500
> > > >>pgscan_direct_dma 41957 9622
> > > >>pgsteal_direct_dma 25136 6759
> > > >>slabs_scanned 689575 542705
> > > >>pageoutrun 1234 1538
> > > >>allocstall 110 26
> > > >>
> > > >>Looks like with fault_around, there is more pressure on reclaim because
> > > >>of the presence of more mapped pages, resulting in more IO activity,
> > > >>more faults, more swapping, and allocstalls.
> > > >
> > > >A few of those things did get a bit worse?
> > > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > better with fault_around because, increased number of mapped pages is
> > > resulting in less number of file pages being reclaimed (pageref_activate,
> > > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > of increased swapping, decrease in kswapd efficiency and increase in
> > > allocstalls.
> > > So the problem looks to be that unwanted pages are mapped around the fault
> > > and page_check_references is unaware of this.
> >
> > The page_check_references makes difference only when pte has marked access_bit.
> >
> > enum page_references page_check_references(struct page *page)
> > {
> > referenced_ptes = page_referenced(page);
> > if (referenced_ptes) {
> > ...
> > return PAGEREF_ACTIVATE
> > }
> > }
> >
> > But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
> > pte_mkold. So, I think page_check_reference shouldn't make any difference.
>
> Actually, I've checked and mk_pte() produces young ptes for me. Not sure
> why.
Ah. Okay, _PAGE_ACCESSED included into pgprot mask, which is reasonable to
have if you handle page fault for the address. But it should be adjusted
for faultaround.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 14:17 ` Kirill A. Shutemov
@ 2016-04-22 14:50 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-04-22 14:50 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Minchan Kim, Vinayak Menon, Andrew Morton, linux-mm,
linux-kernel, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd
On Fri, Apr 22, 2016 at 05:17:16PM +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 05:11:41PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Apr 22, 2016 at 11:02:16PM +0900, Minchan Kim wrote:
> > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > > >
> > > > >>Mapping pages around fault is found to cause performance degradation
> > > > >>in certain use cases. The test performed here is launch of 10 apps
> > > > >>one by one, doing something with the app each time, and then repeating
> > > > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > > >>of RAM. The time taken to launch the apps is found to be better when
> > > > >>fault around feature is disabled by setting fault_around_bytes to page
> > > > >>size (4096 in this case).
> > > > >
> > > > >Well that's one workload, and a somewhat strange one. What is the
> > > > >effect on other workloads (of which there are a lot!).
> > > > >
> > > > This workload emulates the way a user would use his mobile device, opening
> > > > an application, using it for some time, switching to next, and then coming
> > > > back to the same application later. Another stat which shows significant
> > > > degradation on Android with fault_around is device boot up time. I have not
> > > > tried any other workload other than these.
> > > >
> > > > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > > >>and pageref_keep accounts the mapped file pages activated and retained
> > > > >>by page_check_references.
> > > > >>
> > > > >>=== Without swap ===
> > > > >> 3.18 3.18-fault_around_bytes=4096
> > > > >>-----------------------------------------------------------------------
> > > > >>workingset_refault 691100 664339
> > > > >>workingset_activate 210379 179139
> > > > >>pgpgin 4676096 4492780
> > > > >>pgpgout 163967 96711
> > > > >>pgpgoutclean 1090664 990659
> > > > >>pgalloc_dma 3463111 3328299
> > > > >>pgfree 3502365 3363866
> > > > >>pgactivate 568134 238570
> > > > >>pgdeactivate 752260 392138
> > > > >>pageref_activate 315078 121705
> > > > >>pageref_activate_vm_exec 162940 55815
> > > > >>pageref_keep 141354 51011
> > > > >>pgmajfault 24863 23633
> > > > >>pgrefill_dma 1116370 544042
> > > > >>pgscan_kswapd_dma 1735186 1234622
> > > > >>pgsteal_kswapd_dma 1121769 1005725
> > > > >>pgscan_direct_dma 12966 1090
> > > > >>pgsteal_direct_dma 6209 967
> > > > >>slabs_scanned 1539849 977351
> > > > >>pageoutrun 1260 1333
> > > > >>allocstall 47 7
> > > > >>
> > > > >>=== With swap ===
> > > > >> 3.18 3.18-fault_around_bytes=4096
> > > > >>-----------------------------------------------------------------------
> > > > >>workingset_refault 597687 878109
> > > > >>workingset_activate 167169 254037
> > > > >>pgpgin 4035424 5157348
> > > > >>pgpgout 162151 85231
> > > > >>pgpgoutclean 928587 1225029
> > > > >>pswpin 46033 17100
> > > > >>pswpout 237952 127686
> > > > >>pgalloc_dma 3305034 3542614
> > > > >>pgfree 3354989 3592132
> > > > >>pgactivate 626468 355275
> > > > >>pgdeactivate 990205 771902
> > > > >>pageref_activate 294780 157106
> > > > >>pageref_activate_vm_exec 141722 63469
> > > > >>pageref_keep 121931 63028
> > > > >>pgmajfault 67818 45643
> > > > >>pgrefill_dma 1324023 977192
> > > > >>pgscan_kswapd_dma 1825267 1720322
> > > > >>pgsteal_kswapd_dma 1181882 1365500
> > > > >>pgscan_direct_dma 41957 9622
> > > > >>pgsteal_direct_dma 25136 6759
> > > > >>slabs_scanned 689575 542705
> > > > >>pageoutrun 1234 1538
> > > > >>allocstall 110 26
> > > > >>
> > > > >>Looks like with fault_around, there is more pressure on reclaim because
> > > > >>of the presence of more mapped pages, resulting in more IO activity,
> > > > >>more faults, more swapping, and allocstalls.
> > > > >
> > > > >A few of those things did get a bit worse?
> > > > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > > better with fault_around because, increased number of mapped pages is
> > > > resulting in less number of file pages being reclaimed (pageref_activate,
> > > > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > > of increased swapping, decrease in kswapd efficiency and increase in
> > > > allocstalls.
> > > > So the problem looks to be that unwanted pages are mapped around the fault
> > > > and page_check_references is unaware of this.
> > >
> > > The page_check_references makes difference only when pte has marked access_bit.
> > >
> > > enum page_references page_check_references(struct page *page)
> > > {
> > > referenced_ptes = page_referenced(page);
> > > if (referenced_ptes) {
> > > ...
> > > return PAGEREF_ACTIVATE
> > > }
> > > }
> > >
> > > But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
> > > pte_mkold. So, I think page_check_reference shouldn't make any difference.
> >
> > Actually, I've checked and mk_pte() produces young ptes for me. Not sure
> > why.
>
> Ah. Okay, _PAGE_ACCESSED included into pgprot mask, which is reasonable to
> have if you handle page fault for the address. But it should be adjusted
> for faultaround.
Thanks for pointing out quickly!
Your suggestion does make sense to me.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 14:50 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-04-22 14:50 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Minchan Kim, Vinayak Menon, Andrew Morton, linux-mm,
linux-kernel, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd
On Fri, Apr 22, 2016 at 05:17:16PM +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 05:11:41PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Apr 22, 2016 at 11:02:16PM +0900, Minchan Kim wrote:
> > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > > >
> > > > >>Mapping pages around fault is found to cause performance degradation
> > > > >>in certain use cases. The test performed here is launch of 10 apps
> > > > >>one by one, doing something with the app each time, and then repeating
> > > > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > > >>of RAM. The time taken to launch the apps is found to be better when
> > > > >>fault around feature is disabled by setting fault_around_bytes to page
> > > > >>size (4096 in this case).
> > > > >
> > > > >Well that's one workload, and a somewhat strange one. What is the
> > > > >effect on other workloads (of which there are a lot!).
> > > > >
> > > > This workload emulates the way a user would use his mobile device, opening
> > > > an application, using it for some time, switching to next, and then coming
> > > > back to the same application later. Another stat which shows significant
> > > > degradation on Android with fault_around is device boot up time. I have not
> > > > tried any other workload other than these.
> > > >
> > > > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > > >>and pageref_keep accounts the mapped file pages activated and retained
> > > > >>by page_check_references.
> > > > >>
> > > > >>=== Without swap ===
> > > > >> 3.18 3.18-fault_around_bytes=4096
> > > > >>-----------------------------------------------------------------------
> > > > >>workingset_refault 691100 664339
> > > > >>workingset_activate 210379 179139
> > > > >>pgpgin 4676096 4492780
> > > > >>pgpgout 163967 96711
> > > > >>pgpgoutclean 1090664 990659
> > > > >>pgalloc_dma 3463111 3328299
> > > > >>pgfree 3502365 3363866
> > > > >>pgactivate 568134 238570
> > > > >>pgdeactivate 752260 392138
> > > > >>pageref_activate 315078 121705
> > > > >>pageref_activate_vm_exec 162940 55815
> > > > >>pageref_keep 141354 51011
> > > > >>pgmajfault 24863 23633
> > > > >>pgrefill_dma 1116370 544042
> > > > >>pgscan_kswapd_dma 1735186 1234622
> > > > >>pgsteal_kswapd_dma 1121769 1005725
> > > > >>pgscan_direct_dma 12966 1090
> > > > >>pgsteal_direct_dma 6209 967
> > > > >>slabs_scanned 1539849 977351
> > > > >>pageoutrun 1260 1333
> > > > >>allocstall 47 7
> > > > >>
> > > > >>=== With swap ===
> > > > >> 3.18 3.18-fault_around_bytes=4096
> > > > >>-----------------------------------------------------------------------
> > > > >>workingset_refault 597687 878109
> > > > >>workingset_activate 167169 254037
> > > > >>pgpgin 4035424 5157348
> > > > >>pgpgout 162151 85231
> > > > >>pgpgoutclean 928587 1225029
> > > > >>pswpin 46033 17100
> > > > >>pswpout 237952 127686
> > > > >>pgalloc_dma 3305034 3542614
> > > > >>pgfree 3354989 3592132
> > > > >>pgactivate 626468 355275
> > > > >>pgdeactivate 990205 771902
> > > > >>pageref_activate 294780 157106
> > > > >>pageref_activate_vm_exec 141722 63469
> > > > >>pageref_keep 121931 63028
> > > > >>pgmajfault 67818 45643
> > > > >>pgrefill_dma 1324023 977192
> > > > >>pgscan_kswapd_dma 1825267 1720322
> > > > >>pgsteal_kswapd_dma 1181882 1365500
> > > > >>pgscan_direct_dma 41957 9622
> > > > >>pgsteal_direct_dma 25136 6759
> > > > >>slabs_scanned 689575 542705
> > > > >>pageoutrun 1234 1538
> > > > >>allocstall 110 26
> > > > >>
> > > > >>Looks like with fault_around, there is more pressure on reclaim because
> > > > >>of the presence of more mapped pages, resulting in more IO activity,
> > > > >>more faults, more swapping, and allocstalls.
> > > > >
> > > > >A few of those things did get a bit worse?
> > > > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > > better with fault_around because, increased number of mapped pages is
> > > > resulting in less number of file pages being reclaimed (pageref_activate,
> > > > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > > of increased swapping, decrease in kswapd efficiency and increase in
> > > > allocstalls.
> > > > So the problem looks to be that unwanted pages are mapped around the fault
> > > > and page_check_references is unaware of this.
> > >
> > > The page_check_references makes difference only when pte has marked access_bit.
> > >
> > > enum page_references page_check_references(struct page *page)
> > > {
> > > referenced_ptes = page_referenced(page);
> > > if (referenced_ptes) {
> > > ...
> > > return PAGEREF_ACTIVATE
> > > }
> > > }
> > >
> > > But map_pages doesn't mark ahead pages as pte_mkyoung. IOW, ptes are already
> > > pte_mkold. So, I think page_check_reference shouldn't make any difference.
> >
> > Actually, I've checked and mk_pte() produces young ptes for me. Not sure
> > why.
>
> Ah. Okay, _PAGE_ACCESSED included into pgprot mask, which is reasonable to
> have if you handle page fault for the address. But it should be adjusted
> for faultaround.
Thanks for pointing out quickly!
Your suggestion does make sense to me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 9:44 ` Kirill A. Shutemov
@ 2016-04-22 15:09 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-04-22 15:09 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Fri, Apr 22, 2016 at 12:44:30PM +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > >
> > >>Mapping pages around fault is found to cause performance degradation
> > >>in certain use cases. The test performed here is launch of 10 apps
> > >>one by one, doing something with the app each time, and then repeating
> > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > >>of RAM. The time taken to launch the apps is found to be better when
> > >>fault around feature is disabled by setting fault_around_bytes to page
> > >>size (4096 in this case).
> > >
> > >Well that's one workload, and a somewhat strange one. What is the
> > >effect on other workloads (of which there are a lot!).
> > >
> > This workload emulates the way a user would use his mobile device, opening
> > an application, using it for some time, switching to next, and then coming
> > back to the same application later. Another stat which shows significant
> > degradation on Android with fault_around is device boot up time. I have not
> > tried any other workload other than these.
> >
> > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > >>and pageref_keep accounts the mapped file pages activated and retained
> > >>by page_check_references.
> > >>
> > >>=== Without swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 691100 664339
> > >>workingset_activate 210379 179139
> > >>pgpgin 4676096 4492780
> > >>pgpgout 163967 96711
> > >>pgpgoutclean 1090664 990659
> > >>pgalloc_dma 3463111 3328299
> > >>pgfree 3502365 3363866
> > >>pgactivate 568134 238570
> > >>pgdeactivate 752260 392138
> > >>pageref_activate 315078 121705
> > >>pageref_activate_vm_exec 162940 55815
> > >>pageref_keep 141354 51011
> > >>pgmajfault 24863 23633
> > >>pgrefill_dma 1116370 544042
> > >>pgscan_kswapd_dma 1735186 1234622
> > >>pgsteal_kswapd_dma 1121769 1005725
> > >>pgscan_direct_dma 12966 1090
> > >>pgsteal_direct_dma 6209 967
> > >>slabs_scanned 1539849 977351
> > >>pageoutrun 1260 1333
> > >>allocstall 47 7
> > >>
> > >>=== With swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 597687 878109
> > >>workingset_activate 167169 254037
> > >>pgpgin 4035424 5157348
> > >>pgpgout 162151 85231
> > >>pgpgoutclean 928587 1225029
> > >>pswpin 46033 17100
> > >>pswpout 237952 127686
> > >>pgalloc_dma 3305034 3542614
> > >>pgfree 3354989 3592132
> > >>pgactivate 626468 355275
> > >>pgdeactivate 990205 771902
> > >>pageref_activate 294780 157106
> > >>pageref_activate_vm_exec 141722 63469
> > >>pageref_keep 121931 63028
> > >>pgmajfault 67818 45643
> > >>pgrefill_dma 1324023 977192
> > >>pgscan_kswapd_dma 1825267 1720322
> > >>pgsteal_kswapd_dma 1181882 1365500
> > >>pgscan_direct_dma 41957 9622
> > >>pgsteal_direct_dma 25136 6759
> > >>slabs_scanned 689575 542705
> > >>pageoutrun 1234 1538
> > >>allocstall 110 26
> > >>
> > >>Looks like with fault_around, there is more pressure on reclaim because
> > >>of the presence of more mapped pages, resulting in more IO activity,
> > >>more faults, more swapping, and allocstalls.
> > >
> > >A few of those things did get a bit worse?
> > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > better with fault_around because, increased number of mapped pages is
> > resulting in less number of file pages being reclaimed (pageref_activate,
> > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > of increased swapping, decrease in kswapd efficiency and increase in
> > allocstalls.
> > So the problem looks to be that unwanted pages are mapped around the fault
> > and page_check_references is unaware of this.
>
> Hm. It makes me think we should make ptes setup by faultaround old.
>
> Although, it would defeat (to some extend) purpose of faultaround on
> architectures without HW accessed bit :-/
So, faultaround should be disabled for non HW access bit architecture?
As you said, it would defeat faultaround benefit. As well, it adds reclaim
overhead because rmap should handle it to remove ptes and more pressure to slab.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 15:09 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-04-22 15:09 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Fri, Apr 22, 2016 at 12:44:30PM +0300, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > >
> > >>Mapping pages around fault is found to cause performance degradation
> > >>in certain use cases. The test performed here is launch of 10 apps
> > >>one by one, doing something with the app each time, and then repeating
> > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > >>of RAM. The time taken to launch the apps is found to be better when
> > >>fault around feature is disabled by setting fault_around_bytes to page
> > >>size (4096 in this case).
> > >
> > >Well that's one workload, and a somewhat strange one. What is the
> > >effect on other workloads (of which there are a lot!).
> > >
> > This workload emulates the way a user would use his mobile device, opening
> > an application, using it for some time, switching to next, and then coming
> > back to the same application later. Another stat which shows significant
> > degradation on Android with fault_around is device boot up time. I have not
> > tried any other workload other than these.
> >
> > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > >>and pageref_keep accounts the mapped file pages activated and retained
> > >>by page_check_references.
> > >>
> > >>=== Without swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 691100 664339
> > >>workingset_activate 210379 179139
> > >>pgpgin 4676096 4492780
> > >>pgpgout 163967 96711
> > >>pgpgoutclean 1090664 990659
> > >>pgalloc_dma 3463111 3328299
> > >>pgfree 3502365 3363866
> > >>pgactivate 568134 238570
> > >>pgdeactivate 752260 392138
> > >>pageref_activate 315078 121705
> > >>pageref_activate_vm_exec 162940 55815
> > >>pageref_keep 141354 51011
> > >>pgmajfault 24863 23633
> > >>pgrefill_dma 1116370 544042
> > >>pgscan_kswapd_dma 1735186 1234622
> > >>pgsteal_kswapd_dma 1121769 1005725
> > >>pgscan_direct_dma 12966 1090
> > >>pgsteal_direct_dma 6209 967
> > >>slabs_scanned 1539849 977351
> > >>pageoutrun 1260 1333
> > >>allocstall 47 7
> > >>
> > >>=== With swap ===
> > >> 3.18 3.18-fault_around_bytes=4096
> > >>-----------------------------------------------------------------------
> > >>workingset_refault 597687 878109
> > >>workingset_activate 167169 254037
> > >>pgpgin 4035424 5157348
> > >>pgpgout 162151 85231
> > >>pgpgoutclean 928587 1225029
> > >>pswpin 46033 17100
> > >>pswpout 237952 127686
> > >>pgalloc_dma 3305034 3542614
> > >>pgfree 3354989 3592132
> > >>pgactivate 626468 355275
> > >>pgdeactivate 990205 771902
> > >>pageref_activate 294780 157106
> > >>pageref_activate_vm_exec 141722 63469
> > >>pageref_keep 121931 63028
> > >>pgmajfault 67818 45643
> > >>pgrefill_dma 1324023 977192
> > >>pgscan_kswapd_dma 1825267 1720322
> > >>pgsteal_kswapd_dma 1181882 1365500
> > >>pgscan_direct_dma 41957 9622
> > >>pgsteal_direct_dma 25136 6759
> > >>slabs_scanned 689575 542705
> > >>pageoutrun 1234 1538
> > >>allocstall 110 26
> > >>
> > >>Looks like with fault_around, there is more pressure on reclaim because
> > >>of the presence of more mapped pages, resulting in more IO activity,
> > >>more faults, more swapping, and allocstalls.
> > >
> > >A few of those things did get a bit worse?
> > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > better with fault_around because, increased number of mapped pages is
> > resulting in less number of file pages being reclaimed (pageref_activate,
> > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > of increased swapping, decrease in kswapd efficiency and increase in
> > allocstalls.
> > So the problem looks to be that unwanted pages are mapped around the fault
> > and page_check_references is unaware of this.
>
> Hm. It makes me think we should make ptes setup by faultaround old.
>
> Although, it would defeat (to some extend) purpose of faultaround on
> architectures without HW accessed bit :-/
So, faultaround should be disabled for non HW access bit architecture?
As you said, it would defeat faultaround benefit. As well, it adds reclaim
overhead because rmap should handle it to remove ptes and more pressure to slab.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 15:09 ` Minchan Kim
@ 2016-04-22 15:16 ` Kirill A. Shutemov
-1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 15:16 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Sat, Apr 23, 2016 at 12:09:46AM +0900, Minchan Kim wrote:
> On Fri, Apr 22, 2016 at 12:44:30PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > >
> > > >>Mapping pages around fault is found to cause performance degradation
> > > >>in certain use cases. The test performed here is launch of 10 apps
> > > >>one by one, doing something with the app each time, and then repeating
> > > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > >>of RAM. The time taken to launch the apps is found to be better when
> > > >>fault around feature is disabled by setting fault_around_bytes to page
> > > >>size (4096 in this case).
> > > >
> > > >Well that's one workload, and a somewhat strange one. What is the
> > > >effect on other workloads (of which there are a lot!).
> > > >
> > > This workload emulates the way a user would use his mobile device, opening
> > > an application, using it for some time, switching to next, and then coming
> > > back to the same application later. Another stat which shows significant
> > > degradation on Android with fault_around is device boot up time. I have not
> > > tried any other workload other than these.
> > >
> > > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > >>and pageref_keep accounts the mapped file pages activated and retained
> > > >>by page_check_references.
> > > >>
> > > >>=== Without swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 691100 664339
> > > >>workingset_activate 210379 179139
> > > >>pgpgin 4676096 4492780
> > > >>pgpgout 163967 96711
> > > >>pgpgoutclean 1090664 990659
> > > >>pgalloc_dma 3463111 3328299
> > > >>pgfree 3502365 3363866
> > > >>pgactivate 568134 238570
> > > >>pgdeactivate 752260 392138
> > > >>pageref_activate 315078 121705
> > > >>pageref_activate_vm_exec 162940 55815
> > > >>pageref_keep 141354 51011
> > > >>pgmajfault 24863 23633
> > > >>pgrefill_dma 1116370 544042
> > > >>pgscan_kswapd_dma 1735186 1234622
> > > >>pgsteal_kswapd_dma 1121769 1005725
> > > >>pgscan_direct_dma 12966 1090
> > > >>pgsteal_direct_dma 6209 967
> > > >>slabs_scanned 1539849 977351
> > > >>pageoutrun 1260 1333
> > > >>allocstall 47 7
> > > >>
> > > >>=== With swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 597687 878109
> > > >>workingset_activate 167169 254037
> > > >>pgpgin 4035424 5157348
> > > >>pgpgout 162151 85231
> > > >>pgpgoutclean 928587 1225029
> > > >>pswpin 46033 17100
> > > >>pswpout 237952 127686
> > > >>pgalloc_dma 3305034 3542614
> > > >>pgfree 3354989 3592132
> > > >>pgactivate 626468 355275
> > > >>pgdeactivate 990205 771902
> > > >>pageref_activate 294780 157106
> > > >>pageref_activate_vm_exec 141722 63469
> > > >>pageref_keep 121931 63028
> > > >>pgmajfault 67818 45643
> > > >>pgrefill_dma 1324023 977192
> > > >>pgscan_kswapd_dma 1825267 1720322
> > > >>pgsteal_kswapd_dma 1181882 1365500
> > > >>pgscan_direct_dma 41957 9622
> > > >>pgsteal_direct_dma 25136 6759
> > > >>slabs_scanned 689575 542705
> > > >>pageoutrun 1234 1538
> > > >>allocstall 110 26
> > > >>
> > > >>Looks like with fault_around, there is more pressure on reclaim because
> > > >>of the presence of more mapped pages, resulting in more IO activity,
> > > >>more faults, more swapping, and allocstalls.
> > > >
> > > >A few of those things did get a bit worse?
> > > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > better with fault_around because, increased number of mapped pages is
> > > resulting in less number of file pages being reclaimed (pageref_activate,
> > > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > of increased swapping, decrease in kswapd efficiency and increase in
> > > allocstalls.
> > > So the problem looks to be that unwanted pages are mapped around the fault
> > > and page_check_references is unaware of this.
> >
> > Hm. It makes me think we should make ptes setup by faultaround old.
> >
> > Although, it would defeat (to some extend) purpose of faultaround on
> > architectures without HW accessed bit :-/
>
> So, faultaround should be disabled for non HW access bit architecture?
Not necessarily. Need to be tested. For those architectures, after
faultaround, we would get faults to set accessed bit, which should be
cheaper than fault to pte_none().
> As you said, it would defeat faultaround benefit. As well, it adds reclaim
> overhead because rmap should handle it to remove ptes and more pressure to slab.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-22 15:16 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-04-22 15:16 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Sat, Apr 23, 2016 at 12:09:46AM +0900, Minchan Kim wrote:
> On Fri, Apr 22, 2016 at 12:44:30PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > >On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > >
> > > >>Mapping pages around fault is found to cause performance degradation
> > > >>in certain use cases. The test performed here is launch of 10 apps
> > > >>one by one, doing something with the app each time, and then repeating
> > > >>the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > >>of RAM. The time taken to launch the apps is found to be better when
> > > >>fault around feature is disabled by setting fault_around_bytes to page
> > > >>size (4096 in this case).
> > > >
> > > >Well that's one workload, and a somewhat strange one. What is the
> > > >effect on other workloads (of which there are a lot!).
> > > >
> > > This workload emulates the way a user would use his mobile device, opening
> > > an application, using it for some time, switching to next, and then coming
> > > back to the same application later. Another stat which shows significant
> > > degradation on Android with fault_around is device boot up time. I have not
> > > tried any other workload other than these.
> > >
> > > >>The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > >>for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > >>__delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > >>and pageref_keep accounts the mapped file pages activated and retained
> > > >>by page_check_references.
> > > >>
> > > >>=== Without swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 691100 664339
> > > >>workingset_activate 210379 179139
> > > >>pgpgin 4676096 4492780
> > > >>pgpgout 163967 96711
> > > >>pgpgoutclean 1090664 990659
> > > >>pgalloc_dma 3463111 3328299
> > > >>pgfree 3502365 3363866
> > > >>pgactivate 568134 238570
> > > >>pgdeactivate 752260 392138
> > > >>pageref_activate 315078 121705
> > > >>pageref_activate_vm_exec 162940 55815
> > > >>pageref_keep 141354 51011
> > > >>pgmajfault 24863 23633
> > > >>pgrefill_dma 1116370 544042
> > > >>pgscan_kswapd_dma 1735186 1234622
> > > >>pgsteal_kswapd_dma 1121769 1005725
> > > >>pgscan_direct_dma 12966 1090
> > > >>pgsteal_direct_dma 6209 967
> > > >>slabs_scanned 1539849 977351
> > > >>pageoutrun 1260 1333
> > > >>allocstall 47 7
> > > >>
> > > >>=== With swap ===
> > > >> 3.18 3.18-fault_around_bytes=4096
> > > >>-----------------------------------------------------------------------
> > > >>workingset_refault 597687 878109
> > > >>workingset_activate 167169 254037
> > > >>pgpgin 4035424 5157348
> > > >>pgpgout 162151 85231
> > > >>pgpgoutclean 928587 1225029
> > > >>pswpin 46033 17100
> > > >>pswpout 237952 127686
> > > >>pgalloc_dma 3305034 3542614
> > > >>pgfree 3354989 3592132
> > > >>pgactivate 626468 355275
> > > >>pgdeactivate 990205 771902
> > > >>pageref_activate 294780 157106
> > > >>pageref_activate_vm_exec 141722 63469
> > > >>pageref_keep 121931 63028
> > > >>pgmajfault 67818 45643
> > > >>pgrefill_dma 1324023 977192
> > > >>pgscan_kswapd_dma 1825267 1720322
> > > >>pgsteal_kswapd_dma 1181882 1365500
> > > >>pgscan_direct_dma 41957 9622
> > > >>pgsteal_direct_dma 25136 6759
> > > >>slabs_scanned 689575 542705
> > > >>pageoutrun 1234 1538
> > > >>allocstall 110 26
> > > >>
> > > >>Looks like with fault_around, there is more pressure on reclaim because
> > > >>of the presence of more mapped pages, resulting in more IO activity,
> > > >>more faults, more swapping, and allocstalls.
> > > >
> > > >A few of those things did get a bit worse?
> > > I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > better with fault_around because, increased number of mapped pages is
> > > resulting in less number of file pages being reclaimed (pageref_activate,
> > > pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > of increased swapping, decrease in kswapd efficiency and increase in
> > > allocstalls.
> > > So the problem looks to be that unwanted pages are mapped around the fault
> > > and page_check_references is unaware of this.
> >
> > Hm. It makes me think we should make ptes setup by faultaround old.
> >
> > Although, it would defeat (to some extend) purpose of faultaround on
> > architectures without HW accessed bit :-/
>
> So, faultaround should be disabled for non HW access bit architecture?
Not necessarily. Need to be tested. For those architectures, after
faultaround, we would get faults to set accessed bit, which should be
cheaper than fault to pte_none().
> As you said, it would defeat faultaround benefit. As well, it adds reclaim
> overhead because rmap should handle it to remove ptes and more pressure to slab.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-22 9:44 ` Kirill A. Shutemov
@ 2016-04-25 11:51 ` Vinayak Menon
-1 siblings, 0 replies; 34+ messages in thread
From: Vinayak Menon @ 2016-04-25 11:51 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, linux-mm, linux-kernel, dan.j.williams, mgorman,
vbabka, kirill.shutemov, dave.hansen, hughd
On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
>> On 04/22/2016 05:31 AM, Andrew Morton wrote:
>>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
>>>
>>>> Mapping pages around fault is found to cause performance degradation
>>>> in certain use cases. The test performed here is launch of 10 apps
>>>> one by one, doing something with the app each time, and then repeating
>>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
>>>> of RAM. The time taken to launch the apps is found to be better when
>>>> fault around feature is disabled by setting fault_around_bytes to page
>>>> size (4096 in this case).
>>> Well that's one workload, and a somewhat strange one. What is the
>>> effect on other workloads (of which there are a lot!).
>>>
>> This workload emulates the way a user would use his mobile device, opening
>> an application, using it for some time, switching to next, and then coming
>> back to the same application later. Another stat which shows significant
>> degradation on Android with fault_around is device boot up time. I have not
>> tried any other workload other than these.
>>
>>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
>>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
>>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
>>>> and pageref_keep accounts the mapped file pages activated and retained
>>>> by page_check_references.
>>>>
>>>> === Without swap ===
>>>> 3.18 3.18-fault_around_bytes=4096
>>>> -----------------------------------------------------------------------
>>>> workingset_refault 691100 664339
>>>> workingset_activate 210379 179139
>>>> pgpgin 4676096 4492780
>>>> pgpgout 163967 96711
>>>> pgpgoutclean 1090664 990659
>>>> pgalloc_dma 3463111 3328299
>>>> pgfree 3502365 3363866
>>>> pgactivate 568134 238570
>>>> pgdeactivate 752260 392138
>>>> pageref_activate 315078 121705
>>>> pageref_activate_vm_exec 162940 55815
>>>> pageref_keep 141354 51011
>>>> pgmajfault 24863 23633
>>>> pgrefill_dma 1116370 544042
>>>> pgscan_kswapd_dma 1735186 1234622
>>>> pgsteal_kswapd_dma 1121769 1005725
>>>> pgscan_direct_dma 12966 1090
>>>> pgsteal_direct_dma 6209 967
>>>> slabs_scanned 1539849 977351
>>>> pageoutrun 1260 1333
>>>> allocstall 47 7
>>>>
>>>> === With swap ===
>>>> 3.18 3.18-fault_around_bytes=4096
>>>> -----------------------------------------------------------------------
>>>> workingset_refault 597687 878109
>>>> workingset_activate 167169 254037
>>>> pgpgin 4035424 5157348
>>>> pgpgout 162151 85231
>>>> pgpgoutclean 928587 1225029
>>>> pswpin 46033 17100
>>>> pswpout 237952 127686
>>>> pgalloc_dma 3305034 3542614
>>>> pgfree 3354989 3592132
>>>> pgactivate 626468 355275
>>>> pgdeactivate 990205 771902
>>>> pageref_activate 294780 157106
>>>> pageref_activate_vm_exec 141722 63469
>>>> pageref_keep 121931 63028
>>>> pgmajfault 67818 45643
>>>> pgrefill_dma 1324023 977192
>>>> pgscan_kswapd_dma 1825267 1720322
>>>> pgsteal_kswapd_dma 1181882 1365500
>>>> pgscan_direct_dma 41957 9622
>>>> pgsteal_direct_dma 25136 6759
>>>> slabs_scanned 689575 542705
>>>> pageoutrun 1234 1538
>>>> allocstall 110 26
>>>>
>>>> Looks like with fault_around, there is more pressure on reclaim because
>>>> of the presence of more mapped pages, resulting in more IO activity,
>>>> more faults, more swapping, and allocstalls.
>>> A few of those things did get a bit worse?
>> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
>> better with fault_around because, increased number of mapped pages is
>> resulting in less number of file pages being reclaimed (pageref_activate,
>> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
>> Latency numbers are far bad with fault_around_bytes + swap, possibly because
>> of increased swapping, decrease in kswapd efficiency and increase in
>> allocstalls.
>> So the problem looks to be that unwanted pages are mapped around the fault
>> and page_check_references is unaware of this.
> Hm. It makes me think we should make ptes setup by faultaround old.
>
> Although, it would defeat (to some extend) purpose of faultaround on
> architectures without HW accessed bit :-/
>
> Could you check if the patch below changes the situation?
> It would require some more work to not mark the pte we've got fault for old.
Column at the end shows the values with the patch
3.18 3.18-fab=4096 3.18-Kirill's-fix
---------------------------------------------------------
workingset_refault 597687 878109 790207
workingset_activate 167169 254037 207912
pgpgin 4035424 5157348 4793116
pgpgout 162151 85231 85539
pgpgoutclean 928587 1225029 1129088
pswpin 46033 17100 8926
pswpout 237952 127686 103435
pgalloc_dma 3305034 3542614 3401000
pgfree 3354989 3592132 3457783
pgactivate 626468 355275 326716
pgdeactivate 990205 771902 697392
pageref_activate 294780 157106 138451
pageref_activate_vm_exec 141722 63469 64585
pageref_keep 121931 63028 65811
pgmajfault 67818 45643 34944
pgrefill_dma 1324023 977192 874497
pgscan_kswapd_dma 1825267 1720322 1577483
pgsteal_kswapd_dma 1181882 1365500 1243968
pgscan_direct_dma 41957 9622 9387
pgsteal_direct_dma 25136 6759 7108
slabs_scanned 689575 542705 618839
pageoutrun 1234 1538 1450
allocstall 110 26 13
Everything seems to have improved except slabs_scanned, possibly because
of this check which Minchan pointed out, that results in higher pressure on slabs.
if (page_mapped(page) || PageSwapCache(page))
sc->nr_scanned++;
I had added some traces to monitor the vmpressure values. Those also seems to
be high, possibly because of the same reason.
Should the pressure be doubled only if page is mapped and referenced ?
There is big improvement in avg latency, but still 5% higher than with fault_around
disabled. I will try to debug this further.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-04-25 11:51 ` Vinayak Menon
0 siblings, 0 replies; 34+ messages in thread
From: Vinayak Menon @ 2016-04-25 11:51 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, linux-mm, linux-kernel, dan.j.williams, mgorman,
vbabka, kirill.shutemov, dave.hansen, hughd
On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
>> On 04/22/2016 05:31 AM, Andrew Morton wrote:
>>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
>>>
>>>> Mapping pages around fault is found to cause performance degradation
>>>> in certain use cases. The test performed here is launch of 10 apps
>>>> one by one, doing something with the app each time, and then repeating
>>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
>>>> of RAM. The time taken to launch the apps is found to be better when
>>>> fault around feature is disabled by setting fault_around_bytes to page
>>>> size (4096 in this case).
>>> Well that's one workload, and a somewhat strange one. What is the
>>> effect on other workloads (of which there are a lot!).
>>>
>> This workload emulates the way a user would use his mobile device, opening
>> an application, using it for some time, switching to next, and then coming
>> back to the same application later. Another stat which shows significant
>> degradation on Android with fault_around is device boot up time. I have not
>> tried any other workload other than these.
>>
>>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
>>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
>>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
>>>> and pageref_keep accounts the mapped file pages activated and retained
>>>> by page_check_references.
>>>>
>>>> === Without swap ===
>>>> 3.18 3.18-fault_around_bytes=4096
>>>> -----------------------------------------------------------------------
>>>> workingset_refault 691100 664339
>>>> workingset_activate 210379 179139
>>>> pgpgin 4676096 4492780
>>>> pgpgout 163967 96711
>>>> pgpgoutclean 1090664 990659
>>>> pgalloc_dma 3463111 3328299
>>>> pgfree 3502365 3363866
>>>> pgactivate 568134 238570
>>>> pgdeactivate 752260 392138
>>>> pageref_activate 315078 121705
>>>> pageref_activate_vm_exec 162940 55815
>>>> pageref_keep 141354 51011
>>>> pgmajfault 24863 23633
>>>> pgrefill_dma 1116370 544042
>>>> pgscan_kswapd_dma 1735186 1234622
>>>> pgsteal_kswapd_dma 1121769 1005725
>>>> pgscan_direct_dma 12966 1090
>>>> pgsteal_direct_dma 6209 967
>>>> slabs_scanned 1539849 977351
>>>> pageoutrun 1260 1333
>>>> allocstall 47 7
>>>>
>>>> === With swap ===
>>>> 3.18 3.18-fault_around_bytes=4096
>>>> -----------------------------------------------------------------------
>>>> workingset_refault 597687 878109
>>>> workingset_activate 167169 254037
>>>> pgpgin 4035424 5157348
>>>> pgpgout 162151 85231
>>>> pgpgoutclean 928587 1225029
>>>> pswpin 46033 17100
>>>> pswpout 237952 127686
>>>> pgalloc_dma 3305034 3542614
>>>> pgfree 3354989 3592132
>>>> pgactivate 626468 355275
>>>> pgdeactivate 990205 771902
>>>> pageref_activate 294780 157106
>>>> pageref_activate_vm_exec 141722 63469
>>>> pageref_keep 121931 63028
>>>> pgmajfault 67818 45643
>>>> pgrefill_dma 1324023 977192
>>>> pgscan_kswapd_dma 1825267 1720322
>>>> pgsteal_kswapd_dma 1181882 1365500
>>>> pgscan_direct_dma 41957 9622
>>>> pgsteal_direct_dma 25136 6759
>>>> slabs_scanned 689575 542705
>>>> pageoutrun 1234 1538
>>>> allocstall 110 26
>>>>
>>>> Looks like with fault_around, there is more pressure on reclaim because
>>>> of the presence of more mapped pages, resulting in more IO activity,
>>>> more faults, more swapping, and allocstalls.
>>> A few of those things did get a bit worse?
>> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
>> better with fault_around because, increased number of mapped pages is
>> resulting in less number of file pages being reclaimed (pageref_activate,
>> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
>> Latency numbers are far bad with fault_around_bytes + swap, possibly because
>> of increased swapping, decrease in kswapd efficiency and increase in
>> allocstalls.
>> So the problem looks to be that unwanted pages are mapped around the fault
>> and page_check_references is unaware of this.
> Hm. It makes me think we should make ptes setup by faultaround old.
>
> Although, it would defeat (to some extend) purpose of faultaround on
> architectures without HW accessed bit :-/
>
> Could you check if the patch below changes the situation?
> It would require some more work to not mark the pte we've got fault for old.
Column at the end shows the values with the patch
3.18 3.18-fab=4096 3.18-Kirill's-fix
---------------------------------------------------------
workingset_refault 597687 878109 790207
workingset_activate 167169 254037 207912
pgpgin 4035424 5157348 4793116
pgpgout 162151 85231 85539
pgpgoutclean 928587 1225029 1129088
pswpin 46033 17100 8926
pswpout 237952 127686 103435
pgalloc_dma 3305034 3542614 3401000
pgfree 3354989 3592132 3457783
pgactivate 626468 355275 326716
pgdeactivate 990205 771902 697392
pageref_activate 294780 157106 138451
pageref_activate_vm_exec 141722 63469 64585
pageref_keep 121931 63028 65811
pgmajfault 67818 45643 34944
pgrefill_dma 1324023 977192 874497
pgscan_kswapd_dma 1825267 1720322 1577483
pgsteal_kswapd_dma 1181882 1365500 1243968
pgscan_direct_dma 41957 9622 9387
pgsteal_direct_dma 25136 6759 7108
slabs_scanned 689575 542705 618839
pageoutrun 1234 1538 1450
allocstall 110 26 13
Everything seems to have improved except slabs_scanned, possibly because
of this check which Minchan pointed out, that results in higher pressure on slabs.
if (page_mapped(page) || PageSwapCache(page))
sc->nr_scanned++;
I had added some traces to monitor the vmpressure values. Those also seems to
be high, possibly because of the same reason.
Should the pressure be doubled only if page is mapped and referenced ?
There is big improvement in avg latency, but still 5% higher than with fault_around
disabled. I will try to debug this further.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-04-25 11:51 ` Vinayak Menon
@ 2016-05-09 7:32 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-09 7:32 UTC (permalink / raw)
To: Vinayak Menon
Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
Hello,
On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
>
>
> On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> >>>
> >>>> Mapping pages around fault is found to cause performance degradation
> >>>> in certain use cases. The test performed here is launch of 10 apps
> >>>> one by one, doing something with the app each time, and then repeating
> >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> >>>> of RAM. The time taken to launch the apps is found to be better when
> >>>> fault around feature is disabled by setting fault_around_bytes to page
> >>>> size (4096 in this case).
> >>> Well that's one workload, and a somewhat strange one. What is the
> >>> effect on other workloads (of which there are a lot!).
> >>>
> >> This workload emulates the way a user would use his mobile device, opening
> >> an application, using it for some time, switching to next, and then coming
> >> back to the same application later. Another stat which shows significant
> >> degradation on Android with fault_around is device boot up time. I have not
> >> tried any other workload other than these.
> >>
> >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> >>>> and pageref_keep accounts the mapped file pages activated and retained
> >>>> by page_check_references.
> >>>>
> >>>> === Without swap ===
> >>>> 3.18 3.18-fault_around_bytes=4096
> >>>> -----------------------------------------------------------------------
> >>>> workingset_refault 691100 664339
> >>>> workingset_activate 210379 179139
> >>>> pgpgin 4676096 4492780
> >>>> pgpgout 163967 96711
> >>>> pgpgoutclean 1090664 990659
> >>>> pgalloc_dma 3463111 3328299
> >>>> pgfree 3502365 3363866
> >>>> pgactivate 568134 238570
> >>>> pgdeactivate 752260 392138
> >>>> pageref_activate 315078 121705
> >>>> pageref_activate_vm_exec 162940 55815
> >>>> pageref_keep 141354 51011
> >>>> pgmajfault 24863 23633
> >>>> pgrefill_dma 1116370 544042
> >>>> pgscan_kswapd_dma 1735186 1234622
> >>>> pgsteal_kswapd_dma 1121769 1005725
> >>>> pgscan_direct_dma 12966 1090
> >>>> pgsteal_direct_dma 6209 967
> >>>> slabs_scanned 1539849 977351
> >>>> pageoutrun 1260 1333
> >>>> allocstall 47 7
> >>>>
> >>>> === With swap ===
> >>>> 3.18 3.18-fault_around_bytes=4096
> >>>> -----------------------------------------------------------------------
> >>>> workingset_refault 597687 878109
> >>>> workingset_activate 167169 254037
> >>>> pgpgin 4035424 5157348
> >>>> pgpgout 162151 85231
> >>>> pgpgoutclean 928587 1225029
> >>>> pswpin 46033 17100
> >>>> pswpout 237952 127686
> >>>> pgalloc_dma 3305034 3542614
> >>>> pgfree 3354989 3592132
> >>>> pgactivate 626468 355275
> >>>> pgdeactivate 990205 771902
> >>>> pageref_activate 294780 157106
> >>>> pageref_activate_vm_exec 141722 63469
> >>>> pageref_keep 121931 63028
> >>>> pgmajfault 67818 45643
> >>>> pgrefill_dma 1324023 977192
> >>>> pgscan_kswapd_dma 1825267 1720322
> >>>> pgsteal_kswapd_dma 1181882 1365500
> >>>> pgscan_direct_dma 41957 9622
> >>>> pgsteal_direct_dma 25136 6759
> >>>> slabs_scanned 689575 542705
> >>>> pageoutrun 1234 1538
> >>>> allocstall 110 26
> >>>>
> >>>> Looks like with fault_around, there is more pressure on reclaim because
> >>>> of the presence of more mapped pages, resulting in more IO activity,
> >>>> more faults, more swapping, and allocstalls.
> >>> A few of those things did get a bit worse?
> >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> >> better with fault_around because, increased number of mapped pages is
> >> resulting in less number of file pages being reclaimed (pageref_activate,
> >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> >> of increased swapping, decrease in kswapd efficiency and increase in
> >> allocstalls.
> >> So the problem looks to be that unwanted pages are mapped around the fault
> >> and page_check_references is unaware of this.
> > Hm. It makes me think we should make ptes setup by faultaround old.
> >
> > Although, it would defeat (to some extend) purpose of faultaround on
> > architectures without HW accessed bit :-/
> >
> > Could you check if the patch below changes the situation?
> > It would require some more work to not mark the pte we've got fault for old.
>
> Column at the end shows the values with the patch
>
> 3.18 3.18-fab=4096 3.18-Kirill's-fix
>
> ---------------------------------------------------------
>
> workingset_refault 597687 878109 790207
>
> workingset_activate 167169 254037 207912
>
> pgpgin 4035424 5157348 4793116
>
> pgpgout 162151 85231 85539
>
> pgpgoutclean 928587 1225029 1129088
>
> pswpin 46033 17100 8926
>
> pswpout 237952 127686 103435
>
> pgalloc_dma 3305034 3542614 3401000
>
> pgfree 3354989 3592132 3457783
>
> pgactivate 626468 355275 326716
>
> pgdeactivate 990205 771902 697392
>
> pageref_activate 294780 157106 138451
>
> pageref_activate_vm_exec 141722 63469 64585
>
> pageref_keep 121931 63028 65811
>
> pgmajfault 67818 45643 34944
>
> pgrefill_dma 1324023 977192 874497
>
> pgscan_kswapd_dma 1825267 1720322 1577483
>
> pgsteal_kswapd_dma 1181882 1365500 1243968
>
> pgscan_direct_dma 41957 9622 9387
>
> pgsteal_direct_dma 25136 6759 7108
>
> slabs_scanned 689575 542705 618839
>
> pageoutrun 1234 1538 1450
>
> allocstall 110 26 13
>
> Everything seems to have improved except slabs_scanned, possibly because
> of this check which Minchan pointed out, that results in higher pressure on slabs.
>
> if (page_mapped(page) || PageSwapCache(page))
>
> sc->nr_scanned++;
>
> I had added some traces to monitor the vmpressure values. Those also seems to
> be high, possibly because of the same reason.
>
> Should the pressure be doubled only if page is mapped and referenced ?
Yes, pte_mkold is not perfect at the moment.
Anyway, above heuristic has been in there for a long time since I was born
maybe :) (I don't want to argue why it's there and whether it's right) So,
I'm really hesitant to change it that it might bite some workloads.
(But I don't mean I'm against it but just don't want to make it by myself
to avoid potential blame). IOW, Kirill's fault_around broke it too so it
could bite some workloads.
At least, as Vinayak mentioned, it would change vmpressure level so users of
vmpressure can be affected. AFAIK, some vendors in embedded side relies on
vmpressure to control memory management so it will hurt them.
As well, slab shrinking behavior was changed, too. Unfortunately, I don't
know any workload is dependent with it.
As other regression in my company product, we have snapshot a process
with workingset for later fast resume. For that, we have considered
pte-mapped pages as workingset for snapshot but snapshot start to include
non-workingset pages since fault-around is merged. It means snapshot
image size is increased so that we need more storage space and it starts
the thing slow down. I guess mincore(2) users will be affected.
Additional Note: There are lots of products with ARM which is non-HW access
bit system in embedded world although ARM start to support it recenlty and
sequential file access workload is not important compared to memory reclaim
So, fault_around's benefit could be higly limited compared to HW-access bit
architectures on server workload.
I want to ask again.
I guess we could disable fault_around by kernel parameter but does it
sound reasonable to enable fault_around by default for every arches
at the cost of above regression?
I'm not against for that. Just what I want is some fixes about the
regression should go to -stable.
>
> There is big improvement in avg latency, but still 5% higher than with fault_around
> disabled. I will try to debug this further.
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-05-09 7:32 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-09 7:32 UTC (permalink / raw)
To: Vinayak Menon
Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
Hello,
On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
>
>
> On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> >>>
> >>>> Mapping pages around fault is found to cause performance degradation
> >>>> in certain use cases. The test performed here is launch of 10 apps
> >>>> one by one, doing something with the app each time, and then repeating
> >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> >>>> of RAM. The time taken to launch the apps is found to be better when
> >>>> fault around feature is disabled by setting fault_around_bytes to page
> >>>> size (4096 in this case).
> >>> Well that's one workload, and a somewhat strange one. What is the
> >>> effect on other workloads (of which there are a lot!).
> >>>
> >> This workload emulates the way a user would use his mobile device, opening
> >> an application, using it for some time, switching to next, and then coming
> >> back to the same application later. Another stat which shows significant
> >> degradation on Android with fault_around is device boot up time. I have not
> >> tried any other workload other than these.
> >>
> >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> >>>> and pageref_keep accounts the mapped file pages activated and retained
> >>>> by page_check_references.
> >>>>
> >>>> === Without swap ===
> >>>> 3.18 3.18-fault_around_bytes=4096
> >>>> -----------------------------------------------------------------------
> >>>> workingset_refault 691100 664339
> >>>> workingset_activate 210379 179139
> >>>> pgpgin 4676096 4492780
> >>>> pgpgout 163967 96711
> >>>> pgpgoutclean 1090664 990659
> >>>> pgalloc_dma 3463111 3328299
> >>>> pgfree 3502365 3363866
> >>>> pgactivate 568134 238570
> >>>> pgdeactivate 752260 392138
> >>>> pageref_activate 315078 121705
> >>>> pageref_activate_vm_exec 162940 55815
> >>>> pageref_keep 141354 51011
> >>>> pgmajfault 24863 23633
> >>>> pgrefill_dma 1116370 544042
> >>>> pgscan_kswapd_dma 1735186 1234622
> >>>> pgsteal_kswapd_dma 1121769 1005725
> >>>> pgscan_direct_dma 12966 1090
> >>>> pgsteal_direct_dma 6209 967
> >>>> slabs_scanned 1539849 977351
> >>>> pageoutrun 1260 1333
> >>>> allocstall 47 7
> >>>>
> >>>> === With swap ===
> >>>> 3.18 3.18-fault_around_bytes=4096
> >>>> -----------------------------------------------------------------------
> >>>> workingset_refault 597687 878109
> >>>> workingset_activate 167169 254037
> >>>> pgpgin 4035424 5157348
> >>>> pgpgout 162151 85231
> >>>> pgpgoutclean 928587 1225029
> >>>> pswpin 46033 17100
> >>>> pswpout 237952 127686
> >>>> pgalloc_dma 3305034 3542614
> >>>> pgfree 3354989 3592132
> >>>> pgactivate 626468 355275
> >>>> pgdeactivate 990205 771902
> >>>> pageref_activate 294780 157106
> >>>> pageref_activate_vm_exec 141722 63469
> >>>> pageref_keep 121931 63028
> >>>> pgmajfault 67818 45643
> >>>> pgrefill_dma 1324023 977192
> >>>> pgscan_kswapd_dma 1825267 1720322
> >>>> pgsteal_kswapd_dma 1181882 1365500
> >>>> pgscan_direct_dma 41957 9622
> >>>> pgsteal_direct_dma 25136 6759
> >>>> slabs_scanned 689575 542705
> >>>> pageoutrun 1234 1538
> >>>> allocstall 110 26
> >>>>
> >>>> Looks like with fault_around, there is more pressure on reclaim because
> >>>> of the presence of more mapped pages, resulting in more IO activity,
> >>>> more faults, more swapping, and allocstalls.
> >>> A few of those things did get a bit worse?
> >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> >> better with fault_around because, increased number of mapped pages is
> >> resulting in less number of file pages being reclaimed (pageref_activate,
> >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> >> of increased swapping, decrease in kswapd efficiency and increase in
> >> allocstalls.
> >> So the problem looks to be that unwanted pages are mapped around the fault
> >> and page_check_references is unaware of this.
> > Hm. It makes me think we should make ptes setup by faultaround old.
> >
> > Although, it would defeat (to some extend) purpose of faultaround on
> > architectures without HW accessed bit :-/
> >
> > Could you check if the patch below changes the situation?
> > It would require some more work to not mark the pte we've got fault for old.
>
> Column at the end shows the values with the patch
>
> 3.18 3.18-fab=4096 3.18-Kirill's-fix
>
> ---------------------------------------------------------
>
> workingset_refault 597687 878109 790207
>
> workingset_activate 167169 254037 207912
>
> pgpgin 4035424 5157348 4793116
>
> pgpgout 162151 85231 85539
>
> pgpgoutclean 928587 1225029 1129088
>
> pswpin 46033 17100 8926
>
> pswpout 237952 127686 103435
>
> pgalloc_dma 3305034 3542614 3401000
>
> pgfree 3354989 3592132 3457783
>
> pgactivate 626468 355275 326716
>
> pgdeactivate 990205 771902 697392
>
> pageref_activate 294780 157106 138451
>
> pageref_activate_vm_exec 141722 63469 64585
>
> pageref_keep 121931 63028 65811
>
> pgmajfault 67818 45643 34944
>
> pgrefill_dma 1324023 977192 874497
>
> pgscan_kswapd_dma 1825267 1720322 1577483
>
> pgsteal_kswapd_dma 1181882 1365500 1243968
>
> pgscan_direct_dma 41957 9622 9387
>
> pgsteal_direct_dma 25136 6759 7108
>
> slabs_scanned 689575 542705 618839
>
> pageoutrun 1234 1538 1450
>
> allocstall 110 26 13
>
> Everything seems to have improved except slabs_scanned, possibly because
> of this check which Minchan pointed out, that results in higher pressure on slabs.
>
> if (page_mapped(page) || PageSwapCache(page))
>
> sc->nr_scanned++;
>
> I had added some traces to monitor the vmpressure values. Those also seems to
> be high, possibly because of the same reason.
>
> Should the pressure be doubled only if page is mapped and referenced ?
Yes, pte_mkold is not perfect at the moment.
Anyway, above heuristic has been in there for a long time since I was born
maybe :) (I don't want to argue why it's there and whether it's right) So,
I'm really hesitant to change it that it might bite some workloads.
(But I don't mean I'm against it but just don't want to make it by myself
to avoid potential blame). IOW, Kirill's fault_around broke it too so it
could bite some workloads.
At least, as Vinayak mentioned, it would change vmpressure level so users of
vmpressure can be affected. AFAIK, some vendors in embedded side relies on
vmpressure to control memory management so it will hurt them.
As well, slab shrinking behavior was changed, too. Unfortunately, I don't
know any workload is dependent with it.
As other regression in my company product, we have snapshot a process
with workingset for later fast resume. For that, we have considered
pte-mapped pages as workingset for snapshot but snapshot start to include
non-workingset pages since fault-around is merged. It means snapshot
image size is increased so that we need more storage space and it starts
the thing slow down. I guess mincore(2) users will be affected.
Additional Note: There are lots of products with ARM which is non-HW access
bit system in embedded world although ARM start to support it recenlty and
sequential file access workload is not important compared to memory reclaim
So, fault_around's benefit could be higly limited compared to HW-access bit
architectures on server workload.
I want to ask again.
I guess we could disable fault_around by kernel parameter but does it
sound reasonable to enable fault_around by default for every arches
at the cost of above regression?
I'm not against for that. Just what I want is some fixes about the
regression should go to -stable.
>
> There is big improvement in avg latency, but still 5% higher than with fault_around
> disabled. I will try to debug this further.
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-05-09 7:32 ` Minchan Kim
@ 2016-05-10 2:48 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-10 2:48 UTC (permalink / raw)
To: Vinayak Menon
Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> Hello,
>
> On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> >
> >
> > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > >>>
> > >>>> Mapping pages around fault is found to cause performance degradation
> > >>>> in certain use cases. The test performed here is launch of 10 apps
> > >>>> one by one, doing something with the app each time, and then repeating
> > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > >>>> of RAM. The time taken to launch the apps is found to be better when
> > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > >>>> size (4096 in this case).
> > >>> Well that's one workload, and a somewhat strange one. What is the
> > >>> effect on other workloads (of which there are a lot!).
> > >>>
> > >> This workload emulates the way a user would use his mobile device, opening
> > >> an application, using it for some time, switching to next, and then coming
> > >> back to the same application later. Another stat which shows significant
> > >> degradation on Android with fault_around is device boot up time. I have not
> > >> tried any other workload other than these.
> > >>
> > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > >>>> by page_check_references.
> > >>>>
> > >>>> === Without swap ===
> > >>>> 3.18 3.18-fault_around_bytes=4096
> > >>>> -----------------------------------------------------------------------
> > >>>> workingset_refault 691100 664339
> > >>>> workingset_activate 210379 179139
> > >>>> pgpgin 4676096 4492780
> > >>>> pgpgout 163967 96711
> > >>>> pgpgoutclean 1090664 990659
> > >>>> pgalloc_dma 3463111 3328299
> > >>>> pgfree 3502365 3363866
> > >>>> pgactivate 568134 238570
> > >>>> pgdeactivate 752260 392138
> > >>>> pageref_activate 315078 121705
> > >>>> pageref_activate_vm_exec 162940 55815
> > >>>> pageref_keep 141354 51011
> > >>>> pgmajfault 24863 23633
> > >>>> pgrefill_dma 1116370 544042
> > >>>> pgscan_kswapd_dma 1735186 1234622
> > >>>> pgsteal_kswapd_dma 1121769 1005725
> > >>>> pgscan_direct_dma 12966 1090
> > >>>> pgsteal_direct_dma 6209 967
> > >>>> slabs_scanned 1539849 977351
> > >>>> pageoutrun 1260 1333
> > >>>> allocstall 47 7
> > >>>>
> > >>>> === With swap ===
> > >>>> 3.18 3.18-fault_around_bytes=4096
> > >>>> -----------------------------------------------------------------------
> > >>>> workingset_refault 597687 878109
> > >>>> workingset_activate 167169 254037
> > >>>> pgpgin 4035424 5157348
> > >>>> pgpgout 162151 85231
> > >>>> pgpgoutclean 928587 1225029
> > >>>> pswpin 46033 17100
> > >>>> pswpout 237952 127686
> > >>>> pgalloc_dma 3305034 3542614
> > >>>> pgfree 3354989 3592132
> > >>>> pgactivate 626468 355275
> > >>>> pgdeactivate 990205 771902
> > >>>> pageref_activate 294780 157106
> > >>>> pageref_activate_vm_exec 141722 63469
> > >>>> pageref_keep 121931 63028
> > >>>> pgmajfault 67818 45643
> > >>>> pgrefill_dma 1324023 977192
> > >>>> pgscan_kswapd_dma 1825267 1720322
> > >>>> pgsteal_kswapd_dma 1181882 1365500
> > >>>> pgscan_direct_dma 41957 9622
> > >>>> pgsteal_direct_dma 25136 6759
> > >>>> slabs_scanned 689575 542705
> > >>>> pageoutrun 1234 1538
> > >>>> allocstall 110 26
> > >>>>
> > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > >>>> more faults, more swapping, and allocstalls.
> > >>> A few of those things did get a bit worse?
> > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > >> better with fault_around because, increased number of mapped pages is
> > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > >> of increased swapping, decrease in kswapd efficiency and increase in
> > >> allocstalls.
> > >> So the problem looks to be that unwanted pages are mapped around the fault
> > >> and page_check_references is unaware of this.
> > > Hm. It makes me think we should make ptes setup by faultaround old.
> > >
> > > Although, it would defeat (to some extend) purpose of faultaround on
> > > architectures without HW accessed bit :-/
> > >
> > > Could you check if the patch below changes the situation?
> > > It would require some more work to not mark the pte we've got fault for old.
> >
> > Column at the end shows the values with the patch
> >
> > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> >
> > ---------------------------------------------------------
> >
> > workingset_refault 597687 878109 790207
> >
> > workingset_activate 167169 254037 207912
> >
> > pgpgin 4035424 5157348 4793116
> >
> > pgpgout 162151 85231 85539
> >
> > pgpgoutclean 928587 1225029 1129088
> >
> > pswpin 46033 17100 8926
> >
> > pswpout 237952 127686 103435
> >
> > pgalloc_dma 3305034 3542614 3401000
> >
> > pgfree 3354989 3592132 3457783
> >
> > pgactivate 626468 355275 326716
> >
> > pgdeactivate 990205 771902 697392
> >
> > pageref_activate 294780 157106 138451
> >
> > pageref_activate_vm_exec 141722 63469 64585
> >
> > pageref_keep 121931 63028 65811
> >
> > pgmajfault 67818 45643 34944
> >
> > pgrefill_dma 1324023 977192 874497
> >
> > pgscan_kswapd_dma 1825267 1720322 1577483
> >
> > pgsteal_kswapd_dma 1181882 1365500 1243968
> >
> > pgscan_direct_dma 41957 9622 9387
> >
> > pgsteal_direct_dma 25136 6759 7108
> >
> > slabs_scanned 689575 542705 618839
> >
> > pageoutrun 1234 1538 1450
> >
> > allocstall 110 26 13
> >
> > Everything seems to have improved except slabs_scanned, possibly because
> > of this check which Minchan pointed out, that results in higher pressure on slabs.
> >
> > if (page_mapped(page) || PageSwapCache(page))
> >
> > sc->nr_scanned++;
> >
> > I had added some traces to monitor the vmpressure values. Those also seems to
> > be high, possibly because of the same reason.
> >
> > Should the pressure be doubled only if page is mapped and referenced ?
>
> Yes, pte_mkold is not perfect at the moment.
>
> Anyway, above heuristic has been in there for a long time since I was born
> maybe :) (I don't want to argue why it's there and whether it's right) So,
> I'm really hesitant to change it that it might bite some workloads.
> (But I don't mean I'm against it but just don't want to make it by myself
> to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> could bite some workloads.
>
> At least, as Vinayak mentioned, it would change vmpressure level so users of
> vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> vmpressure to control memory management so it will hurt them.
> As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> know any workload is dependent with it.
>
> As other regression in my company product, we have snapshot a process
> with workingset for later fast resume. For that, we have considered
> pte-mapped pages as workingset for snapshot but snapshot start to include
> non-workingset pages since fault-around is merged. It means snapshot
> image size is increased so that we need more storage space and it starts
> the thing slow down. I guess mincore(2) users will be affected.
>
> Additional Note: There are lots of products with ARM which is non-HW access
> bit system in embedded world although ARM start to support it recenlty and
> sequential file access workload is not important compared to memory reclaim
> So, fault_around's benefit could be higly limited compared to HW-access bit
> architectures on server workload.
>
> I want to ask again.
> I guess we could disable fault_around by kernel parameter but does it
> sound reasonable to enable fault_around by default for every arches
> at the cost of above regression?
>
> I'm not against for that. Just what I want is some fixes about the
> regression should go to -stable.
>
> >
> > There is big improvement in avg latency, but still 5% higher than with fault_around
> > disabled. I will try to debug this further.
I did quick test in my ARM machine.
512M file mmap sequential every word read
= vanilla fault_around=4096 =
minor fault: 131291
elapsed time(usec): 6686236
= vanilla fault_around=65536 =
minor fault: 12577
elapsed time(usec): 6586959
I tested 3 times and result seemed to be stable.
90% minor fault was reduced. It's huge win but as looking at elapsed time,
it's not huge win. Just about 1.5%.
= pte_mkold applied fault_around=4096 =
minor fault: 131291
elapsed time(usec): 6608358
= pte_mkold applied fault_around=65536 =
minor fault: 143609
elapsed time(usec): 6772520
I tested 3 times and result seemed to be stable.
minor fault was rather increased and elapsed time was slow with
fault_around.
Gain is really not clear.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-05-10 2:48 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-10 2:48 UTC (permalink / raw)
To: Vinayak Menon
Cc: Kirill A. Shutemov, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> Hello,
>
> On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> >
> >
> > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > >>>
> > >>>> Mapping pages around fault is found to cause performance degradation
> > >>>> in certain use cases. The test performed here is launch of 10 apps
> > >>>> one by one, doing something with the app each time, and then repeating
> > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > >>>> of RAM. The time taken to launch the apps is found to be better when
> > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > >>>> size (4096 in this case).
> > >>> Well that's one workload, and a somewhat strange one. What is the
> > >>> effect on other workloads (of which there are a lot!).
> > >>>
> > >> This workload emulates the way a user would use his mobile device, opening
> > >> an application, using it for some time, switching to next, and then coming
> > >> back to the same application later. Another stat which shows significant
> > >> degradation on Android with fault_around is device boot up time. I have not
> > >> tried any other workload other than these.
> > >>
> > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > >>>> by page_check_references.
> > >>>>
> > >>>> === Without swap ===
> > >>>> 3.18 3.18-fault_around_bytes=4096
> > >>>> -----------------------------------------------------------------------
> > >>>> workingset_refault 691100 664339
> > >>>> workingset_activate 210379 179139
> > >>>> pgpgin 4676096 4492780
> > >>>> pgpgout 163967 96711
> > >>>> pgpgoutclean 1090664 990659
> > >>>> pgalloc_dma 3463111 3328299
> > >>>> pgfree 3502365 3363866
> > >>>> pgactivate 568134 238570
> > >>>> pgdeactivate 752260 392138
> > >>>> pageref_activate 315078 121705
> > >>>> pageref_activate_vm_exec 162940 55815
> > >>>> pageref_keep 141354 51011
> > >>>> pgmajfault 24863 23633
> > >>>> pgrefill_dma 1116370 544042
> > >>>> pgscan_kswapd_dma 1735186 1234622
> > >>>> pgsteal_kswapd_dma 1121769 1005725
> > >>>> pgscan_direct_dma 12966 1090
> > >>>> pgsteal_direct_dma 6209 967
> > >>>> slabs_scanned 1539849 977351
> > >>>> pageoutrun 1260 1333
> > >>>> allocstall 47 7
> > >>>>
> > >>>> === With swap ===
> > >>>> 3.18 3.18-fault_around_bytes=4096
> > >>>> -----------------------------------------------------------------------
> > >>>> workingset_refault 597687 878109
> > >>>> workingset_activate 167169 254037
> > >>>> pgpgin 4035424 5157348
> > >>>> pgpgout 162151 85231
> > >>>> pgpgoutclean 928587 1225029
> > >>>> pswpin 46033 17100
> > >>>> pswpout 237952 127686
> > >>>> pgalloc_dma 3305034 3542614
> > >>>> pgfree 3354989 3592132
> > >>>> pgactivate 626468 355275
> > >>>> pgdeactivate 990205 771902
> > >>>> pageref_activate 294780 157106
> > >>>> pageref_activate_vm_exec 141722 63469
> > >>>> pageref_keep 121931 63028
> > >>>> pgmajfault 67818 45643
> > >>>> pgrefill_dma 1324023 977192
> > >>>> pgscan_kswapd_dma 1825267 1720322
> > >>>> pgsteal_kswapd_dma 1181882 1365500
> > >>>> pgscan_direct_dma 41957 9622
> > >>>> pgsteal_direct_dma 25136 6759
> > >>>> slabs_scanned 689575 542705
> > >>>> pageoutrun 1234 1538
> > >>>> allocstall 110 26
> > >>>>
> > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > >>>> more faults, more swapping, and allocstalls.
> > >>> A few of those things did get a bit worse?
> > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > >> better with fault_around because, increased number of mapped pages is
> > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > >> of increased swapping, decrease in kswapd efficiency and increase in
> > >> allocstalls.
> > >> So the problem looks to be that unwanted pages are mapped around the fault
> > >> and page_check_references is unaware of this.
> > > Hm. It makes me think we should make ptes setup by faultaround old.
> > >
> > > Although, it would defeat (to some extend) purpose of faultaround on
> > > architectures without HW accessed bit :-/
> > >
> > > Could you check if the patch below changes the situation?
> > > It would require some more work to not mark the pte we've got fault for old.
> >
> > Column at the end shows the values with the patch
> >
> > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> >
> > ---------------------------------------------------------
> >
> > workingset_refault 597687 878109 790207
> >
> > workingset_activate 167169 254037 207912
> >
> > pgpgin 4035424 5157348 4793116
> >
> > pgpgout 162151 85231 85539
> >
> > pgpgoutclean 928587 1225029 1129088
> >
> > pswpin 46033 17100 8926
> >
> > pswpout 237952 127686 103435
> >
> > pgalloc_dma 3305034 3542614 3401000
> >
> > pgfree 3354989 3592132 3457783
> >
> > pgactivate 626468 355275 326716
> >
> > pgdeactivate 990205 771902 697392
> >
> > pageref_activate 294780 157106 138451
> >
> > pageref_activate_vm_exec 141722 63469 64585
> >
> > pageref_keep 121931 63028 65811
> >
> > pgmajfault 67818 45643 34944
> >
> > pgrefill_dma 1324023 977192 874497
> >
> > pgscan_kswapd_dma 1825267 1720322 1577483
> >
> > pgsteal_kswapd_dma 1181882 1365500 1243968
> >
> > pgscan_direct_dma 41957 9622 9387
> >
> > pgsteal_direct_dma 25136 6759 7108
> >
> > slabs_scanned 689575 542705 618839
> >
> > pageoutrun 1234 1538 1450
> >
> > allocstall 110 26 13
> >
> > Everything seems to have improved except slabs_scanned, possibly because
> > of this check which Minchan pointed out, that results in higher pressure on slabs.
> >
> > if (page_mapped(page) || PageSwapCache(page))
> >
> > sc->nr_scanned++;
> >
> > I had added some traces to monitor the vmpressure values. Those also seems to
> > be high, possibly because of the same reason.
> >
> > Should the pressure be doubled only if page is mapped and referenced ?
>
> Yes, pte_mkold is not perfect at the moment.
>
> Anyway, above heuristic has been in there for a long time since I was born
> maybe :) (I don't want to argue why it's there and whether it's right) So,
> I'm really hesitant to change it that it might bite some workloads.
> (But I don't mean I'm against it but just don't want to make it by myself
> to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> could bite some workloads.
>
> At least, as Vinayak mentioned, it would change vmpressure level so users of
> vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> vmpressure to control memory management so it will hurt them.
> As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> know any workload is dependent with it.
>
> As other regression in my company product, we have snapshot a process
> with workingset for later fast resume. For that, we have considered
> pte-mapped pages as workingset for snapshot but snapshot start to include
> non-workingset pages since fault-around is merged. It means snapshot
> image size is increased so that we need more storage space and it starts
> the thing slow down. I guess mincore(2) users will be affected.
>
> Additional Note: There are lots of products with ARM which is non-HW access
> bit system in embedded world although ARM start to support it recenlty and
> sequential file access workload is not important compared to memory reclaim
> So, fault_around's benefit could be higly limited compared to HW-access bit
> architectures on server workload.
>
> I want to ask again.
> I guess we could disable fault_around by kernel parameter but does it
> sound reasonable to enable fault_around by default for every arches
> at the cost of above regression?
>
> I'm not against for that. Just what I want is some fixes about the
> regression should go to -stable.
>
> >
> > There is big improvement in avg latency, but still 5% higher than with fault_around
> > disabled. I will try to debug this further.
I did quick test in my ARM machine.
512M file mmap sequential every word read
= vanilla fault_around=4096 =
minor fault: 131291
elapsed time(usec): 6686236
= vanilla fault_around=65536 =
minor fault: 12577
elapsed time(usec): 6586959
I tested 3 times and result seemed to be stable.
90% minor fault was reduced. It's huge win but as looking at elapsed time,
it's not huge win. Just about 1.5%.
= pte_mkold applied fault_around=4096 =
minor fault: 131291
elapsed time(usec): 6608358
= pte_mkold applied fault_around=65536 =
minor fault: 143609
elapsed time(usec): 6772520
I tested 3 times and result seemed to be stable.
minor fault was rather increased and elapsed time was slow with
fault_around.
Gain is really not clear.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-05-10 2:48 ` Minchan Kim
@ 2016-05-16 14:18 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-16 14:18 UTC (permalink / raw)
To: Minchan Kim, Kirill A. Shutemov
Cc: Vinayak Menon, Kirill A. Shutemov, Andrew Morton, linux-mm,
linux-kernel, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd
On Tue, May 10, 2016 at 11:48:42AM +0900, Minchan Kim wrote:
> On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> > Hello,
> >
> > On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> > >
> > >
> > > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > >>>
> > > >>>> Mapping pages around fault is found to cause performance degradation
> > > >>>> in certain use cases. The test performed here is launch of 10 apps
> > > >>>> one by one, doing something with the app each time, and then repeating
> > > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > >>>> of RAM. The time taken to launch the apps is found to be better when
> > > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > > >>>> size (4096 in this case).
> > > >>> Well that's one workload, and a somewhat strange one. What is the
> > > >>> effect on other workloads (of which there are a lot!).
> > > >>>
> > > >> This workload emulates the way a user would use his mobile device, opening
> > > >> an application, using it for some time, switching to next, and then coming
> > > >> back to the same application later. Another stat which shows significant
> > > >> degradation on Android with fault_around is device boot up time. I have not
> > > >> tried any other workload other than these.
> > > >>
> > > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > > >>>> by page_check_references.
> > > >>>>
> > > >>>> === Without swap ===
> > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > >>>> -----------------------------------------------------------------------
> > > >>>> workingset_refault 691100 664339
> > > >>>> workingset_activate 210379 179139
> > > >>>> pgpgin 4676096 4492780
> > > >>>> pgpgout 163967 96711
> > > >>>> pgpgoutclean 1090664 990659
> > > >>>> pgalloc_dma 3463111 3328299
> > > >>>> pgfree 3502365 3363866
> > > >>>> pgactivate 568134 238570
> > > >>>> pgdeactivate 752260 392138
> > > >>>> pageref_activate 315078 121705
> > > >>>> pageref_activate_vm_exec 162940 55815
> > > >>>> pageref_keep 141354 51011
> > > >>>> pgmajfault 24863 23633
> > > >>>> pgrefill_dma 1116370 544042
> > > >>>> pgscan_kswapd_dma 1735186 1234622
> > > >>>> pgsteal_kswapd_dma 1121769 1005725
> > > >>>> pgscan_direct_dma 12966 1090
> > > >>>> pgsteal_direct_dma 6209 967
> > > >>>> slabs_scanned 1539849 977351
> > > >>>> pageoutrun 1260 1333
> > > >>>> allocstall 47 7
> > > >>>>
> > > >>>> === With swap ===
> > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > >>>> -----------------------------------------------------------------------
> > > >>>> workingset_refault 597687 878109
> > > >>>> workingset_activate 167169 254037
> > > >>>> pgpgin 4035424 5157348
> > > >>>> pgpgout 162151 85231
> > > >>>> pgpgoutclean 928587 1225029
> > > >>>> pswpin 46033 17100
> > > >>>> pswpout 237952 127686
> > > >>>> pgalloc_dma 3305034 3542614
> > > >>>> pgfree 3354989 3592132
> > > >>>> pgactivate 626468 355275
> > > >>>> pgdeactivate 990205 771902
> > > >>>> pageref_activate 294780 157106
> > > >>>> pageref_activate_vm_exec 141722 63469
> > > >>>> pageref_keep 121931 63028
> > > >>>> pgmajfault 67818 45643
> > > >>>> pgrefill_dma 1324023 977192
> > > >>>> pgscan_kswapd_dma 1825267 1720322
> > > >>>> pgsteal_kswapd_dma 1181882 1365500
> > > >>>> pgscan_direct_dma 41957 9622
> > > >>>> pgsteal_direct_dma 25136 6759
> > > >>>> slabs_scanned 689575 542705
> > > >>>> pageoutrun 1234 1538
> > > >>>> allocstall 110 26
> > > >>>>
> > > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > > >>>> more faults, more swapping, and allocstalls.
> > > >>> A few of those things did get a bit worse?
> > > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > >> better with fault_around because, increased number of mapped pages is
> > > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > >> of increased swapping, decrease in kswapd efficiency and increase in
> > > >> allocstalls.
> > > >> So the problem looks to be that unwanted pages are mapped around the fault
> > > >> and page_check_references is unaware of this.
> > > > Hm. It makes me think we should make ptes setup by faultaround old.
> > > >
> > > > Although, it would defeat (to some extend) purpose of faultaround on
> > > > architectures without HW accessed bit :-/
> > > >
> > > > Could you check if the patch below changes the situation?
> > > > It would require some more work to not mark the pte we've got fault for old.
> > >
> > > Column at the end shows the values with the patch
> > >
> > > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> > >
> > > ---------------------------------------------------------
> > >
> > > workingset_refault 597687 878109 790207
> > >
> > > workingset_activate 167169 254037 207912
> > >
> > > pgpgin 4035424 5157348 4793116
> > >
> > > pgpgout 162151 85231 85539
> > >
> > > pgpgoutclean 928587 1225029 1129088
> > >
> > > pswpin 46033 17100 8926
> > >
> > > pswpout 237952 127686 103435
> > >
> > > pgalloc_dma 3305034 3542614 3401000
> > >
> > > pgfree 3354989 3592132 3457783
> > >
> > > pgactivate 626468 355275 326716
> > >
> > > pgdeactivate 990205 771902 697392
> > >
> > > pageref_activate 294780 157106 138451
> > >
> > > pageref_activate_vm_exec 141722 63469 64585
> > >
> > > pageref_keep 121931 63028 65811
> > >
> > > pgmajfault 67818 45643 34944
> > >
> > > pgrefill_dma 1324023 977192 874497
> > >
> > > pgscan_kswapd_dma 1825267 1720322 1577483
> > >
> > > pgsteal_kswapd_dma 1181882 1365500 1243968
> > >
> > > pgscan_direct_dma 41957 9622 9387
> > >
> > > pgsteal_direct_dma 25136 6759 7108
> > >
> > > slabs_scanned 689575 542705 618839
> > >
> > > pageoutrun 1234 1538 1450
> > >
> > > allocstall 110 26 13
> > >
> > > Everything seems to have improved except slabs_scanned, possibly because
> > > of this check which Minchan pointed out, that results in higher pressure on slabs.
> > >
> > > if (page_mapped(page) || PageSwapCache(page))
> > >
> > > sc->nr_scanned++;
> > >
> > > I had added some traces to monitor the vmpressure values. Those also seems to
> > > be high, possibly because of the same reason.
> > >
> > > Should the pressure be doubled only if page is mapped and referenced ?
> >
> > Yes, pte_mkold is not perfect at the moment.
> >
> > Anyway, above heuristic has been in there for a long time since I was born
> > maybe :) (I don't want to argue why it's there and whether it's right) So,
> > I'm really hesitant to change it that it might bite some workloads.
> > (But I don't mean I'm against it but just don't want to make it by myself
> > to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> > could bite some workloads.
> >
> > At least, as Vinayak mentioned, it would change vmpressure level so users of
> > vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> > vmpressure to control memory management so it will hurt them.
> > As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> > know any workload is dependent with it.
> >
> > As other regression in my company product, we have snapshot a process
> > with workingset for later fast resume. For that, we have considered
> > pte-mapped pages as workingset for snapshot but snapshot start to include
> > non-workingset pages since fault-around is merged. It means snapshot
> > image size is increased so that we need more storage space and it starts
> > the thing slow down. I guess mincore(2) users will be affected.
> >
> > Additional Note: There are lots of products with ARM which is non-HW access
> > bit system in embedded world although ARM start to support it recenlty and
> > sequential file access workload is not important compared to memory reclaim
> > So, fault_around's benefit could be higly limited compared to HW-access bit
> > architectures on server workload.
> >
> > I want to ask again.
> > I guess we could disable fault_around by kernel parameter but does it
> > sound reasonable to enable fault_around by default for every arches
> > at the cost of above regression?
> >
> > I'm not against for that. Just what I want is some fixes about the
> > regression should go to -stable.
> >
> > >
> > > There is big improvement in avg latency, but still 5% higher than with fault_around
> > > disabled. I will try to debug this further.
>
> I did quick test in my ARM machine.
>
> 512M file mmap sequential every word read
>
> = vanilla fault_around=4096 =
> minor fault: 131291
> elapsed time(usec): 6686236
>
> = vanilla fault_around=65536 =
> minor fault: 12577
> elapsed time(usec): 6586959
>
> I tested 3 times and result seemed to be stable.
> 90% minor fault was reduced. It's huge win but as looking at elapsed time,
> it's not huge win. Just about 1.5%.
>
> = pte_mkold applied fault_around=4096 =
> minor fault: 131291
> elapsed time(usec): 6608358
>
> = pte_mkold applied fault_around=65536 =
> minor fault: 143609
> elapsed time(usec): 6772520
>
> I tested 3 times and result seemed to be stable.
> minor fault was rather increased and elapsed time was slow with
> fault_around.
> Gain is really not clear.
Kirill,
You wanted to test non-HW access bit system and I did.
What's your opinion?
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-05-16 14:18 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-16 14:18 UTC (permalink / raw)
To: Minchan Kim, Kirill A. Shutemov
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Tue, May 10, 2016 at 11:48:42AM +0900, Minchan Kim wrote:
> On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> > Hello,
> >
> > On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> > >
> > >
> > > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > >>>
> > > >>>> Mapping pages around fault is found to cause performance degradation
> > > >>>> in certain use cases. The test performed here is launch of 10 apps
> > > >>>> one by one, doing something with the app each time, and then repeating
> > > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > >>>> of RAM. The time taken to launch the apps is found to be better when
> > > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > > >>>> size (4096 in this case).
> > > >>> Well that's one workload, and a somewhat strange one. What is the
> > > >>> effect on other workloads (of which there are a lot!).
> > > >>>
> > > >> This workload emulates the way a user would use his mobile device, opening
> > > >> an application, using it for some time, switching to next, and then coming
> > > >> back to the same application later. Another stat which shows significant
> > > >> degradation on Android with fault_around is device boot up time. I have not
> > > >> tried any other workload other than these.
> > > >>
> > > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > > >>>> by page_check_references.
> > > >>>>
> > > >>>> === Without swap ===
> > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > >>>> -----------------------------------------------------------------------
> > > >>>> workingset_refault 691100 664339
> > > >>>> workingset_activate 210379 179139
> > > >>>> pgpgin 4676096 4492780
> > > >>>> pgpgout 163967 96711
> > > >>>> pgpgoutclean 1090664 990659
> > > >>>> pgalloc_dma 3463111 3328299
> > > >>>> pgfree 3502365 3363866
> > > >>>> pgactivate 568134 238570
> > > >>>> pgdeactivate 752260 392138
> > > >>>> pageref_activate 315078 121705
> > > >>>> pageref_activate_vm_exec 162940 55815
> > > >>>> pageref_keep 141354 51011
> > > >>>> pgmajfault 24863 23633
> > > >>>> pgrefill_dma 1116370 544042
> > > >>>> pgscan_kswapd_dma 1735186 1234622
> > > >>>> pgsteal_kswapd_dma 1121769 1005725
> > > >>>> pgscan_direct_dma 12966 1090
> > > >>>> pgsteal_direct_dma 6209 967
> > > >>>> slabs_scanned 1539849 977351
> > > >>>> pageoutrun 1260 1333
> > > >>>> allocstall 47 7
> > > >>>>
> > > >>>> === With swap ===
> > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > >>>> -----------------------------------------------------------------------
> > > >>>> workingset_refault 597687 878109
> > > >>>> workingset_activate 167169 254037
> > > >>>> pgpgin 4035424 5157348
> > > >>>> pgpgout 162151 85231
> > > >>>> pgpgoutclean 928587 1225029
> > > >>>> pswpin 46033 17100
> > > >>>> pswpout 237952 127686
> > > >>>> pgalloc_dma 3305034 3542614
> > > >>>> pgfree 3354989 3592132
> > > >>>> pgactivate 626468 355275
> > > >>>> pgdeactivate 990205 771902
> > > >>>> pageref_activate 294780 157106
> > > >>>> pageref_activate_vm_exec 141722 63469
> > > >>>> pageref_keep 121931 63028
> > > >>>> pgmajfault 67818 45643
> > > >>>> pgrefill_dma 1324023 977192
> > > >>>> pgscan_kswapd_dma 1825267 1720322
> > > >>>> pgsteal_kswapd_dma 1181882 1365500
> > > >>>> pgscan_direct_dma 41957 9622
> > > >>>> pgsteal_direct_dma 25136 6759
> > > >>>> slabs_scanned 689575 542705
> > > >>>> pageoutrun 1234 1538
> > > >>>> allocstall 110 26
> > > >>>>
> > > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > > >>>> more faults, more swapping, and allocstalls.
> > > >>> A few of those things did get a bit worse?
> > > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > >> better with fault_around because, increased number of mapped pages is
> > > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > >> of increased swapping, decrease in kswapd efficiency and increase in
> > > >> allocstalls.
> > > >> So the problem looks to be that unwanted pages are mapped around the fault
> > > >> and page_check_references is unaware of this.
> > > > Hm. It makes me think we should make ptes setup by faultaround old.
> > > >
> > > > Although, it would defeat (to some extend) purpose of faultaround on
> > > > architectures without HW accessed bit :-/
> > > >
> > > > Could you check if the patch below changes the situation?
> > > > It would require some more work to not mark the pte we've got fault for old.
> > >
> > > Column at the end shows the values with the patch
> > >
> > > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> > >
> > > ---------------------------------------------------------
> > >
> > > workingset_refault 597687 878109 790207
> > >
> > > workingset_activate 167169 254037 207912
> > >
> > > pgpgin 4035424 5157348 4793116
> > >
> > > pgpgout 162151 85231 85539
> > >
> > > pgpgoutclean 928587 1225029 1129088
> > >
> > > pswpin 46033 17100 8926
> > >
> > > pswpout 237952 127686 103435
> > >
> > > pgalloc_dma 3305034 3542614 3401000
> > >
> > > pgfree 3354989 3592132 3457783
> > >
> > > pgactivate 626468 355275 326716
> > >
> > > pgdeactivate 990205 771902 697392
> > >
> > > pageref_activate 294780 157106 138451
> > >
> > > pageref_activate_vm_exec 141722 63469 64585
> > >
> > > pageref_keep 121931 63028 65811
> > >
> > > pgmajfault 67818 45643 34944
> > >
> > > pgrefill_dma 1324023 977192 874497
> > >
> > > pgscan_kswapd_dma 1825267 1720322 1577483
> > >
> > > pgsteal_kswapd_dma 1181882 1365500 1243968
> > >
> > > pgscan_direct_dma 41957 9622 9387
> > >
> > > pgsteal_direct_dma 25136 6759 7108
> > >
> > > slabs_scanned 689575 542705 618839
> > >
> > > pageoutrun 1234 1538 1450
> > >
> > > allocstall 110 26 13
> > >
> > > Everything seems to have improved except slabs_scanned, possibly because
> > > of this check which Minchan pointed out, that results in higher pressure on slabs.
> > >
> > > if (page_mapped(page) || PageSwapCache(page))
> > >
> > > sc->nr_scanned++;
> > >
> > > I had added some traces to monitor the vmpressure values. Those also seems to
> > > be high, possibly because of the same reason.
> > >
> > > Should the pressure be doubled only if page is mapped and referenced ?
> >
> > Yes, pte_mkold is not perfect at the moment.
> >
> > Anyway, above heuristic has been in there for a long time since I was born
> > maybe :) (I don't want to argue why it's there and whether it's right) So,
> > I'm really hesitant to change it that it might bite some workloads.
> > (But I don't mean I'm against it but just don't want to make it by myself
> > to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> > could bite some workloads.
> >
> > At least, as Vinayak mentioned, it would change vmpressure level so users of
> > vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> > vmpressure to control memory management so it will hurt them.
> > As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> > know any workload is dependent with it.
> >
> > As other regression in my company product, we have snapshot a process
> > with workingset for later fast resume. For that, we have considered
> > pte-mapped pages as workingset for snapshot but snapshot start to include
> > non-workingset pages since fault-around is merged. It means snapshot
> > image size is increased so that we need more storage space and it starts
> > the thing slow down. I guess mincore(2) users will be affected.
> >
> > Additional Note: There are lots of products with ARM which is non-HW access
> > bit system in embedded world although ARM start to support it recenlty and
> > sequential file access workload is not important compared to memory reclaim
> > So, fault_around's benefit could be higly limited compared to HW-access bit
> > architectures on server workload.
> >
> > I want to ask again.
> > I guess we could disable fault_around by kernel parameter but does it
> > sound reasonable to enable fault_around by default for every arches
> > at the cost of above regression?
> >
> > I'm not against for that. Just what I want is some fixes about the
> > regression should go to -stable.
> >
> > >
> > > There is big improvement in avg latency, but still 5% higher than with fault_around
> > > disabled. I will try to debug this further.
>
> I did quick test in my ARM machine.
>
> 512M file mmap sequential every word read
>
> = vanilla fault_around=4096 =
> minor fault: 131291
> elapsed time(usec): 6686236
>
> = vanilla fault_around=65536 =
> minor fault: 12577
> elapsed time(usec): 6586959
>
> I tested 3 times and result seemed to be stable.
> 90% minor fault was reduced. It's huge win but as looking at elapsed time,
> it's not huge win. Just about 1.5%.
>
> = pte_mkold applied fault_around=4096 =
> minor fault: 131291
> elapsed time(usec): 6608358
>
> = pte_mkold applied fault_around=65536 =
> minor fault: 143609
> elapsed time(usec): 6772520
>
> I tested 3 times and result seemed to be stable.
> minor fault was rather increased and elapsed time was slow with
> fault_around.
> Gain is really not clear.
Kirill,
You wanted to test non-HW access bit system and I did.
What's your opinion?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-05-16 14:18 ` Minchan Kim
@ 2016-05-16 14:29 ` Kirill A. Shutemov
-1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-05-16 14:29 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Mon, May 16, 2016 at 11:18:54PM +0900, Minchan Kim wrote:
> On Tue, May 10, 2016 at 11:48:42AM +0900, Minchan Kim wrote:
> > On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> > > Hello,
> > >
> > > On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> > > >
> > > >
> > > > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > > >>>
> > > > >>>> Mapping pages around fault is found to cause performance degradation
> > > > >>>> in certain use cases. The test performed here is launch of 10 apps
> > > > >>>> one by one, doing something with the app each time, and then repeating
> > > > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > > >>>> of RAM. The time taken to launch the apps is found to be better when
> > > > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > > > >>>> size (4096 in this case).
> > > > >>> Well that's one workload, and a somewhat strange one. What is the
> > > > >>> effect on other workloads (of which there are a lot!).
> > > > >>>
> > > > >> This workload emulates the way a user would use his mobile device, opening
> > > > >> an application, using it for some time, switching to next, and then coming
> > > > >> back to the same application later. Another stat which shows significant
> > > > >> degradation on Android with fault_around is device boot up time. I have not
> > > > >> tried any other workload other than these.
> > > > >>
> > > > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > > > >>>> by page_check_references.
> > > > >>>>
> > > > >>>> === Without swap ===
> > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > >>>> -----------------------------------------------------------------------
> > > > >>>> workingset_refault 691100 664339
> > > > >>>> workingset_activate 210379 179139
> > > > >>>> pgpgin 4676096 4492780
> > > > >>>> pgpgout 163967 96711
> > > > >>>> pgpgoutclean 1090664 990659
> > > > >>>> pgalloc_dma 3463111 3328299
> > > > >>>> pgfree 3502365 3363866
> > > > >>>> pgactivate 568134 238570
> > > > >>>> pgdeactivate 752260 392138
> > > > >>>> pageref_activate 315078 121705
> > > > >>>> pageref_activate_vm_exec 162940 55815
> > > > >>>> pageref_keep 141354 51011
> > > > >>>> pgmajfault 24863 23633
> > > > >>>> pgrefill_dma 1116370 544042
> > > > >>>> pgscan_kswapd_dma 1735186 1234622
> > > > >>>> pgsteal_kswapd_dma 1121769 1005725
> > > > >>>> pgscan_direct_dma 12966 1090
> > > > >>>> pgsteal_direct_dma 6209 967
> > > > >>>> slabs_scanned 1539849 977351
> > > > >>>> pageoutrun 1260 1333
> > > > >>>> allocstall 47 7
> > > > >>>>
> > > > >>>> === With swap ===
> > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > >>>> -----------------------------------------------------------------------
> > > > >>>> workingset_refault 597687 878109
> > > > >>>> workingset_activate 167169 254037
> > > > >>>> pgpgin 4035424 5157348
> > > > >>>> pgpgout 162151 85231
> > > > >>>> pgpgoutclean 928587 1225029
> > > > >>>> pswpin 46033 17100
> > > > >>>> pswpout 237952 127686
> > > > >>>> pgalloc_dma 3305034 3542614
> > > > >>>> pgfree 3354989 3592132
> > > > >>>> pgactivate 626468 355275
> > > > >>>> pgdeactivate 990205 771902
> > > > >>>> pageref_activate 294780 157106
> > > > >>>> pageref_activate_vm_exec 141722 63469
> > > > >>>> pageref_keep 121931 63028
> > > > >>>> pgmajfault 67818 45643
> > > > >>>> pgrefill_dma 1324023 977192
> > > > >>>> pgscan_kswapd_dma 1825267 1720322
> > > > >>>> pgsteal_kswapd_dma 1181882 1365500
> > > > >>>> pgscan_direct_dma 41957 9622
> > > > >>>> pgsteal_direct_dma 25136 6759
> > > > >>>> slabs_scanned 689575 542705
> > > > >>>> pageoutrun 1234 1538
> > > > >>>> allocstall 110 26
> > > > >>>>
> > > > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > > > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > > > >>>> more faults, more swapping, and allocstalls.
> > > > >>> A few of those things did get a bit worse?
> > > > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > > >> better with fault_around because, increased number of mapped pages is
> > > > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > > > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > > >> of increased swapping, decrease in kswapd efficiency and increase in
> > > > >> allocstalls.
> > > > >> So the problem looks to be that unwanted pages are mapped around the fault
> > > > >> and page_check_references is unaware of this.
> > > > > Hm. It makes me think we should make ptes setup by faultaround old.
> > > > >
> > > > > Although, it would defeat (to some extend) purpose of faultaround on
> > > > > architectures without HW accessed bit :-/
> > > > >
> > > > > Could you check if the patch below changes the situation?
> > > > > It would require some more work to not mark the pte we've got fault for old.
> > > >
> > > > Column at the end shows the values with the patch
> > > >
> > > > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> > > >
> > > > ---------------------------------------------------------
> > > >
> > > > workingset_refault 597687 878109 790207
> > > >
> > > > workingset_activate 167169 254037 207912
> > > >
> > > > pgpgin 4035424 5157348 4793116
> > > >
> > > > pgpgout 162151 85231 85539
> > > >
> > > > pgpgoutclean 928587 1225029 1129088
> > > >
> > > > pswpin 46033 17100 8926
> > > >
> > > > pswpout 237952 127686 103435
> > > >
> > > > pgalloc_dma 3305034 3542614 3401000
> > > >
> > > > pgfree 3354989 3592132 3457783
> > > >
> > > > pgactivate 626468 355275 326716
> > > >
> > > > pgdeactivate 990205 771902 697392
> > > >
> > > > pageref_activate 294780 157106 138451
> > > >
> > > > pageref_activate_vm_exec 141722 63469 64585
> > > >
> > > > pageref_keep 121931 63028 65811
> > > >
> > > > pgmajfault 67818 45643 34944
> > > >
> > > > pgrefill_dma 1324023 977192 874497
> > > >
> > > > pgscan_kswapd_dma 1825267 1720322 1577483
> > > >
> > > > pgsteal_kswapd_dma 1181882 1365500 1243968
> > > >
> > > > pgscan_direct_dma 41957 9622 9387
> > > >
> > > > pgsteal_direct_dma 25136 6759 7108
> > > >
> > > > slabs_scanned 689575 542705 618839
> > > >
> > > > pageoutrun 1234 1538 1450
> > > >
> > > > allocstall 110 26 13
> > > >
> > > > Everything seems to have improved except slabs_scanned, possibly because
> > > > of this check which Minchan pointed out, that results in higher pressure on slabs.
> > > >
> > > > if (page_mapped(page) || PageSwapCache(page))
> > > >
> > > > sc->nr_scanned++;
> > > >
> > > > I had added some traces to monitor the vmpressure values. Those also seems to
> > > > be high, possibly because of the same reason.
> > > >
> > > > Should the pressure be doubled only if page is mapped and referenced ?
> > >
> > > Yes, pte_mkold is not perfect at the moment.
> > >
> > > Anyway, above heuristic has been in there for a long time since I was born
> > > maybe :) (I don't want to argue why it's there and whether it's right) So,
> > > I'm really hesitant to change it that it might bite some workloads.
> > > (But I don't mean I'm against it but just don't want to make it by myself
> > > to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> > > could bite some workloads.
> > >
> > > At least, as Vinayak mentioned, it would change vmpressure level so users of
> > > vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> > > vmpressure to control memory management so it will hurt them.
> > > As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> > > know any workload is dependent with it.
> > >
> > > As other regression in my company product, we have snapshot a process
> > > with workingset for later fast resume. For that, we have considered
> > > pte-mapped pages as workingset for snapshot but snapshot start to include
> > > non-workingset pages since fault-around is merged. It means snapshot
> > > image size is increased so that we need more storage space and it starts
> > > the thing slow down. I guess mincore(2) users will be affected.
> > >
> > > Additional Note: There are lots of products with ARM which is non-HW access
> > > bit system in embedded world although ARM start to support it recenlty and
> > > sequential file access workload is not important compared to memory reclaim
> > > So, fault_around's benefit could be higly limited compared to HW-access bit
> > > architectures on server workload.
> > >
> > > I want to ask again.
> > > I guess we could disable fault_around by kernel parameter but does it
> > > sound reasonable to enable fault_around by default for every arches
> > > at the cost of above regression?
> > >
> > > I'm not against for that. Just what I want is some fixes about the
> > > regression should go to -stable.
> > >
> > > >
> > > > There is big improvement in avg latency, but still 5% higher than with fault_around
> > > > disabled. I will try to debug this further.
> >
> > I did quick test in my ARM machine.
> >
> > 512M file mmap sequential every word read
> >
> > = vanilla fault_around=4096 =
> > minor fault: 131291
> > elapsed time(usec): 6686236
> >
> > = vanilla fault_around=65536 =
> > minor fault: 12577
> > elapsed time(usec): 6586959
> >
> > I tested 3 times and result seemed to be stable.
> > 90% minor fault was reduced. It's huge win but as looking at elapsed time,
> > it's not huge win. Just about 1.5%.
> >
> > = pte_mkold applied fault_around=4096 =
> > minor fault: 131291
> > elapsed time(usec): 6608358
> >
> > = pte_mkold applied fault_around=65536 =
> > minor fault: 143609
> > elapsed time(usec): 6772520
> >
> > I tested 3 times and result seemed to be stable.
> > minor fault was rather increased and elapsed time was slow with
> > fault_around.
> > Gain is really not clear.
>
> Kirill,
> You wanted to test non-HW access bit system and I did.
> What's your opinion?
Sorry, for late response.
My patch is incomlete: we need to find a way to not mark pte as old if we
handle page fault for the address the pte represents.
Once this will be done, the number of page faults shouldn't be higher with
fault-around enabled even on machines without hardware accessed bit. This
will address performance regression with the patch on such machines.
I'll try to find time to update the patch soon.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-05-16 14:29 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-05-16 14:29 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Mon, May 16, 2016 at 11:18:54PM +0900, Minchan Kim wrote:
> On Tue, May 10, 2016 at 11:48:42AM +0900, Minchan Kim wrote:
> > On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> > > Hello,
> > >
> > > On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> > > >
> > > >
> > > > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > > >>>
> > > > >>>> Mapping pages around fault is found to cause performance degradation
> > > > >>>> in certain use cases. The test performed here is launch of 10 apps
> > > > >>>> one by one, doing something with the app each time, and then repeating
> > > > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > > >>>> of RAM. The time taken to launch the apps is found to be better when
> > > > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > > > >>>> size (4096 in this case).
> > > > >>> Well that's one workload, and a somewhat strange one. What is the
> > > > >>> effect on other workloads (of which there are a lot!).
> > > > >>>
> > > > >> This workload emulates the way a user would use his mobile device, opening
> > > > >> an application, using it for some time, switching to next, and then coming
> > > > >> back to the same application later. Another stat which shows significant
> > > > >> degradation on Android with fault_around is device boot up time. I have not
> > > > >> tried any other workload other than these.
> > > > >>
> > > > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > > > >>>> by page_check_references.
> > > > >>>>
> > > > >>>> === Without swap ===
> > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > >>>> -----------------------------------------------------------------------
> > > > >>>> workingset_refault 691100 664339
> > > > >>>> workingset_activate 210379 179139
> > > > >>>> pgpgin 4676096 4492780
> > > > >>>> pgpgout 163967 96711
> > > > >>>> pgpgoutclean 1090664 990659
> > > > >>>> pgalloc_dma 3463111 3328299
> > > > >>>> pgfree 3502365 3363866
> > > > >>>> pgactivate 568134 238570
> > > > >>>> pgdeactivate 752260 392138
> > > > >>>> pageref_activate 315078 121705
> > > > >>>> pageref_activate_vm_exec 162940 55815
> > > > >>>> pageref_keep 141354 51011
> > > > >>>> pgmajfault 24863 23633
> > > > >>>> pgrefill_dma 1116370 544042
> > > > >>>> pgscan_kswapd_dma 1735186 1234622
> > > > >>>> pgsteal_kswapd_dma 1121769 1005725
> > > > >>>> pgscan_direct_dma 12966 1090
> > > > >>>> pgsteal_direct_dma 6209 967
> > > > >>>> slabs_scanned 1539849 977351
> > > > >>>> pageoutrun 1260 1333
> > > > >>>> allocstall 47 7
> > > > >>>>
> > > > >>>> === With swap ===
> > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > >>>> -----------------------------------------------------------------------
> > > > >>>> workingset_refault 597687 878109
> > > > >>>> workingset_activate 167169 254037
> > > > >>>> pgpgin 4035424 5157348
> > > > >>>> pgpgout 162151 85231
> > > > >>>> pgpgoutclean 928587 1225029
> > > > >>>> pswpin 46033 17100
> > > > >>>> pswpout 237952 127686
> > > > >>>> pgalloc_dma 3305034 3542614
> > > > >>>> pgfree 3354989 3592132
> > > > >>>> pgactivate 626468 355275
> > > > >>>> pgdeactivate 990205 771902
> > > > >>>> pageref_activate 294780 157106
> > > > >>>> pageref_activate_vm_exec 141722 63469
> > > > >>>> pageref_keep 121931 63028
> > > > >>>> pgmajfault 67818 45643
> > > > >>>> pgrefill_dma 1324023 977192
> > > > >>>> pgscan_kswapd_dma 1825267 1720322
> > > > >>>> pgsteal_kswapd_dma 1181882 1365500
> > > > >>>> pgscan_direct_dma 41957 9622
> > > > >>>> pgsteal_direct_dma 25136 6759
> > > > >>>> slabs_scanned 689575 542705
> > > > >>>> pageoutrun 1234 1538
> > > > >>>> allocstall 110 26
> > > > >>>>
> > > > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > > > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > > > >>>> more faults, more swapping, and allocstalls.
> > > > >>> A few of those things did get a bit worse?
> > > > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > > >> better with fault_around because, increased number of mapped pages is
> > > > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > > > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > > >> of increased swapping, decrease in kswapd efficiency and increase in
> > > > >> allocstalls.
> > > > >> So the problem looks to be that unwanted pages are mapped around the fault
> > > > >> and page_check_references is unaware of this.
> > > > > Hm. It makes me think we should make ptes setup by faultaround old.
> > > > >
> > > > > Although, it would defeat (to some extend) purpose of faultaround on
> > > > > architectures without HW accessed bit :-/
> > > > >
> > > > > Could you check if the patch below changes the situation?
> > > > > It would require some more work to not mark the pte we've got fault for old.
> > > >
> > > > Column at the end shows the values with the patch
> > > >
> > > > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> > > >
> > > > ---------------------------------------------------------
> > > >
> > > > workingset_refault 597687 878109 790207
> > > >
> > > > workingset_activate 167169 254037 207912
> > > >
> > > > pgpgin 4035424 5157348 4793116
> > > >
> > > > pgpgout 162151 85231 85539
> > > >
> > > > pgpgoutclean 928587 1225029 1129088
> > > >
> > > > pswpin 46033 17100 8926
> > > >
> > > > pswpout 237952 127686 103435
> > > >
> > > > pgalloc_dma 3305034 3542614 3401000
> > > >
> > > > pgfree 3354989 3592132 3457783
> > > >
> > > > pgactivate 626468 355275 326716
> > > >
> > > > pgdeactivate 990205 771902 697392
> > > >
> > > > pageref_activate 294780 157106 138451
> > > >
> > > > pageref_activate_vm_exec 141722 63469 64585
> > > >
> > > > pageref_keep 121931 63028 65811
> > > >
> > > > pgmajfault 67818 45643 34944
> > > >
> > > > pgrefill_dma 1324023 977192 874497
> > > >
> > > > pgscan_kswapd_dma 1825267 1720322 1577483
> > > >
> > > > pgsteal_kswapd_dma 1181882 1365500 1243968
> > > >
> > > > pgscan_direct_dma 41957 9622 9387
> > > >
> > > > pgsteal_direct_dma 25136 6759 7108
> > > >
> > > > slabs_scanned 689575 542705 618839
> > > >
> > > > pageoutrun 1234 1538 1450
> > > >
> > > > allocstall 110 26 13
> > > >
> > > > Everything seems to have improved except slabs_scanned, possibly because
> > > > of this check which Minchan pointed out, that results in higher pressure on slabs.
> > > >
> > > > if (page_mapped(page) || PageSwapCache(page))
> > > >
> > > > sc->nr_scanned++;
> > > >
> > > > I had added some traces to monitor the vmpressure values. Those also seems to
> > > > be high, possibly because of the same reason.
> > > >
> > > > Should the pressure be doubled only if page is mapped and referenced ?
> > >
> > > Yes, pte_mkold is not perfect at the moment.
> > >
> > > Anyway, above heuristic has been in there for a long time since I was born
> > > maybe :) (I don't want to argue why it's there and whether it's right) So,
> > > I'm really hesitant to change it that it might bite some workloads.
> > > (But I don't mean I'm against it but just don't want to make it by myself
> > > to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> > > could bite some workloads.
> > >
> > > At least, as Vinayak mentioned, it would change vmpressure level so users of
> > > vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> > > vmpressure to control memory management so it will hurt them.
> > > As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> > > know any workload is dependent with it.
> > >
> > > As other regression in my company product, we have snapshot a process
> > > with workingset for later fast resume. For that, we have considered
> > > pte-mapped pages as workingset for snapshot but snapshot start to include
> > > non-workingset pages since fault-around is merged. It means snapshot
> > > image size is increased so that we need more storage space and it starts
> > > the thing slow down. I guess mincore(2) users will be affected.
> > >
> > > Additional Note: There are lots of products with ARM which is non-HW access
> > > bit system in embedded world although ARM start to support it recenlty and
> > > sequential file access workload is not important compared to memory reclaim
> > > So, fault_around's benefit could be higly limited compared to HW-access bit
> > > architectures on server workload.
> > >
> > > I want to ask again.
> > > I guess we could disable fault_around by kernel parameter but does it
> > > sound reasonable to enable fault_around by default for every arches
> > > at the cost of above regression?
> > >
> > > I'm not against for that. Just what I want is some fixes about the
> > > regression should go to -stable.
> > >
> > > >
> > > > There is big improvement in avg latency, but still 5% higher than with fault_around
> > > > disabled. I will try to debug this further.
> >
> > I did quick test in my ARM machine.
> >
> > 512M file mmap sequential every word read
> >
> > = vanilla fault_around=4096 =
> > minor fault: 131291
> > elapsed time(usec): 6686236
> >
> > = vanilla fault_around=65536 =
> > minor fault: 12577
> > elapsed time(usec): 6586959
> >
> > I tested 3 times and result seemed to be stable.
> > 90% minor fault was reduced. It's huge win but as looking at elapsed time,
> > it's not huge win. Just about 1.5%.
> >
> > = pte_mkold applied fault_around=4096 =
> > minor fault: 131291
> > elapsed time(usec): 6608358
> >
> > = pte_mkold applied fault_around=65536 =
> > minor fault: 143609
> > elapsed time(usec): 6772520
> >
> > I tested 3 times and result seemed to be stable.
> > minor fault was rather increased and elapsed time was slow with
> > fault_around.
> > Gain is really not clear.
>
> Kirill,
> You wanted to test non-HW access bit system and I did.
> What's your opinion?
Sorry, for late response.
My patch is incomlete: we need to find a way to not mark pte as old if we
handle page fault for the address the pte represents.
Once this will be done, the number of page faults shouldn't be higher with
fault-around enabled even on machines without hardware accessed bit. This
will address performance regression with the patch on such machines.
I'll try to find time to update the patch soon.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-05-16 14:29 ` Kirill A. Shutemov
@ 2016-05-16 14:56 ` Minchan Kim
-1 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-16 14:56 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Minchan Kim, Vinayak Menon, Andrew Morton, linux-mm,
linux-kernel, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd
On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 16, 2016 at 11:18:54PM +0900, Minchan Kim wrote:
> > On Tue, May 10, 2016 at 11:48:42AM +0900, Minchan Kim wrote:
> > > On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> > > > Hello,
> > > >
> > > > On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> > > > >
> > > > >
> > > > > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > > > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > > > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > > > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > > > >>>
> > > > > >>>> Mapping pages around fault is found to cause performance degradation
> > > > > >>>> in certain use cases. The test performed here is launch of 10 apps
> > > > > >>>> one by one, doing something with the app each time, and then repeating
> > > > > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > > > >>>> of RAM. The time taken to launch the apps is found to be better when
> > > > > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > > > > >>>> size (4096 in this case).
> > > > > >>> Well that's one workload, and a somewhat strange one. What is the
> > > > > >>> effect on other workloads (of which there are a lot!).
> > > > > >>>
> > > > > >> This workload emulates the way a user would use his mobile device, opening
> > > > > >> an application, using it for some time, switching to next, and then coming
> > > > > >> back to the same application later. Another stat which shows significant
> > > > > >> degradation on Android with fault_around is device boot up time. I have not
> > > > > >> tried any other workload other than these.
> > > > > >>
> > > > > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > > > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > > > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > > > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > > > > >>>> by page_check_references.
> > > > > >>>>
> > > > > >>>> === Without swap ===
> > > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > > >>>> -----------------------------------------------------------------------
> > > > > >>>> workingset_refault 691100 664339
> > > > > >>>> workingset_activate 210379 179139
> > > > > >>>> pgpgin 4676096 4492780
> > > > > >>>> pgpgout 163967 96711
> > > > > >>>> pgpgoutclean 1090664 990659
> > > > > >>>> pgalloc_dma 3463111 3328299
> > > > > >>>> pgfree 3502365 3363866
> > > > > >>>> pgactivate 568134 238570
> > > > > >>>> pgdeactivate 752260 392138
> > > > > >>>> pageref_activate 315078 121705
> > > > > >>>> pageref_activate_vm_exec 162940 55815
> > > > > >>>> pageref_keep 141354 51011
> > > > > >>>> pgmajfault 24863 23633
> > > > > >>>> pgrefill_dma 1116370 544042
> > > > > >>>> pgscan_kswapd_dma 1735186 1234622
> > > > > >>>> pgsteal_kswapd_dma 1121769 1005725
> > > > > >>>> pgscan_direct_dma 12966 1090
> > > > > >>>> pgsteal_direct_dma 6209 967
> > > > > >>>> slabs_scanned 1539849 977351
> > > > > >>>> pageoutrun 1260 1333
> > > > > >>>> allocstall 47 7
> > > > > >>>>
> > > > > >>>> === With swap ===
> > > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > > >>>> -----------------------------------------------------------------------
> > > > > >>>> workingset_refault 597687 878109
> > > > > >>>> workingset_activate 167169 254037
> > > > > >>>> pgpgin 4035424 5157348
> > > > > >>>> pgpgout 162151 85231
> > > > > >>>> pgpgoutclean 928587 1225029
> > > > > >>>> pswpin 46033 17100
> > > > > >>>> pswpout 237952 127686
> > > > > >>>> pgalloc_dma 3305034 3542614
> > > > > >>>> pgfree 3354989 3592132
> > > > > >>>> pgactivate 626468 355275
> > > > > >>>> pgdeactivate 990205 771902
> > > > > >>>> pageref_activate 294780 157106
> > > > > >>>> pageref_activate_vm_exec 141722 63469
> > > > > >>>> pageref_keep 121931 63028
> > > > > >>>> pgmajfault 67818 45643
> > > > > >>>> pgrefill_dma 1324023 977192
> > > > > >>>> pgscan_kswapd_dma 1825267 1720322
> > > > > >>>> pgsteal_kswapd_dma 1181882 1365500
> > > > > >>>> pgscan_direct_dma 41957 9622
> > > > > >>>> pgsteal_direct_dma 25136 6759
> > > > > >>>> slabs_scanned 689575 542705
> > > > > >>>> pageoutrun 1234 1538
> > > > > >>>> allocstall 110 26
> > > > > >>>>
> > > > > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > > > > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > > > > >>>> more faults, more swapping, and allocstalls.
> > > > > >>> A few of those things did get a bit worse?
> > > > > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > > > >> better with fault_around because, increased number of mapped pages is
> > > > > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > > > > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > > > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > > > >> of increased swapping, decrease in kswapd efficiency and increase in
> > > > > >> allocstalls.
> > > > > >> So the problem looks to be that unwanted pages are mapped around the fault
> > > > > >> and page_check_references is unaware of this.
> > > > > > Hm. It makes me think we should make ptes setup by faultaround old.
> > > > > >
> > > > > > Although, it would defeat (to some extend) purpose of faultaround on
> > > > > > architectures without HW accessed bit :-/
> > > > > >
> > > > > > Could you check if the patch below changes the situation?
> > > > > > It would require some more work to not mark the pte we've got fault for old.
> > > > >
> > > > > Column at the end shows the values with the patch
> > > > >
> > > > > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> > > > >
> > > > > ---------------------------------------------------------
> > > > >
> > > > > workingset_refault 597687 878109 790207
> > > > >
> > > > > workingset_activate 167169 254037 207912
> > > > >
> > > > > pgpgin 4035424 5157348 4793116
> > > > >
> > > > > pgpgout 162151 85231 85539
> > > > >
> > > > > pgpgoutclean 928587 1225029 1129088
> > > > >
> > > > > pswpin 46033 17100 8926
> > > > >
> > > > > pswpout 237952 127686 103435
> > > > >
> > > > > pgalloc_dma 3305034 3542614 3401000
> > > > >
> > > > > pgfree 3354989 3592132 3457783
> > > > >
> > > > > pgactivate 626468 355275 326716
> > > > >
> > > > > pgdeactivate 990205 771902 697392
> > > > >
> > > > > pageref_activate 294780 157106 138451
> > > > >
> > > > > pageref_activate_vm_exec 141722 63469 64585
> > > > >
> > > > > pageref_keep 121931 63028 65811
> > > > >
> > > > > pgmajfault 67818 45643 34944
> > > > >
> > > > > pgrefill_dma 1324023 977192 874497
> > > > >
> > > > > pgscan_kswapd_dma 1825267 1720322 1577483
> > > > >
> > > > > pgsteal_kswapd_dma 1181882 1365500 1243968
> > > > >
> > > > > pgscan_direct_dma 41957 9622 9387
> > > > >
> > > > > pgsteal_direct_dma 25136 6759 7108
> > > > >
> > > > > slabs_scanned 689575 542705 618839
> > > > >
> > > > > pageoutrun 1234 1538 1450
> > > > >
> > > > > allocstall 110 26 13
> > > > >
> > > > > Everything seems to have improved except slabs_scanned, possibly because
> > > > > of this check which Minchan pointed out, that results in higher pressure on slabs.
> > > > >
> > > > > if (page_mapped(page) || PageSwapCache(page))
> > > > >
> > > > > sc->nr_scanned++;
> > > > >
> > > > > I had added some traces to monitor the vmpressure values. Those also seems to
> > > > > be high, possibly because of the same reason.
> > > > >
> > > > > Should the pressure be doubled only if page is mapped and referenced ?
> > > >
> > > > Yes, pte_mkold is not perfect at the moment.
> > > >
> > > > Anyway, above heuristic has been in there for a long time since I was born
> > > > maybe :) (I don't want to argue why it's there and whether it's right) So,
> > > > I'm really hesitant to change it that it might bite some workloads.
> > > > (But I don't mean I'm against it but just don't want to make it by myself
> > > > to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> > > > could bite some workloads.
> > > >
> > > > At least, as Vinayak mentioned, it would change vmpressure level so users of
> > > > vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> > > > vmpressure to control memory management so it will hurt them.
> > > > As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> > > > know any workload is dependent with it.
> > > >
> > > > As other regression in my company product, we have snapshot a process
> > > > with workingset for later fast resume. For that, we have considered
> > > > pte-mapped pages as workingset for snapshot but snapshot start to include
> > > > non-workingset pages since fault-around is merged. It means snapshot
> > > > image size is increased so that we need more storage space and it starts
> > > > the thing slow down. I guess mincore(2) users will be affected.
> > > >
> > > > Additional Note: There are lots of products with ARM which is non-HW access
> > > > bit system in embedded world although ARM start to support it recenlty and
> > > > sequential file access workload is not important compared to memory reclaim
> > > > So, fault_around's benefit could be higly limited compared to HW-access bit
> > > > architectures on server workload.
> > > >
> > > > I want to ask again.
> > > > I guess we could disable fault_around by kernel parameter but does it
> > > > sound reasonable to enable fault_around by default for every arches
> > > > at the cost of above regression?
> > > >
> > > > I'm not against for that. Just what I want is some fixes about the
> > > > regression should go to -stable.
> > > >
> > > > >
> > > > > There is big improvement in avg latency, but still 5% higher than with fault_around
> > > > > disabled. I will try to debug this further.
> > >
> > > I did quick test in my ARM machine.
> > >
> > > 512M file mmap sequential every word read
> > >
> > > = vanilla fault_around=4096 =
> > > minor fault: 131291
> > > elapsed time(usec): 6686236
> > >
> > > = vanilla fault_around=65536 =
> > > minor fault: 12577
> > > elapsed time(usec): 6586959
> > >
> > > I tested 3 times and result seemed to be stable.
> > > 90% minor fault was reduced. It's huge win but as looking at elapsed time,
> > > it's not huge win. Just about 1.5%.
> > >
> > > = pte_mkold applied fault_around=4096 =
> > > minor fault: 131291
> > > elapsed time(usec): 6608358
> > >
> > > = pte_mkold applied fault_around=65536 =
> > > minor fault: 143609
> > > elapsed time(usec): 6772520
> > >
> > > I tested 3 times and result seemed to be stable.
> > > minor fault was rather increased and elapsed time was slow with
> > > fault_around.
> > > Gain is really not clear.
> >
> > Kirill,
> > You wanted to test non-HW access bit system and I did.
> > What's your opinion?
>
> Sorry, for late response.
>
> My patch is incomlete: we need to find a way to not mark pte as old if we
> handle page fault for the address the pte represents.
I'm sure you can handle it but my point is there wouldn't be a big gain
although you can handle it in non-HW access bit system. Okay, let's be
more clear because I don't have every non-HW access bit architecture.
At least, current mobile workload in ARM which I have wouldn't be huge
benefit.
I will say one more.
I tested the workload on quad-core system and core speed is not so slow
compared to recent other mobile phone SoC. Even when I tested the benchmark
without pte_mkold, the benefit is within noise because storage is really
slow so major fault is dominant factor. So, I decide test storage from eMMC
to eSATA. And then finally, I manage to see the a little beneift with
fault_around without pte_mkold.
However, let's consider side-effect aspect from fault_around.
1. Increase slab shrinking compard to old
2. high level vmpressure compared to old
With considering that regressions on my system, it's really not worth to
try at the moment.
That's why I wanted to disable fault_around as default in non-HW access
bit system.
>
> Once this will be done, the number of page faults shouldn't be higher with
> fault-around enabled even on machines without hardware accessed bit. This
> will address performance regression with the patch on such machines.
Although you solves that, I guess the benefit would be marginal in
some architectures but we should solve above side-effects.
>
> I'll try to find time to update the patch soon.
I hope you can solve above those regressions as well.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-05-16 14:56 ` Minchan Kim
0 siblings, 0 replies; 34+ messages in thread
From: Minchan Kim @ 2016-05-16 14:56 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Minchan Kim, Vinayak Menon, Andrew Morton, linux-mm,
linux-kernel, dan.j.williams, mgorman, vbabka, kirill.shutemov,
dave.hansen, hughd
On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 16, 2016 at 11:18:54PM +0900, Minchan Kim wrote:
> > On Tue, May 10, 2016 at 11:48:42AM +0900, Minchan Kim wrote:
> > > On Mon, May 09, 2016 at 04:32:51PM +0900, Minchan Kim wrote:
> > > > Hello,
> > > >
> > > > On Mon, Apr 25, 2016 at 05:21:11PM +0530, Vinayak Menon wrote:
> > > > >
> > > > >
> > > > > On 4/22/2016 3:14 PM, Kirill A. Shutemov wrote:
> > > > > > On Fri, Apr 22, 2016 at 02:15:08PM +0530, Vinayak Menon wrote:
> > > > > >> On 04/22/2016 05:31 AM, Andrew Morton wrote:
> > > > > >>> On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon <vinmenon@codeaurora.org> wrote:
> > > > > >>>
> > > > > >>>> Mapping pages around fault is found to cause performance degradation
> > > > > >>>> in certain use cases. The test performed here is launch of 10 apps
> > > > > >>>> one by one, doing something with the app each time, and then repeating
> > > > > >>>> the same sequence once more, on an ARM 64-bit Android device with 2GB
> > > > > >>>> of RAM. The time taken to launch the apps is found to be better when
> > > > > >>>> fault around feature is disabled by setting fault_around_bytes to page
> > > > > >>>> size (4096 in this case).
> > > > > >>> Well that's one workload, and a somewhat strange one. What is the
> > > > > >>> effect on other workloads (of which there are a lot!).
> > > > > >>>
> > > > > >> This workload emulates the way a user would use his mobile device, opening
> > > > > >> an application, using it for some time, switching to next, and then coming
> > > > > >> back to the same application later. Another stat which shows significant
> > > > > >> degradation on Android with fault_around is device boot up time. I have not
> > > > > >> tried any other workload other than these.
> > > > > >>
> > > > > >>>> The tests were done on 3.18 kernel. 4 extra vmstat counters were added
> > > > > >>>> for debugging. pgpgoutclean accounts the clean pages reclaimed via
> > > > > >>>> __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec,
> > > > > >>>> and pageref_keep accounts the mapped file pages activated and retained
> > > > > >>>> by page_check_references.
> > > > > >>>>
> > > > > >>>> === Without swap ===
> > > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > > >>>> -----------------------------------------------------------------------
> > > > > >>>> workingset_refault 691100 664339
> > > > > >>>> workingset_activate 210379 179139
> > > > > >>>> pgpgin 4676096 4492780
> > > > > >>>> pgpgout 163967 96711
> > > > > >>>> pgpgoutclean 1090664 990659
> > > > > >>>> pgalloc_dma 3463111 3328299
> > > > > >>>> pgfree 3502365 3363866
> > > > > >>>> pgactivate 568134 238570
> > > > > >>>> pgdeactivate 752260 392138
> > > > > >>>> pageref_activate 315078 121705
> > > > > >>>> pageref_activate_vm_exec 162940 55815
> > > > > >>>> pageref_keep 141354 51011
> > > > > >>>> pgmajfault 24863 23633
> > > > > >>>> pgrefill_dma 1116370 544042
> > > > > >>>> pgscan_kswapd_dma 1735186 1234622
> > > > > >>>> pgsteal_kswapd_dma 1121769 1005725
> > > > > >>>> pgscan_direct_dma 12966 1090
> > > > > >>>> pgsteal_direct_dma 6209 967
> > > > > >>>> slabs_scanned 1539849 977351
> > > > > >>>> pageoutrun 1260 1333
> > > > > >>>> allocstall 47 7
> > > > > >>>>
> > > > > >>>> === With swap ===
> > > > > >>>> 3.18 3.18-fault_around_bytes=4096
> > > > > >>>> -----------------------------------------------------------------------
> > > > > >>>> workingset_refault 597687 878109
> > > > > >>>> workingset_activate 167169 254037
> > > > > >>>> pgpgin 4035424 5157348
> > > > > >>>> pgpgout 162151 85231
> > > > > >>>> pgpgoutclean 928587 1225029
> > > > > >>>> pswpin 46033 17100
> > > > > >>>> pswpout 237952 127686
> > > > > >>>> pgalloc_dma 3305034 3542614
> > > > > >>>> pgfree 3354989 3592132
> > > > > >>>> pgactivate 626468 355275
> > > > > >>>> pgdeactivate 990205 771902
> > > > > >>>> pageref_activate 294780 157106
> > > > > >>>> pageref_activate_vm_exec 141722 63469
> > > > > >>>> pageref_keep 121931 63028
> > > > > >>>> pgmajfault 67818 45643
> > > > > >>>> pgrefill_dma 1324023 977192
> > > > > >>>> pgscan_kswapd_dma 1825267 1720322
> > > > > >>>> pgsteal_kswapd_dma 1181882 1365500
> > > > > >>>> pgscan_direct_dma 41957 9622
> > > > > >>>> pgsteal_direct_dma 25136 6759
> > > > > >>>> slabs_scanned 689575 542705
> > > > > >>>> pageoutrun 1234 1538
> > > > > >>>> allocstall 110 26
> > > > > >>>>
> > > > > >>>> Looks like with fault_around, there is more pressure on reclaim because
> > > > > >>>> of the presence of more mapped pages, resulting in more IO activity,
> > > > > >>>> more faults, more swapping, and allocstalls.
> > > > > >>> A few of those things did get a bit worse?
> > > > > >> I think some numbers (like workingset, pgpgin, pgpgoutclean etc) looks
> > > > > >> better with fault_around because, increased number of mapped pages is
> > > > > >> resulting in less number of file pages being reclaimed (pageref_activate,
> > > > > >> pageref_activate_vm_exec, pageref_keep above), but increased swapping.
> > > > > >> Latency numbers are far bad with fault_around_bytes + swap, possibly because
> > > > > >> of increased swapping, decrease in kswapd efficiency and increase in
> > > > > >> allocstalls.
> > > > > >> So the problem looks to be that unwanted pages are mapped around the fault
> > > > > >> and page_check_references is unaware of this.
> > > > > > Hm. It makes me think we should make ptes setup by faultaround old.
> > > > > >
> > > > > > Although, it would defeat (to some extend) purpose of faultaround on
> > > > > > architectures without HW accessed bit :-/
> > > > > >
> > > > > > Could you check if the patch below changes the situation?
> > > > > > It would require some more work to not mark the pte we've got fault for old.
> > > > >
> > > > > Column at the end shows the values with the patch
> > > > >
> > > > > 3.18 3.18-fab=4096 3.18-Kirill's-fix
> > > > >
> > > > > ---------------------------------------------------------
> > > > >
> > > > > workingset_refault 597687 878109 790207
> > > > >
> > > > > workingset_activate 167169 254037 207912
> > > > >
> > > > > pgpgin 4035424 5157348 4793116
> > > > >
> > > > > pgpgout 162151 85231 85539
> > > > >
> > > > > pgpgoutclean 928587 1225029 1129088
> > > > >
> > > > > pswpin 46033 17100 8926
> > > > >
> > > > > pswpout 237952 127686 103435
> > > > >
> > > > > pgalloc_dma 3305034 3542614 3401000
> > > > >
> > > > > pgfree 3354989 3592132 3457783
> > > > >
> > > > > pgactivate 626468 355275 326716
> > > > >
> > > > > pgdeactivate 990205 771902 697392
> > > > >
> > > > > pageref_activate 294780 157106 138451
> > > > >
> > > > > pageref_activate_vm_exec 141722 63469 64585
> > > > >
> > > > > pageref_keep 121931 63028 65811
> > > > >
> > > > > pgmajfault 67818 45643 34944
> > > > >
> > > > > pgrefill_dma 1324023 977192 874497
> > > > >
> > > > > pgscan_kswapd_dma 1825267 1720322 1577483
> > > > >
> > > > > pgsteal_kswapd_dma 1181882 1365500 1243968
> > > > >
> > > > > pgscan_direct_dma 41957 9622 9387
> > > > >
> > > > > pgsteal_direct_dma 25136 6759 7108
> > > > >
> > > > > slabs_scanned 689575 542705 618839
> > > > >
> > > > > pageoutrun 1234 1538 1450
> > > > >
> > > > > allocstall 110 26 13
> > > > >
> > > > > Everything seems to have improved except slabs_scanned, possibly because
> > > > > of this check which Minchan pointed out, that results in higher pressure on slabs.
> > > > >
> > > > > if (page_mapped(page) || PageSwapCache(page))
> > > > >
> > > > > sc->nr_scanned++;
> > > > >
> > > > > I had added some traces to monitor the vmpressure values. Those also seems to
> > > > > be high, possibly because of the same reason.
> > > > >
> > > > > Should the pressure be doubled only if page is mapped and referenced ?
> > > >
> > > > Yes, pte_mkold is not perfect at the moment.
> > > >
> > > > Anyway, above heuristic has been in there for a long time since I was born
> > > > maybe :) (I don't want to argue why it's there and whether it's right) So,
> > > > I'm really hesitant to change it that it might bite some workloads.
> > > > (But I don't mean I'm against it but just don't want to make it by myself
> > > > to avoid potential blame). IOW, Kirill's fault_around broke it too so it
> > > > could bite some workloads.
> > > >
> > > > At least, as Vinayak mentioned, it would change vmpressure level so users of
> > > > vmpressure can be affected. AFAIK, some vendors in embedded side relies on
> > > > vmpressure to control memory management so it will hurt them.
> > > > As well, slab shrinking behavior was changed, too. Unfortunately, I don't
> > > > know any workload is dependent with it.
> > > >
> > > > As other regression in my company product, we have snapshot a process
> > > > with workingset for later fast resume. For that, we have considered
> > > > pte-mapped pages as workingset for snapshot but snapshot start to include
> > > > non-workingset pages since fault-around is merged. It means snapshot
> > > > image size is increased so that we need more storage space and it starts
> > > > the thing slow down. I guess mincore(2) users will be affected.
> > > >
> > > > Additional Note: There are lots of products with ARM which is non-HW access
> > > > bit system in embedded world although ARM start to support it recenlty and
> > > > sequential file access workload is not important compared to memory reclaim
> > > > So, fault_around's benefit could be higly limited compared to HW-access bit
> > > > architectures on server workload.
> > > >
> > > > I want to ask again.
> > > > I guess we could disable fault_around by kernel parameter but does it
> > > > sound reasonable to enable fault_around by default for every arches
> > > > at the cost of above regression?
> > > >
> > > > I'm not against for that. Just what I want is some fixes about the
> > > > regression should go to -stable.
> > > >
> > > > >
> > > > > There is big improvement in avg latency, but still 5% higher than with fault_around
> > > > > disabled. I will try to debug this further.
> > >
> > > I did quick test in my ARM machine.
> > >
> > > 512M file mmap sequential every word read
> > >
> > > = vanilla fault_around=4096 =
> > > minor fault: 131291
> > > elapsed time(usec): 6686236
> > >
> > > = vanilla fault_around=65536 =
> > > minor fault: 12577
> > > elapsed time(usec): 6586959
> > >
> > > I tested 3 times and result seemed to be stable.
> > > 90% minor fault was reduced. It's huge win but as looking at elapsed time,
> > > it's not huge win. Just about 1.5%.
> > >
> > > = pte_mkold applied fault_around=4096 =
> > > minor fault: 131291
> > > elapsed time(usec): 6608358
> > >
> > > = pte_mkold applied fault_around=65536 =
> > > minor fault: 143609
> > > elapsed time(usec): 6772520
> > >
> > > I tested 3 times and result seemed to be stable.
> > > minor fault was rather increased and elapsed time was slow with
> > > fault_around.
> > > Gain is really not clear.
> >
> > Kirill,
> > You wanted to test non-HW access bit system and I did.
> > What's your opinion?
>
> Sorry, for late response.
>
> My patch is incomlete: we need to find a way to not mark pte as old if we
> handle page fault for the address the pte represents.
I'm sure you can handle it but my point is there wouldn't be a big gain
although you can handle it in non-HW access bit system. Okay, let's be
more clear because I don't have every non-HW access bit architecture.
At least, current mobile workload in ARM which I have wouldn't be huge
benefit.
I will say one more.
I tested the workload on quad-core system and core speed is not so slow
compared to recent other mobile phone SoC. Even when I tested the benchmark
without pte_mkold, the benefit is within noise because storage is really
slow so major fault is dominant factor. So, I decide test storage from eMMC
to eSATA. And then finally, I manage to see the a little beneift with
fault_around without pte_mkold.
However, let's consider side-effect aspect from fault_around.
1. Increase slab shrinking compard to old
2. high level vmpressure compared to old
With considering that regressions on my system, it's really not worth to
try at the moment.
That's why I wanted to disable fault_around as default in non-HW access
bit system.
>
> Once this will be done, the number of page faults shouldn't be higher with
> fault-around enabled even on machines without hardware accessed bit. This
> will address performance regression with the patch on such machines.
Although you solves that, I guess the benefit would be marginal in
some architectures but we should solve above side-effects.
>
> I'll try to find time to update the patch soon.
I hope you can solve above those regressions as well.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
2016-05-16 14:56 ` Minchan Kim
@ 2016-05-17 12:34 ` Kirill A. Shutemov
-1 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-05-17 12:34 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Mon, May 16, 2016 at 11:56:32PM +0900, Minchan Kim wrote:
> On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote:
> > > Kirill,
> > > You wanted to test non-HW access bit system and I did.
> > > What's your opinion?
> >
> > Sorry, for late response.
> >
> > My patch is incomlete: we need to find a way to not mark pte as old if we
> > handle page fault for the address the pte represents.
>
> I'm sure you can handle it but my point is there wouldn't be a big gain
> although you can handle it in non-HW access bit system. Okay, let's be
> more clear because I don't have every non-HW access bit architecture.
> At least, current mobile workload in ARM which I have wouldn't be huge
> benefit.
> I will say one more.
> I tested the workload on quad-core system and core speed is not so slow
> compared to recent other mobile phone SoC. Even when I tested the benchmark
> without pte_mkold, the benefit is within noise because storage is really
> slow so major fault is dominant factor. So, I decide test storage from eMMC
> to eSATA. And then finally, I manage to see the a little beneift with
> fault_around without pte_mkold.
>
> However, let's consider side-effect aspect from fault_around.
>
> 1. Increase slab shrinking compard to old
> 2. high level vmpressure compared to old
>
> With considering that regressions on my system, it's really not worth to
> try at the moment.
> That's why I wanted to disable fault_around as default in non-HW access
> bit system.
Feel free to post such patch. I guess it's reasonable.
> > Once this will be done, the number of page faults shouldn't be higher with
> > fault-around enabled even on machines without hardware accessed bit. This
> > will address performance regression with the patch on such machines.
>
> Although you solves that, I guess the benefit would be marginal in
> some architectures but we should solve above side-effects.
>
> >
> > I'll try to find time to update the patch soon.
>
> I hope you can solve above those regressions as well.
The patch is posted. Please test.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] mm: make fault_around_bytes configurable
@ 2016-05-17 12:34 ` Kirill A. Shutemov
0 siblings, 0 replies; 34+ messages in thread
From: Kirill A. Shutemov @ 2016-05-17 12:34 UTC (permalink / raw)
To: Minchan Kim
Cc: Vinayak Menon, Andrew Morton, linux-mm, linux-kernel,
dan.j.williams, mgorman, vbabka, kirill.shutemov, dave.hansen,
hughd
On Mon, May 16, 2016 at 11:56:32PM +0900, Minchan Kim wrote:
> On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote:
> > > Kirill,
> > > You wanted to test non-HW access bit system and I did.
> > > What's your opinion?
> >
> > Sorry, for late response.
> >
> > My patch is incomlete: we need to find a way to not mark pte as old if we
> > handle page fault for the address the pte represents.
>
> I'm sure you can handle it but my point is there wouldn't be a big gain
> although you can handle it in non-HW access bit system. Okay, let's be
> more clear because I don't have every non-HW access bit architecture.
> At least, current mobile workload in ARM which I have wouldn't be huge
> benefit.
> I will say one more.
> I tested the workload on quad-core system and core speed is not so slow
> compared to recent other mobile phone SoC. Even when I tested the benchmark
> without pte_mkold, the benefit is within noise because storage is really
> slow so major fault is dominant factor. So, I decide test storage from eMMC
> to eSATA. And then finally, I manage to see the a little beneift with
> fault_around without pte_mkold.
>
> However, let's consider side-effect aspect from fault_around.
>
> 1. Increase slab shrinking compard to old
> 2. high level vmpressure compared to old
>
> With considering that regressions on my system, it's really not worth to
> try at the moment.
> That's why I wanted to disable fault_around as default in non-HW access
> bit system.
Feel free to post such patch. I guess it's reasonable.
> > Once this will be done, the number of page faults shouldn't be higher with
> > fault-around enabled even on machines without hardware accessed bit. This
> > will address performance regression with the patch on such machines.
>
> Although you solves that, I guess the benefit would be marginal in
> some architectures but we should solve above side-effects.
>
> >
> > I'll try to find time to update the patch soon.
>
> I hope you can solve above those regressions as well.
The patch is posted. Please test.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2016-05-17 12:34 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-18 15:17 [PATCH] mm: make fault_around_bytes configurable Vinayak Menon
2016-04-18 15:17 ` Vinayak Menon
2016-04-22 0:01 ` Andrew Morton
2016-04-22 0:01 ` Andrew Morton
2016-04-22 8:45 ` Vinayak Menon
2016-04-22 8:45 ` Vinayak Menon
2016-04-22 9:44 ` Kirill A. Shutemov
2016-04-22 9:44 ` Kirill A. Shutemov
2016-04-22 15:09 ` Minchan Kim
2016-04-22 15:09 ` Minchan Kim
2016-04-22 15:16 ` Kirill A. Shutemov
2016-04-22 15:16 ` Kirill A. Shutemov
2016-04-25 11:51 ` Vinayak Menon
2016-04-25 11:51 ` Vinayak Menon
2016-05-09 7:32 ` Minchan Kim
2016-05-09 7:32 ` Minchan Kim
2016-05-10 2:48 ` Minchan Kim
2016-05-10 2:48 ` Minchan Kim
2016-05-16 14:18 ` Minchan Kim
2016-05-16 14:18 ` Minchan Kim
2016-05-16 14:29 ` Kirill A. Shutemov
2016-05-16 14:29 ` Kirill A. Shutemov
2016-05-16 14:56 ` Minchan Kim
2016-05-16 14:56 ` Minchan Kim
2016-05-17 12:34 ` Kirill A. Shutemov
2016-05-17 12:34 ` Kirill A. Shutemov
2016-04-22 14:02 ` Minchan Kim
2016-04-22 14:02 ` Minchan Kim
2016-04-22 14:11 ` Kirill A. Shutemov
2016-04-22 14:11 ` Kirill A. Shutemov
2016-04-22 14:17 ` Kirill A. Shutemov
2016-04-22 14:17 ` Kirill A. Shutemov
2016-04-22 14:50 ` Minchan Kim
2016-04-22 14:50 ` Minchan Kim
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.