* [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
@ 2017-02-24 6:49 Jia He
2017-02-24 8:49 ` Michal Hocko
0 siblings, 1 reply; 8+ messages in thread
From: Jia He @ 2017-02-24 6:49 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, Andrew Morton, Johannes Weiner, Mel Gorman,
Vlastimil Babka, Michal Hocko, Minchan Kim, Rik van Riel, Jia He
In a numa server, topology looks like
available: 3 nodes (0,2-3)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus: 0 1 2 3 4 5 6 7
node 2 size: 15299 MB
node 2 free: 289 MB
node 3 cpus:
node 3 size: 15336 MB
node 3 free: 184 MB
node distances:
node 0 2 3
0: 10 40 40
2: 40 10 20
3: 40 20 10
When I try to dynamically allocate the hugepages more than system total free
memory:
e.g. echo 4000 >/proc/sys/vm/nr_hugepages
Then the kswapd will take 100% cpu for a long time(more than 3 hours, and will
not be about to end)
top result:
top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
The root cause is: kswapd3 is waken up and then try to do reclaim again and
again but it makes no progress. At last the allocated hugepages are less than
4000.
HugePages_Total: 1864
HugePages_Free: 1864
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
At that time, even there are no relaimable pages in that node3, kswapd3 will
not go to sleep.
Node 3, zone DMA
per-node stats
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_isolated_anon 0
nr_isolated_file 0
nr_pages_scanned 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_writeback_temp 0
nr_shmem 0
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 0
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 0
nr_written 0
pages free 2951
min 2821
low 3526
high 4231
node_scanned 0
spanned 245760
present 245760
managed 245388
nr_free_pages 2951
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_slab_reclaimable 46
nr_slab_unreclaimable 90
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
numa_hit 2257
numa_miss 0
numa_foreign 0
numa_interleave 982
numa_local 0
numa_other 2257
nr_free_cma 0
protection: (0, 0, 0, 0)
It would be called a misconfiguration but it seems that it might be quite easy
to hit with NUMA machines which have large differences in the node sizes.
Further more, when it consumes most the memory in node3, every alloc slow path
might wake up kswapd3 and it will make things worse:
__alloc_pages_slowpath
wake_all_kswapds
wakeup_kswapd
This patch resolves the issue from 2 aspects:
1. In prepare_kswapd_sleep, only when zone is not balanced and there are
reclaimable pages in this zone, kswapd will go to do relaim without sleeping
2. Don't wake up kswapd if there are no reclaimable pages in that node
After this patch:
top - 07:29:43 up 3 min, 1 user, load average: 0.12, 0.13, 0.06
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 97.8 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 31371520 total, 938112 used, 30433408 free, 5504 buffers
KiB Swap: 6284224 total, 0 used, 6284224 free. 632448 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
78 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kswapd3
Changes:
V2: - fix incorrect condition for assignment of node_has_reclaimable_pages
- make commit decription better
Signed-off-by: Jia He <hejianet@gmail.com>
---
mm/vmscan.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 532a2a7..7c5a563 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3139,7 +3139,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx)
if (!managed_zone(zone))
continue;
- if (!zone_balanced(zone, order, classzone_idx))
+ if (!zone_balanced(zone, order, classzone_idx)
+ && zone_reclaimable_pages(zone))
return false;
}
@@ -3502,6 +3503,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
{
pg_data_t *pgdat;
int z;
+ int node_has_relaimable_pages = 0;
if (!managed_zone(zone))
return;
@@ -3522,8 +3524,15 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
if (zone_balanced(zone, order, classzone_idx))
return;
+
+ if (zone_reclaimable_pages(zone))
+ node_has_relaimable_pages = 1;
}
+ /* Dont wake kswapd if all zones has no reclaimable pages */
+ if (!node_has_relaimable_pages)
+ return;
+
trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
wake_up_interruptible(&pgdat->kswapd_wait);
}
--
1.8.5.6
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-24 6:49 [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages Jia He
@ 2017-02-24 8:49 ` Michal Hocko
2017-02-24 16:51 ` Johannes Weiner
0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-02-24 8:49 UTC (permalink / raw)
To: Jia He
Cc: linux-mm, linux-kernel, Andrew Morton, Johannes Weiner,
Mel Gorman, Vlastimil Babka, Minchan Kim, Rik van Riel
On Fri 24-02-17 14:49:52, Jia He wrote:
> In a numa server, topology looks like
> available: 3 nodes (0,2-3)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus: 0 1 2 3 4 5 6 7
> node 2 size: 15299 MB
> node 2 free: 289 MB
> node 3 cpus:
> node 3 size: 15336 MB
> node 3 free: 184 MB
> node distances:
> node 0 2 3
> 0: 10 40 40
> 2: 40 10 20
> 3: 40 20 10
>
> When I try to dynamically allocate the hugepages more than system total free
> memory:
> e.g. echo 4000 >/proc/sys/vm/nr_hugepages
>
> Then the kswapd will take 100% cpu for a long time(more than 3 hours, and will
> not be about to end)
> top result:
> top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
> KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
>
> The root cause is: kswapd3 is waken up and then try to do reclaim again and
> again but it makes no progress. At last the allocated hugepages are less than
> 4000.
> HugePages_Total: 1864
> HugePages_Free: 1864
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 16384 kB
>
> At that time, even there are no relaimable pages in that node3, kswapd3 will
> not go to sleep.
> Node 3, zone DMA
> per-node stats
> nr_inactive_anon 0
> nr_active_anon 0
> nr_inactive_file 0
> nr_active_file 0
> nr_unevictable 0
> nr_isolated_anon 0
> nr_isolated_file 0
> nr_pages_scanned 0
> workingset_refault 0
> workingset_activate 0
> workingset_nodereclaim 0
> nr_anon_pages 0
> nr_mapped 0
> nr_file_pages 0
> nr_dirty 0
> nr_writeback 0
> nr_writeback_temp 0
> nr_shmem 0
> nr_shmem_hugepages 0
> nr_shmem_pmdmapped 0
> nr_anon_transparent_hugepages 0
> nr_unstable 0
> nr_vmscan_write 0
> nr_vmscan_immediate_reclaim 0
> nr_dirtied 0
> nr_written 0
> pages free 2951
> min 2821
> low 3526
> high 4231
> node_scanned 0
> spanned 245760
> present 245760
> managed 245388
> nr_free_pages 2951
> nr_zone_inactive_anon 0
> nr_zone_active_anon 0
> nr_zone_inactive_file 0
> nr_zone_active_file 0
> nr_zone_unevictable 0
> nr_zone_write_pending 0
> nr_mlock 0
> nr_slab_reclaimable 46
> nr_slab_unreclaimable 90
> nr_page_table_pages 0
> nr_kernel_stack 0
> nr_bounce 0
> nr_zspages 0
> numa_hit 2257
> numa_miss 0
> numa_foreign 0
> numa_interleave 982
> numa_local 0
> numa_other 2257
> nr_free_cma 0
> protection: (0, 0, 0, 0)
> It would be called a misconfiguration but it seems that it might be quite easy
> to hit with NUMA machines which have large differences in the node sizes.
>
> Further more, when it consumes most the memory in node3, every alloc slow path
> might wake up kswapd3 and it will make things worse:
> __alloc_pages_slowpath
> wake_all_kswapds
> wakeup_kswapd
>
> This patch resolves the issue from 2 aspects:
> 1. In prepare_kswapd_sleep, only when zone is not balanced and there are
> reclaimable pages in this zone, kswapd will go to do relaim without sleeping
> 2. Don't wake up kswapd if there are no reclaimable pages in that node
>
> After this patch:
> top - 07:29:43 up 3 min, 1 user, load average: 0.12, 0.13, 0.06
> Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 97.8 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 31371520 total, 938112 used, 30433408 free, 5504 buffers
> KiB Swap: 6284224 total, 0 used, 6284224 free. 632448 cached Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 78 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kswapd3
>
> Changes:
> V2: - fix incorrect condition for assignment of node_has_reclaimable_pages
> - make commit decription better
I believe we should pursue the proposal from Johannes which is more
generic and copes with corner cases much better.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-24 8:49 ` Michal Hocko
@ 2017-02-24 16:51 ` Johannes Weiner
2017-02-27 6:04 ` hejianet
2017-02-27 8:50 ` Michal Hocko
0 siblings, 2 replies; 8+ messages in thread
From: Johannes Weiner @ 2017-02-24 16:51 UTC (permalink / raw)
To: Michal Hocko
Cc: Jia He, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
Vlastimil Babka, Minchan Kim, Rik van Riel
On Fri, Feb 24, 2017 at 09:49:50AM +0100, Michal Hocko wrote:
> I believe we should pursue the proposal from Johannes which is more
> generic and copes with corner cases much better.
Jia, can you try this? I'll put the cleanups in follow-up patches.
---
>From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Fri, 24 Feb 2017 10:56:32 -0500
Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
Jia He reports a problem with kswapd spinning at 100% CPU when
requesting more hugepages than memory available in the system:
$ echo 4000 >/proc/sys/vm/nr_hugepages
top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
At that time, there are no reclaimable pages left in the node, but as
kswapd fails to restore the high watermarks it refuses to go to sleep.
Kswapd needs to back away from nodes that fail to balance. Up until
1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
kswapd had such a mechanism. It considered zones whose theoretically
reclaimable pages it had reclaimed six times over as unreclaimable and
backed away from them. This guard was erroneously removed as the patch
changed the definition of a balanced node.
However, simply restoring this code wouldn't help in the case reported
here: there *are* no reclaimable pages that could be scanned until the
threshold is met. Kswapd would stay awake anyway.
Introduce a new and much simpler way of backing off. If kswapd runs
through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
page, make it back off from the node. This is the same number of shots
direct reclaim takes before declaring OOM. Kswapd will go to sleep on
that node until a direct reclaimer manages to reclaim some pages, thus
proving the node reclaimable again.
Reported-by: Jia He <hejianet@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/mmzone.h | 2 ++
include/linux/swap.h | 1 +
mm/page_alloc.c | 6 ------
mm/vmscan.c | 20 ++++++++++++++++++++
4 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8e02b3750fe0..d2c50ab6ae40 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -630,6 +630,8 @@ typedef struct pglist_data {
int kswapd_order;
enum zone_type kswapd_classzone_idx;
+ int kswapd_failures; /* Number of 'reclaimed == 0' runs */
+
#ifdef CONFIG_COMPACTION
int kcompactd_max_order;
enum zone_type kcompactd_classzone_idx;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd6716d..5c06581a730b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -288,6 +288,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
struct vm_area_struct *vma);
/* linux/mm/vmscan.c */
+#define MAX_RECLAIM_RETRIES 16
extern unsigned long zone_reclaimable_pages(struct zone *zone);
extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 614cd0397ce3..83f0442f07fa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3516,12 +3516,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
}
/*
- * Maximum number of reclaim retries without any progress before OOM killer
- * is consider as the only way to move forward.
- */
-#define MAX_RECLAIM_RETRIES 16
-
-/*
* Checks whether it makes sense to retry the reclaim to make a forward progress
* for the given allocation request.
* The reclaim feedback represented by did_some_progress (any progress during
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26c3b405ef34..8e9bdd172182 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2626,6 +2626,15 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
} while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
sc->nr_scanned - nr_scanned, sc));
+ /*
+ * Kswapd gives up on balancing particular nodes after too
+ * many failures to reclaim anything from them. If reclaim
+ * progress happens, reset the failure counter. A successful
+ * direct reclaim run will knock a stuck kswapd loose again.
+ */
+ if (reclaimable)
+ pgdat->kswapd_failures = 0;
+
return reclaimable;
}
@@ -3134,6 +3143,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx)
if (waitqueue_active(&pgdat->pfmemalloc_wait))
wake_up_all(&pgdat->pfmemalloc_wait);
+ /* Hopeless node, leave it to direct reclaim */
+ if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+ return true;
+
for (i = 0; i <= classzone_idx; i++) {
struct zone *zone = pgdat->node_zones + i;
@@ -3316,6 +3329,9 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
sc.priority--;
} while (sc.priority >= 1);
+ if (!sc.nr_reclaimed)
+ pgdat->kswapd_failures++;
+
out:
/*
* Return the order kswapd stopped reclaiming at as
@@ -3515,6 +3531,10 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
+ /* Hopeless node, leave it to direct reclaim */
+ if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
+ return;
+
/* Only wake kswapd if all zones are unbalanced */
for (z = 0; z <= classzone_idx; z++) {
zone = pgdat->node_zones + z;
--
2.11.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-24 16:51 ` Johannes Weiner
@ 2017-02-27 6:04 ` hejianet
2017-02-27 8:50 ` Michal Hocko
1 sibling, 0 replies; 8+ messages in thread
From: hejianet @ 2017-02-27 6:04 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko
Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
Vlastimil Babka, Minchan Kim, Rik van Riel
Hi
Tested-by: Jia He <hejianet@gmail.com>
cat /proc/meminfo
[...]
CmaFree: 0 kB
HugePages_Total: 1831
HugePages_Free: 1831
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
top - 06:50:29 up 1:26, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.2 sy, 0.0 ni, 99.6 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 31371520 total, 30577664 used, 793856 free, 256 buffers
KiB Swap: 6284224 total, 128 used, 6284096 free. 281280 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
79 root 20 0 0 0 0 S 0.000 0.000 0:00.00 kswapd3
On 25/02/2017 12:51 AM, Johannes Weiner wrote:
> On Fri, Feb 24, 2017 at 09:49:50AM +0100, Michal Hocko wrote:
>> I believe we should pursue the proposal from Johannes which is more
>> generic and copes with corner cases much better.
>
> Jia, can you try this? I'll put the cleanups in follow-up patches.
>
> ---
>
>>From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Fri, 24 Feb 2017 10:56:32 -0500
> Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
>
> Jia He reports a problem with kswapd spinning at 100% CPU when
> requesting more hugepages than memory available in the system:
>
> $ echo 4000 >/proc/sys/vm/nr_hugepages
>
> top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
> KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
>
> At that time, there are no reclaimable pages left in the node, but as
> kswapd fails to restore the high watermarks it refuses to go to sleep.
>
> Kswapd needs to back away from nodes that fail to balance. Up until
> 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
> kswapd had such a mechanism. It considered zones whose theoretically
> reclaimable pages it had reclaimed six times over as unreclaimable and
> backed away from them. This guard was erroneously removed as the patch
> changed the definition of a balanced node.
>
> However, simply restoring this code wouldn't help in the case reported
> here: there *are* no reclaimable pages that could be scanned until the
> threshold is met. Kswapd would stay awake anyway.
>
> Introduce a new and much simpler way of backing off. If kswapd runs
> through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
> page, make it back off from the node. This is the same number of shots
> direct reclaim takes before declaring OOM. Kswapd will go to sleep on
> that node until a direct reclaimer manages to reclaim some pages, thus
> proving the node reclaimable again.
>
> Reported-by: Jia He <hejianet@gmail.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/mmzone.h | 2 ++
> include/linux/swap.h | 1 +
> mm/page_alloc.c | 6 ------
> mm/vmscan.c | 20 ++++++++++++++++++++
> 4 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 8e02b3750fe0..d2c50ab6ae40 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -630,6 +630,8 @@ typedef struct pglist_data {
> int kswapd_order;
> enum zone_type kswapd_classzone_idx;
>
> + int kswapd_failures; /* Number of 'reclaimed == 0' runs */
> +
> #ifdef CONFIG_COMPACTION
> int kcompactd_max_order;
> enum zone_type kcompactd_classzone_idx;
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 45e91dd6716d..5c06581a730b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -288,6 +288,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
> struct vm_area_struct *vma);
>
> /* linux/mm/vmscan.c */
> +#define MAX_RECLAIM_RETRIES 16
> extern unsigned long zone_reclaimable_pages(struct zone *zone);
> extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
> extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 614cd0397ce3..83f0442f07fa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3516,12 +3516,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> }
>
> /*
> - * Maximum number of reclaim retries without any progress before OOM killer
> - * is consider as the only way to move forward.
> - */
> -#define MAX_RECLAIM_RETRIES 16
> -
> -/*
> * Checks whether it makes sense to retry the reclaim to make a forward progress
> * for the given allocation request.
> * The reclaim feedback represented by did_some_progress (any progress during
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 26c3b405ef34..8e9bdd172182 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2626,6 +2626,15 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
> sc->nr_scanned - nr_scanned, sc));
>
> + /*
> + * Kswapd gives up on balancing particular nodes after too
> + * many failures to reclaim anything from them. If reclaim
> + * progress happens, reset the failure counter. A successful
> + * direct reclaim run will knock a stuck kswapd loose again.
> + */
> + if (reclaimable)
> + pgdat->kswapd_failures = 0;
> +
> return reclaimable;
> }
>
> @@ -3134,6 +3143,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx)
> if (waitqueue_active(&pgdat->pfmemalloc_wait))
> wake_up_all(&pgdat->pfmemalloc_wait);
>
> + /* Hopeless node, leave it to direct reclaim */
> + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> + return true;
> +
> for (i = 0; i <= classzone_idx; i++) {
> struct zone *zone = pgdat->node_zones + i;
>
> @@ -3316,6 +3329,9 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
> sc.priority--;
> } while (sc.priority >= 1);
>
> + if (!sc.nr_reclaimed)
> + pgdat->kswapd_failures++;
> +
> out:
> /*
> * Return the order kswapd stopped reclaiming at as
> @@ -3515,6 +3531,10 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
> if (!waitqueue_active(&pgdat->kswapd_wait))
> return;
>
> + /* Hopeless node, leave it to direct reclaim */
> + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> + return;
> +
> /* Only wake kswapd if all zones are unbalanced */
> for (z = 0; z <= classzone_idx; z++) {
> zone = pgdat->node_zones + z;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-24 16:51 ` Johannes Weiner
2017-02-27 6:04 ` hejianet
@ 2017-02-27 8:50 ` Michal Hocko
2017-02-27 17:06 ` Johannes Weiner
1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-02-27 8:50 UTC (permalink / raw)
To: Johannes Weiner
Cc: Jia He, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
Vlastimil Babka, Minchan Kim, Rik van Riel
On Fri 24-02-17 11:51:05, Johannes Weiner wrote:
[...]
> >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Fri, 24 Feb 2017 10:56:32 -0500
> Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
>
> Jia He reports a problem with kswapd spinning at 100% CPU when
> requesting more hugepages than memory available in the system:
>
> $ echo 4000 >/proc/sys/vm/nr_hugepages
>
> top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
> KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
> KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
>
> At that time, there are no reclaimable pages left in the node, but as
> kswapd fails to restore the high watermarks it refuses to go to sleep.
>
> Kswapd needs to back away from nodes that fail to balance. Up until
> 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
> kswapd had such a mechanism. It considered zones whose theoretically
> reclaimable pages it had reclaimed six times over as unreclaimable and
> backed away from them. This guard was erroneously removed as the patch
> changed the definition of a balanced node.
>
> However, simply restoring this code wouldn't help in the case reported
> here: there *are* no reclaimable pages that could be scanned until the
> threshold is met. Kswapd would stay awake anyway.
>
> Introduce a new and much simpler way of backing off. If kswapd runs
> through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
> page, make it back off from the node. This is the same number of shots
> direct reclaim takes before declaring OOM. Kswapd will go to sleep on
> that node until a direct reclaimer manages to reclaim some pages, thus
> proving the node reclaimable again.
Yes this looks, nice&simple. I would just be worried about [1] a bit.
Maybe that is worth a separate patch though.
[1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3quxwz@dhcp22.suse.cz
> Reported-by: Jia He <hejianet@gmail.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
I would have just one more suggestion. Please move MAX_RECLAIM_RETRIES
to mm/internal.h. This is MM internal thing and there is no need to make
it visible.
> ---
> include/linux/mmzone.h | 2 ++
> include/linux/swap.h | 1 +
> mm/page_alloc.c | 6 ------
> mm/vmscan.c | 20 ++++++++++++++++++++
> 4 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 8e02b3750fe0..d2c50ab6ae40 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -630,6 +630,8 @@ typedef struct pglist_data {
> int kswapd_order;
> enum zone_type kswapd_classzone_idx;
>
> + int kswapd_failures; /* Number of 'reclaimed == 0' runs */
> +
> #ifdef CONFIG_COMPACTION
> int kcompactd_max_order;
> enum zone_type kcompactd_classzone_idx;
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 45e91dd6716d..5c06581a730b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -288,6 +288,7 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
> struct vm_area_struct *vma);
>
> /* linux/mm/vmscan.c */
> +#define MAX_RECLAIM_RETRIES 16
> extern unsigned long zone_reclaimable_pages(struct zone *zone);
> extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
> extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 614cd0397ce3..83f0442f07fa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3516,12 +3516,6 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> }
>
> /*
> - * Maximum number of reclaim retries without any progress before OOM killer
> - * is consider as the only way to move forward.
> - */
> -#define MAX_RECLAIM_RETRIES 16
> -
> -/*
> * Checks whether it makes sense to retry the reclaim to make a forward progress
> * for the given allocation request.
> * The reclaim feedback represented by did_some_progress (any progress during
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 26c3b405ef34..8e9bdd172182 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2626,6 +2626,15 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
> sc->nr_scanned - nr_scanned, sc));
>
> + /*
> + * Kswapd gives up on balancing particular nodes after too
> + * many failures to reclaim anything from them. If reclaim
> + * progress happens, reset the failure counter. A successful
> + * direct reclaim run will knock a stuck kswapd loose again.
> + */
> + if (reclaimable)
> + pgdat->kswapd_failures = 0;
> +
> return reclaimable;
> }
>
> @@ -3134,6 +3143,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx)
> if (waitqueue_active(&pgdat->pfmemalloc_wait))
> wake_up_all(&pgdat->pfmemalloc_wait);
>
> + /* Hopeless node, leave it to direct reclaim */
> + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> + return true;
> +
> for (i = 0; i <= classzone_idx; i++) {
> struct zone *zone = pgdat->node_zones + i;
>
> @@ -3316,6 +3329,9 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx)
> sc.priority--;
> } while (sc.priority >= 1);
>
> + if (!sc.nr_reclaimed)
> + pgdat->kswapd_failures++;
> +
> out:
> /*
> * Return the order kswapd stopped reclaiming at as
> @@ -3515,6 +3531,10 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
> if (!waitqueue_active(&pgdat->kswapd_wait))
> return;
>
> + /* Hopeless node, leave it to direct reclaim */
> + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES)
> + return;
> +
> /* Only wake kswapd if all zones are unbalanced */
> for (z = 0; z <= classzone_idx; z++) {
> zone = pgdat->node_zones + z;
> --
> 2.11.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-27 8:50 ` Michal Hocko
@ 2017-02-27 17:06 ` Johannes Weiner
2017-02-27 17:29 ` Michal Hocko
2017-02-28 1:53 ` hejianet
0 siblings, 2 replies; 8+ messages in thread
From: Johannes Weiner @ 2017-02-27 17:06 UTC (permalink / raw)
To: Michal Hocko
Cc: Jia He, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
Vlastimil Babka, Minchan Kim, Rik van Riel
On Mon, Feb 27, 2017 at 09:50:24AM +0100, Michal Hocko wrote:
> On Fri 24-02-17 11:51:05, Johannes Weiner wrote:
> [...]
> > >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Date: Fri, 24 Feb 2017 10:56:32 -0500
> > Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
> >
> > Jia He reports a problem with kswapd spinning at 100% CPU when
> > requesting more hugepages than memory available in the system:
> >
> > $ echo 4000 >/proc/sys/vm/nr_hugepages
> >
> > top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
> > Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> > %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
> > KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
> >
> > At that time, there are no reclaimable pages left in the node, but as
> > kswapd fails to restore the high watermarks it refuses to go to sleep.
> >
> > Kswapd needs to back away from nodes that fail to balance. Up until
> > 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
> > kswapd had such a mechanism. It considered zones whose theoretically
> > reclaimable pages it had reclaimed six times over as unreclaimable and
> > backed away from them. This guard was erroneously removed as the patch
> > changed the definition of a balanced node.
> >
> > However, simply restoring this code wouldn't help in the case reported
> > here: there *are* no reclaimable pages that could be scanned until the
> > threshold is met. Kswapd would stay awake anyway.
> >
> > Introduce a new and much simpler way of backing off. If kswapd runs
> > through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
> > page, make it back off from the node. This is the same number of shots
> > direct reclaim takes before declaring OOM. Kswapd will go to sleep on
> > that node until a direct reclaimer manages to reclaim some pages, thus
> > proving the node reclaimable again.
>
> Yes this looks, nice&simple. I would just be worried about [1] a bit.
> Maybe that is worth a separate patch though.
>
> [1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3quxwz@dhcp22.suse.cz
I think I'd prefer the simplicity of keeping this contained inside
vmscan.c, as an interaction between direct reclaimers and kswapd, as
well as leaving the wakeup tied to actually seeing reclaimable pages
rather than merely producing free pages (e.g. should we also add a
kick to a large munmap() for example?).
OOM kills come with such high latencies that I cannot imagine a
slightly quicker kswapd restart would matter in practice.
> > Reported-by: Jia He <hejianet@gmail.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> I would have just one more suggestion. Please move MAX_RECLAIM_RETRIES
> to mm/internal.h. This is MM internal thing and there is no need to make
> it visible.
Good point, I'll move it.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-27 17:06 ` Johannes Weiner
@ 2017-02-27 17:29 ` Michal Hocko
2017-02-28 1:53 ` hejianet
1 sibling, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-02-27 17:29 UTC (permalink / raw)
To: Johannes Weiner
Cc: Jia He, linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
Vlastimil Babka, Minchan Kim, Rik van Riel
On Mon 27-02-17 12:06:34, Johannes Weiner wrote:
> On Mon, Feb 27, 2017 at 09:50:24AM +0100, Michal Hocko wrote:
> > On Fri 24-02-17 11:51:05, Johannes Weiner wrote:
> > [...]
> > > >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > Date: Fri, 24 Feb 2017 10:56:32 -0500
> > > Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
> > >
> > > Jia He reports a problem with kswapd spinning at 100% CPU when
> > > requesting more hugepages than memory available in the system:
> > >
> > > $ echo 4000 >/proc/sys/vm/nr_hugepages
> > >
> > > top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
> > > Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> > > %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > > KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
> > > KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
> > >
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
> > >
> > > At that time, there are no reclaimable pages left in the node, but as
> > > kswapd fails to restore the high watermarks it refuses to go to sleep.
> > >
> > > Kswapd needs to back away from nodes that fail to balance. Up until
> > > 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
> > > kswapd had such a mechanism. It considered zones whose theoretically
> > > reclaimable pages it had reclaimed six times over as unreclaimable and
> > > backed away from them. This guard was erroneously removed as the patch
> > > changed the definition of a balanced node.
> > >
> > > However, simply restoring this code wouldn't help in the case reported
> > > here: there *are* no reclaimable pages that could be scanned until the
> > > threshold is met. Kswapd would stay awake anyway.
> > >
> > > Introduce a new and much simpler way of backing off. If kswapd runs
> > > through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
> > > page, make it back off from the node. This is the same number of shots
> > > direct reclaim takes before declaring OOM. Kswapd will go to sleep on
> > > that node until a direct reclaimer manages to reclaim some pages, thus
> > > proving the node reclaimable again.
> >
> > Yes this looks, nice&simple. I would just be worried about [1] a bit.
> > Maybe that is worth a separate patch though.
> >
> > [1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3quxwz@dhcp22.suse.cz
>
> I think I'd prefer the simplicity of keeping this contained inside
> vmscan.c, as an interaction between direct reclaimers and kswapd, as
> well as leaving the wakeup tied to actually seeing reclaimable pages
> rather than merely producing free pages (e.g. should we also add a
> kick to a large munmap() for example?).
OK, that is a good point as well. I was about to argue that a mlock
runaway process killed by the OOM killer should restart kswapd otherwise
the following operation would be quite surprising. But you are right
that there are other sources of large amout of free pages. So you are
right, let's keep it simple for now and do something based on freed
pages.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages
2017-02-27 17:06 ` Johannes Weiner
2017-02-27 17:29 ` Michal Hocko
@ 2017-02-28 1:53 ` hejianet
1 sibling, 0 replies; 8+ messages in thread
From: hejianet @ 2017-02-28 1:53 UTC (permalink / raw)
To: Johannes Weiner, Michal Hocko
Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
Vlastimil Babka, Minchan Kim, Rik van Riel
Hi Johannes
I have another concern:
kswapd -> balance_pgdat -> age_active_anon
This code path will do some background works to age anon list, will this
patch have some impact on it if the retry time is > 16 and kswapd is
not waken up?
B.R.
Jia
On 28/02/2017 1:06 AM, Johannes Weiner wrote:
> On Mon, Feb 27, 2017 at 09:50:24AM +0100, Michal Hocko wrote:
>> On Fri 24-02-17 11:51:05, Johannes Weiner wrote:
>> [...]
>>> >From 29fefdca148e28830e0934d4e6cceb95ed2ee36e Mon Sep 17 00:00:00 2001
>>> From: Johannes Weiner <hannes@cmpxchg.org>
>>> Date: Fri, 24 Feb 2017 10:56:32 -0500
>>> Subject: [PATCH] mm: vmscan: disable kswapd on unreclaimable nodes
>>>
>>> Jia He reports a problem with kswapd spinning at 100% CPU when
>>> requesting more hugepages than memory available in the system:
>>>
>>> $ echo 4000 >/proc/sys/vm/nr_hugepages
>>>
>>> top - 13:42:59 up 3:37, 1 user, load average: 1.09, 1.03, 1.01
>>> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
>>> %Cpu(s): 0.0 us, 12.5 sy, 0.0 ni, 85.5 id, 2.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>> KiB Mem: 31371520 total, 30915136 used, 456384 free, 320 buffers
>>> KiB Swap: 6284224 total, 115712 used, 6168512 free. 48192 cached Mem
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>> 76 root 20 0 0 0 0 R 100.0 0.000 217:17.29 kswapd3
>>>
>>> At that time, there are no reclaimable pages left in the node, but as
>>> kswapd fails to restore the high watermarks it refuses to go to sleep.
>>>
>>> Kswapd needs to back away from nodes that fail to balance. Up until
>>> 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of nodes")
>>> kswapd had such a mechanism. It considered zones whose theoretically
>>> reclaimable pages it had reclaimed six times over as unreclaimable and
>>> backed away from them. This guard was erroneously removed as the patch
>>> changed the definition of a balanced node.
>>>
>>> However, simply restoring this code wouldn't help in the case reported
>>> here: there *are* no reclaimable pages that could be scanned until the
>>> threshold is met. Kswapd would stay awake anyway.
>>>
>>> Introduce a new and much simpler way of backing off. If kswapd runs
>>> through MAX_RECLAIM_RETRIES (16) cycles without reclaiming a single
>>> page, make it back off from the node. This is the same number of shots
>>> direct reclaim takes before declaring OOM. Kswapd will go to sleep on
>>> that node until a direct reclaimer manages to reclaim some pages, thus
>>> proving the node reclaimable again.
>>
>> Yes this looks, nice&simple. I would just be worried about [1] a bit.
>> Maybe that is worth a separate patch though.
>>
>> [1] http://lkml.kernel.org/r/20170223111609.hlncnvokhq3quxwz@dhcp22.suse.cz
>
> I think I'd prefer the simplicity of keeping this contained inside
> vmscan.c, as an interaction between direct reclaimers and kswapd, as
> well as leaving the wakeup tied to actually seeing reclaimable pages
> rather than merely producing free pages (e.g. should we also add a
> kick to a large munmap() for example?).
>
> OOM kills come with such high latencies that I cannot imagine a
> slightly quicker kswapd restart would matter in practice.
>
>>> Reported-by: Jia He <hejianet@gmail.com>
>>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thanks!
>
>> I would have just one more suggestion. Please move MAX_RECLAIM_RETRIES
>> to mm/internal.h. This is MM internal thing and there is no need to make
>> it visible.
>
> Good point, I'll move it.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-02-28 1:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 6:49 [PATCH v2] mm/vmscan: fix high cpu usage of kswapd if there are no reclaimable pages Jia He
2017-02-24 8:49 ` Michal Hocko
2017-02-24 16:51 ` Johannes Weiner
2017-02-27 6:04 ` hejianet
2017-02-27 8:50 ` Michal Hocko
2017-02-27 17:06 ` Johannes Weiner
2017-02-27 17:29 ` Michal Hocko
2017-02-28 1:53 ` hejianet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).