* kswapd0: wxcessive CPU usage @ 2012-10-11 8:52 Jiri Slaby 2012-10-11 13:44 ` Valdis.Kletnieks 2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton 0 siblings, 2 replies; 52+ messages in thread From: Jiri Slaby @ 2012-10-11 8:52 UTC (permalink / raw) To: linux-mm, LKML, Jiri Slaby Hi, with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 minute or so. If I try to suspend to RAM, this trace appears: kswapd0 R running task 0 577 2 0x00000000 0000000000000000 00000000000000c0 cccccccccccccccd ffff8801c4146800 ffff8801c4b15c88 ffffffff8116ee05 0000000000003e32 ffff8801c3a79000 ffff8801c4b15ca8 ffffffff8116fdf8 ffff8801c480f398 ffff8801c3a79000 Call Trace: [<ffffffff8116ee05>] ? put_super+0x25/0x40 [<ffffffff8116fdd4>] ? grab_super_passive+0x24/0xa0 [<ffffffff8116ff99>] ? prune_super+0x149/0x1b0 [<ffffffff81131531>] ? shrink_slab+0xa1/0x2d0 [<ffffffff8113452d>] ? kswapd+0x66d/0xb60 [<ffffffff81133ec0>] ? try_to_free_pages+0x180/0x180 [<ffffffff810a2770>] ? kthread+0xc0/0xd0 [<ffffffff810a26b0>] ? kthread_create_on_node+0x130/0x130 [<ffffffff816a6c9c>] ? ret_from_fork+0x7c/0x90 [<ffffffff810a26b0>] ? kthread_create_on_node+0x130/0x130 # cat /proc/vmstat nr_free_pages 239962 nr_inactive_anon 89825 nr_active_anon 711136 nr_inactive_file 60386 nr_active_file 46668 nr_unevictable 0 nr_mlock 0 nr_anon_pages 500678 nr_mapped 41319 nr_file_pages 319317 nr_dirty 45 nr_writeback 0 nr_slab_reclaimable 21909 nr_slab_unreclaimable 21598 nr_page_table_pages 12131 nr_kernel_stack 491 nr_unstable 0 nr_bounce 0 nr_vmscan_write 1674280 nr_vmscan_immediate_reclaim 301662 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 212263 nr_dirtied 10620227 nr_written 9260939 nr_anon_transparent_hugepages 172 nr_free_cma 0 nr_dirty_threshold 31459 nr_dirty_background_threshold 15729 pgpgin 31311778 pgpgout 38987552 pswpin 0 pswpout 0 pgalloc_dma 0 pgalloc_dma32 245169455 pgalloc_normal 279685864 pgalloc_movable 0 pgfree 537318727 pgactivate 13126755 pgdeactivate 2482953 pgfault 645947575 pgmajfault 193427 pgrefill_dma 0 pgrefill_dma32 1124272 pgrefill_normal 1998033 pgrefill_movable 0 pgsteal_kswapd_dma 0 pgsteal_kswapd_dma32 2531015 pgsteal_kswapd_normal 3403006 pgsteal_kswapd_movable 0 pgsteal_direct_dma 0 pgsteal_direct_dma32 362488 pgsteal_direct_normal 1134511 pgsteal_direct_movable 0 pgscan_kswapd_dma 0 pgscan_kswapd_dma32 2693620 pgscan_kswapd_normal 5836491 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 368374 pgscan_direct_normal 1658486 pgscan_direct_movable 0 pgscan_direct_throttle 0 pginodesteal 258410 slabs_scanned 86459392 kswapd_inodesteal 3907549 kswapd_low_wmark_hit_quickly 15408 kswapd_high_wmark_hit_quickly 23113 kswapd_skip_congestion_wait 10 pageoutrun 2165627235 allocstall 11256 pgrotated 219624 compact_blocks_moved 4862077 compact_pages_moved 1970005 compact_pagemigrate_failed 1726156 compact_stall 21275 compact_fail 6589 compact_success 14686 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 2799 unevictable_pgs_scanned 0 unevictable_pgs_rescued 22563 unevictable_pgs_mlocked 22563 unevictable_pgs_munlocked 22563 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 18725 thp_fault_fallback 64868 thp_collapse_alloc 9216 thp_collapse_alloc_failed 2031 thp_split 2146 Any ideas what it could be? -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 8:52 kswapd0: wxcessive CPU usage Jiri Slaby @ 2012-10-11 13:44 ` Valdis.Kletnieks 2012-10-11 15:34 ` Jiri Slaby 2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton 1 sibling, 1 reply; 52+ messages in thread From: Valdis.Kletnieks @ 2012-10-11 13:44 UTC (permalink / raw) To: Jiri Slaby; +Cc: linux-mm, LKML, Jiri Slaby [-- Attachment #1: Type: text/plain, Size: 1675 bytes --] On Thu, 11 Oct 2012 10:52:28 +0200, Jiri Slaby said: > Hi, > > with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 > minute or so. > [<ffffffff8116ee05>] ? put_super+0x25/0x40 > [<ffffffff8116fdd4>] ? grab_super_passive+0x24/0xa0 > [<ffffffff8116ff99>] ? prune_super+0x149/0x1b0 > [<ffffffff81131531>] ? shrink_slab+0xa1/0x2d0 > [<ffffffff8113452d>] ? kswapd+0x66d/0xb60 > [<ffffffff81133ec0>] ? try_to_free_pages+0x180/0x180 > [<ffffffff810a2770>] ? kthread+0xc0/0xd0 > [<ffffffff810a26b0>] ? kthread_create_on_node+0x130/0x130 > [<ffffffff816a6c9c>] ? ret_from_fork+0x7c/0x90 > [<ffffffff810a26b0>] ? kthread_create_on_node+0x130/0x130 I don't know what it is, I haven't finished bisecting it - but I can confirm that I started seeing the same problem 2 or 3 weeks ago. Note that said call trace does *NOT* require a suspend - I don't do suspend on my laptop and I'm seeing kswapd burn CPU with similar traces. # cat /proc/31/stack [<ffffffff81110306>] grab_super_passive+0x44/0x76 [<ffffffff81110372>] prune_super+0x3a/0x13c [<ffffffff810dc52a>] shrink_slab+0x95/0x301 [<ffffffff810defb7>] kswapd+0x5c8/0x902 [<ffffffff8104eea4>] kthread+0x9d/0xa5 [<ffffffff815ccfac>] ret_from_fork+0x7c/0x90 [<ffffffffffffffff>] 0xffffffffffffffff # cat /proc/31/stack [<ffffffff8110f5af>] put_super+0x29/0x2d [<ffffffff8110f637>] drop_super+0x1b/0x20 [<ffffffff81110462>] prune_super+0x12a/0x13c [<ffffffff810dc52a>] shrink_slab+0x95/0x301 [<ffffffff810defb7>] kswapd+0x5c8/0x902 [<ffffffff8104eea4>] kthread+0x9d/0xa5 [<ffffffff815ccfac>] ret_from_fork+0x7c/0x90 [<ffffffffffffffff>] 0xffffffffffffffff So at least we know we're not hallucinating. :) [-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 13:44 ` Valdis.Kletnieks @ 2012-10-11 15:34 ` Jiri Slaby 2012-10-11 17:56 ` Valdis.Kletnieks 0 siblings, 1 reply; 52+ messages in thread From: Jiri Slaby @ 2012-10-11 15:34 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: linux-mm, LKML, Jiri Slaby On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > So at least we know we're not hallucinating. :) Just a thought? Do you have raid? -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 15:34 ` Jiri Slaby @ 2012-10-11 17:56 ` Valdis.Kletnieks 2012-10-11 17:59 ` Jiri Slaby 0 siblings, 1 reply; 52+ messages in thread From: Valdis.Kletnieks @ 2012-10-11 17:56 UTC (permalink / raw) To: Jiri Slaby; +Cc: linux-mm, LKML, Jiri Slaby [-- Attachment #1: Type: text/plain, Size: 315 bytes --] On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: > On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > > So at least we know we're not hallucinating. :) > > Just a thought? Do you have raid? Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 on LVM on a cryptoLUKS partition on /dev/sda2. [-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 17:56 ` Valdis.Kletnieks @ 2012-10-11 17:59 ` Jiri Slaby 2012-10-11 18:19 ` Valdis.Kletnieks 0 siblings, 1 reply; 52+ messages in thread From: Jiri Slaby @ 2012-10-11 17:59 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Jiri Slaby, linux-mm, LKML On 10/11/2012 07:56 PM, Valdis.Kletnieks@vt.edu wrote: > On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: >> On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: >>> So at least we know we're not hallucinating. :) >> >> Just a thought? Do you have raid? > > Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 > on LVM on a cryptoLUKS partition on /dev/sda2. Ok, it's maybe compaction. Do you have CONFIG_COMPACTION=y? -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 17:59 ` Jiri Slaby @ 2012-10-11 18:19 ` Valdis.Kletnieks 2012-10-11 22:08 ` kswapd0: excessive " Jiri Slaby 0 siblings, 1 reply; 52+ messages in thread From: Valdis.Kletnieks @ 2012-10-11 18:19 UTC (permalink / raw) To: Jiri Slaby; +Cc: Jiri Slaby, linux-mm, LKML [-- Attachment #1: Type: text/plain, Size: 608 bytes --] On Thu, 11 Oct 2012 19:59:33 +0200, Jiri Slaby said: > On 10/11/2012 07:56 PM, Valdis.Kletnieks@vt.edu wrote: > > On Thu, 11 Oct 2012 17:34:24 +0200, Jiri Slaby said: > >> On 10/11/2012 03:44 PM, Valdis.Kletnieks@vt.edu wrote: > >>> So at least we know we're not hallucinating. :) > >> > >> Just a thought? Do you have raid? > > > > Nope, just a 160G laptop spinning hard drive. Filesystems are ext4 > > on LVM on a cryptoLUKS partition on /dev/sda2. > > Ok, it's maybe compaction. Do you have CONFIG_COMPACTION=y? # zgrep COMPAC /proc/config.gz CONFIG_COMPACTION=y Hope that tells you something useful. [-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-11 18:19 ` Valdis.Kletnieks @ 2012-10-11 22:08 ` Jiri Slaby 2012-10-12 12:37 ` Jiri Slaby 0 siblings, 1 reply; 52+ messages in thread From: Jiri Slaby @ 2012-10-11 22:08 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Jiri Slaby, linux-mm, LKML On 10/11/2012 08:19 PM, Valdis.Kletnieks@vt.edu wrote: > # zgrep COMPAC /proc/config.gz > CONFIG_COMPACTION=y > > Hope that tells you something useful. It just supports my another theory. This seems to fix it for me: --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1830,8 +1830,8 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec, */ pages_for_compaction = (2UL << sc->order); - pages_for_compaction = scale_for_compaction(pages_for_compaction, - lruvec, sc); +/* pages_for_compaction = scale_for_compaction(pages_for_compaction, + lruvec, sc);*/ inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); if (nr_swap_pages > 0) inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON); And for you? (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) regards, -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-11 22:08 ` kswapd0: excessive " Jiri Slaby @ 2012-10-12 12:37 ` Jiri Slaby 2012-10-12 13:57 ` Mel Gorman 0 siblings, 1 reply; 52+ messages in thread From: Jiri Slaby @ 2012-10-12 12:37 UTC (permalink / raw) To: Valdis.Kletnieks; +Cc: Jiri Slaby, linux-mm, LKML, Mel Gorman, Andrew Morton On 10/12/2012 12:08 AM, Jiri Slaby wrote: > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. Mel, you wrote me it's unlikely the patch, but not impossible in the end. Can you take a look, please? If you need some trace-cmd output or anything, just let us know. This is x86_64, 6G of RAM, no swap. FWIW EXT4, SLUB, COMPACTION all enabled/used. thanks, -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-12 12:37 ` Jiri Slaby @ 2012-10-12 13:57 ` Mel Gorman 2012-10-15 9:54 ` Jiri Slaby 0 siblings, 1 reply; 52+ messages in thread From: Mel Gorman @ 2012-10-12 13:57 UTC (permalink / raw) To: Jiri Slaby; +Cc: Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On Fri, Oct 12, 2012 at 02:37:58PM +0200, Jiri Slaby wrote: > On 10/12/2012 12:08 AM, Jiri Slaby wrote: > > (It's an effective revert of "mm: vmscan: scale number of pages > > reclaimed by reclaim/compaction based on failures".) > > Given kswapd had hours of runtime in ps/top output yesterday in the > morning and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > Mel, you wrote me it's unlikely the patch, but not impossible in the > end. Can you take a look, please? If you need some trace-cmd output or > anything, just let us know. > > This is x86_64, 6G of RAM, no swap. FWIW EXT4, SLUB, COMPACTION all > enabled/used. > Can you monitor the behaviour of this patch please? Please keep a particular eye on kswapd activity and the amount of free memory. If free memory is spiking it might indicate that kswapd is still too aggressive with the loss of the __GFP_NO_KSWAPD flag. One way to tell is to record /proc/vmstat over time and see what the pgsteal_* figures look like. If they are climbing aggressively during what should be normal usage then it might show that kswapd is still too aggressive when asked to reclaim for THP. Thanks very much. ---8<--- mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. As it is not taking deferred compaction into account in this path it scans aggressively before falling out and making the compaction_deferred check in compaction_ready. This patch avoids kswapd scaling pages for reclaim and leaves the aggressive reclaim to the process attempting the THP allocation. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2624edc..2b7edfa 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) #ifdef CONFIG_COMPACTION /* * If compaction is deferred for sc->order then scale the number of pages - * reclaimed based on the number of consecutive allocation failures + * reclaimed based on the number of consecutive allocation failures. This + * scaling only happens for direct reclaim as it is about to attempt + * compaction. If compaction fails, future allocations will be deferred + * and reclaim avoided. On the other hand, kswapd does not take compaction + * deferral into account so if it scaled, it could scan excessively even + * though allocations are temporarily not being attempted. */ static unsigned long scale_for_compaction(unsigned long pages_for_compaction, struct lruvec *lruvec, struct scan_control *sc) { struct zone *zone = lruvec_zone(lruvec); - if (zone->compact_order_failed <= sc->order) + if (zone->compact_order_failed <= sc->order && + !current_is_kswapd()) pages_for_compaction <<= zone->compact_defer_shift; return pages_for_compaction; } ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-12 13:57 ` Mel Gorman @ 2012-10-15 9:54 ` Jiri Slaby 2012-10-15 11:09 ` Mel Gorman 0 siblings, 1 reply; 52+ messages in thread From: Jiri Slaby @ 2012-10-15 9:54 UTC (permalink / raw) To: Mel Gorman; +Cc: Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On 10/12/2012 03:57 PM, Mel Gorman wrote: > mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) > Given kswapd had hours of runtime in ps/top output yesterday in the > morning and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > The intention of the patch in question was to compensate for the loss of > lumpy reclaim. Part of the reason lumpy reclaim worked is because it > aggressively reclaimed pages and this patch was meant to be a > sane compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > and continues reclaiming which was not taken into account when the patch > was developed. > > As it is not taking deferred compaction into account in this path it scans > aggressively before falling out and making the compaction_deferred check in > compaction_ready. This patch avoids kswapd scaling pages for reclaim and > leaves the aggressive reclaim to the process attempting the THP > allocation. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/vmscan.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2624edc..2b7edfa 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > #ifdef CONFIG_COMPACTION > /* > * If compaction is deferred for sc->order then scale the number of pages > - * reclaimed based on the number of consecutive allocation failures > + * reclaimed based on the number of consecutive allocation failures. This > + * scaling only happens for direct reclaim as it is about to attempt > + * compaction. If compaction fails, future allocations will be deferred > + * and reclaim avoided. On the other hand, kswapd does not take compaction > + * deferral into account so if it scaled, it could scan excessively even > + * though allocations are temporarily not being attempted. > */ > static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > struct lruvec *lruvec, struct scan_control *sc) > { > struct zone *zone = lruvec_zone(lruvec); > > - if (zone->compact_order_failed <= sc->order) > + if (zone->compact_order_failed <= sc->order && > + !current_is_kswapd()) > pages_for_compaction <<= zone->compact_defer_shift; > return pages_for_compaction; > } Yes, applying this instead of the revert fixes the issue as well. thanks, -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-15 9:54 ` Jiri Slaby @ 2012-10-15 11:09 ` Mel Gorman 2012-10-29 10:52 ` Thorsten Leemhuis 2012-11-02 10:44 ` Zdenek Kabelac 0 siblings, 2 replies; 52+ messages in thread From: Mel Gorman @ 2012-10-15 11:09 UTC (permalink / raw) To: Jiri Slaby; +Cc: Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > On 10/12/2012 03:57 PM, Mel Gorman wrote: > > mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > > > > Jiri Slaby reported the following: > > > > (It's an effective revert of "mm: vmscan: scale number of pages > > reclaimed by reclaim/compaction based on failures".) > > Given kswapd had hours of runtime in ps/top output yesterday in the > > morning and after the revert it's now 2 minutes in sum for the last 24h, > > I would say, it's gone. > > > > The intention of the patch in question was to compensate for the loss of > > lumpy reclaim. Part of the reason lumpy reclaim worked is because it > > aggressively reclaimed pages and this patch was meant to be a > > sane compromise. > > > > When compaction fails, it gets deferred and both compaction and > > reclaim/compaction is deferred avoid excessive reclaim. However, since > > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > > and continues reclaiming which was not taken into account when the patch > > was developed. > > > > As it is not taking deferred compaction into account in this path it scans > > aggressively before falling out and making the compaction_deferred check in > > compaction_ready. This patch avoids kswapd scaling pages for reclaim and > > leaves the aggressive reclaim to the process attempting the THP > > allocation. > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > --- > > mm/vmscan.c | 10 ++++++++-- > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 2624edc..2b7edfa 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > > #ifdef CONFIG_COMPACTION > > /* > > * If compaction is deferred for sc->order then scale the number of pages > > - * reclaimed based on the number of consecutive allocation failures > > + * reclaimed based on the number of consecutive allocation failures. This > > + * scaling only happens for direct reclaim as it is about to attempt > > + * compaction. If compaction fails, future allocations will be deferred > > + * and reclaim avoided. On the other hand, kswapd does not take compaction > > + * deferral into account so if it scaled, it could scan excessively even > > + * though allocations are temporarily not being attempted. > > */ > > static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > > struct lruvec *lruvec, struct scan_control *sc) > > { > > struct zone *zone = lruvec_zone(lruvec); > > > > - if (zone->compact_order_failed <= sc->order) > > + if (zone->compact_order_failed <= sc->order && > > + !current_is_kswapd()) > > pages_for_compaction <<= zone->compact_defer_shift; > > return pages_for_compaction; > > } > > Yes, applying this instead of the revert fixes the issue as well. > Thanks Jiri. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-15 11:09 ` Mel Gorman @ 2012-10-29 10:52 ` Thorsten Leemhuis 2012-10-30 19:18 ` Mel Gorman 2012-11-02 10:44 ` Zdenek Kabelac 1 sibling, 1 reply; 52+ messages in thread From: Thorsten Leemhuis @ 2012-10-29 10:52 UTC (permalink / raw) To: Mel Gorman Cc: Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton Hi! On 15.10.2012 13:09, Mel Gorman wrote: > On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>> Jiri Slaby reported the following: > [...] >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 2624edc..2b7edfa 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) >>> #ifdef CONFIG_COMPACTION >>> /* >>> * If compaction is deferred for sc->order then scale the number of pages >>> - * reclaimed based on the number of consecutive allocation failures >>> + * reclaimed based on the number of consecutive allocation failures. This >>> + * scaling only happens for direct reclaim as it is about to attempt >>> + * compaction. If compaction fails, future allocations will be deferred >>> + * and reclaim avoided. On the other hand, kswapd does not take compaction >>> + * deferral into account so if it scaled, it could scan excessively even >>> + * though allocations are temporarily not being attempted. >>> */ >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, >>> struct lruvec *lruvec, struct scan_control *sc) >>> { >>> struct zone *zone = lruvec_zone(lruvec); >>> >>> - if (zone->compact_order_failed <= sc->order) >>> + if (zone->compact_order_failed <= sc->order && >>> + !current_is_kswapd()) >>> pages_for_compaction <<= zone->compact_defer_shift; >>> return pages_for_compaction; >>> } >> Yes, applying this instead of the revert fixes the issue as well. Just wondering, is there a reason why this patch wasn't applied to mainline? Did it simply fall through the cracks? Or am I missing something? I'm asking because I think I stil see the issue on 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are hitting it, too: https://bugzilla.redhat.com/show_bug.cgi?id=866988 Or are we seeing something different which just looks similar? I can test the patch if it needs further testing, but from the discussion I got the impression that everything is clear and the patch ready for merging. CU knurd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-29 10:52 ` Thorsten Leemhuis @ 2012-10-30 19:18 ` Mel Gorman 2012-10-31 11:25 ` Thorsten Leemhuis 2012-11-04 16:36 ` Rik van Riel 0 siblings, 2 replies; 52+ messages in thread From: Mel Gorman @ 2012-10-30 19:18 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: > Hi! > > On 15.10.2012 13:09, Mel Gorman wrote: > >On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > >>On 10/12/2012 03:57 PM, Mel Gorman wrote: > >>>mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > >>>Jiri Slaby reported the following: > > [...] > >>>diff --git a/mm/vmscan.c b/mm/vmscan.c > >>>index 2624edc..2b7edfa 100644 > >>>--- a/mm/vmscan.c > >>>+++ b/mm/vmscan.c > >>>@@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) > >>> #ifdef CONFIG_COMPACTION > >>> /* > >>> * If compaction is deferred for sc->order then scale the number of pages > >>>- * reclaimed based on the number of consecutive allocation failures > >>>+ * reclaimed based on the number of consecutive allocation failures. This > >>>+ * scaling only happens for direct reclaim as it is about to attempt > >>>+ * compaction. If compaction fails, future allocations will be deferred > >>>+ * and reclaim avoided. On the other hand, kswapd does not take compaction > >>>+ * deferral into account so if it scaled, it could scan excessively even > >>>+ * though allocations are temporarily not being attempted. > >>> */ > >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, > >>> struct lruvec *lruvec, struct scan_control *sc) > >>> { > >>> struct zone *zone = lruvec_zone(lruvec); > >>> > >>>- if (zone->compact_order_failed <= sc->order) > >>>+ if (zone->compact_order_failed <= sc->order && > >>>+ !current_is_kswapd()) > >>> pages_for_compaction <<= zone->compact_defer_shift; > >>> return pages_for_compaction; > >>> } > >>Yes, applying this instead of the revert fixes the issue as well. > > Just wondering, is there a reason why this patch wasn't applied to > mainline? Did it simply fall through the cracks? Or am I missing > something? > It's because a problem was reported related to the patch (off-list, whoops). I'm waiting to hear if a second patch fixes the problem or not. > I'm asking because I think I stil see the issue on > 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are > hitting it, too: > https://bugzilla.redhat.com/show_bug.cgi?id=866988 > I like the steps to reproduce. Is step 3 profit? > Or are we seeing something different which just looks similar? I can > test the patch if it needs further testing, but from the discussion > I got the impression that everything is clear and the patch ready > for merging. It could be the same issue. Can you test with the "mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim" patch and the following on top please? Thanks. ---8<--- mm: page_alloc: Do not wake kswapd if the request is for THP but deferred Since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd gets woken for every THP request in the slow path. If compaction has been deferred the waker will not compact or enter direct reclaim on its own behalf but kswapd is still woken to reclaim free pages that no one may consume. If compaction was deferred because pages and slab was not reclaimable then kswapd is just consuming cycles for no gain. This patch avoids waking kswapd if the compaction has been deferred. It'll still wake when compaction is running to reduce the latency of THP allocations. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/page_alloc.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..e72674c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); } +/* Returns true if the allocation is likely for THP */ +static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order) +{ + if (order == pageblock_order && + (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + return true; + return false; +} + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2416,7 +2425,15 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, + /* + * kswapd is woken except when this is a THP request and compaction + * is deferred. If we are backing off reclaim/compaction then kswapd + * should not be awake aggressively reclaiming with no consumers of + * the freed pages + */ + if (!(is_thp_alloc(gfp_mask, order) && + compaction_deferred(preferred_zone, order))) + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(preferred_zone)); /* @@ -2494,7 +2511,7 @@ rebalance: * system then fail the allocation instead of entering direct reclaim. */ if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + is_thp_alloc(gfp_mask, order)) goto nopage; /* Try direct reclaim and then allocating */ ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-30 19:18 ` Mel Gorman @ 2012-10-31 11:25 ` Thorsten Leemhuis 2012-10-31 15:04 ` Mel Gorman 2012-11-04 16:36 ` Rik van Riel 1 sibling, 1 reply; 52+ messages in thread From: Thorsten Leemhuis @ 2012-10-31 11:25 UTC (permalink / raw) To: Mel Gorman Cc: Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On 30.10.2012 20:18, Mel Gorman wrote: > On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: >> On 15.10.2012 13:09, Mel Gorman wrote: >>> On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >>>> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>>>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>>>> Jiri Slaby reported the following: > [...] >>>> Yes, applying this instead of the revert fixes the issue as well. >> Just wondering, is there a reason why this patch wasn't applied to >> mainline? Did it simply fall through the cracks? Or am I missing >> something? > It's because a problem was reported related to the patch (off-list, > whoops). I'm waiting to hear if a second patch fixes the problem or not. Anything in particular I should look out for while testing? >> I'm asking because I think I stil see the issue on >> 3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are >> hitting it, too: >> https://bugzilla.redhat.com/show_bug.cgi?id=866988 > I like the steps to reproduce. One of those cases where the bugzilla bug template was not very helpful or where it was not used as intended (you decide) :-) > Is step 3 profit? Yes, but psst, don't tell anyone; step 4 (world domination! for real!) is also hidden to keep that part of the big plan a secret for now ;-) >> Or are we seeing something different which just looks similar? I can >> test the patch if it needs further testing, but from the discussion >> I got the impression that everything is clear and the patch ready >> for merging. > It could be the same issue. Can you test with the "mm: vmscan: scale > number of pages reclaimed by reclaim/compaction only in direct reclaim" > patch and the following on top please? Built a vanilla mainline kernel with those two patches and installed it on the machine where I was seeing problems high kswapd0 load on 3.7-rc3. Ran it an hour yesterday and a few hours today; seems the patches fix the issue for me as kswapd behaves: $ LC_ALL=C ps -aux | grep 'kswapd' root 62 0.0 0.0 0 0 ? S Oct30 0:05 [kswapd0] So everything is looking fine again so far thx to the two patches -- hopefully it stays that way even after hitting "send" in my mailer in a few seconds. CU knurd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-31 11:25 ` Thorsten Leemhuis @ 2012-10-31 15:04 ` Mel Gorman 0 siblings, 0 replies; 52+ messages in thread From: Mel Gorman @ 2012-10-31 15:04 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On Wed, Oct 31, 2012 at 12:25:13PM +0100, Thorsten Leemhuis wrote: > On 30.10.2012 20:18, Mel Gorman wrote: > >On Mon, Oct 29, 2012 at 11:52:03AM +0100, Thorsten Leemhuis wrote: > >>On 15.10.2012 13:09, Mel Gorman wrote: > >>>On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: > >>>>On 10/12/2012 03:57 PM, Mel Gorman wrote: > >>>>>mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim > >>>>>Jiri Slaby reported the following: > >[...] > >>>>Yes, applying this instead of the revert fixes the issue as well. > >>Just wondering, is there a reason why this patch wasn't applied to > >>mainline? Did it simply fall through the cracks? Or am I missing > >>something? > >It's because a problem was reported related to the patch (off-list, > >whoops). I'm waiting to hear if a second patch fixes the problem or not. > > Anything in particular I should look out for while testing? > Excessive reclaim, high CPU usage by kswapd, processes getting stick in isolate_migratepages or isolate_freepages. > >>I'm asking because I think I stil see the issue on > >>3.7-rc2-git-checkout-from-friday. Seems Fedora rawhide users are > >>hitting it, too: > >>https://bugzilla.redhat.com/show_bug.cgi?id=866988 > >I like the steps to reproduce. > > One of those cases where the bugzilla bug template was not very > helpful or where it was not used as intended (you decide) :-) > It wins at entertainment value if nothing else :) > >Is step 3 profit? > > Yes, but psst, don't tell anyone; step 4 (world domination! for > real!) is also hidden to keep that part of the big plan a secret for > now ;-) > No doubt it's the default private comment #1 ! > >>Or are we seeing something different which just looks similar? I can > >>test the patch if it needs further testing, but from the discussion > >>I got the impression that everything is clear and the patch ready > >>for merging. > >It could be the same issue. Can you test with the "mm: vmscan: scale > >number of pages reclaimed by reclaim/compaction only in direct reclaim" > >patch and the following on top please? > > Built a vanilla mainline kernel with those two patches and installed > it on the machine where I was seeing problems high kswapd0 load on > 3.7-rc3. Ran it an hour yesterday and a few hours today; seems the > patches fix the issue for me as kswapd behaves: > > $ LC_ALL=C ps -aux | grep 'kswapd' > root 62 0.0 0.0 0 0 ? S Oct30 0:05 [kswapd0] > > So everything is looking fine again so far thx to the two patches > -- hopefully it stays that way even after hitting "send" in my > mailer in a few seconds. > Ok, great. Keep an eye on it please. If Jiri Slaby reports similar success then I'll collapse the two patches together and resend to Andrew. Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-30 19:18 ` Mel Gorman 2012-10-31 11:25 ` Thorsten Leemhuis @ 2012-11-04 16:36 ` Rik van Riel 1 sibling, 0 replies; 52+ messages in thread From: Rik van Riel @ 2012-11-04 16:36 UTC (permalink / raw) To: Mel Gorman Cc: Thorsten Leemhuis, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On 10/30/2012 03:18 PM, Mel Gorman wrote: > restart: > - wake_all_kswapd(order, zonelist, high_zoneidx, > + /* > + * kswapd is woken except when this is a THP request and compaction > + * is deferred. If we are backing off reclaim/compaction then kswapd > + * should not be awake aggressively reclaiming with no consumers of > + * the freed pages > + */ > + if (!(is_thp_alloc(gfp_mask, order) && > + compaction_deferred(preferred_zone, order))) > + wake_all_kswapd(order, zonelist, high_zoneidx, > zone_idx(preferred_zone)); What is special about thp allocations here? Surely other large allocations that keep failing should get the same treatment, of not waking up kswapd if compaction is deferred? ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-10-15 11:09 ` Mel Gorman 2012-10-29 10:52 ` Thorsten Leemhuis @ 2012-11-02 10:44 ` Zdenek Kabelac 2012-11-02 10:53 ` Jiri Slaby 1 sibling, 1 reply; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-02 10:44 UTC (permalink / raw) To: Mel Gorman Cc: Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton Dne 15.10.2012 13:09, Mel Gorman napsal(a): > On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote: >> On 10/12/2012 03:57 PM, Mel Gorman wrote: >>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim >>> >>> Jiri Slaby reported the following: >>> >>> (It's an effective revert of "mm: vmscan: scale number of pages >>> reclaimed by reclaim/compaction based on failures".) >>> Given kswapd had hours of runtime in ps/top output yesterday in the >>> morning and after the revert it's now 2 minutes in sum for the last 24h, >>> I would say, it's gone. >>> >>> The intention of the patch in question was to compensate for the loss of >>> lumpy reclaim. Part of the reason lumpy reclaim worked is because it >>> aggressively reclaimed pages and this patch was meant to be a >>> sane compromise. >>> >>> When compaction fails, it gets deferred and both compaction and >>> reclaim/compaction is deferred avoid excessive reclaim. However, since >>> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time >>> and continues reclaiming which was not taken into account when the patch >>> was developed. >>> >>> As it is not taking deferred compaction into account in this path it scans >>> aggressively before falling out and making the compaction_deferred check in >>> compaction_ready. This patch avoids kswapd scaling pages for reclaim and >>> leaves the aggressive reclaim to the process attempting the THP >>> allocation. >>> >>> Signed-off-by: Mel Gorman <mgorman@suse.de> >>> --- >>> mm/vmscan.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 2624edc..2b7edfa 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc) >>> #ifdef CONFIG_COMPACTION >>> /* >>> * If compaction is deferred for sc->order then scale the number of pages >>> - * reclaimed based on the number of consecutive allocation failures >>> + * reclaimed based on the number of consecutive allocation failures. This >>> + * scaling only happens for direct reclaim as it is about to attempt >>> + * compaction. If compaction fails, future allocations will be deferred >>> + * and reclaim avoided. On the other hand, kswapd does not take compaction >>> + * deferral into account so if it scaled, it could scan excessively even >>> + * though allocations are temporarily not being attempted. >>> */ >>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction, >>> struct lruvec *lruvec, struct scan_control *sc) >>> { >>> struct zone *zone = lruvec_zone(lruvec); >>> >>> - if (zone->compact_order_failed <= sc->order) >>> + if (zone->compact_order_failed <= sc->order && >>> + !current_is_kswapd()) >>> pages_for_compaction <<= zone->compact_defer_shift; >>> return pages_for_compaction; >>> } >> >> Yes, applying this instead of the revert fixes the issue as well. >> > I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive CPU usage - mainly after suspend/resume Here is just simple kswapd backtrace from running kernel: kswapd0 R running task 0 30 2 0x00000000 ffff8801331ddae8 0000000000000082 ffff880135b8a340 0000000000000008 ffff880135b8a340 ffff8801331ddfd8 ffff8801331ddfd8 ffff8801331ddfd8 ffff880071db8000 ffff880135b8a340 0000000000000286 ffff8801331dc000 Call Trace: [<ffffffff81555cd2>] preempt_schedule+0x42/0x60 [<ffffffff81557b75>] _raw_spin_unlock+0x55/0x60 [<ffffffff811929d1>] put_super+0x31/0x40 [<ffffffff81192aa2>] drop_super+0x22/0x30 [<ffffffff81193be9>] prune_super+0x149/0x1b0 [<ffffffff81141e2a>] shrink_slab+0xba/0x510 [<ffffffff81185baa>] ? mem_cgroup_iter+0x17a/0x2e0 [<ffffffff81185afa>] ? mem_cgroup_iter+0xca/0x2e0 [<ffffffff811450f9>] balance_pgdat+0x629/0x7f0 [<ffffffff81145434>] kswapd+0x174/0x620 [<ffffffff8106fd20>] ? __init_waitqueue_head+0x60/0x60 [<ffffffff811452c0>] ? balance_pgdat+0x7f0/0x7f0 [<ffffffff8106f50b>] kthread+0xdb/0xe0 [<ffffffff8106f430>] ? kthread_create_on_node+0x140/0x140 [<ffffffff8155fb1c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106f430>] ? kthread_create_on_node+0x140/0x140 Zdenek ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-02 10:44 ` Zdenek Kabelac @ 2012-11-02 10:53 ` Jiri Slaby 2012-11-02 19:45 ` Jiri Slaby 0 siblings, 1 reply; 52+ messages in thread From: Jiri Slaby @ 2012-11-02 10:53 UTC (permalink / raw) To: Zdenek Kabelac Cc: Mel Gorman, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>> Yes, applying this instead of the revert fixes the issue as well. > > I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive > CPU usage - mainly after suspend/resume > > Here is just simple kswapd backtrace from running kernel: Yup, this is what we were seeing with the former patch only too. Try to apply the other one too: https://patchwork.kernel.org/patch/1673231/ For me I would say, it is fixed by the two patches now. I won't be able to report later, since I'm leaving to a conference tomorrow. > kswapd0 R running task 0 30 2 0x00000000 ... > [<ffffffff81141e2a>] shrink_slab+0xba/0x510 thanks, -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-02 10:53 ` Jiri Slaby @ 2012-11-02 19:45 ` Jiri Slaby 2012-11-04 11:26 ` Zdenek Kabelac ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Jiri Slaby @ 2012-11-02 19:45 UTC (permalink / raw) To: Mel Gorman Cc: Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton On 11/02/2012 11:53 AM, Jiri Slaby wrote: > On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>> Yes, applying this instead of the revert fixes the issue as well. >> >> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >> CPU usage - mainly after suspend/resume >> >> Here is just simple kswapd backtrace from running kernel: > > Yup, this is what we were seeing with the former patch only too. Try to > apply the other one too: > https://patchwork.kernel.org/patch/1673231/ > > For me I would say, it is fixed by the two patches now. I won't be able > to report later, since I'm leaving to a conference tomorrow. Damn it. It recurred right now, with both patches applied. After I started a java program which consumed some more memory. Though there are still 2 gigs free, kswap is spinning: [<ffffffff810b00da>] __cond_resched+0x2a/0x40 [<ffffffff811318a0>] shrink_slab+0x1c0/0x2d0 [<ffffffff8113478d>] kswapd+0x66d/0xb60 [<ffffffff810a25d0>] kthread+0xc0/0xd0 [<ffffffff816aa29c>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-02 19:45 ` Jiri Slaby @ 2012-11-04 11:26 ` Zdenek Kabelac 2012-11-05 14:24 ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman 2012-11-09 4:22 ` kswapd0: excessive CPU usage Seth Jennings 2 siblings, 0 replies; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-04 11:26 UTC (permalink / raw) To: Jiri Slaby Cc: Mel Gorman, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton Dne 2.11.2012 20:45, Jiri Slaby napsal(a): > On 11/02/2012 11:53 AM, Jiri Slaby wrote: >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>> Yes, applying this instead of the revert fixes the issue as well. >>> >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>> CPU usage - mainly after suspend/resume >>> >>> Here is just simple kswapd backtrace from running kernel: >> >> Yup, this is what we were seeing with the former patch only too. Try to >> apply the other one too: >> https://patchwork.kernel.org/patch/1673231/ >> >> For me I would say, it is fixed by the two patches now. I won't be able >> to report later, since I'm leaving to a conference tomorrow. > > Damn it. It recurred right now, with both patches applied. After I > started a java program which consumed some more memory. Though there are > still 2 gigs free, kswap is spinning: > [<ffffffff810b00da>] __cond_resched+0x2a/0x40 > [<ffffffff811318a0>] shrink_slab+0x1c0/0x2d0 > [<ffffffff8113478d>] kswapd+0x66d/0xb60 > [<ffffffff810a25d0>] kthread+0xc0/0xd0 > [<ffffffff816aa29c>] ret_from_fork+0x7c/0xb0 > [<ffffffffffffffff>] 0xffffffffffffffff > Yep - wanted to report myself again and noticed your replay. Yes - I've now also both patches installed - and I still observe kswapd eating my CPU. It seems (at least for me) that prior suspend and resume is way to trigger it more frequently. However there is a change in behaviour - while before kswapd was running almost indefinitely now the> CPU spikes are in the range of minutes. (i.e. uptime ~2days - kswapd has over 32minutes CPU time) My machine has 4GB, and no swap (disabled) firefox (22mins), thunderbird(3mins) and pidgin(0.5min) are the 3 most memory and CPU hungry apps for this moment. Zdenek ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" 2012-11-02 19:45 ` Jiri Slaby 2012-11-04 11:26 ` Zdenek Kabelac @ 2012-11-05 14:24 ` Mel Gorman 2012-11-06 10:15 ` Johannes Hirte 2012-11-09 9:12 ` Mel Gorman 2012-11-09 4:22 ` kswapd0: excessive CPU usage Seth Jennings 2 siblings, 2 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-05 14:24 UTC (permalink / raw) To: Andrew Morton Cc: Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, Rik van Riel, Jiri Slaby, LKML Jiri Slaby reported the following: (It's an effective revert of "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures".) Given kswapd had hours of runtime in ps/top output yesterday in the morning and after the revert it's now 2 minutes in sum for the last 24h, I would say, it's gone. The intention of the patch in question was to compensate for the loss of lumpy reclaim. Part of the reason lumpy reclaim worked is because it aggressively reclaimed pages and this patch was meant to be a sane compromise. When compaction fails, it gets deferred and both compaction and reclaim/compaction is deferred avoid excessive reclaim. However, since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time and continues reclaiming which was not taken into account when the patch was developed. Attempts to address the problem ended up just changing the shape of the problem instead of fixing it. The release window gets closer and while a THP allocation failing is not a major problem, kswapd chewing up a lot of CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" and will be revisited in the future. Signed-off-by: Mel Gorman <mgorman@suse.de> --- mm/vmscan.c | 25 ------------------------- 1 file changed, 25 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2624edc..e081ee8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct scan_control *sc) return false; } -#ifdef CONFIG_COMPACTION -/* - * If compaction is deferred for sc->order then scale the number of pages - * reclaimed based on the number of consecutive allocation failures - */ -static unsigned long scale_for_compaction(unsigned long pages_for_compaction, - struct lruvec *lruvec, struct scan_control *sc) -{ - struct zone *zone = lruvec_zone(lruvec); - - if (zone->compact_order_failed <= sc->order) - pages_for_compaction <<= zone->compact_defer_shift; - return pages_for_compaction; -} -#else -static unsigned long scale_for_compaction(unsigned long pages_for_compaction, - struct lruvec *lruvec, struct scan_control *sc) -{ - return pages_for_compaction; -} -#endif - /* * Reclaim/compaction is used for high-order allocation requests. It reclaims * order-0 pages before compacting the zone. should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec, * inactive lists are large enough, continue reclaiming */ pages_for_compaction = (2UL << sc->order); - - pages_for_compaction = scale_for_compaction(pages_for_compaction, - lruvec, sc); inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); if (nr_swap_pages > 0) inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON); ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" 2012-11-05 14:24 ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman @ 2012-11-06 10:15 ` Johannes Hirte 2012-11-09 8:36 ` Mel Gorman 2012-11-09 9:12 ` Mel Gorman 1 sibling, 1 reply; 52+ messages in thread From: Johannes Hirte @ 2012-11-06 10:15 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, Rik van Riel, Jiri Slaby, LKML Am Mon, 5 Nov 2012 14:24:49 +0000 schrieb Mel Gorman <mgorman@suse.de>: > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of > pages reclaimed by reclaim/compaction based on failures".) Given > kswapd had hours of runtime in ps/top output yesterday in the morning > and after the revert it's now 2 minutes in sum for the last > 24h, I would say, it's gone. > > The intention of the patch in question was to compensate for the loss > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > it aggressively reclaimed pages and this patch was meant to be a sane > compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each > time and continues reclaiming which was not taken into account when > the patch was developed. > > Attempts to address the problem ended up just changing the shape of > the problem instead of fixing it. The release window gets closer and > while a THP allocation failing is not a major problem, kswapd chewing > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of > pages reclaimed by reclaim/compaction based on failures" and will be > revisited in the future. > > Signed-off-by: Mel Gorman <mgorman@suse.de> > --- > mm/vmscan.c | 25 ------------------------- > 1 file changed, 25 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 2624edc..e081ee8 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > scan_control *sc) return false; > } > > -#ifdef CONFIG_COMPACTION > -/* > - * If compaction is deferred for sc->order then scale the number of > pages > - * reclaimed based on the number of consecutive allocation failures > - */ > -static unsigned long scale_for_compaction(unsigned long > pages_for_compaction, > - struct lruvec *lruvec, struct scan_control > *sc) -{ > - struct zone *zone = lruvec_zone(lruvec); > - > - if (zone->compact_order_failed <= sc->order) > - pages_for_compaction <<= zone->compact_defer_shift; > - return pages_for_compaction; > -} > -#else > -static unsigned long scale_for_compaction(unsigned long > pages_for_compaction, > - struct lruvec *lruvec, struct scan_control > *sc) -{ > - return pages_for_compaction; > -} > -#endif > - > /* > * Reclaim/compaction is used for high-order allocation requests. It > reclaims > * order-0 pages before compacting the zone. > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline > bool should_continue_reclaim(struct lruvec *lruvec, > * inactive lists are large enough, continue reclaiming > */ > pages_for_compaction = (2UL << sc->order); > - > - pages_for_compaction = > scale_for_compaction(pages_for_compaction, > - lruvec, sc); > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); > if (nr_swap_pages > 0) > inactive_lru_pages += get_lru_size(lruvec, > LRU_INACTIVE_ANON); -- Even with this patch I see kswapd0 very often on top. Much more than with kernel 3.6. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" 2012-11-06 10:15 ` Johannes Hirte @ 2012-11-09 8:36 ` Mel Gorman 2012-11-14 21:43 ` Johannes Hirte 0 siblings, 1 reply; 52+ messages in thread From: Mel Gorman @ 2012-11-09 8:36 UTC (permalink / raw) To: Johannes Hirte Cc: Andrew Morton, Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, Rik van Riel, Jiri Slaby, LKML On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote: > Am Mon, 5 Nov 2012 14:24:49 +0000 > schrieb Mel Gorman <mgorman@suse.de>: > > > Jiri Slaby reported the following: > > > > (It's an effective revert of "mm: vmscan: scale number of > > pages reclaimed by reclaim/compaction based on failures".) Given > > kswapd had hours of runtime in ps/top output yesterday in the morning > > and after the revert it's now 2 minutes in sum for the last > > 24h, I would say, it's gone. > > > > The intention of the patch in question was to compensate for the loss > > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > > it aggressively reclaimed pages and this patch was meant to be a sane > > compromise. > > > > When compaction fails, it gets deferred and both compaction and > > reclaim/compaction is deferred avoid excessive reclaim. However, since > > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each > > time and continues reclaiming which was not taken into account when > > the patch was developed. > > > > Attempts to address the problem ended up just changing the shape of > > the problem instead of fixing it. The release window gets closer and > > while a THP allocation failing is not a major problem, kswapd chewing > > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of > > pages reclaimed by reclaim/compaction based on failures" and will be > > revisited in the future. > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > --- > > mm/vmscan.c | 25 ------------------------- > > 1 file changed, 25 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 2624edc..e081ee8 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > > scan_control *sc) return false; > > } > > > > -#ifdef CONFIG_COMPACTION > > -/* > > - * If compaction is deferred for sc->order then scale the number of > > pages > > - * reclaimed based on the number of consecutive allocation failures > > - */ > > -static unsigned long scale_for_compaction(unsigned long > > pages_for_compaction, > > - struct lruvec *lruvec, struct scan_control > > *sc) -{ > > - struct zone *zone = lruvec_zone(lruvec); > > - > > - if (zone->compact_order_failed <= sc->order) > > - pages_for_compaction <<= zone->compact_defer_shift; > > - return pages_for_compaction; > > -} > > -#else > > -static unsigned long scale_for_compaction(unsigned long > > pages_for_compaction, > > - struct lruvec *lruvec, struct scan_control > > *sc) -{ > > - return pages_for_compaction; > > -} > > -#endif > > - > > /* > > * Reclaim/compaction is used for high-order allocation requests. It > > reclaims > > * order-0 pages before compacting the zone. > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline > > bool should_continue_reclaim(struct lruvec *lruvec, > > * inactive lists are large enough, continue reclaiming > > */ > > pages_for_compaction = (2UL << sc->order); > > - > > - pages_for_compaction = > > scale_for_compaction(pages_for_compaction, > > - lruvec, sc); > > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE); > > if (nr_swap_pages > 0) > > inactive_lru_pages += get_lru_size(lruvec, > > LRU_INACTIVE_ANON); -- > > Even with this patch I see kswapd0 very often on top. Much more than > with kernel 3.6. How severe is the CPU usage? The higher usage can be explained by "mm: remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to reduce the amount of time processes spend in compaction but will result in the CPU cost being incurred by kswapd. Is it really high like the bug was reporting with high usage over long periods of time or do you just see it using 2-6% of CPU for short periods? Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" 2012-11-09 8:36 ` Mel Gorman @ 2012-11-14 21:43 ` Johannes Hirte 0 siblings, 0 replies; 52+ messages in thread From: Johannes Hirte @ 2012-11-14 21:43 UTC (permalink / raw) To: Mel Gorman Cc: Andrew Morton, Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, Rik van Riel, Jiri Slaby, LKML Am Fri, 9 Nov 2012 08:36:37 +0000 schrieb Mel Gorman <mgorman@suse.de>: > On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote: > > Am Mon, 5 Nov 2012 14:24:49 +0000 > > schrieb Mel Gorman <mgorman@suse.de>: > > > > > Jiri Slaby reported the following: > > > > > > (It's an effective revert of "mm: vmscan: scale number of > > > pages reclaimed by reclaim/compaction based on failures".) Given > > > kswapd had hours of runtime in ps/top output yesterday in the > > > morning and after the revert it's now 2 minutes in sum for the > > > last 24h, I would say, it's gone. > > > > > > The intention of the patch in question was to compensate for the > > > loss of lumpy reclaim. Part of the reason lumpy reclaim worked is > > > because it aggressively reclaimed pages and this patch was meant > > > to be a sane compromise. > > > > > > When compaction fails, it gets deferred and both compaction and > > > reclaim/compaction is deferred avoid excessive reclaim. However, > > > since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is > > > woken up each time and continues reclaiming which was not taken > > > into account when the patch was developed. > > > > > > Attempts to address the problem ended up just changing the shape > > > of the problem instead of fixing it. The release window gets > > > closer and while a THP allocation failing is not a major problem, > > > kswapd chewing up a lot of CPU is. This patch reverts "mm: > > > vmscan: scale number of pages reclaimed by reclaim/compaction > > > based on failures" and will be revisited in the future. > > > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > > --- > > > mm/vmscan.c | 25 ------------------------- > > > 1 file changed, 25 deletions(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 2624edc..e081ee8 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct > > > scan_control *sc) return false; > > > } > > > > > > -#ifdef CONFIG_COMPACTION > > > -/* > > > - * If compaction is deferred for sc->order then scale the number > > > of pages > > > - * reclaimed based on the number of consecutive allocation > > > failures > > > - */ > > > -static unsigned long scale_for_compaction(unsigned long > > > pages_for_compaction, > > > - struct lruvec *lruvec, struct > > > scan_control *sc) -{ > > > - struct zone *zone = lruvec_zone(lruvec); > > > - > > > - if (zone->compact_order_failed <= sc->order) > > > - pages_for_compaction <<= > > > zone->compact_defer_shift; > > > - return pages_for_compaction; > > > -} > > > -#else > > > -static unsigned long scale_for_compaction(unsigned long > > > pages_for_compaction, > > > - struct lruvec *lruvec, struct > > > scan_control *sc) -{ > > > - return pages_for_compaction; > > > -} > > > -#endif > > > - > > > /* > > > * Reclaim/compaction is used for high-order allocation > > > requests. It reclaims > > > * order-0 pages before compacting the zone. > > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static > > > inline bool should_continue_reclaim(struct lruvec *lruvec, > > > * inactive lists are large enough, continue reclaiming > > > */ > > > pages_for_compaction = (2UL << sc->order); > > > - > > > - pages_for_compaction = > > > scale_for_compaction(pages_for_compaction, > > > - lruvec, sc); > > > inactive_lru_pages = get_lru_size(lruvec, > > > LRU_INACTIVE_FILE); if (nr_swap_pages > 0) > > > inactive_lru_pages += get_lru_size(lruvec, > > > LRU_INACTIVE_ANON); -- > > > > Even with this patch I see kswapd0 very often on top. Much more than > > with kernel 3.6. > > How severe is the CPU usage? The higher usage can be explained by "mm: > remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to > reduce the amount of time processes spend in compaction but will > result in the CPU cost being incurred by kswapd. > > Is it really high like the bug was reporting with high usage over long > periods of time or do you just see it using 2-6% of CPU for short > periods? It is really high. I've seen with compile-jobs (make -j4 on dual core) kswapd0 consuming at least 50% CPU most time. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" 2012-11-05 14:24 ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman 2012-11-06 10:15 ` Johannes Hirte @ 2012-11-09 9:12 ` Mel Gorman 1 sibling, 0 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-09 9:12 UTC (permalink / raw) To: Andrew Morton Cc: Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, Rik van Riel, Jiri Slaby, LKML On Mon, Nov 05, 2012 at 02:24:49PM +0000, Mel Gorman wrote: > Jiri Slaby reported the following: > > (It's an effective revert of "mm: vmscan: scale number of pages > reclaimed by reclaim/compaction based on failures".) Given kswapd > had hours of runtime in ps/top output yesterday in the morning > and after the revert it's now 2 minutes in sum for the last 24h, > I would say, it's gone. > > The intention of the patch in question was to compensate for the loss > of lumpy reclaim. Part of the reason lumpy reclaim worked is because > it aggressively reclaimed pages and this patch was meant to be a sane > compromise. > > When compaction fails, it gets deferred and both compaction and > reclaim/compaction is deferred avoid excessive reclaim. However, since > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time > and continues reclaiming which was not taken into account when the patch > was developed. > > Attempts to address the problem ended up just changing the shape of the > problem instead of fixing it. The release window gets closer and while a > THP allocation failing is not a major problem, kswapd chewing up a lot of > CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed > by reclaim/compaction based on failures" and will be revisited in the future. > > Signed-off-by: Mel Gorman <mgorman@suse.de> Andrew, can you pick up this patch please and drop mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-only-in-direct-reclaim.patch ? There are mixed reports on how much it helps but it comes down to "this fixes a problem" versus "kswapd is still showing higher usage". I think the higher kswapd usage is explained by the removal of __GFP_NO_KSWAPD and so while higher usage is bad, it is not necessarily unjustified. Ideally it would have been proven that having kswapd doing the work reduced application stalls in direct reclaim but unfortunately I do not have concrete evidence of that at this time. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-02 19:45 ` Jiri Slaby 2012-11-04 11:26 ` Zdenek Kabelac 2012-11-05 14:24 ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman @ 2012-11-09 4:22 ` Seth Jennings 2012-11-09 8:07 ` Zdenek Kabelac 2012-11-09 8:40 ` Mel Gorman 2 siblings, 2 replies; 52+ messages in thread From: Seth Jennings @ 2012-11-09 4:22 UTC (permalink / raw) To: Jiri Slaby Cc: Mel Gorman, Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On 11/02/2012 02:45 PM, Jiri Slaby wrote: > On 11/02/2012 11:53 AM, Jiri Slaby wrote: >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>> Yes, applying this instead of the revert fixes the issue as well. >>> >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>> CPU usage - mainly after suspend/resume >>> >>> Here is just simple kswapd backtrace from running kernel: >> >> Yup, this is what we were seeing with the former patch only too. Try to >> apply the other one too: >> https://patchwork.kernel.org/patch/1673231/ >> >> For me I would say, it is fixed by the two patches now. I won't be able >> to report later, since I'm leaving to a conference tomorrow. > > Damn it. It recurred right now, with both patches applied. After I > started a java program which consumed some more memory. Though there are > still 2 gigs free, kswap is spinning: > [<ffffffff810b00da>] __cond_resched+0x2a/0x40 > [<ffffffff811318a0>] shrink_slab+0x1c0/0x2d0 > [<ffffffff8113478d>] kswapd+0x66d/0xb60 > [<ffffffff810a25d0>] kthread+0xc0/0xd0 > [<ffffffff816aa29c>] ret_from_fork+0x7c/0xb0 > [<ffffffffffffffff>] 0xffffffffffffffff I'm also hitting this issue in v3.7-rc4. It appears that the last release not effected by this issue was v3.3. Bisecting the changes included for v3.4-rc1 showed that this commit introduced the issue: fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c Author: Rik van Riel <riel@redhat.com> Date: Wed Mar 21 16:33:51 2012 -0700 vmscan: reclaim at order 0 when compaction is enabled ... This is plausible since the issue seems to be in the kswapd + compaction realm. I've yet to figure out exactly what about this commit results in kswapd spinning. I would be interested if someone can confirm this finding. -- Seth ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-09 4:22 ` kswapd0: excessive CPU usage Seth Jennings @ 2012-11-09 8:07 ` Zdenek Kabelac 2012-11-09 9:06 ` Mel Gorman 2012-11-09 8:40 ` Mel Gorman 1 sibling, 1 reply; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-09 8:07 UTC (permalink / raw) To: Seth Jennings Cc: Jiri Slaby, Mel Gorman, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings Dne 9.11.2012 05:22, Seth Jennings napsal(a): > On 11/02/2012 02:45 PM, Jiri Slaby wrote: >> On 11/02/2012 11:53 AM, Jiri Slaby wrote: >>> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: >>>>>> Yes, applying this instead of the revert fixes the issue as well. >>>> >>>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive >>>> CPU usage - mainly after suspend/resume >>>> >>>> Here is just simple kswapd backtrace from running kernel: >>> >>> Yup, this is what we were seeing with the former patch only too. Try to >>> apply the other one too: >>> https://patchwork.kernel.org/patch/1673231/ >>> >>> For me I would say, it is fixed by the two patches now. I won't be able >>> to report later, since I'm leaving to a conference tomorrow. >> >> Damn it. It recurred right now, with both patches applied. After I >> started a java program which consumed some more memory. Though there are >> still 2 gigs free, kswap is spinning: >> [<ffffffff810b00da>] __cond_resched+0x2a/0x40 >> [<ffffffff811318a0>] shrink_slab+0x1c0/0x2d0 >> [<ffffffff8113478d>] kswapd+0x66d/0xb60 >> [<ffffffff810a25d0>] kthread+0xc0/0xd0 >> [<ffffffff816aa29c>] ret_from_fork+0x7c/0xb0 >> [<ffffffffffffffff>] 0xffffffffffffffff > > I'm also hitting this issue in v3.7-rc4. It appears that the last > release not effected by this issue was v3.3. Bisecting the changes > included for v3.4-rc1 showed that this commit introduced the issue: > > fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > Author: Rik van Riel <riel@redhat.com> > Date: Wed Mar 21 16:33:51 2012 -0700 > > vmscan: reclaim at order 0 when compaction is enabled > ... > > This is plausible since the issue seems to be in the kswapd + compaction > realm. I've yet to figure out exactly what about this commit results in > kswapd spinning. > > I would be interested if someone can confirm this finding. > > -- > Seth > On my system 3.7-rc4 the problem seems to be effectively solved by revert patch: https://lkml.org/lkml/2012/11/5/308 i.e. in 2 days uptime kswapd0 eats 6 seconds which is IMHO ok - I'm not observing any busy loops on CPU with kswapd0. Zdenek ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-09 8:07 ` Zdenek Kabelac @ 2012-11-09 9:06 ` Mel Gorman 2012-11-11 9:13 ` Zdenek Kabelac 0 siblings, 1 reply; 52+ messages in thread From: Mel Gorman @ 2012-11-09 9:06 UTC (permalink / raw) To: Zdenek Kabelac Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On Fri, Nov 09, 2012 at 09:07:45AM +0100, Zdenek Kabelac wrote: > >fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > >commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > >Author: Rik van Riel <riel@redhat.com> > >Date: Wed Mar 21 16:33:51 2012 -0700 > > > > vmscan: reclaim at order 0 when compaction is enabled > >... > > > >This is plausible since the issue seems to be in the kswapd + compaction > >realm. I've yet to figure out exactly what about this commit results in > >kswapd spinning. > > > >I would be interested if someone can confirm this finding. > > > >-- > >Seth > > > > > On my system 3.7-rc4 the problem seems to be effectively solved by > revert patch: https://lkml.org/lkml/2012/11/5/308 > Ok, while there is still a question on whether it's enough I think it's sensible to at least start with the obvious one. Thanks very much. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-09 9:06 ` Mel Gorman @ 2012-11-11 9:13 ` Zdenek Kabelac 2012-11-12 11:37 ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman 2012-11-12 12:19 ` kswapd0: excessive CPU usage Mel Gorman 0 siblings, 2 replies; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-11 9:13 UTC (permalink / raw) To: Mel Gorman Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings Dne 9.11.2012 10:06, Mel Gorman napsal(a): > On Fri, Nov 09, 2012 at 09:07:45AM +0100, Zdenek Kabelac wrote: >>> fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit >>> commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c >>> Author: Rik van Riel <riel@redhat.com> >>> Date: Wed Mar 21 16:33:51 2012 -0700 >>> >>> vmscan: reclaim at order 0 when compaction is enabled >>> ... >>> >>> This is plausible since the issue seems to be in the kswapd + compaction >>> realm. I've yet to figure out exactly what about this commit results in >>> kswapd spinning. >>> >>> I would be interested if someone can confirm this finding. >>> >>> -- >>> Seth >>> >> >> >> On my system 3.7-rc4 the problem seems to be effectively solved by >> revert patch: https://lkml.org/lkml/2012/11/5/308 >> > > Ok, while there is still a question on whether it's enough I think it's > sensible to at least start with the obvious one. > Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 [<ffffffff81192971>] put_super+0x31/0x40 [<ffffffff81192a42>] drop_super+0x22/0x30 [<ffffffff81193b89>] prune_super+0x149/0x1b0 [<ffffffff81141e2a>] shrink_slab+0xba/0x510 [<ffffffff81185b4a>] ? mem_cgroup_iter+0x17a/0x2e0 [<ffffffff81185a9a>] ? mem_cgroup_iter+0xca/0x2e0 [<ffffffff81145099>] balance_pgdat+0x629/0x7f0 [<ffffffff811453d4>] kswapd+0x174/0x620 [<ffffffff8106fd20>] ? __init_waitqueue_head+0x60/0x60 [<ffffffff81145260>] ? balance_pgdat+0x7f0/0x7f0 [<ffffffff8106f50b>] kthread+0xdb/0xe0 [<ffffffff8106f430>] ? kthread_create_on_node+0x140/0x140 [<ffffffff8155fa1c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106f430>] ? kthread_create_on_node+0x140/0x140 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep ---------------------------------------------------------------------------------------------------------- kswapd0 30 8689943.729790 36266 120 8689943.729790 201495.640629 56609485.489414 / kworker/0:1 14790 8689937.729790 16969 120 8689937.729790 374.385996 150405.181652 / R bash 14855 821.749268 50 120 821.749268 24.027535 5252.291128 /autogroup-304 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 146 CPU 1: hi: 186, btch: 31 usd: 135 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 131 CPU 1: hi: 186, btch: 31 usd: 132 active_anon:726521 inactive_anon:26442 isolated_anon:0 active_file:77765 inactive_file:76890 isolated_file:0 unevictable:12 dirty:4 writeback:0 unstable:0 free:40261 slab_reclaimable:12414 slab_unreclaimable:9694 mapped:26382 shmem:162712 pagetables:6618 bounce:0 free_cma:0 DMA free:15676kB min:272kB low:340kB high:408kB active_anon:208kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:208kB slab_reclaimable:8kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:126072kB min:51776kB low:64720kB high:77664kB active_anon:2175104kB inactive_anon:98976kB active_file:296252kB inactive_file:297648kB unevictable:48kB isolated(anon):0kB isolated(file):0kB present:3021960kB mlocked:48kB dirty:12kB writeback:0kB mapped:77664kB shmem:620388kB slab_reclaimable:19128kB slab_unreclaimable:6292kB kernel_stack:624kB pagetables:8900kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 885 885 Normal free:19296kB min:15532kB low:19412kB high:23296kB active_anon:730772kB inactive_anon:6792kB active_file:14808kB inactive_file:9912kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:0kB dirty:4kB writeback:0kB mapped:27864kB shmem:30252kB slab_reclaimable:30520kB slab_unreclaimable:32476kB kernel_stack:2496kB pagetables:17572kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 1*4kB 1*8kB 3*16kB 2*32kB 3*64kB 2*128kB 3*256kB 2*512kB 3*1024kB 3*2048kB 1*4096kB = 15676kB DMA32: 730*4kB 328*8kB 223*16kB 123*32kB 182*64kB 96*128kB 172*256kB 56*512kB 12*1024kB 1*2048kB 1*4096kB = 128120kB Normal: 600*4kB 384*8kB 164*16kB 122*32kB 40*64kB 7*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 19296kB 317367 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 642501 pages shared 869271 pages non-shared ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-11 9:13 ` Zdenek Kabelac @ 2012-11-12 11:37 ` Mel Gorman 2012-11-16 19:14 ` Josh Boyer 2012-11-20 9:18 ` Glauber Costa 2012-11-12 12:19 ` kswapd0: excessive CPU usage Mel Gorman 1 sibling, 2 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-12 11:37 UTC (permalink / raw) To: Zdenek Kabelac Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 [<ffffffff81192971>] put_super+0x31/0x40 [<ffffffff81192a42>] drop_super+0x22/0x30 [<ffffffff81193b89>] prune_super+0x149/0x1b0 [<ffffffff81141e2a>] shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. The temptation is to supply a patch that checks if kswapd was woken for THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not backed up by proper testing. As 3.7 is very close to release and this is not a bug we should release with, a safer path is to revert "mm: remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the balance_pgdat() logic in general. Signed-off-by: Mel Gorman <mgorman@suse.de> --- drivers/mtd/mtdcore.c | 6 ++++-- include/linux/gfp.h | 5 ++++- include/trace/events/gfpflags.h | 1 + mm/page_alloc.c | 7 ++++--- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index 374c46d..ec794a7 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -1077,7 +1077,8 @@ EXPORT_SYMBOL_GPL(mtd_writev); * until the request succeeds or until the allocation size falls below * the system page size. This attempts to make sure it does not adversely * impact system performance, so when allocating more than one page, we - * ask the memory allocator to avoid re-trying. + * ask the memory allocator to avoid re-trying, swapping, writing back + * or performing I/O. * * Note, this function also makes sure that the allocated buffer is aligned to * the MTD device's min. I/O unit, i.e. the "mtd->writesize" value. @@ -1091,7 +1092,8 @@ EXPORT_SYMBOL_GPL(mtd_writev); */ void *mtd_kmalloc_up_to(const struct mtd_info *mtd, size_t *size) { - gfp_t flags = __GFP_NOWARN | __GFP_WAIT | __GFP_NORETRY; + gfp_t flags = __GFP_NOWARN | __GFP_WAIT | + __GFP_NORETRY | __GFP_NO_KSWAPD; size_t min_alloc = max_t(size_t, mtd->writesize, PAGE_SIZE); void *kbuf; diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 02c1c971..d0a7967 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -31,6 +31,7 @@ struct vm_area_struct; #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u #define ___GFP_NOTRACK 0x200000u +#define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u @@ -85,6 +86,7 @@ struct vm_area_struct; #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ +#define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ @@ -114,7 +116,8 @@ struct vm_area_struct; __GFP_MOVABLE) #define GFP_IOFS (__GFP_IO | __GFP_FS) #define GFP_TRANSHUGE (GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ - __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) + __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \ + __GFP_NO_KSWAPD) #ifdef CONFIG_NUMA #define GFP_THISNODE (__GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY) diff --git a/include/trace/events/gfpflags.h b/include/trace/events/gfpflags.h index 9391706..d6fd8e5 100644 --- a/include/trace/events/gfpflags.h +++ b/include/trace/events/gfpflags.h @@ -36,6 +36,7 @@ {(unsigned long)__GFP_RECLAIMABLE, "GFP_RECLAIMABLE"}, \ {(unsigned long)__GFP_MOVABLE, "GFP_MOVABLE"}, \ {(unsigned long)__GFP_NOTRACK, "GFP_NOTRACK"}, \ + {(unsigned long)__GFP_NO_KSWAPD, "GFP_NO_KSWAPD"}, \ {(unsigned long)__GFP_OTHER_NODE, "GFP_OTHER_NODE"} \ ) : "GFP_NOWAIT" diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..7228260 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2416,8 +2416,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, - zone_idx(preferred_zone)); + if (!(gfp_mask & __GFP_NO_KSWAPD)) + wake_all_kswapd(order, zonelist, high_zoneidx, + zone_idx(preferred_zone)); /* * OK, we're below the kswapd watermark and have kicked background @@ -2494,7 +2495,7 @@ rebalance: * system then fail the allocation instead of entering direct reclaim. */ if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + (gfp_mask & __GFP_NO_KSWAPD)) goto nopage; /* Try direct reclaim and then allocating */ ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-12 11:37 ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman @ 2012-11-16 19:14 ` Josh Boyer 2012-11-16 19:51 ` Andrew Morton 2012-11-16 20:06 ` Mel Gorman 2012-11-20 9:18 ` Glauber Costa 1 sibling, 2 replies; 52+ messages in thread From: Josh Boyer @ 2012-11-16 19:14 UTC (permalink / raw) To: Mel Gorman Cc: Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote: > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > based on failures" reverted, Zdenek Kabelac reported the following > > Hmm, so it's just took longer to hit the problem and observe > kswapd0 spinning on my CPU again - it's not as endless like before - > but still it easily eats minutes - it helps to turn off Firefox > or TB (memory hungry apps) so kswapd0 stops soon - and restart > those apps again. (And I still have like >1GB of cached memory) > > kswapd0 R running task 0 30 2 0x00000000 > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > Call Trace: > [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 > [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 > [<ffffffff81192971>] put_super+0x31/0x40 > [<ffffffff81192a42>] drop_super+0x22/0x30 > [<ffffffff81193b89>] prune_super+0x149/0x1b0 > [<ffffffff81141e2a>] shrink_slab+0xba/0x510 > > The sysrq+m indicates the system has no swap so it'll never reclaim > anonymous pages as part of reclaim/compaction. That is one part of the > problem but not the root cause as file-backed pages could also be reclaimed. > > The likely underlying problem is that kswapd is woken up or kept awake > for each THP allocation request in the page allocator slow path. > > If compaction fails for the requesting process then compaction will be > deferred for a time and direct reclaim is avoided. However, if there > are a storm of THP requests that are simply rejected, it will still > be the the case that kswapd is awake for a prolonged period of time > as pgdat->kswapd_max_order is updated each time. This is noticed by > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > Instead it will loopp, shrinking a small number of pages and calling > shrink_slab() on each iteration. > > The temptation is to supply a patch that checks if kswapd was woken for > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > backed up by proper testing. As 3.7 is very close to release and this is > not a bug we should release with, a safer path is to revert "mm: remove > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > balance_pgdat() logic in general. > > Signed-off-by: Mel Gorman <mgorman@suse.de> Does anyone know if this is queued to go into 3.7 somewhere? I looked a bit and can't find it in a tree. We have a few reports of Fedora rawhide users hitting this. josh ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-16 19:14 ` Josh Boyer @ 2012-11-16 19:51 ` Andrew Morton 2012-11-20 1:43 ` Valdis.Kletnieks 2012-11-16 20:06 ` Mel Gorman 1 sibling, 1 reply; 52+ messages in thread From: Andrew Morton @ 2012-11-16 19:51 UTC (permalink / raw) To: Josh Boyer Cc: Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Rik van Riel, Robert Jennings On Fri, 16 Nov 2012 14:14:47 -0500 Josh Boyer <jwboyer@gmail.com> wrote: > > The temptation is to supply a patch that checks if kswapd was woken for > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > backed up by proper testing. As 3.7 is very close to release and this is > > not a bug we should release with, a safer path is to revert "mm: remove > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > balance_pgdat() logic in general. > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > a bit and can't find it in a tree. We have a few reports of Fedora > rawhide users hitting this. Still thinking about it. We're reverting quite a lot of material lately. mm-revert-mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-based-on-failures.patch and revert-mm-fix-up-zone-present-pages.patch are queued for 3.7. I'll toss this one in there as well, but I can't say I'm feeling terribly confident. How is Valdis's machine nowadays? ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-16 19:51 ` Andrew Morton @ 2012-11-20 1:43 ` Valdis.Kletnieks 0 siblings, 0 replies; 52+ messages in thread From: Valdis.Kletnieks @ 2012-11-20 1:43 UTC (permalink / raw) To: Andrew Morton Cc: Josh Boyer, Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Jiri Slaby, linux-mm, LKML, Rik van Riel, Robert Jennings [-- Attachment #1: Type: text/plain, Size: 1743 bytes --] On Fri, 16 Nov 2012 11:51:24 -0800, Andrew Morton said: > On Fri, 16 Nov 2012 14:14:47 -0500 > Josh Boyer <jwboyer@gmail.com> wrote: > > > > The temptation is to supply a patch that checks if kswapd was woken for > > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > > backed up by proper testing. As 3.7 is very close to release and this is > > > not a bug we should release with, a safer path is to revert "mm: remove > > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > > balance_pgdat() logic in general. > > > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > > a bit and can't find it in a tree. We have a few reports of Fedora > > rawhide users hitting this. > > Still thinking about it. We're reverting quite a lot of material > lately. > mm-revert-mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-based-on-failures.patch > and revert-mm-fix-up-zone-present-pages.patch are queued for 3.7. > > I'll toss this one in there as well, but I can't say I'm feeling > terribly confident. How is Valdis's machine nowadays? I admit possibly having lost the plot. With the two patches you mention stuck on top of next-20121114, I'm seeing less kswapd issues but am still tripping over them on occasion. It seems to be related to uptime - I don't see any for a few hours, but they become more frequent. I was seeing quite a few of them yesterday after I had a 30-hour uptime. I'll stick Mel's "mm: remove __GFP_NO_KSWAPD" patch on this evening and let you know what happens (might be a day or two before I have definitive results, as usualally my laptop gets rebooted twice a day). [-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-16 19:14 ` Josh Boyer 2012-11-16 19:51 ` Andrew Morton @ 2012-11-16 20:06 ` Mel Gorman 2012-11-20 15:38 ` Josh Boyer 1 sibling, 1 reply; 52+ messages in thread From: Mel Gorman @ 2012-11-16 20:06 UTC (permalink / raw) To: Josh Boyer Cc: Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: > On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote: > > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > > based on failures" reverted, Zdenek Kabelac reported the following > > > > Hmm, so it's just took longer to hit the problem and observe > > kswapd0 spinning on my CPU again - it's not as endless like before - > > but still it easily eats minutes - it helps to turn off Firefox > > or TB (memory hungry apps) so kswapd0 stops soon - and restart > > those apps again. (And I still have like >1GB of cached memory) > > > > kswapd0 R running task 0 30 2 0x00000000 > > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > > Call Trace: > > [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 > > [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 > > [<ffffffff81192971>] put_super+0x31/0x40 > > [<ffffffff81192a42>] drop_super+0x22/0x30 > > [<ffffffff81193b89>] prune_super+0x149/0x1b0 > > [<ffffffff81141e2a>] shrink_slab+0xba/0x510 > > > > The sysrq+m indicates the system has no swap so it'll never reclaim > > anonymous pages as part of reclaim/compaction. That is one part of the > > problem but not the root cause as file-backed pages could also be reclaimed. > > > > The likely underlying problem is that kswapd is woken up or kept awake > > for each THP allocation request in the page allocator slow path. > > > > If compaction fails for the requesting process then compaction will be > > deferred for a time and direct reclaim is avoided. However, if there > > are a storm of THP requests that are simply rejected, it will still > > be the the case that kswapd is awake for a prolonged period of time > > as pgdat->kswapd_max_order is updated each time. This is noticed by > > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > > Instead it will loopp, shrinking a small number of pages and calling > > shrink_slab() on each iteration. > > > > The temptation is to supply a patch that checks if kswapd was woken for > > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > > backed up by proper testing. As 3.7 is very close to release and this is > > not a bug we should release with, a safer path is to revert "mm: remove > > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > > balance_pgdat() logic in general. > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > Does anyone know if this is queued to go into 3.7 somewhere? I looked > a bit and can't find it in a tree. We have a few reports of Fedora > rawhide users hitting this. > No, because I was waiting to hear if a) it worked and preferably if the alternative "less safe" option worked. This close to release it might be better to just go with the safe option. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-16 20:06 ` Mel Gorman @ 2012-11-20 15:38 ` Josh Boyer 2012-11-20 16:13 ` Bruno Wolff III ` (2 more replies) 0 siblings, 3 replies; 52+ messages in thread From: Josh Boyer @ 2012-11-20 15:38 UTC (permalink / raw) To: Mel Gorman Cc: Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings, Thorsten Leemhuis, bruno On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman <mgorman@suse.de> wrote: > On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: >> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote: >> > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction >> > based on failures" reverted, Zdenek Kabelac reported the following >> > >> > Hmm, so it's just took longer to hit the problem and observe >> > kswapd0 spinning on my CPU again - it's not as endless like before - >> > but still it easily eats minutes - it helps to turn off Firefox >> > or TB (memory hungry apps) so kswapd0 stops soon - and restart >> > those apps again. (And I still have like >1GB of cached memory) >> > >> > kswapd0 R running task 0 30 2 0x00000000 >> > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 >> > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 >> > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 >> > Call Trace: >> > [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 >> > [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 >> > [<ffffffff81192971>] put_super+0x31/0x40 >> > [<ffffffff81192a42>] drop_super+0x22/0x30 >> > [<ffffffff81193b89>] prune_super+0x149/0x1b0 >> > [<ffffffff81141e2a>] shrink_slab+0xba/0x510 >> > >> > The sysrq+m indicates the system has no swap so it'll never reclaim >> > anonymous pages as part of reclaim/compaction. That is one part of the >> > problem but not the root cause as file-backed pages could also be reclaimed. >> > >> > The likely underlying problem is that kswapd is woken up or kept awake >> > for each THP allocation request in the page allocator slow path. >> > >> > If compaction fails for the requesting process then compaction will be >> > deferred for a time and direct reclaim is avoided. However, if there >> > are a storm of THP requests that are simply rejected, it will still >> > be the the case that kswapd is awake for a prolonged period of time >> > as pgdat->kswapd_max_order is updated each time. This is noticed by >> > the main kswapd() loop and it will not call kswapd_try_to_sleep(). >> > Instead it will loopp, shrinking a small number of pages and calling >> > shrink_slab() on each iteration. >> > >> > The temptation is to supply a patch that checks if kswapd was woken for >> > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not >> > backed up by proper testing. As 3.7 is very close to release and this is >> > not a bug we should release with, a safer path is to revert "mm: remove >> > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the >> > balance_pgdat() logic in general. >> > >> > Signed-off-by: Mel Gorman <mgorman@suse.de> >> >> Does anyone know if this is queued to go into 3.7 somewhere? I looked >> a bit and can't find it in a tree. We have a few reports of Fedora >> rawhide users hitting this. >> > > No, because I was waiting to hear if a) it worked and preferably if the > alternative "less safe" option worked. This close to release it might be > better to just go with the safe option. We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 and people say this revert patch doesn't seem to make the issue go away fully. Thorsten has created another kernel with the other patch applied for testing. At least I think that is the latest status from the bug. Hopefully the commenters will chime in. josh ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-20 15:38 ` Josh Boyer @ 2012-11-20 16:13 ` Bruno Wolff III 2012-11-20 17:43 ` Thorsten Leemhuis 2012-11-21 15:08 ` Mel Gorman 2 siblings, 0 replies; 52+ messages in thread From: Bruno Wolff III @ 2012-11-20 16:13 UTC (permalink / raw) To: Josh Boyer Cc: Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings, Thorsten Leemhuis On Tue, Nov 20, 2012 at 10:38:45 -0500, Josh Boyer <jwboyer@gmail.com> wrote: > >We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 >and people say this revert patch doesn't seem to make the issue go away >fully. Thorsten has created another kernel with the other patch applied >for testing. > >At least I think that is the latest status from the bug. Hopefully the >commenters will chime in. I am seeing kswapd0 hogging a cpu right now. I have two rsyncs and an md sync running and a couple of large memory processes (java and firefox) idle. I haven't been seeing this happen as often as previously. Before doing a yum update with an rsync was pretty good at triggering the problem. Now, not so much. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-20 15:38 ` Josh Boyer 2012-11-20 16:13 ` Bruno Wolff III @ 2012-11-20 17:43 ` Thorsten Leemhuis 2012-11-23 15:20 ` Thorsten Leemhuis 2012-11-21 15:08 ` Mel Gorman 2 siblings, 1 reply; 52+ messages in thread From: Thorsten Leemhuis @ 2012-11-20 17:43 UTC (permalink / raw) To: Josh Boyer Cc: Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings, bruno On 20.11.2012 16:38, Josh Boyer wrote: > On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman <mgorman@suse.de> wrote: >> On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: >>> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote: >>>> With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction >>>> based on failures" reverted, Zdenek Kabelac reported the following >>>> >>>> Hmm, so it's just took longer to hit the problem and observe >>>> kswapd0 spinning on my CPU again - it's not as endless like before - >>>> but still it easily eats minutes - it helps to turn off Firefox >>>> or TB (memory hungry apps) so kswapd0 stops soon - and restart >>>> those apps again. (And I still have like >1GB of cached memory) >>>> >>>> kswapd0 R running task 0 30 2 0x00000000 >>>> ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 >>>> ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 >>>> ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 >>>> Call Trace: >>>> [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 >>>> [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 >>>> [<ffffffff81192971>] put_super+0x31/0x40 >>>> [<ffffffff81192a42>] drop_super+0x22/0x30 >>>> [<ffffffff81193b89>] prune_super+0x149/0x1b0 >>>> [<ffffffff81141e2a>] shrink_slab+0xba/0x510 >>>> >>>> The sysrq+m indicates the system has no swap so it'll never reclaim >>>> anonymous pages as part of reclaim/compaction. That is one part of the >>>> problem but not the root cause as file-backed pages could also be reclaimed. >>>> >>>> The likely underlying problem is that kswapd is woken up or kept awake >>>> for each THP allocation request in the page allocator slow path. >>>> >>>> If compaction fails for the requesting process then compaction will be >>>> deferred for a time and direct reclaim is avoided. However, if there >>>> are a storm of THP requests that are simply rejected, it will still >>>> be the the case that kswapd is awake for a prolonged period of time >>>> as pgdat->kswapd_max_order is updated each time. This is noticed by >>>> the main kswapd() loop and it will not call kswapd_try_to_sleep(). >>>> Instead it will loopp, shrinking a small number of pages and calling >>>> shrink_slab() on each iteration. >>>> >>>> The temptation is to supply a patch that checks if kswapd was woken for >>>> THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not >>>> backed up by proper testing. As 3.7 is very close to release and this is >>>> not a bug we should release with, a safer path is to revert "mm: remove >>>> __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the >>>> balance_pgdat() logic in general. >>>> >>>> Signed-off-by: Mel Gorman <mgorman@suse.de> >>> >>> Does anyone know if this is queued to go into 3.7 somewhere? I looked >>> a bit and can't find it in a tree. We have a few reports of Fedora >>> rawhide users hitting this. >> >> No, because I was waiting to hear if a) it worked and preferably if the >> alternative "less safe" option worked. This close to release it might be >> better to just go with the safe option. > > We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 > and people say this revert patch doesn't seem to make the issue go away > fully. Thorsten has created another kernel with the other patch applied > for testing. > > At least I think that is the latest status from the bug. Hopefully the > commenters will chime in. The short story from my current point of view is: * my main machine at home where I initially saw the issue that started this thread seems to be running fine with rc6 and the "safe" patch Mel posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 kernel with the revert that went into rc6 and the "safe" patch -- that worked fine for a few days, too. * I have a second machine where I started to use 3.7-rc kernels only yesterday (the machine triggered a bug in the radeon driver that seems to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac mentions in this thread. I wasn't able to look closer at it, but simply tried rc6 with the safe patch, which didn't help. I'm now running rc6 with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 I can't yet tell if it helps. If the problems shows up again I'll try to capture more debugging data via sysrq -- there wasn't any time for that when I was running rc6 with the safe patch, sorry. Thorsten ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-20 17:43 ` Thorsten Leemhuis @ 2012-11-23 15:20 ` Thorsten Leemhuis 2012-11-27 11:12 ` Mel Gorman 0 siblings, 1 reply; 52+ messages in thread From: Thorsten Leemhuis @ 2012-11-23 15:20 UTC (permalink / raw) To: Josh Boyer Cc: Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings, bruno Thorsten Leemhuis wrote on 20.11.2012 18:43: > On 20.11.2012 16:38, Josh Boyer wrote: > > The short story from my current point of view is: Quick update, in case anybody is interested: > * my main machine at home where I initially saw the issue that started > this thread seems to be running fine with rc6 and the "safe" patch Mel > posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 > kernel with the revert that went into rc6 and the "safe" patch -- that > worked fine for a few days, too. On this machine I'm running a rc6 kernel + the fix for the accounting bug(¹) that went into mainline ~40 hours ago + the "riskier" patch Mel posted in https://lkml.org/lkml/2012/11/12/151 Up to now everything works fine. (¹) https://lkml.org/lkml/2012/11/21/362 > * I have a second machine where I started to use 3.7-rc kernels only > yesterday (the machine triggered a bug in the radeon driver that seems > to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac > mentions in this thread. I wasn't able to look closer at it, but simply > tried rc6 with the safe patch, which didn't help. I'm now running rc6 > with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 > I can't yet tell if it helps. If the problems shows up again I'll try to > capture more debugging data via sysrq -- there wasn't any time for that > when I was running rc6 with the safe patch, sorry. This machine is now also behaving fine with above mentioned rc6 kernel + the two patches. It seems the accounting bug was the root cause for the problems this machine showed. CU Thorsten ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-23 15:20 ` Thorsten Leemhuis @ 2012-11-27 11:12 ` Mel Gorman 0 siblings, 0 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-27 11:12 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Josh Boyer, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings, bruno On Fri, Nov 23, 2012 at 04:20:48PM +0100, Thorsten Leemhuis wrote: > Thorsten Leemhuis wrote on 20.11.2012 18:43: > > On 20.11.2012 16:38, Josh Boyer wrote: > > > > The short story from my current point of view is: > > Quick update, in case anybody is interested: > > > * my main machine at home where I initially saw the issue that started > > this thread seems to be running fine with rc6 and the "safe" patch Mel > > posted in https://lkml.org/lkml/2012/11/12/113 Before that I ran a rc5 > > kernel with the revert that went into rc6 and the "safe" patch -- that > > worked fine for a few days, too. > > On this machine I'm running a rc6 kernel + the fix for the accounting > bug(¹) that went into mainline ~40 hours ago + the "riskier" patch Mel > posted in https://lkml.org/lkml/2012/11/12/151 > > Up to now everything works fine. > > (¹) https://lkml.org/lkml/2012/11/21/362 > That's good news, thanks for the follow up. Maybe 3.7 will not be a complete disaster with respect to THP after all this. The riskier patch was not picked up simply because it was riskier and would still be vunerable to the effective infinite loop Johannes found in kswapd. It'll all need to be revisisted. > > * I have a second machine where I started to use 3.7-rc kernels only > > yesterday (the machine triggered a bug in the radeon driver that seems > > to be fixed in rc6) which showed symptoms like the ones Zdenek Kabelac > > mentions in this thread. I wasn't able to look closer at it, but simply > > tried rc6 with the safe patch, which didn't help. I'm now running rc6 > > with the "riskier" patch from https://lkml.org/lkml/2012/11/12/151 > > I can't yet tell if it helps. If the problems shows up again I'll try to > > capture more debugging data via sysrq -- there wasn't any time for that > > when I was running rc6 with the safe patch, sorry. > > This machine is now also behaving fine with above mentioned rc6 kernel + > the two patches. It seems the accounting bug was the root cause for the > problems this machine showed. > For some yes, for others no. Others are getting stuck within effective infinite loops in kswapd and the trigger cases are different although the symptoms loop similar. Thanks again. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-20 15:38 ` Josh Boyer 2012-11-20 16:13 ` Bruno Wolff III 2012-11-20 17:43 ` Thorsten Leemhuis @ 2012-11-21 15:08 ` Mel Gorman 2 siblings, 0 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-21 15:08 UTC (permalink / raw) To: Josh Boyer Cc: Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings, Thorsten Leemhuis, bruno On Tue, Nov 20, 2012 at 10:38:45AM -0500, Josh Boyer wrote: > On Fri, Nov 16, 2012 at 3:06 PM, Mel Gorman <mgorman@suse.de> wrote: > > On Fri, Nov 16, 2012 at 02:14:47PM -0500, Josh Boyer wrote: > >> On Mon, Nov 12, 2012 at 6:37 AM, Mel Gorman <mgorman@suse.de> wrote: > >> > With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction > >> > based on failures" reverted, Zdenek Kabelac reported the following > >> > > >> > Hmm, so it's just took longer to hit the problem and observe > >> > kswapd0 spinning on my CPU again - it's not as endless like before - > >> > but still it easily eats minutes - it helps to turn off Firefox > >> > or TB (memory hungry apps) so kswapd0 stops soon - and restart > >> > those apps again. (And I still have like >1GB of cached memory) > >> > > >> > kswapd0 R running task 0 30 2 0x00000000 > >> > ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 > >> > ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 > >> > ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 > >> > Call Trace: > >> > [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 > >> > [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 > >> > [<ffffffff81192971>] put_super+0x31/0x40 > >> > [<ffffffff81192a42>] drop_super+0x22/0x30 > >> > [<ffffffff81193b89>] prune_super+0x149/0x1b0 > >> > [<ffffffff81141e2a>] shrink_slab+0xba/0x510 > >> > > >> > The sysrq+m indicates the system has no swap so it'll never reclaim > >> > anonymous pages as part of reclaim/compaction. That is one part of the > >> > problem but not the root cause as file-backed pages could also be reclaimed. > >> > > >> > The likely underlying problem is that kswapd is woken up or kept awake > >> > for each THP allocation request in the page allocator slow path. > >> > > >> > If compaction fails for the requesting process then compaction will be > >> > deferred for a time and direct reclaim is avoided. However, if there > >> > are a storm of THP requests that are simply rejected, it will still > >> > be the the case that kswapd is awake for a prolonged period of time > >> > as pgdat->kswapd_max_order is updated each time. This is noticed by > >> > the main kswapd() loop and it will not call kswapd_try_to_sleep(). > >> > Instead it will loopp, shrinking a small number of pages and calling > >> > shrink_slab() on each iteration. > >> > > >> > The temptation is to supply a patch that checks if kswapd was woken for > >> > THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not > >> > backed up by proper testing. As 3.7 is very close to release and this is > >> > not a bug we should release with, a safer path is to revert "mm: remove > >> > __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the > >> > balance_pgdat() logic in general. > >> > > >> > Signed-off-by: Mel Gorman <mgorman@suse.de> > >> > >> Does anyone know if this is queued to go into 3.7 somewhere? I looked > >> a bit and can't find it in a tree. We have a few reports of Fedora > >> rawhide users hitting this. > >> > > > > No, because I was waiting to hear if a) it worked and preferably if the > > alternative "less safe" option worked. This close to release it might be > > better to just go with the safe option. > > We've been tracking it in https://bugzilla.redhat.com/show_bug.cgi?id=866988 > and people say this revert patch doesn't seem to make the issue go away > fully. Thorsten has created another kernel with the other patch applied > for testing. > There is also a potential accounting bug that could be affecting this. https://lkml.org/lkml/2012/11/20/613 . NR_FREE_PAGES affects watermark calculations. If it's drifts too far then processes would keep entering direct reclaim and waking kswapd even if there is no need to. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-12 11:37 ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman 2012-11-16 19:14 ` Josh Boyer @ 2012-11-20 9:18 ` Glauber Costa 2012-11-20 20:18 ` Andrew Morton 1 sibling, 1 reply; 52+ messages in thread From: Glauber Costa @ 2012-11-20 9:18 UTC (permalink / raw) To: Mel Gorman Cc: Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On 11/12/2012 03:37 PM, Mel Gorman wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 02c1c971..d0a7967 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -31,6 +31,7 @@ struct vm_area_struct; > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > #define ___GFP_NOTRACK 0x200000u > +#define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u Keep in mind that this bit has been reused in -mm. If this patch needs to be reverted, we'll need to first change the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it would break things. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-20 9:18 ` Glauber Costa @ 2012-11-20 20:18 ` Andrew Morton 2012-11-21 8:30 ` Glauber Costa 0 siblings, 1 reply; 52+ messages in thread From: Andrew Morton @ 2012-11-20 20:18 UTC (permalink / raw) To: Glauber Costa Cc: Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Rik van Riel, Robert Jennings On Tue, 20 Nov 2012 13:18:19 +0400 Glauber Costa <glommer@parallels.com> wrote: > On 11/12/2012 03:37 PM, Mel Gorman wrote: > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 02c1c971..d0a7967 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -31,6 +31,7 @@ struct vm_area_struct; > > #define ___GFP_THISNODE 0x40000u > > #define ___GFP_RECLAIMABLE 0x80000u > > #define ___GFP_NOTRACK 0x200000u > > +#define ___GFP_NO_KSWAPD 0x400000u > > #define ___GFP_OTHER_NODE 0x800000u > > #define ___GFP_WRITE 0x1000000u > > Keep in mind that this bit has been reused in -mm. > If this patch needs to be reverted, we'll need to first change > the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it > would break things. I presently have /* Plain integer GFP bitmasks. Do not use this directly. */ #define ___GFP_DMA 0x01u #define ___GFP_HIGHMEM 0x02u #define ___GFP_DMA32 0x04u #define ___GFP_MOVABLE 0x08u #define ___GFP_WAIT 0x10u #define ___GFP_HIGH 0x20u #define ___GFP_IO 0x40u #define ___GFP_FS 0x80u #define ___GFP_COLD 0x100u #define ___GFP_NOWARN 0x200u #define ___GFP_REPEAT 0x400u #define ___GFP_NOFAIL 0x800u #define ___GFP_NORETRY 0x1000u #define ___GFP_MEMALLOC 0x2000u #define ___GFP_COMP 0x4000u #define ___GFP_ZERO 0x8000u #define ___GFP_NOMEMALLOC 0x10000u #define ___GFP_HARDWALL 0x20000u #define ___GFP_THISNODE 0x40000u #define ___GFP_RECLAIMABLE 0x80000u #define ___GFP_KMEMCG 0x100000u #define ___GFP_NOTRACK 0x200000u #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u and #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) Which I think is OK? I'd forgotten about __GFP_BITS_SHIFT. Should we do this? --- a/include/linux/gfp.h~a +++ a/include/linux/gfp.h @@ -35,6 +35,7 @@ struct vm_area_struct; #define ___GFP_NO_KSWAPD 0x400000u #define ___GFP_OTHER_NODE 0x800000u #define ___GFP_WRITE 0x1000000u +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* * GFP bitmasks.. _ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" 2012-11-20 20:18 ` Andrew Morton @ 2012-11-21 8:30 ` Glauber Costa 0 siblings, 0 replies; 52+ messages in thread From: Glauber Costa @ 2012-11-21 8:30 UTC (permalink / raw) To: Andrew Morton Cc: Mel Gorman, Zdenek Kabelac, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Rik van Riel, Robert Jennings On 11/21/2012 12:18 AM, Andrew Morton wrote: > On Tue, 20 Nov 2012 13:18:19 +0400 > Glauber Costa <glommer@parallels.com> wrote: > >> On 11/12/2012 03:37 PM, Mel Gorman wrote: >>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h >>> index 02c1c971..d0a7967 100644 >>> --- a/include/linux/gfp.h >>> +++ b/include/linux/gfp.h >>> @@ -31,6 +31,7 @@ struct vm_area_struct; >>> #define ___GFP_THISNODE 0x40000u >>> #define ___GFP_RECLAIMABLE 0x80000u >>> #define ___GFP_NOTRACK 0x200000u >>> +#define ___GFP_NO_KSWAPD 0x400000u >>> #define ___GFP_OTHER_NODE 0x800000u >>> #define ___GFP_WRITE 0x1000000u >> >> Keep in mind that this bit has been reused in -mm. >> If this patch needs to be reverted, we'll need to first change >> the definition of __GFP_KMEMCG (and __GFP_BITS_SHIFT as a result), or it >> would break things. > > I presently have > > /* Plain integer GFP bitmasks. Do not use this directly. */ > #define ___GFP_DMA 0x01u > #define ___GFP_HIGHMEM 0x02u > #define ___GFP_DMA32 0x04u > #define ___GFP_MOVABLE 0x08u > #define ___GFP_WAIT 0x10u > #define ___GFP_HIGH 0x20u > #define ___GFP_IO 0x40u > #define ___GFP_FS 0x80u > #define ___GFP_COLD 0x100u > #define ___GFP_NOWARN 0x200u > #define ___GFP_REPEAT 0x400u > #define ___GFP_NOFAIL 0x800u > #define ___GFP_NORETRY 0x1000u > #define ___GFP_MEMALLOC 0x2000u > #define ___GFP_COMP 0x4000u > #define ___GFP_ZERO 0x8000u > #define ___GFP_NOMEMALLOC 0x10000u > #define ___GFP_HARDWALL 0x20000u > #define ___GFP_THISNODE 0x40000u > #define ___GFP_RECLAIMABLE 0x80000u > #define ___GFP_KMEMCG 0x100000u > #define ___GFP_NOTRACK 0x200000u > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u > > and > Humm, I didn't realize there were also another free space at 0x100000u. This seems fine. > #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > Which I think is OK? Yes, if we haven't increased the size of the flag-space, no need to change it. > > I'd forgotten about __GFP_BITS_SHIFT. Should we do this? > > --- a/include/linux/gfp.h~a > +++ a/include/linux/gfp.h > @@ -35,6 +35,7 @@ struct vm_area_struct; > #define ___GFP_NO_KSWAPD 0x400000u > #define ___GFP_OTHER_NODE 0x800000u > #define ___GFP_WRITE 0x1000000u > +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ > This is a very helpful comment. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-11 9:13 ` Zdenek Kabelac 2012-11-12 11:37 ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman @ 2012-11-12 12:19 ` Mel Gorman 2012-11-12 13:13 ` Zdenek Kabelac 1 sibling, 1 reply; 52+ messages in thread From: Mel Gorman @ 2012-11-12 12:19 UTC (permalink / raw) To: Zdenek Kabelac Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: > Hmm, so it's just took longer to hit the problem and observe kswapd0 > spinning on my CPU again - it's not as endless like before - but > still it easily eats minutes - it helps to turn off Firefox or TB > (memory hungry apps) so kswapd0 stops soon - and restart those apps > again. > (And I still have like >1GB of cached memory) > I posted a "safe" patch that I believe explains why you are seeing what you are seeing. It does mean that there will still be some stalls due to THP because kswapd is not helping and it's avoiding the problem rather than trying to deal with it. Hence, I'm also going to post this patch even though I have not tested it myself. If you find it fixes the problem then it would be a preferable patch to the revert. It still is the case that the balance_pgdat() logic is in sort need of a rethink as it's pretty twisted right now. Thanks ---8<--- mm: Avoid waking kswapd for THP allocations when compaction is deferred or contended With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 ffff8801331efae8 0000000000000082 0000000000000018 0000000000000246 ffff880135b9a340 ffff8801331effd8 ffff8801331effd8 ffff8801331effd8 ffff880055dfa340 ffff880135b9a340 00000000331efad8 ffff8801331ee000 Call Trace: [<ffffffff81555bf2>] preempt_schedule+0x42/0x60 [<ffffffff81557a95>] _raw_spin_unlock+0x55/0x60 [<ffffffff81192971>] put_super+0x31/0x40 [<ffffffff81192a42>] drop_super+0x22/0x30 [<ffffffff81193b89>] prune_super+0x149/0x1b0 [<ffffffff81141e2a>] shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. This patch defers when kswapd gets woken up for THP allocations. For !THP allocations, kswapd is always woken up. For THP allocations, kswapd is woken up iff the process is willing to enter into direct reclaim/compaction. Signed-off-by: Mel Gorman <mgorman@suse.de> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bb90971..0b469b4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2378,6 +2378,15 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS); } +/* Returns true if the allocation is likely for THP */ +static bool is_thp_alloc(gfp_t gfp_mask, unsigned int order) +{ + if (order == pageblock_order && + (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) + return true; + return false; +} + static inline struct page * __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, enum zone_type high_zoneidx, @@ -2416,7 +2425,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, goto nopage; restart: - wake_all_kswapd(order, zonelist, high_zoneidx, + /* The decision whether to wake kswapd for THP is made later */ + if (!is_thp_alloc(gfp_mask, order)) + wake_all_kswapd(order, zonelist, high_zoneidx, zone_idx(preferred_zone)); /* @@ -2487,15 +2498,21 @@ rebalance: goto got_pg; sync_migration = true; - /* - * If compaction is deferred for high-order allocations, it is because - * sync compaction recently failed. In this is the case and the caller - * requested a movable allocation that does not heavily disrupt the - * system then fail the allocation instead of entering direct reclaim. - */ - if ((deferred_compaction || contended_compaction) && - (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE) - goto nopage; + if (is_thp_alloc(gfp_mask, order)) { + /* + * If compaction is deferred for high-order allocations, it is + * because sync compaction recently failed. In this is the case + * and the caller requested a movable allocation that does not + * heavily disrupt the system then fail the allocation instead + * of entering direct reclaim. + */ + if (deferred_compaction || contended_compaction) + goto nopage; + + /* If process is willing to reclaim/compact then wake kswapd */ + wake_all_kswapd(order, zonelist, high_zoneidx, + zone_idx(preferred_zone)); + } /* Try direct reclaim and then allocating */ page = __alloc_pages_direct_reclaim(gfp_mask, order, ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-12 12:19 ` kswapd0: excessive CPU usage Mel Gorman @ 2012-11-12 13:13 ` Zdenek Kabelac 2012-11-12 13:31 ` Mel Gorman 0 siblings, 1 reply; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-12 13:13 UTC (permalink / raw) To: Mel Gorman Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings Dne 12.11.2012 13:19, Mel Gorman napsal(a): > On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >> Hmm, so it's just took longer to hit the problem and observe kswapd0 >> spinning on my CPU again - it's not as endless like before - but >> still it easily eats minutes - it helps to turn off Firefox or TB >> (memory hungry apps) so kswapd0 stops soon - and restart those apps >> again. >> (And I still have like >1GB of cached memory) >> > > I posted a "safe" patch that I believe explains why you are seeing what > you are seeing. It does mean that there will still be some stalls due to > THP because kswapd is not helping and it's avoiding the problem rather > than trying to deal with it. > > Hence, I'm also going to post this patch even though I have not tested > it myself. If you find it fixes the problem then it would be a > preferable patch to the revert. It still is the case that the > balance_pgdat() logic is in sort need of a rethink as it's pretty > twisted right now. > Should I apply them all together for 3.7-rc5 ? 1) https://lkml.org/lkml/2012/11/5/308 2) https://lkml.org/lkml/2012/11/12/113 3) https://lkml.org/lkml/2012/11/12/151 Zdenek ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-12 13:13 ` Zdenek Kabelac @ 2012-11-12 13:31 ` Mel Gorman 2012-11-12 14:50 ` Zdenek Kabelac 2012-11-18 19:00 ` Zdenek Kabelac 0 siblings, 2 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-12 13:31 UTC (permalink / raw) To: Zdenek Kabelac Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: > Dne 12.11.2012 13:19, Mel Gorman napsal(a): > >On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: > >>Hmm, so it's just took longer to hit the problem and observe kswapd0 > >>spinning on my CPU again - it's not as endless like before - but > >>still it easily eats minutes - it helps to turn off Firefox or TB > >>(memory hungry apps) so kswapd0 stops soon - and restart those apps > >>again. > >>(And I still have like >1GB of cached memory) > >> > > > >I posted a "safe" patch that I believe explains why you are seeing what > >you are seeing. It does mean that there will still be some stalls due to > >THP because kswapd is not helping and it's avoiding the problem rather > >than trying to deal with it. > > > >Hence, I'm also going to post this patch even though I have not tested > >it myself. If you find it fixes the problem then it would be a > >preferable patch to the revert. It still is the case that the > >balance_pgdat() logic is in sort need of a rethink as it's pretty > >twisted right now. > > > > > Should I apply them all together for 3.7-rc5 ? > > 1) https://lkml.org/lkml/2012/11/5/308 > 2) https://lkml.org/lkml/2012/11/12/113 > 3) https://lkml.org/lkml/2012/11/12/151 > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but does nothing about THP stalls. 1+3 is a riskier version but depends on me being correct on what the root cause of the problem you see it. If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only have the time to test one combination then it would be preferred that you test the safe option of 1+2. Thanks. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-12 13:31 ` Mel Gorman @ 2012-11-12 14:50 ` Zdenek Kabelac 2012-11-18 19:00 ` Zdenek Kabelac 1 sibling, 0 replies; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-12 14:50 UTC (permalink / raw) To: Mel Gorman Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings Dne 12.11.2012 14:31, Mel Gorman napsal(a): > On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: >> Dne 12.11.2012 13:19, Mel Gorman napsal(a): >>> On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >>>> Hmm, so it's just took longer to hit the problem and observe kswapd0 >>>> spinning on my CPU again - it's not as endless like before - but >>>> still it easily eats minutes - it helps to turn off Firefox or TB >>>> (memory hungry apps) so kswapd0 stops soon - and restart those apps >>>> again. >>>> (And I still have like >1GB of cached memory) >>>> >>> >>> I posted a "safe" patch that I believe explains why you are seeing what >>> you are seeing. It does mean that there will still be some stalls due to >>> THP because kswapd is not helping and it's avoiding the problem rather >>> than trying to deal with it. >>> >>> Hence, I'm also going to post this patch even though I have not tested >>> it myself. If you find it fixes the problem then it would be a >>> preferable patch to the revert. It still is the case that the >>> balance_pgdat() logic is in sort need of a rethink as it's pretty >>> twisted right now. >>> >> >> >> Should I apply them all together for 3.7-rc5 ? >> >> 1) https://lkml.org/lkml/2012/11/5/308 >> 2) https://lkml.org/lkml/2012/11/12/113 >> 3) https://lkml.org/lkml/2012/11/12/151 >> > > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but > does nothing about THP stalls. 1+3 is a riskier version but depends on > me being correct on what the root cause of the problem you see it. > > If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only > have the time to test one combination then it would be preferred that you > test the safe option of 1+2. > > I'll go with 1+2 for couple days - the issue is - I've no idea how it gets suddenly triggered - it seemed to be running fine for 2-3 days even with just 1) - but then kswapd0 started to occupy CPU for minutes. Looks like some intensive workload on firefox (flash) may lead to that. Anyway it's hard to tell quickly if it helped. Zdenek ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-12 13:31 ` Mel Gorman 2012-11-12 14:50 ` Zdenek Kabelac @ 2012-11-18 19:00 ` Zdenek Kabelac 2012-11-18 19:07 ` Jiri Slaby 1 sibling, 1 reply; 52+ messages in thread From: Zdenek Kabelac @ 2012-11-18 19:00 UTC (permalink / raw) To: Mel Gorman Cc: Seth Jennings, Jiri Slaby, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings Dne 12.11.2012 14:31, Mel Gorman napsal(a): > On Mon, Nov 12, 2012 at 02:13:20PM +0100, Zdenek Kabelac wrote: >> Dne 12.11.2012 13:19, Mel Gorman napsal(a): >>> On Sun, Nov 11, 2012 at 10:13:14AM +0100, Zdenek Kabelac wrote: >>>> Hmm, so it's just took longer to hit the problem and observe kswapd0 >>>> spinning on my CPU again - it's not as endless like before - but >>>> still it easily eats minutes - it helps to turn off Firefox or TB >>>> (memory hungry apps) so kswapd0 stops soon - and restart those apps >>>> again. >>>> (And I still have like >1GB of cached memory) >>>> >>> >>> I posted a "safe" patch that I believe explains why you are seeing what >>> you are seeing. It does mean that there will still be some stalls due to >>> THP because kswapd is not helping and it's avoiding the problem rather >>> than trying to deal with it. >>> >>> Hence, I'm also going to post this patch even though I have not tested >>> it myself. If you find it fixes the problem then it would be a >>> preferable patch to the revert. It still is the case that the >>> balance_pgdat() logic is in sort need of a rethink as it's pretty >>> twisted right now. >>> >> >> >> Should I apply them all together for 3.7-rc5 ? >> >> 1) https://lkml.org/lkml/2012/11/5/308 >> 2) https://lkml.org/lkml/2012/11/12/113 >> 3) https://lkml.org/lkml/2012/11/12/151 >> > > Not all together. Test either 1+2 or 1+3. 1+2 is the safer choice but > does nothing about THP stalls. 1+3 is a riskier version but depends on > me being correct on what the root cause of the problem you see it. > > If both 1+2 and 1+3 work for you, I'd choose 1+3 for merging. If you only > have the time to test one combination then it would be preferred that you > test the safe option of 1+2. So I've tested 1+2 for a few days - once I've rebooted for another reason, but today happened this to me (with ~2day uptime) For some reason my machine went ouf of memory and OOM killed firefox and then even whole Xsession. Unsure whether it's related to those 2 patches - but I've never had such OOM failure before. Should I experiment now with 1+3 - or is there newer thing to test ? Zdenek X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [<ffffffff811354e9>] warn_alloc_failed+0xe9/0x140 [<ffffffff81138eda>] __alloc_pages_nodemask+0x7fa/0xa40 [<ffffffff81148fc3>] shmem_getpage_gfp+0x603/0x9d0 [<ffffffff8100a166>] ? native_sched_clock+0x26/0x90 [<ffffffff81149d6f>] shmem_fault+0x4f/0xa0 [<ffffffff812ad69e>] shm_fault+0x1e/0x20 [<ffffffff811571d3>] __do_fault+0x73/0x4d0 [<ffffffff81131640>] ? generic_file_aio_write+0xb0/0x100 [<ffffffff81159d67>] handle_pte_fault+0x97/0x9a0 [<ffffffff810aca4f>] ? __lock_is_held+0x5f/0x90 [<ffffffff81081711>] ? get_parent_ip+0x11/0x50 rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 rsyslogd cpuset=/ mems_allowed=0 Pid: 571, comm: rsyslogd Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [<ffffffff8154dfcb>] dump_header.isra.12+0x78/0x224 [<ffffffff8155b529>] ? sub_preempt_count+0x79/0xd0 [<ffffffff81557842>] ? _raw_spin_unlock_irqrestore+0x42/0x80 [<ffffffff81317c0e>] ? ___ratelimit+0x9e/0x130 [<ffffffff81133ac3>] oom_kill_process+0x1d3/0x330 [<ffffffff81134219>] out_of_memory+0x439/0x4a0 [<ffffffff81139056>] __alloc_pages_nodemask+0x976/0xa40 [<ffffffff811304b5>] ? find_get_page+0x5/0x230 [<ffffffff811322a0>] filemap_fault+0x2d0/0x480 [<ffffffff811571d3>] __do_fault+0x73/0x4d0 [<ffffffff81159d67>] handle_pte_fault+0x97/0x9a0 [<ffffffff810aca4f>] ? __lock_is_held+0x5f/0x90 [<ffffffff81081711>] ? get_parent_ip+0x11/0x50 [<ffffffff8115ae6f>] handle_mm_fault+0x22f/0x2f0 [<ffffffff8155ae7d>] __do_page_fault+0x15d/0x4e0 [<ffffffff8155b529>] ? sub_preempt_count+0x79/0xd0 [<ffffffff815578b5>] ? _raw_spin_unlock+0x35/0x60 [<ffffffff811f8d9c>] ? proc_reg_read+0x8c/0xc0 [<ffffffff815580a3>] ? error_sti+0x5/0x6 [<ffffffff8131f55d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8155b20e>] do_page_fault+0xe/0x10 [<ffffffff81557ea2>] page_fault+0x22/0x30 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 30 CPU 1: hi: 186, btch: 31 usd: 6 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 30 CPU 1: hi: 186, btch: 31 usd: 0 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:43 inactive_file:21 isolated_file:0 unevictable:4 dirty:34 writeback:2 unstable:0 free:20731 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55296kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:92kB inactive_file:52kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:88kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:180 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15508kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:80kB inactive_file:32kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:234 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 900*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55296kB Normal: 452*4kB 363*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17496kB 243783 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553592 pages shared 943414 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 351] 0 351 74685 1679 154 0 0 systemd-journal [ 544] 0 544 5863 107 16 0 0 bluetoothd [ 545] 0 545 88977 725 56 0 0 NetworkManager [ 546] 0 546 30170 158 15 0 0 crond [ 552] 0 552 1879 28 8 0 0 gpm [ 557] 0 557 1092 37 8 0 0 acpid [ 564] 81 564 6361 373 16 0 -900 dbus-daemon [ 566] 0 566 61331 155 22 0 0 rsyslogd [ 567] 498 567 7026 104 19 0 0 avahi-daemon [ 568] 498 568 6994 59 17 0 0 avahi-daemon [ 573] 0 573 1758 33 9 0 0 mcelog [ 578] 0 578 5925 51 16 0 0 atd [ 586] 105 586 121536 4270 56 0 0 polkitd [ 593] 0 593 21967 205 48 0 -900 modem-manager [ 601] 0 601 1087 26 8 0 0 thinkfan [ 619] 0 619 122722 1085 129 0 0 libvirtd [ 630] 32 630 4812 68 13 0 0 rpcbind [ 633] 0 633 20080 199 43 0 -1000 sshd [ 653] 29 653 5905 116 16 0 0 rpc.statd [ 700] 0 700 13173 190 28 0 0 wpa_supplicant [ 719] 0 719 4810 50 14 0 0 rpc.idmapd [ 730] 0 730 28268 36 10 0 0 rpc.rquotad [ 766] 0 766 6030 153 15 0 0 rpc.mountd [ 806] 99 806 3306 45 11 0 0 dnsmasq [ 985] 0 985 21219 150 46 0 0 login [ 988] 0 988 260408 355 48 0 0 console-kit-dae [ 1053] 11641 1053 28706 241 14 0 0 bash [ 1097] 11641 1097 27972 58 10 0 0 startx [ 1125] 11641 1125 3487 48 13 0 0 xinit [ 1126] 11641 1126 80028 35289 154 0 0 X [ 1138] 11641 1138 142989 930 122 0 0 gnome-session [ 1151] 11641 1151 4013 64 12 0 0 dbus-launch [ 1152] 11641 1152 6069 82 17 0 0 dbus-daemon [ 1154] 11641 1154 85449 162 36 0 0 at-spi-bus-laun [ 1158] 11641 1158 6103 116 17 0 0 dbus-daemon [ 1161] 11641 1161 32328 174 33 0 0 at-spi2-registr [ 1172] 11641 1172 4013 65 13 0 0 dbus-launch [ 1173] 11641 1173 6350 265 18 0 0 dbus-daemon [ 1177] 11641 1177 37416 416 29 0 0 gconfd-2 [ 1184] 11641 1184 117556 1203 44 0 0 gnome-keyring-d [ 1185] 11641 1185 224829 2236 177 0 0 gnome-settings- [ 1194] 0 1194 57227 786 46 0 0 upowerd [ 1226] 11641 1226 77392 190 36 0 0 gvfsd [ 1246] 11641 1246 118201 772 90 0 0 pulseaudio [ 1247] 496 1247 41161 59 17 0 0 rtkit-daemon [ 1252] 11641 1252 29494 205 58 0 0 gconf-helper [ 1253] 106 1253 81296 355 46 0 0 colord [ 1257] 11641 1257 59080 1574 60 0 0 openbox [ 1258] 11641 1258 185569 3216 146 0 0 gnome-panel [ 1264] 11641 1264 64102 229 27 0 0 dconf-service [ 1268] 11641 1268 139203 858 116 0 0 gnome-user-shar [ 1269] 11641 1269 268645 27442 334 0 0 pidgin [ 1270] 11641 1270 142642 1064 117 0 0 bluetooth-apple [ 1271] 11641 1271 193218 1775 175 0 0 nm-applet [ 1272] 11641 1272 220194 1810 138 0 0 gnome-sound-app [ 1285] 11641 1285 80914 632 45 0 0 gvfs-udisks2-vo [ 1287] 0 1287 88101 599 41 0 0 udisksd [ 1295] 11641 1295 177162 14140 150 0 0 wnck-applet [ 1297] 11641 1297 281043 3161 199 0 0 clock-applet [ 1299] 11641 1299 142537 1053 120 0 0 cpufreq-applet [ 1302] 11641 1302 141960 986 113 0 0 notification-ar [ 1340] 11641 1340 190026 6265 144 0 0 gnome-terminal [ 1346] 11641 1346 2123 35 10 0 0 gnome-pty-helpe [ 1347] 11641 1347 28719 253 11 0 0 bash [ 1858] 11641 1858 10895 101 27 0 0 xfconfd [ 2052] 11641 2052 28720 255 11 0 0 bash [ 6239] 11641 6239 73437 711 88 0 0 kdeinit4 [ 6240] 11641 6240 83952 717 101 0 0 klauncher [ 6242] 11641 6242 126497 1479 172 0 0 kded4 [ 6244] 11641 6244 2977 48 11 0 0 gam_server [10804] 11641 10804 101320 307 47 0 0 gvfsd-http [12175] 0 12175 27197 32 10 0 0 agetty [12249] 11641 12249 28719 252 14 0 0 bash [14862] 0 14862 51773 344 55 0 0 cupsd [14868] 4 14868 18105 158 39 0 0 cups-polld [16728] 11641 16728 28691 244 12 0 0 bash [16975] 0 16975 9109 253 23 0 -1000 systemd-udevd [17618] 0 17618 8245 87 22 0 0 systemd-logind [ 3133] 11641 3133 43721 132 40 0 0 su [ 3136] 0 3136 28564 139 12 0 0 bash [ 3983] 11641 3983 43722 134 41 0 0 su [ 3986] 0 3986 28564 144 13 0 0 bash [16350] 11641 16350 28691 245 14 0 0 bash [31228] 11641 31228 28691 245 11 0 0 bash [31922] 11641 31922 28719 250 13 0 0 bash [ 2340] 11641 2340 28691 245 15 0 0 bash [12586] 38 12586 7851 150 19 0 0 ntpd [32658] 11641 32658 41192 424 35 0 0 mc [32660] 11641 32660 28692 245 13 0 0 bash [29193] 11641 29193 713846 414344 1614 0 0 firefox [10971] 11641 10971 43722 133 43 0 0 su [10974] 0 10974 28564 132 12 0 0 bash [11343] 0 11343 28497 66 11 0 0 ksmtuned [11387] 11641 11387 28719 254 11 0 0 bash [11450] 11641 11450 28691 246 13 0 0 bash [11576] 11641 11576 43722 133 40 0 0 su [11579] 0 11579 28564 141 13 0 0 bash [12106] 11641 12106 28691 244 12 0 0 bash [12141] 11641 12141 43722 132 44 0 0 su [12144] 0 12144 28564 140 11 0 0 bash [12264] 11641 12264 28691 245 11 0 0 bash [12299] 11641 12299 43721 133 40 0 0 su [12302] 0 12302 28564 137 12 0 0 bash [26024] 11641 26024 28691 245 13 0 0 bash [26083] 11641 26083 28691 245 13 0 0 bash [28235] 11641 28235 43721 132 42 0 0 su [28238] 0 28238 28564 143 13 0 0 bash [29460] 11641 29460 43721 132 42 0 0 su [29463] 0 29463 28564 137 12 0 0 bash [29758] 11641 29758 28720 256 12 0 0 bash [29864] 11641 29864 41916 1153 36 0 0 mc [29866] 11641 29866 28728 257 11 0 0 bash [32750] 0 32750 23164 2994 47 0 0 dhclient [ 323] 0 323 24081 471 48 0 0 sendmail [ 347] 51 347 20347 367 38 0 0 sendmail [ 907] 11641 907 379562 159766 707 0 0 thunderbird [ 6340] 11641 6340 28719 251 12 0 0 bash [ 6790] 11641 6790 80307 620 101 0 0 xfce4-notifyd [ 6844] 0 6844 26669 23 9 0 0 sleep Out of memory: Kill process 29193 (firefox) score 420 or sacrifice child Killed process 29193 (firefox) total-vm:2855384kB, anon-rss:1653868kB, file-rss:3508kB [<ffffffff8115ae6f>] handle_mm_fault+0x22f/0x2f0 [<ffffffff8115b12a>] __get_user_pages+0x12a/0x530 [<ffffffff8115b575>] get_dump_page+0x45/0x60 [<ffffffff811eec6d>] elf_core_dump+0x16bd/0x1960 [<ffffffff811edf86>] ? elf_core_dump+0x9d6/0x1960 [<ffffffff8155b529>] ? sub_preempt_count+0x79/0xd0 [<ffffffff815546ae>] ? mutex_unlock+0xe/0x10 [<ffffffff8118ed63>] ? do_truncate+0x73/0xa0 [<ffffffff811f55a1>] do_coredump+0xa21/0xeb0 [<ffffffff810b22a0>] ? debug_check_no_locks_freed+0xe0/0x170 [<ffffffff810abe8d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8105a961>] get_signal_to_deliver+0x2e1/0x960 [<ffffffff8100236f>] do_signal+0x3f/0x9a0 [<ffffffff81540000>] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [<ffffffff8154b565>] ? is_prefetch.isra.15+0x1a6/0x1fd [<ffffffff815580a3>] ? error_sti+0x5/0x6 [<ffffffff81557cd1>] ? retint_signal+0x11/0x90 [<ffffffff81002d70>] do_notify_resume+0x80/0xb0 [<ffffffff81557d06>] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:8 inactive_file:0 isolated_file:0 unevictable:4 dirty:34 writeback:2 unstable:0 free:20724 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55404kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:0kB inactive_file:28kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:0kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:129 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15364kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:0kB inactive_file:24kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:379 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 896*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55280kB Normal: 403*4kB 377*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17412kB 243733 total pagecache pages rsyslogd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 rsyslogd cpuset=/ mems_allowed=0 Pid: 571, comm: rsyslogd Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [<ffffffff8154dfcb>] dump_header.isra.12+0x78/0x224 [<ffffffff8155b529>] ? sub_preempt_count+0x79/0xd0 [<ffffffff81557842>] ? _raw_spin_unlock_irqrestore+0x42/0x80 [<ffffffff81317c0e>] ? ___ratelimit+0x9e/0x130 [<ffffffff81133ac3>] oom_kill_process+0x1d3/0x330 [<ffffffff81134219>] out_of_memory+0x439/0x4a0 [<ffffffff81139056>] __alloc_pages_nodemask+0x976/0xa40 [<ffffffff811304b5>] ? find_get_page+0x5/0x230 [<ffffffff811322a0>] filemap_fault+0x2d0/0x480 [<ffffffff811571d3>] __do_fault+0x73/0x4d0 [<ffffffff81159d67>] handle_pte_fault+0x97/0x9a0 [<ffffffff810aca4f>] ? __lock_is_held+0x5f/0x90 [<ffffffff81081711>] ? get_parent_ip+0x11/0x50 [<ffffffff8115ae6f>] handle_mm_fault+0x22f/0x2f0 [<ffffffff8155ae7d>] __do_page_fault+0x15d/0x4e0 [<ffffffff815578b5>] ? _raw_spin_unlock+0x35/0x60 [<ffffffff811f8d9c>] ? proc_reg_read+0x8c/0xc0 [<ffffffff815580a3>] ? error_sti+0x5/0x6 [<ffffffff8131f55d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff8155b20e>] do_page_fault+0xe/0x10 [<ffffffff81557ea2>] page_fault+0x22/0x30 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 30 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 46 active_anon:900420 inactive_anon:28835 isolated_anon:0 active_file:0 inactive_file:7 isolated_file:0 unevictable:4 dirty:0 writeback:2 unstable:0 free:20691 slab_reclaimable:8641 slab_unreclaimable:10446 mapped:18325 shmem:243662 pagetables:7705 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55280kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:107924kB active_file:0kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:0kB writeback:0kB mapped:65460kB shmem:943100kB slab_reclaimable:11700kB slab_unreclaimable:8968kB kernel_stack:592kB pagetables:11852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:520 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:15364kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6544kB active_file:0kB inactive_file:16kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:48kB writeback:52kB mapped:6168kB shmem:27952kB slab_reclaimable:22864kB slab_unreclaimable:32800kB kernel_stack:2568kB pagetables:18968kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:571 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 896*4kB 1512*8kB 513*16kB 635*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55280kB Normal: 403*4kB 377*8kB 225*16kB 139*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 17412kB 243733 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553579 pages shared 943538 pages non-shared 1032176 pages RAM 42789 pages reserved 553576 pages shared 943549 pages non-shared [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 351] 0 351 74685 1682 154 0 0 systemd-journal [ 544] 0 544 5863 107 16 0 0 bluetoothd [ 545] 0 545 88977 725 56 0 0 NetworkManager [ 546] 0 546 30170 158 15 0 0 crond [ 552] 0 552 1879 28 8 0 0 gpm [ 557] 0 557 1092 37 8 0 0 acpid [ 564] 81 564 6361 373 16 0 -900 dbus-daemon [ 566] 0 566 61331 155 22 0 0 rsyslogd [ 567] 498 567 7026 104 19 0 0 avahi-daemon [ 568] 498 568 6994 59 17 0 0 avahi-daemon [ 573] 0 573 1758 33 9 0 0 mcelog [ 578] 0 578 5925 51 16 0 0 atd [ 586] 105 586 121536 4270 56 0 0 polkitd [ 593] 0 593 21967 205 48 0 -900 modem-manager [ 601] 0 601 1087 26 8 0 0 thinkfan [ 619] 0 619 122722 1085 129 0 0 libvirtd [ 630] 32 630 4812 68 13 0 0 rpcbind [ 633] 0 633 20080 199 43 0 -1000 sshd [ 653] 29 653 5905 116 16 0 0 rpc.statd [ 700] 0 700 13173 190 28 0 0 wpa_supplicant [ 719] 0 719 4810 50 14 0 0 rpc.idmapd [ 730] 0 730 28268 36 10 0 0 rpc.rquotad [ 766] 0 766 6030 153 15 0 0 rpc.mountd [ 806] 99 806 3306 45 11 0 0 dnsmasq [ 985] 0 985 21219 150 46 0 0 login [ 988] 0 988 260408 355 48 0 0 console-kit-dae [ 1053] 11641 1053 28706 241 14 0 0 bash [ 1097] 11641 1097 27972 58 10 0 0 startx [ 1125] 11641 1125 3487 48 13 0 0 xinit [ 1126] 11641 1126 80028 35379 154 0 0 X [ 1138] 11641 1138 142989 930 122 0 0 gnome-session [ 1151] 11641 1151 4013 64 12 0 0 dbus-launch [ 1152] 11641 1152 6069 82 17 0 0 dbus-daemon [ 1154] 11641 1154 85449 162 36 0 0 at-spi-bus-laun [ 1158] 11641 1158 6103 116 17 0 0 dbus-daemon [ 1161] 11641 1161 32328 174 33 0 0 at-spi2-registr [ 1172] 11641 1172 4013 65 13 0 0 dbus-launch [ 1173] 11641 1173 6350 265 18 0 0 dbus-daemon [ 1177] 11641 1177 37416 416 29 0 0 gconfd-2 [ 1184] 11641 1184 117556 1203 44 0 0 gnome-keyring-d [ 1185] 11641 1185 224829 2236 177 0 0 gnome-settings- [ 1194] 0 1194 57227 786 46 0 0 upowerd [ 1226] 11641 1226 77392 190 36 0 0 gvfsd [ 1246] 11641 1246 118201 772 90 0 0 pulseaudio [ 1247] 496 1247 41161 59 17 0 0 rtkit-daemon [ 1252] 11641 1252 29494 205 58 0 0 gconf-helper [ 1253] 106 1253 81296 355 46 0 0 colord [ 1257] 11641 1257 59080 1574 60 0 0 openbox [ 1258] 11641 1258 185569 3216 146 0 0 gnome-panel [ 1264] 11641 1264 64102 229 27 0 0 dconf-service [ 1268] 11641 1268 139203 858 116 0 0 gnome-user-shar [ 1269] 11641 1269 268645 27442 334 0 0 pidgin [ 1270] 11641 1270 142642 1064 117 0 0 bluetooth-apple [ 1271] 11641 1271 193218 1775 175 0 0 nm-applet [ 1272] 11641 1272 220194 1810 138 0 0 gnome-sound-app [ 1285] 11641 1285 80914 632 45 0 0 gvfs-udisks2-vo [ 1287] 0 1287 88101 599 41 0 0 udisksd [ 1295] 11641 1295 177162 14140 150 0 0 wnck-applet [ 1297] 11641 1297 281043 3161 199 0 0 clock-applet [ 1299] 11641 1299 142537 1051 120 0 0 cpufreq-applet [ 1302] 11641 1302 141960 986 113 0 0 notification-ar [ 1340] 11641 1340 190026 6265 144 0 0 gnome-terminal [ 1346] 11641 1346 2123 35 10 0 0 gnome-pty-helpe [ 1347] 11641 1347 28719 253 11 0 0 bash [ 1858] 11641 1858 10895 101 27 0 0 xfconfd X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 [ 2052] 11641 2052 28720 255 11 0 0 bash [ 6239] 11641 6239 73437 711 88 0 0 kdeinit4 [ 6240] 11641 6240 83952 717 101 0 0 klauncher Call Trace: [<ffffffff811354e9>] warn_alloc_failed+0xe9/0x140 [<ffffffff81138eda>] __alloc_pages_nodemask+0x7fa/0xa40 [<ffffffff81148fc3>] shmem_getpage_gfp+0x603/0x9d0 [<ffffffff8100a166>] ? native_sched_clock+0x26/0x90 [<ffffffff81149d6f>] shmem_fault+0x4f/0xa0 [<ffffffff812ad69e>] shm_fault+0x1e/0x20 [<ffffffff811571d3>] __do_fault+0x73/0x4d0 [<ffffffff81131640>] ? generic_file_aio_write+0xb0/0x100 [<ffffffff81159d67>] handle_pte_fault+0x97/0x9a0 [<ffffffff810aca4f>] ? __lock_is_held+0x5f/0x90 [<ffffffff81081711>] ? get_parent_ip+0x11/0x50 [<ffffffff8115ae6f>] handle_mm_fault+0x22f/0x2f0 [<ffffffff8115b12a>] __get_user_pages+0x12a/0x530 [<ffffffff8115b575>] get_dump_page+0x45/0x60 [<ffffffff811eec6d>] elf_core_dump+0x16bd/0x1960 [<ffffffff811edf86>] ? elf_core_dump+0x9d6/0x1960 [<ffffffff8155b529>] ? sub_preempt_count+0x79/0xd0 [<ffffffff815546ae>] ? mutex_unlock+0xe/0x10 [<ffffffff8118ed63>] ? do_truncate+0x73/0xa0 [<ffffffff811f55a1>] do_coredump+0xa21/0xeb0 [<ffffffff810b22a0>] ? debug_check_no_locks_freed+0xe0/0x170 [<ffffffff810abe8d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8105a961>] get_signal_to_deliver+0x2e1/0x960 [<ffffffff8100236f>] do_signal+0x3f/0x9a0 [<ffffffff81540000>] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [<ffffffff8154b565>] ? is_prefetch.isra.15+0x1a6/0x1fd [<ffffffff815580a3>] ? error_sti+0x5/0x6 [<ffffffff81557cd1>] ? retint_signal+0x11/0x90 [<ffffffff81002d70>] do_notify_resume+0x80/0xb0 [<ffffffff81557d06>] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 14 active_anon:900420 inactive_anon:28978 isolated_anon:0 active_file:22 inactive_file:24 isolated_file:0 unevictable:4 dirty:5 writeback:0 unstable:0 free:20346 slab_reclaimable:8656 slab_unreclaimable:10414 mapped:18437 shmem:243751 pagetables:7717 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55316kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:108408kB active_file:52kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:20kB writeback:0kB mapped:65916kB shmem:943452kB slab_reclaimable:11716kB slab_unreclaimable:8904kB kernel_stack:488kB pagetables:11880kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3103 all_unreclaimable? yes lowmem_reserve[]: 0 0 885 885 Normal free:13948kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6632kB active_file:36kB inactive_file:40kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:0kB writeback:0kB mapped:6160kB shmem:27956kB slab_reclaimable:22908kB slab_unreclaimable:32736kB kernel_stack:2352kB pagetables:18988kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:602 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 883*4kB 1525*8kB 513*16kB 637*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55396kB Normal: 269*4kB 255*8kB 227*16kB 141*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 15996kB 243797 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553637 pages shared 943817 pages non-shared X: page allocation failure: order:0, mode:0x200da Pid: 1126, comm: X Not tainted 3.7.0-rc5-00007-g95e21c5 #100 Call Trace: [<ffffffff811354e9>] warn_alloc_failed+0xe9/0x140 [<ffffffff81138eda>] __alloc_pages_nodemask+0x7fa/0xa40 [<ffffffff81148fc3>] shmem_getpage_gfp+0x603/0x9d0 [<ffffffff8100a166>] ? native_sched_clock+0x26/0x90 [<ffffffff81149d6f>] shmem_fault+0x4f/0xa0 [<ffffffff812ad69e>] shm_fault+0x1e/0x20 [<ffffffff811571d3>] __do_fault+0x73/0x4d0 [<ffffffff81159d67>] handle_pte_fault+0x97/0x9a0 [<ffffffff810aca4f>] ? __lock_is_held+0x5f/0x90 [<ffffffff81081711>] ? get_parent_ip+0x11/0x50 [<ffffffff8115ae6f>] handle_mm_fault+0x22f/0x2f0 [<ffffffff8115b12a>] __get_user_pages+0x12a/0x530 [<ffffffff815578b5>] ? _raw_spin_unlock+0x35/0x60 [<ffffffff8115b575>] get_dump_page+0x45/0x60 [<ffffffff811eec6d>] elf_core_dump+0x16bd/0x1960 [<ffffffff811edf86>] ? elf_core_dump+0x9d6/0x1960 [<ffffffff8155b529>] ? sub_preempt_count+0x79/0xd0 [<ffffffff815546ae>] ? mutex_unlock+0xe/0x10 [<ffffffff8118ed63>] ? do_truncate+0x73/0xa0 [<ffffffff811f55a1>] do_coredump+0xa21/0xeb0 [<ffffffff810b22a0>] ? debug_check_no_locks_freed+0xe0/0x170 [<ffffffff810abe8d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8105a961>] get_signal_to_deliver+0x2e1/0x960 [<ffffffff8100236f>] do_signal+0x3f/0x9a0 [<ffffffff81540000>] ? pci_fixup_msi_k8t_onboard_sound+0x7d/0x97 [<ffffffff8154b565>] ? is_prefetch.isra.15+0x1a6/0x1fd [<ffffffff815580a3>] ? error_sti+0x5/0x6 [<ffffffff81557cd1>] ? retint_signal+0x11/0x90 [<ffffffff81002d70>] do_notify_resume+0x80/0xb0 [<ffffffff81557d06>] retint_signal+0x46/0x90 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 1 CPU 1: hi: 186, btch: 31 usd: 24 active_anon:900420 inactive_anon:28978 isolated_anon:0 active_file:22 inactive_file:24 isolated_file:19 unevictable:4 dirty:5 writeback:0 unstable:0 free:20222 slab_reclaimable:8656 slab_unreclaimable:10414 mapped:18437 shmem:243751 pagetables:7717 bounce:0 free_cma:0 DMA free:12120kB min:272kB low:340kB high:408kB active_anon:2892kB inactive_anon:872kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:1672kB shmem:3596kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2951 3836 3836 DMA32 free:55316kB min:51776kB low:64720kB high:77664kB active_anon:2834992kB inactive_anon:108408kB active_file:52kB inactive_file:56kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3021968kB mlocked:0kB dirty:20kB writeback:0kB mapped:65916kB shmem:943452kB slab_reclaimable:11716kB slab_unreclaimable:8904kB kernel_stack:488kB pagetables:11880kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:3940 all_unreclaimable? yes [ 6242] 11641 6242 126497 1479 172 0 0 kded4 [ 6244] 11641 6244 2977 48 11 0 0 gam_server [10804] 11641 10804 101320 307 47 0 0 gvfsd-http [12175] 0 12175 27197 32 10 0 0 agetty [12249] 11641 12249 28719 252 14 0 0 bash [14862] 0 14862 51773 344 55 0 0 cupsd [14868] 4 14868 18105 158 39 0 0 cups-polld [16728] 11641 16728 28691 244 12 0 0 bash [16975] 0 16975 9109 253 23 0 -1000 systemd-udevd [17618] 0 17618 8245 87 22 0 0 systemd-logind [ 3133] 11641 3133 43721 132 40 0 0 su [ 3136] 0 3136 28564 139 12 0 0 bash [ 3983] 11641 3983 43722 134 41 0 0 su [ 3986] 0 3986 28564 144 13 0 0 bash [16350] 11641 16350 28691 245 14 0 0 bash [31228] 11641 31228 28691 245 11 0 0 bash [31922] 11641 31922 28719 250 13 0 0 bash [ 2340] 11641 2340 28691 245 15 0 0 bash [12586] 38 12586 7851 150 19 0 0 ntpd [32658] 11641 32658 41192 424 35 0 0 mc [32660] 11641 32660 28692 245 13 0 0 bash [10971] 11641 10971 43722 133 43 0 0 su [10974] 0 10974 28564 132 12 0 0 bash [11343] 0 11343 28497 66 11 0 0 ksmtuned [11387] 11641 11387 28719 254 11 0 0 bash [11450] 11641 11450 28691 246 13 0 0 bash [11576] 11641 11576 43722 133 40 0 0 su [11579] 0 11579 28564 141 13 0 0 bash [12106] 11641 12106 28691 244 12 0 0 bash [12141] 11641 12141 43722 132 44 0 0 su [12144] 0 12144 28564 140 11 0 0 bash [12264] 11641 12264 28691 245 11 0 0 bash [12299] 11641 12299 43721 133 40 0 0 su [12302] 0 12302 28564 137 12 0 0 bash [26024] 11641 26024 28691 245 13 0 0 bash [26083] 11641 26083 28691 245 13 0 0 bash [28235] 11641 28235 43721 132 42 0 0 su [28238] 0 28238 28564 143 13 0 0 bash [29460] 11641 29460 43721 132 42 0 0 su [29463] 0 29463 28564 137 12 0 0 bash [29758] 11641 29758 28720 256 12 0 0 bash [29864] 11641 29864 41916 1153 36 0 0 mc [29866] 11641 29866 28728 257 11 0 0 bash [32750] 0 32750 23164 2994 47 0 0 dhclient [ 323] 0 323 24081 471 48 0 0 sendmail [ 347] 51 347 20347 367 38 0 0 sendmail [ 907] 11641 907 379562 159766 707 0 0 thunderbird [ 6340] 11641 6340 28719 251 12 0 0 bash [ 6790] 11641 6790 80307 620 101 0 0 xfce4-notifyd [ 6844] 0 6844 26669 23 9 0 0 sleep Out of memory: Kill process 907 (thunderbird) score 162 or sacrifice child Killed process 907 (thunderbird) total-vm:1518248kB, anon-rss:638476kB, file-rss:588kB lowmem_reserve[]: 0 0 885 885 Normal free:12832kB min:15532kB low:19412kB high:23296kB active_anon:763796kB inactive_anon:6632kB active_file:36kB inactive_file:40kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:906664kB mlocked:16kB dirty:0kB writeback:0kB mapped:6160kB shmem:27956kB slab_reclaimable:22908kB slab_unreclaimable:32736kB kernel_stack:2352kB pagetables:18988kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1742 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 DMA: 2*4kB 2*8kB 2*16kB 1*32kB 2*64kB 1*128kB 2*256kB 2*512kB 2*1024kB 2*2048kB 1*4096kB = 12120kB DMA32: 883*4kB 1525*8kB 513*16kB 637*32kB 109*64kB 8*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 55396kB Normal: 270*4kB 173*8kB 198*16kB 141*32kB 30*64kB 4*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 14880kB 243797 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 1032176 pages RAM 42789 pages reserved 553659 pages shared 937056 pages non-shared SysRq : Emergency Sync Emergency Sync complete SysRq : Emergency Remount R/O ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-18 19:00 ` Zdenek Kabelac @ 2012-11-18 19:07 ` Jiri Slaby 0 siblings, 0 replies; 52+ messages in thread From: Jiri Slaby @ 2012-11-18 19:07 UTC (permalink / raw) To: Zdenek Kabelac Cc: Mel Gorman, Seth Jennings, Jiri Slaby, Valdis.Kletnieks, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On 11/18/2012 08:00 PM, Zdenek Kabelac wrote: > For some reason my machine went ouf of memory and OOM killed > firefox and then even whole Xsession. > > Unsure whether it's related to those 2 patches - but I've never had > such OOM failure before. As I wrote, this would be me: https://lkml.org/lkml/2012/11/15/150 There is no -next tree for Friday which would contain the set already. So for now, it should be enough for you to apply: https://lkml.org/lkml/2012/11/15/95 Or, alternatively, if you use a brand new systemd, it likes to fork bomb using udev. thanks, -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: excessive CPU usage 2012-11-09 4:22 ` kswapd0: excessive CPU usage Seth Jennings 2012-11-09 8:07 ` Zdenek Kabelac @ 2012-11-09 8:40 ` Mel Gorman 1 sibling, 0 replies; 52+ messages in thread From: Mel Gorman @ 2012-11-09 8:40 UTC (permalink / raw) To: Seth Jennings Cc: Jiri Slaby, Zdenek Kabelac, Valdis.Kletnieks, Jiri Slaby, linux-mm, LKML, Andrew Morton, Rik van Riel, Robert Jennings On Thu, Nov 08, 2012 at 10:22:05PM -0600, Seth Jennings wrote: > On 11/02/2012 02:45 PM, Jiri Slaby wrote: > > On 11/02/2012 11:53 AM, Jiri Slaby wrote: > >> On 11/02/2012 11:44 AM, Zdenek Kabelac wrote: > >>>>> Yes, applying this instead of the revert fixes the issue as well. > >>> > >>> I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive > >>> CPU usage - mainly after suspend/resume > >>> > >>> Here is just simple kswapd backtrace from running kernel: > >> > >> Yup, this is what we were seeing with the former patch only too. Try to > >> apply the other one too: > >> https://patchwork.kernel.org/patch/1673231/ > >> > >> For me I would say, it is fixed by the two patches now. I won't be able > >> to report later, since I'm leaving to a conference tomorrow. > > > > Damn it. It recurred right now, with both patches applied. After I > > started a java program which consumed some more memory. Though there are > > still 2 gigs free, kswap is spinning: > > [<ffffffff810b00da>] __cond_resched+0x2a/0x40 > > [<ffffffff811318a0>] shrink_slab+0x1c0/0x2d0 > > [<ffffffff8113478d>] kswapd+0x66d/0xb60 > > [<ffffffff810a25d0>] kthread+0xc0/0xd0 > > [<ffffffff816aa29c>] ret_from_fork+0x7c/0xb0 > > [<ffffffffffffffff>] 0xffffffffffffffff > > I'm also hitting this issue in v3.7-rc4. It appears that the last > release not effected by this issue was v3.3. Bisecting the changes > included for v3.4-rc1 showed that this commit introduced the issue: > > fe2c2a106663130a5ab45cb0e3414b52df2fff0c is the first bad commit > commit fe2c2a106663130a5ab45cb0e3414b52df2fff0c > Author: Rik van Riel <riel@redhat.com> > Date: Wed Mar 21 16:33:51 2012 -0700 > > vmscan: reclaim at order 0 when compaction is enabled > ... > > This is plausible since the issue seems to be in the kswapd + compaction > realm. I've yet to figure out exactly what about this commit results in > kswapd spinning. > > I would be interested if someone can confirm this finding. > I cannot confirm the actual finding as I don't see the same sort of problems. However, this does make sense and was more or less expected. Reclaiming at order-0 would have forced compaction to be used more instead of lumpy reclaim (less CPU usage but greater system distruption that is harder to measure). Shortly after, lumpy reclaim was removed entirely so now larger amounts of CPU time is spent compacting memory that previously would have been reclaimed. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 8:52 kswapd0: wxcessive CPU usage Jiri Slaby 2012-10-11 13:44 ` Valdis.Kletnieks @ 2012-10-11 22:14 ` Andrew Morton 2012-10-11 22:26 ` Jiri Slaby 1 sibling, 1 reply; 52+ messages in thread From: Andrew Morton @ 2012-10-11 22:14 UTC (permalink / raw) To: Jiri Slaby; +Cc: linux-mm, LKML, Jiri Slaby On Thu, 11 Oct 2012 10:52:28 +0200 Jiri Slaby <jslaby@suse.cz> wrote: > with 3.6.0-next-20121008, kswapd0 is spinning my CPU at 100% for 1 > minute or so. If I try to suspend to RAM, this trace appears: > kswapd0 R running task 0 577 2 0x00000000 > 0000000000000000 00000000000000c0 cccccccccccccccd ffff8801c4146800 > ffff8801c4b15c88 ffffffff8116ee05 0000000000003e32 ffff8801c3a79000 > ffff8801c4b15ca8 ffffffff8116fdf8 ffff8801c480f398 ffff8801c3a79000 > Call Trace: > [<ffffffff8116ee05>] ? put_super+0x25/0x40 > [<ffffffff8116fdd4>] ? grab_super_passive+0x24/0xa0 > [<ffffffff8116ff99>] ? prune_super+0x149/0x1b0 > [<ffffffff81131531>] ? shrink_slab+0xa1/0x2d0 > [<ffffffff8113452d>] ? kswapd+0x66d/0xb60 > [<ffffffff81133ec0>] ? try_to_free_pages+0x180/0x180 > [<ffffffff810a2770>] ? kthread+0xc0/0xd0 > [<ffffffff810a26b0>] ? kthread_create_on_node+0x130/0x130 > [<ffffffff816a6c9c>] ? ret_from_fork+0x7c/0x90 > [<ffffffff810a26b0>] ? kthread_create_on_node+0x130/0x130 Could you please do a sysrq-T a few times while it's spinning, to confirm that this trace is consistently the culprit? ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: kswapd0: wxcessive CPU usage 2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton @ 2012-10-11 22:26 ` Jiri Slaby 0 siblings, 0 replies; 52+ messages in thread From: Jiri Slaby @ 2012-10-11 22:26 UTC (permalink / raw) To: Andrew Morton; +Cc: Jiri Slaby, linux-mm, LKML On 10/12/2012 12:14 AM, Andrew Morton wrote: > Could you please do a sysrq-T a few times while it's spinning, to > confirm that this trace is consistently the culprit? For me yes, shrink_slab is in the most of the traces. -- js suse labs ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2012-11-27 11:12 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-11 8:52 kswapd0: wxcessive CPU usage Jiri Slaby 2012-10-11 13:44 ` Valdis.Kletnieks 2012-10-11 15:34 ` Jiri Slaby 2012-10-11 17:56 ` Valdis.Kletnieks 2012-10-11 17:59 ` Jiri Slaby 2012-10-11 18:19 ` Valdis.Kletnieks 2012-10-11 22:08 ` kswapd0: excessive " Jiri Slaby 2012-10-12 12:37 ` Jiri Slaby 2012-10-12 13:57 ` Mel Gorman 2012-10-15 9:54 ` Jiri Slaby 2012-10-15 11:09 ` Mel Gorman 2012-10-29 10:52 ` Thorsten Leemhuis 2012-10-30 19:18 ` Mel Gorman 2012-10-31 11:25 ` Thorsten Leemhuis 2012-10-31 15:04 ` Mel Gorman 2012-11-04 16:36 ` Rik van Riel 2012-11-02 10:44 ` Zdenek Kabelac 2012-11-02 10:53 ` Jiri Slaby 2012-11-02 19:45 ` Jiri Slaby 2012-11-04 11:26 ` Zdenek Kabelac 2012-11-05 14:24 ` [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" Mel Gorman 2012-11-06 10:15 ` Johannes Hirte 2012-11-09 8:36 ` Mel Gorman 2012-11-14 21:43 ` Johannes Hirte 2012-11-09 9:12 ` Mel Gorman 2012-11-09 4:22 ` kswapd0: excessive CPU usage Seth Jennings 2012-11-09 8:07 ` Zdenek Kabelac 2012-11-09 9:06 ` Mel Gorman 2012-11-11 9:13 ` Zdenek Kabelac 2012-11-12 11:37 ` [PATCH] Revert "mm: remove __GFP_NO_KSWAPD" Mel Gorman 2012-11-16 19:14 ` Josh Boyer 2012-11-16 19:51 ` Andrew Morton 2012-11-20 1:43 ` Valdis.Kletnieks 2012-11-16 20:06 ` Mel Gorman 2012-11-20 15:38 ` Josh Boyer 2012-11-20 16:13 ` Bruno Wolff III 2012-11-20 17:43 ` Thorsten Leemhuis 2012-11-23 15:20 ` Thorsten Leemhuis 2012-11-27 11:12 ` Mel Gorman 2012-11-21 15:08 ` Mel Gorman 2012-11-20 9:18 ` Glauber Costa 2012-11-20 20:18 ` Andrew Morton 2012-11-21 8:30 ` Glauber Costa 2012-11-12 12:19 ` kswapd0: excessive CPU usage Mel Gorman 2012-11-12 13:13 ` Zdenek Kabelac 2012-11-12 13:31 ` Mel Gorman 2012-11-12 14:50 ` Zdenek Kabelac 2012-11-18 19:00 ` Zdenek Kabelac 2012-11-18 19:07 ` Jiri Slaby 2012-11-09 8:40 ` Mel Gorman 2012-10-11 22:14 ` kswapd0: wxcessive " Andrew Morton 2012-10-11 22:26 ` Jiri Slaby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).