From: Michal Hocko <mhocko@kernel.org> To: Nils Holland <nholland@tisys.org> Cc: Mel Gorman <mgorman@suse.de>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>, linux-btrfs@vger.kernel.org Subject: Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Date: Wed, 28 Dec 2016 09:57:59 +0100 [thread overview] Message-ID: <20161228085759.GD11470@dhcp22.suse.cz> (raw) In-Reply-To: <20161227193308.GA17454@boerne.fritz.box> On Tue 27-12-16 20:33:09, Nils Holland wrote: > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > > Hi, > > could you try to run with the following patch on top of the previous > > one? I do not think it will make a large change in your workload but > > I think we need something like that so some testing under which is known > > to make a high lowmem pressure would be really appreciated. If you have > > more time to play with it then running with and without the patch with > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > > whether it make any difference at all. > > Of course, no problem! > > First, about the events to trace: mm_vmscan_direct_reclaim_start > doesn't seem to exist, but mm_vmscan_direct_reclaim_begin does. I'm > sure that's what you meant and so I took that one instead. yes, sorry about the confusion > Then I have to admit in both cases (once without the latest patch, > once with) very little trace data was actually produced. In the case > without the patch, the reclaim was started more often and reclaimed a > smaller number of pages each time, in the case with the patch it was > invoked less often, and with the last time it was invoked it reclaimed > a rather big number of pages. I have no clue, however, if that > happened "by chance" or if it was actually causes by the patch and > thus an expected change. yes that seems to be a variation of the workload I would say because if anything the patch should reduce the number of scanned pages. > In both cases, my test case was: Reboot, setup logging, do "emerge > firefox" (which unpacks and builds the firefox sources), then, when > the emerge had come so far that the unpacking was done and the > building had started, switch to another console and untar the latest > kernel, libreoffice and (once more) firefox sources there. After that > had completed, I aborted the emerge build process and stopped tracing. > > Here's the trace data captured without the latest patch applied: > > khugepaged-22 [000] .... 566.123383: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .N.. 566.165520: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 > khugepaged-22 [001] .... 587.515424: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 587.596035: mm_vmscan_direct_reclaim_end: nr_reclaimed=1029 > khugepaged-22 [001] .... 599.879536: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 601.000812: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 > khugepaged-22 [001] .... 601.228137: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 601.309952: mm_vmscan_direct_reclaim_end: nr_reclaimed=1081 > khugepaged-22 [001] .... 694.935267: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .N.. 695.081943: mm_vmscan_direct_reclaim_end: nr_reclaimed=1071 > khugepaged-22 [001] .... 701.370707: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 701.372798: mm_vmscan_direct_reclaim_end: nr_reclaimed=1089 > khugepaged-22 [001] .... 764.752036: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 771.047905: mm_vmscan_direct_reclaim_end: nr_reclaimed=1039 > khugepaged-22 [000] .... 781.760515: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 781.826543: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 782.595575: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 782.638591: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 782.930455: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 782.993608: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 783.330378: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 783.369653: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > > And this is the same with the patch applied: > > khugepaged-22 [001] .... 523.599997: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 523.683110: mm_vmscan_direct_reclaim_end: nr_reclaimed=1092 > khugepaged-22 [001] .... 535.345477: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 535.401189: mm_vmscan_direct_reclaim_end: nr_reclaimed=1078 > khugepaged-22 [000] .... 692.876716: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 703.312399: mm_vmscan_direct_reclaim_end: nr_reclaimed=197759 In these cases there is no real difference because this is not the lowmem pressure because those requests can go to the highmem zone. > If my test case and thus the results don't sound good, I could of > course try some other test cases ... like capturing for a longer > period of time or trying to produce more memory pressure by running > more processes at the same time, or something like that. yes, a stronger memory pressure would be needed. I suspect that your original issues was more about active list aging than a really strong memory pressure. So it might be possible that your workload will not notice. If you can collect those two tracepoints over a longer time it can still tell us something but I do not want you to burn a lot of time on this. The main issue seems to be fixed and the follow up fix can wait for a throughout review after both Mel and Johannes are back from holiday. > Besides that I can say that the patch hasn't produced any warnings or > other issues so far, so at first glance, it doesn't seem to hurt > anything. Thanks! -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Nils Holland <nholland@tisys.org> Cc: Mel Gorman <mgorman@suse.de>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>, linux-btrfs@vger.kernel.org Subject: Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Date: Wed, 28 Dec 2016 09:57:59 +0100 [thread overview] Message-ID: <20161228085759.GD11470@dhcp22.suse.cz> (raw) In-Reply-To: <20161227193308.GA17454@boerne.fritz.box> On Tue 27-12-16 20:33:09, Nils Holland wrote: > On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > > Hi, > > could you try to run with the following patch on top of the previous > > one? I do not think it will make a large change in your workload but > > I think we need something like that so some testing under which is known > > to make a high lowmem pressure would be really appreciated. If you have > > more time to play with it then running with and without the patch with > > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > > whether it make any difference at all. > > Of course, no problem! > > First, about the events to trace: mm_vmscan_direct_reclaim_start > doesn't seem to exist, but mm_vmscan_direct_reclaim_begin does. I'm > sure that's what you meant and so I took that one instead. yes, sorry about the confusion > Then I have to admit in both cases (once without the latest patch, > once with) very little trace data was actually produced. In the case > without the patch, the reclaim was started more often and reclaimed a > smaller number of pages each time, in the case with the patch it was > invoked less often, and with the last time it was invoked it reclaimed > a rather big number of pages. I have no clue, however, if that > happened "by chance" or if it was actually causes by the patch and > thus an expected change. yes that seems to be a variation of the workload I would say because if anything the patch should reduce the number of scanned pages. > In both cases, my test case was: Reboot, setup logging, do "emerge > firefox" (which unpacks and builds the firefox sources), then, when > the emerge had come so far that the unpacking was done and the > building had started, switch to another console and untar the latest > kernel, libreoffice and (once more) firefox sources there. After that > had completed, I aborted the emerge build process and stopped tracing. > > Here's the trace data captured without the latest patch applied: > > khugepaged-22 [000] .... 566.123383: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .N.. 566.165520: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 > khugepaged-22 [001] .... 587.515424: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 587.596035: mm_vmscan_direct_reclaim_end: nr_reclaimed=1029 > khugepaged-22 [001] .... 599.879536: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 601.000812: mm_vmscan_direct_reclaim_end: nr_reclaimed=1100 > khugepaged-22 [001] .... 601.228137: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 601.309952: mm_vmscan_direct_reclaim_end: nr_reclaimed=1081 > khugepaged-22 [001] .... 694.935267: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .N.. 695.081943: mm_vmscan_direct_reclaim_end: nr_reclaimed=1071 > khugepaged-22 [001] .... 701.370707: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 701.372798: mm_vmscan_direct_reclaim_end: nr_reclaimed=1089 > khugepaged-22 [001] .... 764.752036: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 771.047905: mm_vmscan_direct_reclaim_end: nr_reclaimed=1039 > khugepaged-22 [000] .... 781.760515: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 781.826543: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 782.595575: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [000] .... 782.638591: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 782.930455: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 782.993608: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > khugepaged-22 [001] .... 783.330378: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 783.369653: mm_vmscan_direct_reclaim_end: nr_reclaimed=1040 > > And this is the same with the patch applied: > > khugepaged-22 [001] .... 523.599997: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 523.683110: mm_vmscan_direct_reclaim_end: nr_reclaimed=1092 > khugepaged-22 [001] .... 535.345477: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 535.401189: mm_vmscan_direct_reclaim_end: nr_reclaimed=1078 > khugepaged-22 [000] .... 692.876716: mm_vmscan_direct_reclaim_begin: order=9 may_writepage=1 gfp_flags=GFP_TRANSHUGE classzone_idx=3 > khugepaged-22 [001] .... 703.312399: mm_vmscan_direct_reclaim_end: nr_reclaimed=197759 In these cases there is no real difference because this is not the lowmem pressure because those requests can go to the highmem zone. > If my test case and thus the results don't sound good, I could of > course try some other test cases ... like capturing for a longer > period of time or trying to produce more memory pressure by running > more processes at the same time, or something like that. yes, a stronger memory pressure would be needed. I suspect that your original issues was more about active list aging than a really strong memory pressure. So it might be possible that your workload will not notice. If you can collect those two tracepoints over a longer time it can still tell us something but I do not want you to burn a lot of time on this. The main issue seems to be fixed and the follow up fix can wait for a throughout review after both Mel and Johannes are back from holiday. > Besides that I can say that the patch hasn't produced any warnings or > other issues so far, so at first glance, it doesn't seem to hurt > anything. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-28 8:58 UTC|newest] Thread overview: 123+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland 2016-12-16 7:39 ` Michal Hocko 2016-12-16 7:39 ` Michal Hocko 2016-12-16 15:58 ` OOM: Better, but still there on Michal Hocko 2016-12-16 15:58 ` Michal Hocko 2016-12-16 15:58 ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko 2016-12-16 15:58 ` Michal Hocko 2016-12-16 15:58 ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko 2016-12-16 15:58 ` Michal Hocko 2016-12-16 17:31 ` Johannes Weiner 2016-12-16 17:31 ` Johannes Weiner 2016-12-16 22:12 ` Michal Hocko 2016-12-16 22:12 ` Michal Hocko 2016-12-17 11:17 ` Tetsuo Handa 2016-12-17 11:17 ` Tetsuo Handa 2016-12-18 16:37 ` Michal Hocko 2016-12-18 16:37 ` Michal Hocko 2016-12-16 18:47 ` OOM: Better, but still there on Nils Holland 2016-12-16 18:47 ` Nils Holland 2016-12-17 0:02 ` Michal Hocko 2016-12-17 0:02 ` Michal Hocko 2016-12-17 12:59 ` Nils Holland 2016-12-17 12:59 ` Nils Holland 2016-12-17 14:44 ` Tetsuo Handa 2016-12-17 14:44 ` Tetsuo Handa 2016-12-17 17:11 ` Nils Holland 2016-12-17 17:11 ` Nils Holland 2016-12-17 21:06 ` Nils Holland 2016-12-17 21:06 ` Nils Holland 2016-12-18 5:14 ` Tetsuo Handa 2016-12-18 5:14 ` Tetsuo Handa 2016-12-19 13:45 ` Michal Hocko 2016-12-19 13:45 ` Michal Hocko 2016-12-20 2:08 ` Nils Holland 2016-12-20 2:08 ` Nils Holland 2016-12-21 7:36 ` Michal Hocko 2016-12-21 7:36 ` Michal Hocko 2016-12-21 11:00 ` Tetsuo Handa 2016-12-21 11:00 ` Tetsuo Handa 2016-12-21 11:16 ` Michal Hocko 2016-12-21 11:16 ` Michal Hocko 2016-12-21 14:04 ` Chris Mason 2016-12-21 14:04 ` Chris Mason 2016-12-22 10:10 ` Nils Holland 2016-12-22 10:10 ` Nils Holland 2016-12-22 10:27 ` Michal Hocko 2016-12-22 10:27 ` Michal Hocko 2016-12-22 10:35 ` Nils Holland 2016-12-22 10:35 ` Nils Holland 2016-12-22 10:46 ` Tetsuo Handa 2016-12-22 10:46 ` Tetsuo Handa 2016-12-22 19:17 ` Michal Hocko 2016-12-22 19:17 ` Michal Hocko 2016-12-22 21:46 ` Nils Holland 2016-12-22 21:46 ` Nils Holland 2016-12-23 10:51 ` Michal Hocko 2016-12-23 10:51 ` Michal Hocko 2016-12-23 12:18 ` Nils Holland 2016-12-23 12:18 ` Nils Holland 2016-12-23 12:57 ` Michal Hocko 2016-12-23 12:57 ` Michal Hocko 2016-12-23 14:47 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko 2016-12-23 14:47 ` Michal Hocko 2016-12-23 22:26 ` Nils Holland 2016-12-23 22:26 ` Nils Holland 2016-12-26 12:48 ` Michal Hocko 2016-12-26 12:48 ` Michal Hocko 2016-12-26 18:57 ` Nils Holland 2016-12-26 18:57 ` Nils Holland 2016-12-27 8:08 ` Michal Hocko 2016-12-27 8:08 ` Michal Hocko 2016-12-27 11:23 ` Nils Holland 2016-12-27 11:23 ` Nils Holland 2016-12-27 11:27 ` Michal Hocko 2016-12-27 11:27 ` Michal Hocko 2016-12-27 15:55 ` Michal Hocko 2016-12-27 15:55 ` Michal Hocko 2016-12-27 16:28 ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot 2016-12-28 8:51 ` Michal Hocko 2016-12-28 8:51 ` Michal Hocko 2016-12-27 19:33 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland 2016-12-27 19:33 ` Nils Holland 2016-12-28 8:57 ` Michal Hocko [this message] 2016-12-28 8:57 ` Michal Hocko 2016-12-29 1:20 ` Minchan Kim 2016-12-29 1:20 ` Minchan Kim 2016-12-29 9:04 ` Michal Hocko 2016-12-29 9:04 ` Michal Hocko 2016-12-30 2:05 ` Minchan Kim 2016-12-30 2:05 ` Minchan Kim 2016-12-30 10:40 ` Michal Hocko 2016-12-30 10:40 ` Michal Hocko 2016-12-29 0:31 ` Minchan Kim 2016-12-29 0:31 ` Minchan Kim 2016-12-29 0:48 ` Minchan Kim 2016-12-29 0:48 ` Minchan Kim 2016-12-29 8:52 ` Michal Hocko 2016-12-29 8:52 ` Michal Hocko 2016-12-30 10:19 ` Mel Gorman 2016-12-30 10:19 ` Mel Gorman 2016-12-30 11:05 ` Michal Hocko 2016-12-30 11:05 ` Michal Hocko 2016-12-30 12:43 ` Mel Gorman 2016-12-30 12:43 ` Mel Gorman 2016-12-25 22:25 ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot 2016-12-25 22:25 ` kernel test robot 2016-12-26 12:26 ` Michal Hocko 2016-12-26 12:26 ` Michal Hocko 2016-12-26 12:26 ` Michal Hocko 2016-12-26 12:50 ` Michal Hocko 2016-12-26 12:50 ` Michal Hocko 2016-12-26 12:50 ` Michal Hocko 2016-12-18 0:28 ` OOM: Better, but still there on Xin Zhou 2016-12-16 18:15 ` OOM: Better, but still there on 4.9 Chris Mason 2016-12-16 18:15 ` Chris Mason 2016-12-16 22:14 ` Michal Hocko 2016-12-16 22:14 ` Michal Hocko 2016-12-16 22:47 ` Chris Mason 2016-12-16 22:47 ` Chris Mason 2016-12-16 23:31 ` Michal Hocko 2016-12-16 23:31 ` Michal Hocko 2016-12-16 19:50 ` Chris Mason 2016-12-16 19:50 ` Chris Mason
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20161228085759.GD11470@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=clm@fb.com \ --cc=dsterba@suse.cz \ --cc=hannes@cmpxchg.org \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=nholland@tisys.org \ --cc=penguin-kernel@I-love.SAKURA.ne.jp \ --cc=vdavydov.dev@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.