From: Michal Hocko <mhocko@kernel.org> To: vinayak menon <vinayakm.list@gmail.com> Cc: Vinayak Menon <vinmenon@codeaurora.org>, Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, mgorman@techsingularity.net, vbabka@suse.cz, Rik van Riel <riel@redhat.com>, vdavydov.dev@gmail.com, anton.vorontsov@linaro.org, Minchan Kim <minchan@kernel.org>, shashim@codeaurora.org, "linux-mm@kvack.org" <linux-mm@kvack.org>, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2 v4] mm: vmscan: do not pass reclaimed slab to vmpressure Date: Tue, 7 Feb 2017 09:10:03 +0100 [thread overview] Message-ID: <20170207081002.GB5065@dhcp22.suse.cz> (raw) In-Reply-To: <CAOaiJ-=ovwZ53nqNLRtP=sCY=+4s1-1r_soBXvam42bxDeUdAQ@mail.gmail.com> On Mon 06-02-17 20:40:10, vinayak menon wrote: > On Mon, Feb 6, 2017 at 6:22 PM, Michal Hocko <mhocko@kernel.org> wrote: > > On Mon 06-02-17 17:54:09, Vinayak Menon wrote: > >> During global reclaim, the nr_reclaimed passed to vmpressure includes the > >> pages reclaimed from slab. But the corresponding scanned slab pages is > >> not passed. This can cause total reclaimed pages to be greater than > >> scanned, causing an unsigned underflow in vmpressure resulting in a > >> critical event being sent to root cgroup. > > > > If you switched the ordering then this wouldn't be a problem, right? > > You mean calling vmpressure first and then shrink_slab ? No, I meant the scanned vs. reclaim normalization patch first and than do whatever slab related thing later on. This would have an advantage that we can rule the underflow issue out and only focus on why the slab numbers actually matter. > That also can be done. Would completing shrink_slab before vmpressure > be of any use to a userspace task that takes into account both > vmpressure and reclaimable slab size ? Is this the case in your lowmemmory killer implementation? If yes how does it actually work? [...] > > It would be also more than useful to say how much the slab reclaim > > really contributed. > > The 70% less events is caused by slab reclaim being added to > vmpressure, which is confirmed by running the test with and without > the fix. But it is hard to say the effect on reclaim stats is caused > by this problem because, the lowmemorykiller can be written with > different heuristics to make the reclaim look better. Exactly! And this is why I am not still happy with the current justification of this patch. It seems to be tuning for a particular consumer of vmpressure events. Others might depend on a less pessimistic events because we are making some progress afterall. Being more pessimistic can lead to premature oom or other performance related decisions and that is why I am not happy about that. Btw. could you be more specific about your particular test? What is desired/acceptable result? > The issue we see > in the above reclaim stats is entirely because of task kills being > delayed. That is the reason why I did not include the vmstat stats in > the changelog in the earlier versions. > > > > >> This is a regression introduced by commit 6b4f7799c6a5 ("mm: vmscan: > >> invoke slab shrinkers from shrink_zone()"). > > > > I am not really sure this is a regression, though. Maybe your heuristic > > which consumes events is just too fragile? > > > Yes it could be. A different kind of lowmemorykiller may not show up > this issue at all. In my opinion the regression here is the difference > in vmpressure values and thus the vmpressure events because of passing > slab reclaimed pages to vmpressure without considering the scanned > pages and cost model. > So would it be better to drop the vmstat data from changelog ? No! The main question is whether being more pessimistic and report higher reclaim levels really does make sense even when there is a slab reclaim progress. This hasn't been explained and I _really_ do not like a patch which optimizes for a particular consumer of events. I understand that the change of the behavior is unexpeted and that might be reason to revert to the original one. But if this is the only reasonable way to go I would, at least, like to understand what is going on here. Why cannot your lowmemorykiller cope with the workload? Why starting to kill sooner (at the time when the slab still reclaims enough pages to report lower critical events) helps to pass your test. Maybe it is the implementation of the lmk which needs to be changed because it has some false expectations? Or the memory reclaim just behaves in an unpredictable manner? -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: vinayak menon <vinayakm.list@gmail.com> Cc: Vinayak Menon <vinmenon@codeaurora.org>, Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, mgorman@techsingularity.net, vbabka@suse.cz, Rik van Riel <riel@redhat.com>, vdavydov.dev@gmail.com, anton.vorontsov@linaro.org, Minchan Kim <minchan@kernel.org>, shashim@codeaurora.org, "linux-mm@kvack.org" <linux-mm@kvack.org>, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2 v4] mm: vmscan: do not pass reclaimed slab to vmpressure Date: Tue, 7 Feb 2017 09:10:03 +0100 [thread overview] Message-ID: <20170207081002.GB5065@dhcp22.suse.cz> (raw) In-Reply-To: <CAOaiJ-=ovwZ53nqNLRtP=sCY=+4s1-1r_soBXvam42bxDeUdAQ@mail.gmail.com> On Mon 06-02-17 20:40:10, vinayak menon wrote: > On Mon, Feb 6, 2017 at 6:22 PM, Michal Hocko <mhocko@kernel.org> wrote: > > On Mon 06-02-17 17:54:09, Vinayak Menon wrote: > >> During global reclaim, the nr_reclaimed passed to vmpressure includes the > >> pages reclaimed from slab. But the corresponding scanned slab pages is > >> not passed. This can cause total reclaimed pages to be greater than > >> scanned, causing an unsigned underflow in vmpressure resulting in a > >> critical event being sent to root cgroup. > > > > If you switched the ordering then this wouldn't be a problem, right? > > You mean calling vmpressure first and then shrink_slab ? No, I meant the scanned vs. reclaim normalization patch first and than do whatever slab related thing later on. This would have an advantage that we can rule the underflow issue out and only focus on why the slab numbers actually matter. > That also can be done. Would completing shrink_slab before vmpressure > be of any use to a userspace task that takes into account both > vmpressure and reclaimable slab size ? Is this the case in your lowmemmory killer implementation? If yes how does it actually work? [...] > > It would be also more than useful to say how much the slab reclaim > > really contributed. > > The 70% less events is caused by slab reclaim being added to > vmpressure, which is confirmed by running the test with and without > the fix. But it is hard to say the effect on reclaim stats is caused > by this problem because, the lowmemorykiller can be written with > different heuristics to make the reclaim look better. Exactly! And this is why I am not still happy with the current justification of this patch. It seems to be tuning for a particular consumer of vmpressure events. Others might depend on a less pessimistic events because we are making some progress afterall. Being more pessimistic can lead to premature oom or other performance related decisions and that is why I am not happy about that. Btw. could you be more specific about your particular test? What is desired/acceptable result? > The issue we see > in the above reclaim stats is entirely because of task kills being > delayed. That is the reason why I did not include the vmstat stats in > the changelog in the earlier versions. > > > > >> This is a regression introduced by commit 6b4f7799c6a5 ("mm: vmscan: > >> invoke slab shrinkers from shrink_zone()"). > > > > I am not really sure this is a regression, though. Maybe your heuristic > > which consumes events is just too fragile? > > > Yes it could be. A different kind of lowmemorykiller may not show up > this issue at all. In my opinion the regression here is the difference > in vmpressure values and thus the vmpressure events because of passing > slab reclaimed pages to vmpressure without considering the scanned > pages and cost model. > So would it be better to drop the vmstat data from changelog ? No! The main question is whether being more pessimistic and report higher reclaim levels really does make sense even when there is a slab reclaim progress. This hasn't been explained and I _really_ do not like a patch which optimizes for a particular consumer of events. I understand that the change of the behavior is unexpeted and that might be reason to revert to the original one. But if this is the only reasonable way to go I would, at least, like to understand what is going on here. Why cannot your lowmemorykiller cope with the workload? Why starting to kill sooner (at the time when the slab still reclaims enough pages to report lower critical events) helps to pass your test. Maybe it is the implementation of the lmk which needs to be changed because it has some false expectations? Or the memory reclaim just behaves in an unpredictable manner? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-02-07 8:10 UTC|newest] Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-02-06 12:24 [PATCH 1/2 v4] mm: vmscan: do not pass reclaimed slab to vmpressure Vinayak Menon 2017-02-06 12:24 ` Vinayak Menon 2017-02-06 12:24 ` [PATCH 2/2 RESEND] mm: vmpressure: fix sending wrong events on underflow Vinayak Menon 2017-02-06 12:24 ` Vinayak Menon 2017-02-06 12:40 ` Michal Hocko 2017-02-06 12:40 ` Michal Hocko 2017-02-06 13:09 ` vinayak menon 2017-02-06 13:09 ` vinayak menon 2017-02-06 13:24 ` Michal Hocko 2017-02-06 13:24 ` Michal Hocko 2017-02-06 14:35 ` vinayak menon 2017-02-06 14:35 ` vinayak menon 2017-02-06 15:12 ` Michal Hocko 2017-02-06 15:12 ` Michal Hocko 2017-02-07 11:17 ` vinayak menon 2017-02-07 11:17 ` vinayak menon 2017-02-07 12:09 ` Michal Hocko 2017-02-07 12:09 ` Michal Hocko 2017-02-06 12:52 ` [PATCH 1/2 v4] mm: vmscan: do not pass reclaimed slab to vmpressure Michal Hocko 2017-02-06 12:52 ` Michal Hocko 2017-02-06 15:10 ` vinayak menon 2017-02-06 15:10 ` vinayak menon 2017-02-07 8:10 ` Michal Hocko [this message] 2017-02-07 8:10 ` Michal Hocko 2017-02-07 11:09 ` vinayak menon 2017-02-07 11:09 ` vinayak menon 2017-02-07 12:17 ` Michal Hocko 2017-02-07 12:17 ` Michal Hocko 2017-02-07 13:16 ` vinayak menon 2017-02-07 13:16 ` vinayak menon 2017-02-07 14:52 ` Michal Hocko 2017-02-07 14:52 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20170207081002.GB5065@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=anton.vorontsov@linaro.org \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=minchan@kernel.org \ --cc=riel@redhat.com \ --cc=shashim@codeaurora.org \ --cc=vbabka@suse.cz \ --cc=vdavydov.dev@gmail.com \ --cc=vinayakm.list@gmail.com \ --cc=vinmenon@codeaurora.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.