From: Michal Hocko <mhocko@kernel.org> To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: minchan@kernel.org, ying.huang@intel.com, mgorman@techsingularity.net, vdavydov.dev@gmail.com, hannes@cmpxchg.org, akpm@linux-foundation.org, shakeelb@google.com, gthelen@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock. Date: Wed, 15 Nov 2017 12:51:43 +0100 [thread overview] Message-ID: <20171115115143.yh4xl43w3iteqh35@dhcp22.suse.cz> (raw) In-Reply-To: <201711151958.CBI60413.FHQMtFLFOOSOJV@I-love.SAKURA.ne.jp> On Wed 15-11-17 19:58:09, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > > > When shrinker_rwsem was introduced, it was assumed that > > > register_shrinker()/unregister_shrinker() are really unlikely paths > > > which are called during initialization and tear down. But nowadays, > > > register_shrinker()/unregister_shrinker() might be called regularly. > > > > Please provide some examples. I know your other patch mentions the > > usecase but I guess the two patches should be just squashed together. > > They were squashed together in a draft version at > http://lkml.kernel.org/r/2940c150-577a-30a8-fac3-cf59a49b84b4@I-love.SAKURA.ne.jp . > Since Shakeel suggested me to post the patch for others to review without > parallel register/unregister and SHRINKER_PERMANENT, but I thought that > parallel register/unregister is still helpful (described below), I posted > as two patches. > > > > > > This patch prepares for allowing parallel registration/unregistration > > > of shrinkers. > > > > > > Since do_shrink_slab() can reschedule, we cannot protect shrinker_list > > > using one RCU section. But using atomic_inc()/atomic_dec() for each > > > do_shrink_slab() call will not impact so much. > > > > > > This patch uses polling loop with short sleep for unregister_shrinker() > > > rather than wait_on_atomic_t(), for we can save reader's cost (plain > > > atomic_dec() compared to atomic_dec_and_test()), we can expect that > > > do_shrink_slab() of unregistering shrinker likely returns shortly, and > > > we can avoid khungtaskd warnings when do_shrink_slab() of unregistering > > > shrinker unexpectedly took so long. > > > > I would use wait_event_interruptible in the remove path rather than the > > short sleep loop which is just too ugly. The shrinker walk would then > > just wake_up the sleeper when the ref. count drops to 0. Two > > synchronize_rcu is quite ugly as well, but I was not able to simplify > > them. I will keep thinking. It just sucks how we cannot follow the > > standard rcu list with dynamically allocated structure pattern here. > > I think that Minchan's approach depends on how > > In our production, we have observed that the job loader gets stuck for > 10s of seconds while doing mount operation. It turns out that it was > stuck in register_shrinker() and some unrelated job was under memory > pressure and spending time in shrink_slab(). Our machines have a lot > of shrinkers registered and jobs under memory pressure has to traverse > all of those memcg-aware shrinkers and do affect unrelated jobs which > want to register their own shrinkers. > > is interpreted. If there were 100000 shrinkers and each do_shrink_slab() call > took 1 millisecond, aborting the iteration as soon as rwsem_is_contended() would > help a lot. But if there were 10 shrinkers and each do_shrink_slab() call took > 10 seconds, aborting the iteration as soon as rwsem_is_contended() would help > less. Or, there might be some specific shrinker where its do_shrink_slab() call > takes 100 seconds. In that case, checking rwsem_is_contended() is too lazy. I hope we do not have any shrinker to each that much time. They are not supposed to... But the reality screws our intentions quite often so I cannot really tell nobody is doing crazy stuff. Anyway, I think starting simpler make sense here. We will see later. > Since it is possible for a local unpriviledged user to lockup the system at least > due to mute_trylock(&oom_lock) versus (printk() or schedule_timeout_killable(1)), > I suggest completely eliminating scheduling priority problem (i.e. a very low > scheduling priority thread might take 100 seconds inside some do_shrink_slab() > call) by not relying on an assumption of shortly returning from do_shrink_slab(). > My first patch + my second patch will eliminate relying on such assumption, and > avoid potential khungtaskd warnings. It doesn't, because the priority issues will be still there when anybody can preempt your shrinker for extensive amount of time. So no you are not fixing the problem. You are merely make it less probable and limited only to the removed shrinker. You still do not have any control over what happens while that shrinker is executed, though. Anyway, I do not claim your patch is a wrong approach. It is just quite complex and maybe unnecessarily so for most workloads. Therefore going with a simpler solution should be preferred until we see it insufficient. -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: minchan@kernel.org, ying.huang@intel.com, mgorman@techsingularity.net, vdavydov.dev@gmail.com, hannes@cmpxchg.org, akpm@linux-foundation.org, shakeelb@google.com, gthelen@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock. Date: Wed, 15 Nov 2017 12:51:43 +0100 [thread overview] Message-ID: <20171115115143.yh4xl43w3iteqh35@dhcp22.suse.cz> (raw) In-Reply-To: <201711151958.CBI60413.FHQMtFLFOOSOJV@I-love.SAKURA.ne.jp> On Wed 15-11-17 19:58:09, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Tue 14-11-17 06:37:42, Tetsuo Handa wrote: > > > When shrinker_rwsem was introduced, it was assumed that > > > register_shrinker()/unregister_shrinker() are really unlikely paths > > > which are called during initialization and tear down. But nowadays, > > > register_shrinker()/unregister_shrinker() might be called regularly. > > > > Please provide some examples. I know your other patch mentions the > > usecase but I guess the two patches should be just squashed together. > > They were squashed together in a draft version at > http://lkml.kernel.org/r/2940c150-577a-30a8-fac3-cf59a49b84b4@I-love.SAKURA.ne.jp . > Since Shakeel suggested me to post the patch for others to review without > parallel register/unregister and SHRINKER_PERMANENT, but I thought that > parallel register/unregister is still helpful (described below), I posted > as two patches. > > > > > > This patch prepares for allowing parallel registration/unregistration > > > of shrinkers. > > > > > > Since do_shrink_slab() can reschedule, we cannot protect shrinker_list > > > using one RCU section. But using atomic_inc()/atomic_dec() for each > > > do_shrink_slab() call will not impact so much. > > > > > > This patch uses polling loop with short sleep for unregister_shrinker() > > > rather than wait_on_atomic_t(), for we can save reader's cost (plain > > > atomic_dec() compared to atomic_dec_and_test()), we can expect that > > > do_shrink_slab() of unregistering shrinker likely returns shortly, and > > > we can avoid khungtaskd warnings when do_shrink_slab() of unregistering > > > shrinker unexpectedly took so long. > > > > I would use wait_event_interruptible in the remove path rather than the > > short sleep loop which is just too ugly. The shrinker walk would then > > just wake_up the sleeper when the ref. count drops to 0. Two > > synchronize_rcu is quite ugly as well, but I was not able to simplify > > them. I will keep thinking. It just sucks how we cannot follow the > > standard rcu list with dynamically allocated structure pattern here. > > I think that Minchan's approach depends on how > > In our production, we have observed that the job loader gets stuck for > 10s of seconds while doing mount operation. It turns out that it was > stuck in register_shrinker() and some unrelated job was under memory > pressure and spending time in shrink_slab(). Our machines have a lot > of shrinkers registered and jobs under memory pressure has to traverse > all of those memcg-aware shrinkers and do affect unrelated jobs which > want to register their own shrinkers. > > is interpreted. If there were 100000 shrinkers and each do_shrink_slab() call > took 1 millisecond, aborting the iteration as soon as rwsem_is_contended() would > help a lot. But if there were 10 shrinkers and each do_shrink_slab() call took > 10 seconds, aborting the iteration as soon as rwsem_is_contended() would help > less. Or, there might be some specific shrinker where its do_shrink_slab() call > takes 100 seconds. In that case, checking rwsem_is_contended() is too lazy. I hope we do not have any shrinker to each that much time. They are not supposed to... But the reality screws our intentions quite often so I cannot really tell nobody is doing crazy stuff. Anyway, I think starting simpler make sense here. We will see later. > Since it is possible for a local unpriviledged user to lockup the system at least > due to mute_trylock(&oom_lock) versus (printk() or schedule_timeout_killable(1)), > I suggest completely eliminating scheduling priority problem (i.e. a very low > scheduling priority thread might take 100 seconds inside some do_shrink_slab() > call) by not relying on an assumption of shortly returning from do_shrink_slab(). > My first patch + my second patch will eliminate relying on such assumption, and > avoid potential khungtaskd warnings. It doesn't, because the priority issues will be still there when anybody can preempt your shrinker for extensive amount of time. So no you are not fixing the problem. You are merely make it less probable and limited only to the removed shrinker. You still do not have any control over what happens while that shrinker is executed, though. Anyway, I do not claim your patch is a wrong approach. It is just quite complex and maybe unnecessarily so for most workloads. Therefore going with a simpler solution should be preferred until we see it insufficient. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-11-15 11:51 UTC|newest] Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-11-13 21:37 [PATCH 1/2] mm,vmscan: Kill global shrinker lock Tetsuo Handa 2017-11-13 21:37 ` Tetsuo Handa 2017-11-13 21:37 ` [PATCH 2/2] mm,vmscan: Allow parallel registration/unregistration of shrinkers Tetsuo Handa 2017-11-13 21:37 ` Tetsuo Handa 2017-11-13 22:05 ` [PATCH 1/2] mm,vmscan: Kill global shrinker lock Shakeel Butt 2017-11-13 22:05 ` Shakeel Butt 2017-11-15 0:56 ` Minchan Kim 2017-11-15 0:56 ` Minchan Kim 2017-11-15 6:28 ` Shakeel Butt 2017-11-15 6:28 ` Shakeel Butt 2017-11-16 0:46 ` Minchan Kim 2017-11-16 0:46 ` Minchan Kim 2017-11-16 1:41 ` Shakeel Butt 2017-11-16 1:41 ` Shakeel Butt 2017-11-16 4:50 ` Minchan Kim 2017-11-16 4:50 ` Minchan Kim 2017-11-15 8:56 ` Michal Hocko 2017-11-15 8:56 ` Michal Hocko 2017-11-15 9:18 ` Michal Hocko 2017-11-15 9:18 ` Michal Hocko 2017-11-16 17:44 ` Johannes Weiner 2017-11-16 17:44 ` Johannes Weiner 2017-11-23 23:46 ` Minchan Kim 2017-11-23 23:46 ` Minchan Kim 2017-11-15 9:02 ` Michal Hocko 2017-11-15 9:02 ` Michal Hocko 2017-11-15 10:58 ` Tetsuo Handa 2017-11-15 10:58 ` Tetsuo Handa 2017-11-15 11:51 ` Michal Hocko [this message] 2017-11-15 11:51 ` Michal Hocko 2017-11-16 0:56 ` Minchan Kim 2017-11-16 0:56 ` Minchan Kim 2017-11-15 13:28 ` Johannes Weiner 2017-11-15 13:28 ` Johannes Weiner 2017-11-16 10:56 ` Tetsuo Handa 2017-11-16 10:56 ` Tetsuo Handa 2017-11-15 14:00 ` Johannes Weiner 2017-11-15 14:00 ` Johannes Weiner 2017-11-15 14:11 ` Michal Hocko 2017-11-15 14:11 ` Michal Hocko 2018-01-25 2:04 ` Tetsuo Handa 2018-01-25 2:04 ` Tetsuo Handa 2018-01-25 8:36 ` Michal Hocko 2018-01-25 8:36 ` Michal Hocko 2018-01-25 10:56 ` Tetsuo Handa 2018-01-25 10:56 ` Tetsuo Handa 2018-01-25 11:41 ` Michal Hocko 2018-01-25 11:41 ` Michal Hocko 2018-01-25 22:19 ` Eric Wheeler 2018-01-25 22:19 ` Eric Wheeler 2018-01-26 3:12 ` Tetsuo Handa 2018-01-26 3:12 ` Tetsuo Handa 2018-01-26 10:08 ` Michal Hocko 2018-01-26 10:08 ` Michal Hocko 2017-11-17 17:35 ` Christoph Hellwig 2017-11-17 17:35 ` Christoph Hellwig 2017-11-17 17:41 ` Shakeel Butt 2017-11-17 17:41 ` Shakeel Butt 2017-11-17 17:53 ` Shakeel Butt 2017-11-17 17:53 ` Shakeel Butt 2017-11-17 18:36 ` Christoph Hellwig 2017-11-17 18:36 ` Christoph Hellwig 2017-11-20 9:25 ` Michal Hocko 2017-11-20 9:25 ` Michal Hocko 2017-11-20 9:33 ` Christoph Hellwig 2017-11-20 9:33 ` Christoph Hellwig 2017-11-20 9:42 ` Michal Hocko 2017-11-20 9:42 ` Michal Hocko 2017-11-20 10:41 ` Christoph Hellwig 2017-11-20 10:41 ` Christoph Hellwig 2017-11-20 10:56 ` Tetsuo Handa 2017-11-20 10:56 ` Tetsuo Handa 2017-11-20 18:28 ` Paul E. McKenney 2017-11-20 18:28 ` Paul E. McKenney
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171115115143.yh4xl43w3iteqh35@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=gthelen@google.com \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=minchan@kernel.org \ --cc=penguin-kernel@I-love.SAKURA.ne.jp \ --cc=shakeelb@google.com \ --cc=vdavydov.dev@gmail.com \ --cc=ying.huang@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.