From: Michal Hocko <mhocko@kernel.org> To: David Rientjes <rientjes@google.com> Cc: linux-mm@kvack.org, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks Date: Thu, 21 Jan 2016 10:15:15 +0100 [thread overview] Message-ID: <20160121091515.GC29520@dhcp22.suse.cz> (raw) In-Reply-To: <alpine.DEB.2.10.1601201550060.18155@chino.kir.corp.google.com> On Wed 20-01-16 16:01:54, David Rientjes wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > > > No, I do not have a specific load in mind. But let's be realistic. There > > will _always_ be corner cases where the VM cannot react properly or in a > > timely fashion. > > > > Then let's identify it and fix it, like we do with any other bug? I'm 99% > certain you are not advocating that human intervention is the ideal > solution to prevent lengthy stalls or livelocks. I didn't claim that! Please read what I have written. I consider sysrq+f as a _last resort_ emergency tool when the system doesn't behave in the expected way. > I can't speak for all possible configurations and workloads; the only > thing we use sysrq+f for is automated testing of the oom killer itself. That is your use case and it is not the one why the this functionality has been introduced. This is _not a debuggin_ tool. Back in 2005 it has been added precisely to allow for an immediate intervention while the system was trashing heavily. > It would help to know of any situations when people actually need to use > this to solve issues and then fix those issues rather than insisting that > this is the ideal solution. I fully agree that such an issues should be investigated and fixed. That is nothing against having the emergency tool and allow the admin to intervene right away when it happens. > > To be honest I really fail to understand your line of argumentation > > here. Just that you think that sysrq+f might be not helpful in large > > datacenters which you seem to care about, doesn't mean that it is not > > helpful in other setups. > > > > This type of message isn't really contributing anything. You don't have a > specific load in mind, you can't identify a pending bug that people have > complained about, you presumably can't show a testcase that demonstrates > how it's required, yet you're arguing that we should keep a debugging tool > around because you think somebody somewhere sometime might use it. Look, I am getting tired of this discussion. You seem to completely ignore the emergency aspect of sysrq+f just because it doesn't seem to fit in _your_ particular usecase. I have seen admins using sysrq+f when a large application got crazy and started trashing to the point when even ssh to the machine took ages and sysrq+f over serial console was the only deterministic way to make the system usable. Such things are still real. Just look at linux-mm ML (just off hand http://lkml.kernel.org/r/20151221123557.GE3060%40orkisz). You can argue we should fix them, and I agree but swap/page cache trashing are real for ages and those are hard problems and very likely to be with us for some more. Until our MM subsystem and all others that might interfere are perfect we need a sledge hammer. And if we have a hammer then we should really make sure it hits something when used rather than hitting the thin air. The patch proposed here doesn't make the code more complicated or harder to maintain. It even doesn't have any side effects outside of sysrq+f triggered OOM. Your only argument so far was: " : It certainly would get TIF_MEMDIE set if it needs to allocate memory : itself and it calls the oom killer. That doesn't mean that we should : kill a different process, though, when the killed process should exit : and free its memory. So NACK to the fatal_signal_pending() check here. " And that argument is fundamentally broken because killed process is not guaranteed to exit and free its memory. Moreover sysrq+f is by definition an async action which might race by passing killed task and that should deactivate it. The race is quite unlikely but emergency tools should be as robust/reliable as possible. You also have ignored my question about what kind of regression would such a change cause. -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: David Rientjes <rientjes@google.com> Cc: linux-mm@kvack.org, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks Date: Thu, 21 Jan 2016 10:15:15 +0100 [thread overview] Message-ID: <20160121091515.GC29520@dhcp22.suse.cz> (raw) In-Reply-To: <alpine.DEB.2.10.1601201550060.18155@chino.kir.corp.google.com> On Wed 20-01-16 16:01:54, David Rientjes wrote: > On Wed, 20 Jan 2016, Michal Hocko wrote: > > > No, I do not have a specific load in mind. But let's be realistic. There > > will _always_ be corner cases where the VM cannot react properly or in a > > timely fashion. > > > > Then let's identify it and fix it, like we do with any other bug? I'm 99% > certain you are not advocating that human intervention is the ideal > solution to prevent lengthy stalls or livelocks. I didn't claim that! Please read what I have written. I consider sysrq+f as a _last resort_ emergency tool when the system doesn't behave in the expected way. > I can't speak for all possible configurations and workloads; the only > thing we use sysrq+f for is automated testing of the oom killer itself. That is your use case and it is not the one why the this functionality has been introduced. This is _not a debuggin_ tool. Back in 2005 it has been added precisely to allow for an immediate intervention while the system was trashing heavily. > It would help to know of any situations when people actually need to use > this to solve issues and then fix those issues rather than insisting that > this is the ideal solution. I fully agree that such an issues should be investigated and fixed. That is nothing against having the emergency tool and allow the admin to intervene right away when it happens. > > To be honest I really fail to understand your line of argumentation > > here. Just that you think that sysrq+f might be not helpful in large > > datacenters which you seem to care about, doesn't mean that it is not > > helpful in other setups. > > > > This type of message isn't really contributing anything. You don't have a > specific load in mind, you can't identify a pending bug that people have > complained about, you presumably can't show a testcase that demonstrates > how it's required, yet you're arguing that we should keep a debugging tool > around because you think somebody somewhere sometime might use it. Look, I am getting tired of this discussion. You seem to completely ignore the emergency aspect of sysrq+f just because it doesn't seem to fit in _your_ particular usecase. I have seen admins using sysrq+f when a large application got crazy and started trashing to the point when even ssh to the machine took ages and sysrq+f over serial console was the only deterministic way to make the system usable. Such things are still real. Just look at linux-mm ML (just off hand http://lkml.kernel.org/r/20151221123557.GE3060%40orkisz). You can argue we should fix them, and I agree but swap/page cache trashing are real for ages and those are hard problems and very likely to be with us for some more. Until our MM subsystem and all others that might interfere are perfect we need a sledge hammer. And if we have a hammer then we should really make sure it hits something when used rather than hitting the thin air. The patch proposed here doesn't make the code more complicated or harder to maintain. It even doesn't have any side effects outside of sysrq+f triggered OOM. Your only argument so far was: " : It certainly would get TIF_MEMDIE set if it needs to allocate memory : itself and it calls the oom killer. That doesn't mean that we should : kill a different process, though, when the killed process should exit : and free its memory. So NACK to the fatal_signal_pending() check here. " And that argument is fundamentally broken because killed process is not guaranteed to exit and free its memory. Moreover sysrq+f is by definition an async action which might race by passing killed task and that should deactivate it. The race is quite unlikely but emergency tools should be as robust/reliable as possible. You also have ignored my question about what kind of regression would such a change cause. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-01-21 9:15 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-01-12 21:00 [RFC 0/3] oom: few enahancements Michal Hocko 2016-01-12 21:00 ` Michal Hocko 2016-01-12 21:00 ` [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks Michal Hocko 2016-01-12 21:00 ` Michal Hocko 2016-01-13 0:41 ` David Rientjes 2016-01-13 0:41 ` David Rientjes 2016-01-13 9:30 ` Michal Hocko 2016-01-13 9:30 ` Michal Hocko 2016-01-14 0:38 ` David Rientjes 2016-01-14 0:38 ` David Rientjes 2016-01-14 11:00 ` Michal Hocko 2016-01-14 11:00 ` Michal Hocko 2016-01-14 21:51 ` David Rientjes 2016-01-14 21:51 ` David Rientjes 2016-01-15 10:12 ` Michal Hocko 2016-01-15 10:12 ` Michal Hocko 2016-01-15 15:37 ` One Thousand Gnomes 2016-01-15 15:37 ` One Thousand Gnomes 2016-01-19 23:01 ` David Rientjes 2016-01-19 23:01 ` David Rientjes 2016-01-19 22:57 ` David Rientjes 2016-01-19 22:57 ` David Rientjes 2016-01-20 9:49 ` Michal Hocko 2016-01-20 9:49 ` Michal Hocko 2016-01-21 0:01 ` David Rientjes 2016-01-21 0:01 ` David Rientjes 2016-01-21 9:15 ` Michal Hocko [this message] 2016-01-21 9:15 ` Michal Hocko 2016-01-12 21:00 ` [RFC 2/3] oom: Do not sacrifice already OOM killed children Michal Hocko 2016-01-12 21:00 ` Michal Hocko 2016-01-13 0:45 ` David Rientjes 2016-01-13 0:45 ` David Rientjes 2016-01-13 9:36 ` Michal Hocko 2016-01-13 9:36 ` Michal Hocko 2016-01-14 0:42 ` David Rientjes 2016-01-14 0:42 ` David Rientjes 2016-01-12 21:00 ` [RFC 3/3] oom: Do not try to sacrifice small children Michal Hocko 2016-01-12 21:00 ` Michal Hocko 2016-01-13 0:51 ` David Rientjes 2016-01-13 0:51 ` David Rientjes 2016-01-13 9:40 ` Michal Hocko 2016-01-13 9:40 ` Michal Hocko 2016-01-14 0:43 ` David Rientjes 2016-01-14 0:43 ` David Rientjes
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160121091515.GC29520@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=penguin-kernel@i-love.sakura.ne.jp \ --cc=rientjes@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.