From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934449AbcATJtz (ORCPT ); Wed, 20 Jan 2016 04:49:55 -0500 Received: from mail-wm0-f41.google.com ([74.125.82.41]:37640 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758137AbcATJtk (ORCPT ); Wed, 20 Jan 2016 04:49:40 -0500 Date: Wed, 20 Jan 2016 10:49:38 +0100 From: Michal Hocko To: David Rientjes Cc: linux-mm@kvack.org, Tetsuo Handa , LKML Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks Message-ID: <20160120094938.GB14187@dhcp22.suse.cz> References: <1452632425-20191-1-git-send-email-mhocko@kernel.org> <1452632425-20191-2-git-send-email-mhocko@kernel.org> <20160113093046.GA28942@dhcp22.suse.cz> <20160114110037.GC29943@dhcp22.suse.cz> <20160115101218.GB14112@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 19-01-16 14:57:33, David Rientjes wrote: > On Fri, 15 Jan 2016, Michal Hocko wrote: > > > > I think it's time to kill sysrq+F and I'll send those two patches > > > unless there is a usecase I'm not aware of. > > > > I have described one in the part you haven't quoted here. Let me repeat: > > : Your system might be trashing to the point you are not able to log in > > : and resolve the situation in a reasonable time yet you are still not > > : OOM. sysrq+f is your only choice then. > > > > Could you clarify why it is better to ditch a potentially usefull > > emergency tool rather than to make it work reliably and predictably? > > I'm concerned about your usecase where the kernel requires admin > intervention to resolve such an issue and there is nothing in the VM we > can do to fix it. > > If you have a specific test that demonstrates when your usecase is needed, > please provide it so we can address the issue that it triggers. No, I do not have a specific load in mind. But let's be realistic. There will _always_ be corner cases where the VM cannot react properly or in a timely fashion. > I'd prefer to fix the issue in the VM rather than require human > intervention, especially when we try to keep a very large number of > machines running in our datacenters. It is always preferable to resolve the mm related issue automagically, of course. We should strive for robustness as much as possible but that doesn't mean we should get the only emergency tool out of administrator hands. To be honest I really fail to understand your line of argumentation here. Just that you think that sysrq+f might be not helpful in large datacenters which you seem to care about, doesn't mean that it is not helpful in other setups. Removing the functionality is out of question IMHO so can we please start discussing how to make it more predictable please? -- Michal Hocko SUSE Labs