From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S934449AbcATJtz (ORCPT <rfc822;w@1wt.eu>);
	Wed, 20 Jan 2016 04:49:55 -0500
Received: from mail-wm0-f41.google.com ([74.125.82.41]:37640 "EHLO
	mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1758137AbcATJtk (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 20 Jan 2016 04:49:40 -0500
Date: Wed, 20 Jan 2016 10:49:38 +0100
From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks
Message-ID: <20160120094938.GB14187@dhcp22.suse.cz>
References: <1452632425-20191-1-git-send-email-mhocko@kernel.org>
 <1452632425-20191-2-git-send-email-mhocko@kernel.org>
 <alpine.DEB.2.10.1601121639450.28831@chino.kir.corp.google.com>
 <20160113093046.GA28942@dhcp22.suse.cz>
 <alpine.DEB.2.10.1601131633550.3406@chino.kir.corp.google.com>
 <20160114110037.GC29943@dhcp22.suse.cz>
 <alpine.DEB.2.10.1601141347220.16227@chino.kir.corp.google.com>
 <20160115101218.GB14112@dhcp22.suse.cz>
 <alpine.DEB.2.10.1601191454160.7346@chino.kir.corp.google.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.10.1601191454160.7346@chino.kir.corp.google.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue 19-01-16 14:57:33, David Rientjes wrote:
> On Fri, 15 Jan 2016, Michal Hocko wrote:
> 
> > > I think it's time to kill sysrq+F and I'll send those two patches
> > > unless there is a usecase I'm not aware of.
> > 
> > I have described one in the part you haven't quoted here. Let me repeat:
> > : Your system might be trashing to the point you are not able to log in
> > : and resolve the situation in a reasonable time yet you are still not
> > : OOM. sysrq+f is your only choice then.
> > 
> > Could you clarify why it is better to ditch a potentially usefull
> > emergency tool rather than to make it work reliably and predictably?
> 
> I'm concerned about your usecase where the kernel requires admin 
> intervention to resolve such an issue and there is nothing in the VM we 
> can do to fix it.
> 
> If you have a specific test that demonstrates when your usecase is needed, 
> please provide it so we can address the issue that it triggers.

No, I do not have a specific load in mind. But let's be realistic. There
will _always_ be corner cases where the VM cannot react properly or in a
timely fashion.

> I'd prefer to fix the issue in the VM rather than require human
> intervention, especially when we try to keep a very large number of
> machines running in our datacenters.

It is always preferable to resolve the mm related issue automagically,
of course. We should strive for robustness as much as possible but that
doesn't mean we should get the only emergency tool out of administrator
hands.

To be honest I really fail to understand your line of argumentation
here. Just that you think that sysrq+f might be not helpful in large
datacenters which you seem to care about, doesn't mean that it is not
helpful in other setups.

Removing the functionality is out of question IMHO so can we please
start discussing how to make it more predictable please?
-- 
Michal Hocko
SUSE Labs