From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753346AbcAUAB7 (ORCPT ); Wed, 20 Jan 2016 19:01:59 -0500 Received: from mail-pf0-f177.google.com ([209.85.192.177]:35121 "EHLO mail-pf0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752142AbcAUAB4 (ORCPT ); Wed, 20 Jan 2016 19:01:56 -0500 Date: Wed, 20 Jan 2016 16:01:54 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Michal Hocko cc: linux-mm@kvack.org, Tetsuo Handa , LKML Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks In-Reply-To: <20160120094938.GB14187@dhcp22.suse.cz> Message-ID: References: <1452632425-20191-1-git-send-email-mhocko@kernel.org> <1452632425-20191-2-git-send-email-mhocko@kernel.org> <20160113093046.GA28942@dhcp22.suse.cz> <20160114110037.GC29943@dhcp22.suse.cz> <20160115101218.GB14112@dhcp22.suse.cz> <20160120094938.GB14187@dhcp22.suse.cz> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 20 Jan 2016, Michal Hocko wrote: > No, I do not have a specific load in mind. But let's be realistic. There > will _always_ be corner cases where the VM cannot react properly or in a > timely fashion. > Then let's identify it and fix it, like we do with any other bug? I'm 99% certain you are not advocating that human intervention is the ideal solution to prevent lengthy stalls or livelocks. I can't speak for all possible configurations and workloads; the only thing we use sysrq+f for is automated testing of the oom killer itself. It would help to know of any situations when people actually need to use this to solve issues and then fix those issues rather than insisting that this is the ideal solution. > To be honest I really fail to understand your line of argumentation > here. Just that you think that sysrq+f might be not helpful in large > datacenters which you seem to care about, doesn't mean that it is not > helpful in other setups. > This type of message isn't really contributing anything. You don't have a specific load in mind, you can't identify a pending bug that people have complained about, you presumably can't show a testcase that demonstrates how it's required, yet you're arguing that we should keep a debugging tool around because you think somebody somewhere sometime might use it. [ I would imagine that users would be unhappy they have to kill processes already, and would have reported how ridiculous it is that they had to use sysrq+f, but I haven't seen those bug reports. ] I want the VM to be responsive, I don't want it to thrash forever, and I want it to not require root to trigger a sysrq to have the kernel kill a process for the VM to work properly. We either need to fix the issue that causes the unresponsiveness or oom kill processes earlier. This is very simple. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f176.google.com (mail-pf0-f176.google.com [209.85.192.176]) by kanga.kvack.org (Postfix) with ESMTP id BA6E86B0005 for ; Wed, 20 Jan 2016 19:01:56 -0500 (EST) Received: by mail-pf0-f176.google.com with SMTP id 65so12789173pff.2 for ; Wed, 20 Jan 2016 16:01:56 -0800 (PST) Received: from mail-pf0-x234.google.com (mail-pf0-x234.google.com. [2607:f8b0:400e:c00::234]) by mx.google.com with ESMTPS id e29si58370032pfj.102.2016.01.20.16.01.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jan 2016 16:01:56 -0800 (PST) Received: by mail-pf0-x234.google.com with SMTP id e65so12869106pfe.0 for ; Wed, 20 Jan 2016 16:01:55 -0800 (PST) Date: Wed, 20 Jan 2016 16:01:54 -0800 (PST) From: David Rientjes Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks In-Reply-To: <20160120094938.GB14187@dhcp22.suse.cz> Message-ID: References: <1452632425-20191-1-git-send-email-mhocko@kernel.org> <1452632425-20191-2-git-send-email-mhocko@kernel.org> <20160113093046.GA28942@dhcp22.suse.cz> <20160114110037.GC29943@dhcp22.suse.cz> <20160115101218.GB14112@dhcp22.suse.cz> <20160120094938.GB14187@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Tetsuo Handa , LKML On Wed, 20 Jan 2016, Michal Hocko wrote: > No, I do not have a specific load in mind. But let's be realistic. There > will _always_ be corner cases where the VM cannot react properly or in a > timely fashion. > Then let's identify it and fix it, like we do with any other bug? I'm 99% certain you are not advocating that human intervention is the ideal solution to prevent lengthy stalls or livelocks. I can't speak for all possible configurations and workloads; the only thing we use sysrq+f for is automated testing of the oom killer itself. It would help to know of any situations when people actually need to use this to solve issues and then fix those issues rather than insisting that this is the ideal solution. > To be honest I really fail to understand your line of argumentation > here. Just that you think that sysrq+f might be not helpful in large > datacenters which you seem to care about, doesn't mean that it is not > helpful in other setups. > This type of message isn't really contributing anything. You don't have a specific load in mind, you can't identify a pending bug that people have complained about, you presumably can't show a testcase that demonstrates how it's required, yet you're arguing that we should keep a debugging tool around because you think somebody somewhere sometime might use it. [ I would imagine that users would be unhappy they have to kill processes already, and would have reported how ridiculous it is that they had to use sysrq+f, but I haven't seen those bug reports. ] I want the VM to be responsive, I don't want it to thrash forever, and I want it to not require root to trigger a sysrq to have the kernel kill a process for the VM to work properly. We either need to fix the issue that causes the unresponsiveness or oom kill processes earlier. This is very simple. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org