From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f199.google.com (mail-wj0-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id 56C0C6B0253 for ; Thu, 26 Jan 2017 02:57:58 -0500 (EST) Received: by mail-wj0-f199.google.com with SMTP id an2so37742632wjc.3 for ; Wed, 25 Jan 2017 23:57:58 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b41si952814wrb.307.2017.01.25.23.57.56 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 25 Jan 2017 23:57:56 -0800 (PST) Date: Thu, 26 Jan 2017 08:57:53 +0100 From: Michal Hocko Subject: Re: [PATCH v6] mm: Add memory allocation watchdog kernel thread. Message-ID: <20170126075753.GD8456@dhcp22.suse.cz> References: <1478416501-10104-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20170125181150.GA16398@cmpxchg.org> <20170125184548.GB32041@dhcp22.suse.cz> <20170125192245.GA19321@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170125192245.GA19321@cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Tetsuo Handa , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed 25-01-17 14:22:45, Johannes Weiner wrote: > On Wed, Jan 25, 2017 at 07:45:49PM +0100, Michal Hocko wrote: > > On Wed 25-01-17 13:11:50, Johannes Weiner wrote: > > [...] > > > >From 6420cae52cac8167bd5fb19f45feed2d540bc11d Mon Sep 17 00:00:00 2001 > > > From: Johannes Weiner > > > Date: Wed, 25 Jan 2017 12:57:20 -0500 > > > Subject: [PATCH] mm: page_alloc: __GFP_NOWARN shouldn't suppress stall > > > warnings > > > > > > __GFP_NOWARN, which is usually added to avoid warnings from callsites > > > that expect to fail and have fallbacks, currently also suppresses > > > allocation stall warnings. These trigger when an allocation is stuck > > > inside the allocator for 10 seconds or longer. > > > > > > But there is no class of allocations that can get legitimately stuck > > > in the allocator for this long. This always indicates a problem. > > > > > > Always emit stall warnings. Restrict __GFP_NOWARN to alloc failures. > > > > Tetsuo has already suggested something like this and I didn't really > > like it because it makes the semantic of the flag confusing. The mask > > says to not warn while the kernel log might contain an allocation splat. > > You are right that stalling for 10s seconds means a problem on its own > > but on the other hand I can imagine somebody might really want to have > > clean logs and the last thing we want is to have another gfp flag for > > that purpose. > > I don't think it's confusing. __GFP_NOWARN tells the allocator whether > an allocation failure can be handled or whether it constitutes a bug. > > If we agree that stalling for 10s is a bug, then we should emit the > warnings. Yes, in many cases it would be a bug in the MM. Some of them would be inherent because the allocator doesn't implement any fairness and starvation cannot be ruled out (would that be a bug?). In general, looping/spending a lot of time in kernel can be seen as a bug. We have watchdogs to report those cases and the time has told us that we had to develop ways to silent those lockups because in some cases we couldn't handle them. I am worried we will eventually find cases like that for allocation stalls as well. I might be over sensitive because we have made some mistakes in the gfp flags land already and I would like to prevent more to come. That being said, I will not stand in the way... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org