From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752513AbdAYXon (ORCPT ); Wed, 25 Jan 2017 18:44:43 -0500 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:60057 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750877AbdAYXom (ORCPT ); Wed, 25 Jan 2017 18:44:42 -0500 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.249.26 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Thu, 26 Jan 2017 08:44:38 +0900 From: Minchan Kim To: Johannes Weiner CC: Tetsuo Handa , , , Subject: Re: [PATCH v6] mm: Add memory allocation watchdog kernel thread. Message-ID: <20170125234438.GA20953@bbox> References: <1478416501-10104-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <20170125181150.GA16398@cmpxchg.org> MIME-Version: 1.0 In-Reply-To: <20170125181150.GA16398@cmpxchg.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB04/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2017/01/26 08:44:38, Serialize by Router on LGEKRMHUB04/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2017/01/26 08:44:38, Serialize complete at 2017/01/26 08:44:38 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 25, 2017 at 01:11:50PM -0500, Johannes Weiner wrote: > On Sun, Nov 06, 2016 at 04:15:01PM +0900, Tetsuo Handa wrote: > > +- Why need to use it? > > + > > +Currently, when something went wrong inside memory allocation request, > > +the system might stall without any kernel messages. > > + > > +Although there is khungtaskd kernel thread as an asynchronous monitoring > > +approach, khungtaskd kernel thread is not always helpful because memory > > +allocating tasks unlikely sleep in uninterruptible state for > > +/proc/sys/kernel/hung_task_timeout_secs seconds. > > + > > +Although there is warn_alloc() as a synchronous monitoring approach > > +which emits > > + > > + "%s: page allocation stalls for %ums, order:%u, mode:%#x(%pGg)\n" > > + > > +line, warn_alloc() is not bullet proof because allocating tasks can get > > +stuck before calling warn_alloc() and/or allocating tasks are using > > +__GFP_NOWARN flag and/or such lines are suppressed by ratelimiting and/or > > +such lines are corrupted due to collisions. > > I'm not fully convinced by this explanation. Do you have a real life > example where the warn_alloc() stall info is not enough? If yes, this > should be included here and in the changelog. If not, the extra code, > the task_struct overhead etc. don't seem justified. > > __GFP_NOWARN shouldn't suppress stall warnings, IMO. It's for whether > the caller expects allocation failure and is prepared to handle it; an > allocation stalling out for 10s is an issue regardless of the callsite. > > --- > > From 6420cae52cac8167bd5fb19f45feed2d540bc11d Mon Sep 17 00:00:00 2001 > From: Johannes Weiner > Date: Wed, 25 Jan 2017 12:57:20 -0500 > Subject: [PATCH] mm: page_alloc: __GFP_NOWARN shouldn't suppress stall > warnings > > __GFP_NOWARN, which is usually added to avoid warnings from callsites > that expect to fail and have fallbacks, currently also suppresses > allocation stall warnings. These trigger when an allocation is stuck > inside the allocator for 10 seconds or longer. > > But there is no class of allocations that can get legitimately stuck > in the allocator for this long. This always indicates a problem. > > Always emit stall warnings. Restrict __GFP_NOWARN to alloc failures. > > Signed-off-by: Johannes Weiner Acked-by: Minchan Kim