From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:60656 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1765270AbcLVK13 (ORCPT ); Thu, 22 Dec 2016 05:27:29 -0500 Date: Thu, 22 Dec 2016 11:27:25 +0100 From: Michal Hocko To: Nils Holland Cc: Tetsuo Handa , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Mason , David Sterba , linux-btrfs@vger.kernel.org Subject: Re: OOM: Better, but still there on Message-ID: <20161222102725.GG6048@dhcp22.suse.cz> References: <20161216155808.12809-1-mhocko@kernel.org> <20161216184655.GA5664@boerne.fritz.box> <20161217000203.GC23392@dhcp22.suse.cz> <20161217125950.GA3321@boerne.fritz.box> <862a1ada-17f1-9cff-c89b-46c47432e89f@I-love.SAKURA.ne.jp> <20161217210646.GA11358@boerne.fritz.box> <20161219134534.GC5164@dhcp22.suse.cz> <20161220020829.GA5449@boerne.fritz.box> <20161221073658.GC16502@dhcp22.suse.cz> <20161222101028.GA11105@ppc-nas.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20161222101028.GA11105@ppc-nas.fritz.box> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu 22-12-16 11:10:29, Nils Holland wrote: > On Wed, Dec 21, 2016 at 08:36:59AM +0100, Michal Hocko wrote: > > TL;DR > > there is another version of the debugging patch. Just revert the > > previous one and apply this one instead. It's still not clear what > > is going on but I suspect either some misaccounting or unexpeted > > pages on the LRU lists. I have added one more tracepoint, so please > > enable also mm_vmscan_inactive_list_is_low. > > Right, I did just that and can provide a new log. I was also able, in > this case, to reproduce the OOM issues again and not just the "page > allocation stalls" that were the only thing visible in the previous > log. Thanks a lot for testing! I will have a look later today. > However, the log comes from machine #2 again today, as I'm > unfortunately forced to try this via VPN from work to home today, so I > have exactly one attempt per machine before it goes down and locks up > (and I can only restart it later tonight). This is really surprising to me. Are you sure that you have sysrq configured properly. At least sysrq+b shouldn't depend on any memory allocations and should allow you to reboot immediately. A sysrq+m right before the reboot might turn out being helpful as well. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f199.google.com (mail-wj0-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id E2C4E6B0407 for ; Thu, 22 Dec 2016 05:27:29 -0500 (EST) Received: by mail-wj0-f199.google.com with SMTP id qs7so10546785wjc.4 for ; Thu, 22 Dec 2016 02:27:29 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id e70si27578473wmc.129.2016.12.22.02.27.28 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 22 Dec 2016 02:27:28 -0800 (PST) Date: Thu, 22 Dec 2016 11:27:25 +0100 From: Michal Hocko Subject: Re: OOM: Better, but still there on Message-ID: <20161222102725.GG6048@dhcp22.suse.cz> References: <20161216155808.12809-1-mhocko@kernel.org> <20161216184655.GA5664@boerne.fritz.box> <20161217000203.GC23392@dhcp22.suse.cz> <20161217125950.GA3321@boerne.fritz.box> <862a1ada-17f1-9cff-c89b-46c47432e89f@I-love.SAKURA.ne.jp> <20161217210646.GA11358@boerne.fritz.box> <20161219134534.GC5164@dhcp22.suse.cz> <20161220020829.GA5449@boerne.fritz.box> <20161221073658.GC16502@dhcp22.suse.cz> <20161222101028.GA11105@ppc-nas.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161222101028.GA11105@ppc-nas.fritz.box> Sender: owner-linux-mm@kvack.org List-ID: To: Nils Holland Cc: Tetsuo Handa , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Mason , David Sterba , linux-btrfs@vger.kernel.org On Thu 22-12-16 11:10:29, Nils Holland wrote: > On Wed, Dec 21, 2016 at 08:36:59AM +0100, Michal Hocko wrote: > > TL;DR > > there is another version of the debugging patch. Just revert the > > previous one and apply this one instead. It's still not clear what > > is going on but I suspect either some misaccounting or unexpeted > > pages on the LRU lists. I have added one more tracepoint, so please > > enable also mm_vmscan_inactive_list_is_low. > > Right, I did just that and can provide a new log. I was also able, in > this case, to reproduce the OOM issues again and not just the "page > allocation stalls" that were the only thing visible in the previous > log. Thanks a lot for testing! I will have a look later today. > However, the log comes from machine #2 again today, as I'm > unfortunately forced to try this via VPN from work to home today, so I > have exactly one attempt per machine before it goes down and locks up > (and I can only restart it later tonight). This is really surprising to me. Are you sure that you have sysrq configured properly. At least sysrq+b shouldn't depend on any memory allocations and should allow you to reboot immediately. A sysrq+m right before the reboot might turn out being helpful as well. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org