From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752515AbcGMPMM (ORCPT ); Wed, 13 Jul 2016 11:12:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47928 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751327AbcGMPL4 (ORCPT ); Wed, 13 Jul 2016 11:11:56 -0400 Date: Wed, 13 Jul 2016 11:11:32 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Michal Hocko cc: David Rientjes , Tetsuo Handa , Ondrej Kozina , Jerome Marchand , Stanislav Kozina , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: System freezes after OOM In-Reply-To: <20160713145638.GM28723@dhcp22.suse.cz> Message-ID: References: <57837CEE.1010609@redhat.com> <9be09452-de7f-d8be-fd5d-4a80d1cd1ba3@redhat.com> <20160712064905.GA14586@dhcp22.suse.cz> <2d5e1f84-e886-7b98-cb11-170d7104fd13@I-love.SAKURA.ne.jp> <20160713133955.GK28723@dhcp22.suse.cz> <20160713145638.GM28723@dhcp22.suse.cz> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 13 Jul 2016 15:11:36 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 13 Jul 2016, Michal Hocko wrote: > On Wed 13-07-16 10:18:35, Mikulas Patocka wrote: > > > > > > On Wed, 13 Jul 2016, Michal Hocko wrote: > > > > > [CC David] > > > > > > > > It is caused by the commit f9054c70d28bc214b2857cf8db8269f4f45a5e23. > > > > > Prior to this commit, mempool allocations set __GFP_NOMEMALLOC, so > > > > > they never exhausted reserved memory. With this commit, mempool > > > > > allocations drop __GFP_NOMEMALLOC, so they can dig deeper (if the > > > > > process has PF_MEMALLOC, they can bypass all limits). > > > > > > > > I wonder whether commit f9054c70d28bc214 ("mm, mempool: only set > > > > __GFP_NOMEMALLOC if there are free elements") is doing correct thing. > > > > It says > > > > > > > > If an oom killed thread calls mempool_alloc(), it is possible that > > > > it'll > > > > loop forever if there are no elements on the freelist since > > > > __GFP_NOMEMALLOC prevents it from accessing needed memory reserves in > > > > oom conditions. > > > > > > I haven't studied the patch very deeply so I might be missing something > > > but from a quick look the patch does exactly what the above says. > > > > > > mempool_alloc used to inhibit ALLOC_NO_WATERMARKS by default. David has > > > only changed that to allow ALLOC_NO_WATERMARKS if there are no objects > > > in the pool and so we have no fallback for the default __GFP_NORETRY > > > request. > > > > The swapper core sets the flag PF_MEMALLOC and calls generic_make_request > > to submit the swapping bio to the block driver. The device mapper driver > > uses mempools for all its I/O processing. > > OK, this is the part I have missed. I didn't realize that the swapout > path, which is indeed PF_MEMALLOC, can get down to blk code which uses > mempools. A quick code travers shows that at least > make_request_fn = blk_queue_bio > blk_queue_bio > get_request > __get_request > > might do that. And in that case I agree that the above mentioned patch > has unintentional side effects and should be re-evaluated. David, what > do you think? An obvious fixup would be considering TIF_MEMDIE in > mempool_alloc explicitly. What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23 tries to fix? Do you have a stacktrace where it deadlocked, or was just a theoretical consideration? Mempool users generally (except for some flawed cases like fs_bio_set) do not require memory to proceed. So if you just loop in mempool_alloc, the processes that exhasted the mempool reserve will eventually return objects to the mempool and you should proceed. If you can't proceed, it is a bug in the code that uses the mempool. Mikulas From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f198.google.com (mail-qt0-f198.google.com [209.85.216.198]) by kanga.kvack.org (Postfix) with ESMTP id D7A7A6B025F for ; Wed, 13 Jul 2016 11:11:39 -0400 (EDT) Received: by mail-qt0-f198.google.com with SMTP id u25so91512333qtb.3 for ; Wed, 13 Jul 2016 08:11:39 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id n185si2394953qkd.242.2016.07.13.08.11.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Jul 2016 08:11:38 -0700 (PDT) Date: Wed, 13 Jul 2016 11:11:32 -0400 (EDT) From: Mikulas Patocka Subject: Re: System freezes after OOM In-Reply-To: <20160713145638.GM28723@dhcp22.suse.cz> Message-ID: References: <57837CEE.1010609@redhat.com> <9be09452-de7f-d8be-fd5d-4a80d1cd1ba3@redhat.com> <20160712064905.GA14586@dhcp22.suse.cz> <2d5e1f84-e886-7b98-cb11-170d7104fd13@I-love.SAKURA.ne.jp> <20160713133955.GK28723@dhcp22.suse.cz> <20160713145638.GM28723@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: David Rientjes , Tetsuo Handa , Ondrej Kozina , Jerome Marchand , Stanislav Kozina , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, 13 Jul 2016, Michal Hocko wrote: > On Wed 13-07-16 10:18:35, Mikulas Patocka wrote: > > > > > > On Wed, 13 Jul 2016, Michal Hocko wrote: > > > > > [CC David] > > > > > > > > It is caused by the commit f9054c70d28bc214b2857cf8db8269f4f45a5e23. > > > > > Prior to this commit, mempool allocations set __GFP_NOMEMALLOC, so > > > > > they never exhausted reserved memory. With this commit, mempool > > > > > allocations drop __GFP_NOMEMALLOC, so they can dig deeper (if the > > > > > process has PF_MEMALLOC, they can bypass all limits). > > > > > > > > I wonder whether commit f9054c70d28bc214 ("mm, mempool: only set > > > > __GFP_NOMEMALLOC if there are free elements") is doing correct thing. > > > > It says > > > > > > > > If an oom killed thread calls mempool_alloc(), it is possible that > > > > it'll > > > > loop forever if there are no elements on the freelist since > > > > __GFP_NOMEMALLOC prevents it from accessing needed memory reserves in > > > > oom conditions. > > > > > > I haven't studied the patch very deeply so I might be missing something > > > but from a quick look the patch does exactly what the above says. > > > > > > mempool_alloc used to inhibit ALLOC_NO_WATERMARKS by default. David has > > > only changed that to allow ALLOC_NO_WATERMARKS if there are no objects > > > in the pool and so we have no fallback for the default __GFP_NORETRY > > > request. > > > > The swapper core sets the flag PF_MEMALLOC and calls generic_make_request > > to submit the swapping bio to the block driver. The device mapper driver > > uses mempools for all its I/O processing. > > OK, this is the part I have missed. I didn't realize that the swapout > path, which is indeed PF_MEMALLOC, can get down to blk code which uses > mempools. A quick code travers shows that at least > make_request_fn = blk_queue_bio > blk_queue_bio > get_request > __get_request > > might do that. And in that case I agree that the above mentioned patch > has unintentional side effects and should be re-evaluated. David, what > do you think? An obvious fixup would be considering TIF_MEMDIE in > mempool_alloc explicitly. What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23 tries to fix? Do you have a stacktrace where it deadlocked, or was just a theoretical consideration? Mempool users generally (except for some flawed cases like fs_bio_set) do not require memory to proceed. So if you just loop in mempool_alloc, the processes that exhasted the mempool reserve will eventually return objects to the mempool and you should proceed. If you can't proceed, it is a bug in the code that uses the mempool. Mikulas -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org