From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vlastimil Babka Subject: Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update Date: Thu, 18 May 2017 12:03:50 +0200 Message-ID: <8889d67a-adab-91e1-c320-d8bd88d7e1e0@suse.cz> References: <20170411140609.3787-2-vbabka@suse.cz> <20170517092042.GH18247@dhcp22.suse.cz> <20170517140501.GM18247@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Christoph Lameter , Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Li Zefan , Mel Gorman , David Rientjes , Hugh Dickins , Andrea Arcangeli , Anshuman Khandual , "Kirill A. Shutemov" , linux-api@vger.kernel.org List-Id: linux-api@vger.kernel.org On 05/17/2017 04:48 PM, Christoph Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > >>>> So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy >>>> case in a raceless way? >>> >>> You dont have to do that if you do not create an empty mempolicy in the >>> first place. The current kernel code avoids that by first allowing access >>> to the new set of nodes and removing the old ones from the set when done. >> >> which is racy and as Vlastimil pointed out. If we simply fail such an >> allocation the failure will go up the call chain until we hit the OOM >> killer due to VM_FAULT_OOM. How would you want to handle that? > > The race is where? If you expand the node set during the move of the > application then you are safe in terms of the legacy apps that did not > include static bindings. No, that expand/shrink by itself doesn't work against parallel get_page_from_freelist going through a zonelist. Moving from node 0 to 1, with zonelist containing nodes 1 and 0 in that order: - mempolicy mask is 0 - zonelist iteration checks node 1, it's not allowed, skip - mempolicy mask is 0,1 (expand) - mempolicy mask is 1 (shrink) - zonelist iteration checks node 0, it's not allowed, skip - OOM