From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758390AbcJYHRr (ORCPT ); Tue, 25 Oct 2016 03:17:47 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:34449 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753647AbcJYHRq (ORCPT ); Tue, 25 Oct 2016 03:17:46 -0400 Subject: Re: [RFC 3/8] mm: Isolate coherent device memory nodes from HugeTLB allocation paths To: "Aneesh Kumar K.V" , Dave Hansen , Anshuman Khandual , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1477283517-2504-1-git-send-email-khandual@linux.vnet.ibm.com> <1477283517-2504-4-git-send-email-khandual@linux.vnet.ibm.com> <580E41F0.20601@intel.com> <87d1ipawsm.fsf@linux.vnet.ibm.com> Cc: mhocko@suse.com, js1304@gmail.com, vbabka@suse.cz, mgorman@suse.de, minchan@kernel.org, akpm@linux-foundation.org From: Balbir Singh Message-ID: <5f9f43c1-115f-e3fe-fca2-37e6c1eed73f@gmail.com> Date: Tue, 25 Oct 2016 18:17:26 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <87d1ipawsm.fsf@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25/10/16 15:15, Aneesh Kumar K.V wrote: > Dave Hansen writes: > >> On 10/23/2016 09:31 PM, Anshuman Khandual wrote: >>> This change is part of the isolation requiring coherent device memory nodes >>> implementation. >>> >>> Isolation seeking coherent device memory node requires allocation isolation >>> from implicit memory allocations from user space. Towards that effect, the >>> memory should not be used for generic HugeTLB page pool allocations. This >>> modifies relevant functions to skip all coherent memory nodes present on >>> the system during allocation, freeing and auditing for HugeTLB pages. >> >> This seems really fragile. You had to hit, what, 18 call sites? What >> are the odds that this is going to stay working? > > > I guess a better approach is to introduce new node_states entry such > that we have one that excludes coherent device memory numa nodes. One > possibility is to add N_SYSTEM_MEMORY and N_MEMORY. > > Current N_MEMORY becomes N_SYSTEM_MEMORY and N_MEMORY includes > system and device/any other memory which is coherent. > I thought of this as well, but I would rather see N_COHERENT_MEMORY as a flag. The idea being that some device memory is a part of N_MEMORY, but N_COHERENT_MEMORY gives it additional attributes > All the isolation can then be achieved based on the nodemask_t used for > allocation. So for allocations we want to avoid from coherent device we > use N_SYSTEM_MEMORY mask or a derivative of that and where we are ok to > allocate from CDM with fallbacks we use N_MEMORY. > I suspect its going to be easier to exclude N_COHERENT_MEMORY. > All nodes zonelist will have zones from the coherent device nodes but we > will not end up allocating from coherent device node zone due to the > node mask used. > > > This will also make sure we end up allocating from the correct coherent > device numa node in the presence of multiple of them based on the > distance of the coherent device node from the current executing numa > node. > The idea is good overall, but I think its going to be good to document the exclusions with the flags Balbir Singh.