From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20D02C43382 for ; Fri, 28 Sep 2018 08:12:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CA08C2173D for ; Fri, 28 Sep 2018 08:12:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA08C2173D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techadventures.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728888AbeI1OfA (ORCPT ); Fri, 28 Sep 2018 10:35:00 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:36016 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726440AbeI1OfA (ORCPT ); Fri, 28 Sep 2018 10:35:00 -0400 Received: by mail-wr1-f67.google.com with SMTP id l10-v6so5359418wrp.3 for ; Fri, 28 Sep 2018 01:12:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=e6G1NBSTkJ/bhGl8Fn+Smcp7GAw+5rucnfxNIeZbOzo=; b=S3CAsZfXlsBOq8TQoLW58hgTjLvILPpzf9fvkxcYfYNBjbyjsebGUx1EQx2KpfApF0 9I6UNgToYs37w6hp/0RZKFn8JsU5mF07OgnmEu3GMCU6BSnnJRzxe9wfTTtpvITYyPBP Ny7Idkrh6qu1/uEftvfDPr5CZ/tioBs8VPARKJNuVzz92VtaoekOiPNgGVcqozy8QOgY S2/g7cfzRI59N4xKdZRUrhdw83ukQKgexS/xRGAcw6MqqHEqcQUREsKvbqa+tVyhOi2V 8V/ArRGcGH6HlVezE5tXIzTt6q8+UBotUBRGR9XMRF8YLtnxKt1Zjfb7K+Sbt1un1upe 4zeQ== X-Gm-Message-State: ABuFfohUsDTm+qVREYc0hTtAeH4b5IJyuMQ1WqY5h0QW0i3H67rOfezr Lgz0mO5vPYrzH1KBQFAjdM0= X-Google-Smtp-Source: ACcGV60s4Hcx4Z1suQRy8NNzLjwDF2+XOGXk2KDNV6GGFUnZ4tUlIpFt1RNyaGj1ni9K4GfzcTewMQ== X-Received: by 2002:adf:8024:: with SMTP id 33-v6mr11575338wrk.16.1538122345973; Fri, 28 Sep 2018 01:12:25 -0700 (PDT) Received: from techadventures.net (techadventures.net. [62.201.165.239]) by smtp.gmail.com with ESMTPSA id x204-v6sm1141493wmg.27.2018.09.28.01.12.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Sep 2018 01:12:25 -0700 (PDT) Received: by techadventures.net (Postfix, from userid 1000) id A486A125651; Fri, 28 Sep 2018 10:12:24 +0200 (CEST) Date: Fri, 28 Sep 2018 10:12:24 +0200 From: Oscar Salvador To: Michal Hocko Cc: Alexander Duyck , linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, pavel.tatashin@microsoft.com, dave.jiang@intel.com, dave.hansen@intel.com, jglisse@redhat.com, rppt@linux.vnet.ibm.com, dan.j.williams@intel.com, logang@deltatee.com, mingo@kernel.org, kirill.shutemov@linux.intel.com Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap Message-ID: <20180928081224.GA25561@techadventures.net> References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <20180926075540.GD6278@dhcp22.suse.cz> <6f87a5d7-05e2-00f4-8568-bb3521869cea@linux.intel.com> <20180927110926.GE6278@dhcp22.suse.cz> <20180927122537.GA20378@techadventures.net> <20180927131329.GI6278@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180927131329.GI6278@dhcp22.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 27, 2018 at 03:13:29PM +0200, Michal Hocko wrote: > I would have to double check but is the hotplug lock really serializing > access to the state initialized by init_currently_empty_zone? E.g. > zone_start_pfn is a nice example of a state that is used outside of the > lock. zone's free lists are similar. So do we really need the hoptlug > lock? And more broadly, what does the hotplug lock is supposed to > serialize in general. A proper documentation would surely help to answer > these questions. There is way too much of "do not touch this code and > just make my particular hack" mindset which made the whole memory > hotplug a giant pile of mess. We really should start with some proper > engineering here finally. * Locking rules: * * zone_start_pfn and spanned_pages are protected by span_seqlock. * It is a seqlock because it has to be read outside of zone->lock, * and it is done in the main allocator path. But, it is written * quite infrequently. * * Write access to present_pages at runtime should be protected by * mem_hotplug_begin/end(). Any reader who can't tolerant drift of * present_pages should get_online_mems() to get a stable value. IIUC, looks like zone_start_pfn should be envolved with zone_span_writelock/zone_span_writeunlock, and since zone_start_pfn is changed in init_currently_empty_zone, I guess that the whole function should be within that lock. So, a blind shot, but could we do something like the following? diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 898e1f816821..49f87252f1b1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -764,14 +764,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, int nid = pgdat->node_id; unsigned long flags; - if (zone_is_empty(zone)) - init_currently_empty_zone(zone, start_pfn, nr_pages); - clear_zone_contiguous(zone); /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */ pgdat_resize_lock(pgdat, &flags); zone_span_writelock(zone); + if (zone_is_empty(zone)) + init_currently_empty_zone(zone, start_pfn, nr_pages); resize_zone_range(zone, start_pfn, nr_pages); zone_span_writeunlock(zone); resize_pgdat_range(pgdat, start_pfn, nr_pages); Then, we could take move_pfn_range_to_zone out of the hotplug lock. Although I am not sure about leaving memmap_init_zone unprotected. For the normal memory, that is not a problem since the memblock's lock protects us from touching the same pages at the same time in online/offline_pages, but for HMM/devm the story is different. I am totally unaware of HMM/devm, so I am not sure if its protected somehow. e.g: what happens if devm_memremap_pages and devm_memremap_pages_release are running at the same time for the same memory-range (with the assumption that the hotplug-lock does not protect move_pfn_range_to_zone anymore). -- Oscar Salvador SUSE L3