From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E51FFC33CB6 for ; Fri, 17 Jan 2020 10:58:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AB60F2072B for ; Fri, 17 Jan 2020 10:58:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="AX7eeADQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726982AbgAQK6R (ORCPT ); Fri, 17 Jan 2020 05:58:17 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:25989 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726196AbgAQK6R (ORCPT ); Fri, 17 Jan 2020 05:58:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579258695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=pfb7ZqQ08slR2j6b3nkPpigirqDHm9QWjl/R0SsgtyU=; b=AX7eeADQG6IvNRXvz5bPuQnZ51g/nMNoiaxnDAkAaRpU1/oaBRW8kbFRDT+19Zgd0ItgJ6 57Ph7JJ4KVdLHsGsqRK7wfmuMx+jZW8ts7huFp0WCFt36tS0615tyapPDSU3Sr8barvhVx niQglg27Za9KW5/JfaqfMDR91BsGf/A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-221-Y1B2syKMNCWezWBqhccdsA-1; Fri, 17 Jan 2020 05:58:12 -0500 X-MC-Unique: Y1B2syKMNCWezWBqhccdsA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CB8D118C43E5; Fri, 17 Jan 2020 10:58:08 +0000 (UTC) Received: from t480s.redhat.com (ovpn-116-244.ams2.redhat.com [10.36.116.244]) by smtp.corp.redhat.com (Postfix) with ESMTP id D7254845D2; Fri, 17 Jan 2020 10:57:59 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , Leonardo Bras , Nathan Lynch , Allison Randal , Nathan Fontenot , Thomas Gleixner , Michal Hocko , Dan Williams , Stephen Rothwell , Anshuman Khandual , lantianyu1986@gmail.com, linuxppc-dev@lists.ozlabs.org Subject: [PATCH RFC v1] mm: is_mem_section_removable() overhaul Date: Fri, 17 Jan 2020 11:57:59 +0100 Message-Id: <20200117105759.27905-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's refactor that code. We want to check if we can offline memory blocks. Add a new function is_mem_section_offlineable() for that and make it call is_mem_section_offlineable() for each contained section. Within is_mem_section_offlineable(), add some more sanity checks and directly bail out if the section contains holes or if it spans multiple zones. The old code was inherently racy with concurrent offlining/memory unplug. Let's avoid that and grab the device_hotplug_lock. Luckily we are already holding it when calling from powerpc code. Note1: If somebody wants to export this function for use in driver code, = we need a variant that takes the device_hotplug_lock() Note2: If we could have a zombie device (not clear yet), the present section checks would properly bail out early. Note3: I'd prefer the mem_hotplug_lock in read, but as we are about to change the locking on the removal path (IOW, don't hold it when removing memory block devices), I do not want to go down that path. Note4: For now we would have returned "removable" although we would block offlining due to memory holes, multiple zones, or missing sections. Tested with DIMMs on x86-64. Compile-tested on Power. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Andrew Morton Cc: Leonardo Bras Cc: Nathan Lynch Cc: Allison Randal Cc: Nathan Fontenot Cc: Thomas Gleixner Cc: Michal Hocko Cc: Dan Williams Cc: Stephen Rothwell Cc: Anshuman Khandual Cc: lantianyu1986@gmail.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David Hildenbrand --- .../platforms/pseries/hotplug-memory.c | 24 ++----- drivers/base/memory.c | 37 ++++++---- include/linux/memory.h | 1 + include/linux/memory_hotplug.h | 5 +- mm/memory_hotplug.c | 68 +++++++++---------- 5 files changed, 67 insertions(+), 68 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/power= pc/platforms/pseries/hotplug-memory.c index c126b94d1943..8d80159465e4 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -337,34 +337,24 @@ static int pseries_remove_mem_node(struct device_no= de *np) =20 static bool lmb_is_removable(struct drmem_lmb *lmb) { - int i, scns_per_block; - bool rc =3D true; - unsigned long pfn, block_sz; - u64 phys_addr; + struct memory_block *mem; + bool rc =3D false; =20 if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) return false; =20 - block_sz =3D memory_block_size_bytes(); - scns_per_block =3D block_sz / MIN_MEMORY_BLOCK_SIZE; - phys_addr =3D lmb->base_addr; - #ifdef CONFIG_FA_DUMP /* * Don't hot-remove memory that falls in fadump boot memory area * and memory that is reserved for capturing old kernel memory. */ - if (is_fadump_memory_area(phys_addr, block_sz)) + if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes())) return false; #endif - - for (i =3D 0; i < scns_per_block; i++) { - pfn =3D PFN_DOWN(phys_addr); - if (!pfn_present(pfn)) - continue; - - rc =3D rc && is_mem_section_removable(pfn, PAGES_PER_SECTION); - phys_addr +=3D MIN_MEMORY_BLOCK_SIZE; + mem =3D lmb_to_memblock(lmb); + if (mem) { + rc =3D is_memory_block_offlineable(mem); + put_device(&mem->dev); } =20 return rc; diff --git a/drivers/base/memory.c b/drivers/base/memory.c index c6d288fad493..f744250c34d0 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -104,6 +104,25 @@ static ssize_t phys_index_show(struct device *dev, return sprintf(buf, "%08lx\n", phys_index); } =20 +/* + * Test if a memory block is likely to be offlineable. Returns true if + * the block is already offline. + * + * Called under device_hotplug_lock. + */ +bool is_memory_block_offlineable(struct memory_block *mem) +{ + int i; + + if (mem->state !=3D MEM_ONLINE) + return true; + + for (i =3D 0; i < sections_per_block; i++) + if (!is_mem_section_offlineable(mem->start_section_nr + i)) + return false; + return true; +} + /* * Show whether the memory block is likely to be offlineable (or is alre= ady * offline). Once offline, the memory block could be removed. The return @@ -114,20 +133,14 @@ static ssize_t removable_show(struct device *dev, s= truct device_attribute *attr, char *buf) { struct memory_block *mem =3D to_memory_block(dev); - unsigned long pfn; - int ret =3D 1, i; - - if (mem->state !=3D MEM_ONLINE) - goto out; + int ret; =20 - for (i =3D 0; i < sections_per_block; i++) { - if (!present_section_nr(mem->start_section_nr + i)) - continue; - pfn =3D section_nr_to_pfn(mem->start_section_nr + i); - ret &=3D is_mem_section_removable(pfn, PAGES_PER_SECTION); - } + ret =3D lock_device_hotplug_sysfs(); + if (ret) + return ret; + ret =3D is_memory_block_offlineable(mem); + unlock_device_hotplug(); =20 -out: return sprintf(buf, "%d\n", ret); } =20 diff --git a/include/linux/memory.h b/include/linux/memory.h index 0b8d791b6669..faf03eb64ecc 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -91,6 +91,7 @@ typedef int (*walk_memory_blocks_func_t)(struct memory_= block *, void *); extern int walk_memory_blocks(unsigned long start, unsigned long size, void *arg, walk_memory_blocks_func_t func); extern int for_each_memory_block(void *arg, walk_memory_blocks_func_t fu= nc); +extern bool is_memory_block_offlineable(struct memory_block *mem); #define CONFIG_MEM_BLOCK_SIZE (PAGES_PER_SECTION<