From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB53BC43461 for ; Sun, 13 Sep 2020 00:00:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 60BE221531 for ; Sun, 13 Sep 2020 00:00:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599955243; bh=DhMLMMQ3QqQlo/eVCfAaABYzkwJdpPiu8+Zi8usyu0Q=; h=Date:From:To:Subject:Reply-To:List-ID:From; b=M9oQRx1ycPrMKDQw3ZxkYvkcY5Fvor+IuLD4iOcTaMpvwBckHwUw1kmlS1yIEBFT9 bnWCRHbOAurea/wLIgzAtC+d0D7Y0CPk+36xbbKABDFc6RbtJ+7q2OhA6EwwZ2LUcW 0EH9SRCLmHRvLLqususH1b5TflDuiE5mbDiseIT4= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725906AbgIMAAl (ORCPT ); Sat, 12 Sep 2020 20:00:41 -0400 Received: from mail.kernel.org ([198.145.29.99]:57324 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725905AbgIMAAk (ORCPT ); Sat, 12 Sep 2020 20:00:40 -0400 Received: from X1 (unknown [209.33.215.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 68CAC206DB; Sun, 13 Sep 2020 00:00:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599955238; bh=DhMLMMQ3QqQlo/eVCfAaABYzkwJdpPiu8+Zi8usyu0Q=; h=Date:From:To:Subject:From; b=RNTCTSO8ZQwMLSz2fq8Fuqra+zoAMgQzCgTpPYb8/9uHVtScqIU5Fb4ADy5NXyIo/ QgJ9Z3y+SxshqjkV+h6VFZTrDWVNk2UquX+05qxagazCx/uMrIz1CZ9THX/X3imfvg YnKLtO5d8rwY/pIeeo0ZNwCoL+Pw5xRhF2Nln5iI= Date: Sat, 12 Sep 2020 17:00:36 -0700 From: akpm@linux-foundation.org To: mm-commits@vger.kernel.org, wei.liu@kernel.org, vishal.l.verma@intel.com, tglx@linutronix.de, sthemmin@microsoft.com, sstabellini@kernel.org, roger.pau@citrix.com, rjw@rjwysocki.net, richardw.yang@linux.intel.com, paulus@samba.org, pankaj.gupta.linux@gmail.com, oohall@gmail.com, nathanl@linux.ibm.com, mst@redhat.com, mpe@ellerman.id.au, mhocko@suse.com, lpechacek@suse.cz, leobras.c@gmail.com, lenb@kernel.org, kys@microsoft.com, kernelfans@gmail.com, keescook@chromium.org, julien@xen.org, jgross@suse.com, jgg@ziepe.ca, jasowang@redhat.com, hca@linux.ibm.com, haiyangz@microsoft.com, gregkh@linuxfoundation.org, gor@linux.ibm.com, ebiederm@xmission.com, dave.jiang@intel.com, dan.j.williams@intel.com, borntraeger@de.ibm.com, boris.ostrovsky@oracle.com, bhe@redhat.com, benh@kernel.crashing.org, ardb@kernel.org, anton@ozlabs.org, david@redhat.com Subject: + kernel-resource-make-release_mem_region_adjustable-never-fail.patch added to -mm tree Message-ID: <20200913000036.WOXfB%akpm@linux-foundation.org> User-Agent: s-nail v14.9.10 Sender: mm-commits-owner@vger.kernel.org Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: kernel/resource: make release_mem_region_adjustable() never fail has been added to the -mm tree. Its filename is kernel-resource-make-release_mem_region_adjustable-never-fail.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/kernel-resource-make-release_mem_region_adjustable-never-fail.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/kernel-resource-make-release_mem_region_adjustable-never-fail.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: David Hildenbrand Subject: kernel/resource: make release_mem_region_adjustable() never fail Patch series "selective merging of system ram resources", v4. Some add_memory*() users add memory in small, contiguous memory blocks. Examples include virtio-mem, hyper-v balloon, and the XEN balloon. This can quickly result in a lot of memory resources, whereby the actual resource boundaries are not of interest (e.g., it might be relevant for DIMMs, exposed via /proc/iomem to user space). We really want to merge added resources in this scenario where possible. Resources are effectively stored in a list-based tree. Having a lot of resources not only wastes memory, it also makes traversing that tree more expensive, and makes /proc/iomem explode in size (e.g., requiring kexec-tools to manually merge resources when creating a kdump header. The current kexec-tools resource count limit does not allow for more than ~100GB of memory with a memory block size of 128MB on x86-64). Let's allow to selectively merge system ram resources by specifying a new flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only tested with virtio-mem. This patch (of 8): Let's make sure splitting a resource on memory hotunplug will never fail. This will become more relevant once we merge selected System RAM resources - then, we'll trigger that case more often on memory hotunplug. In general, this function is already unlikely to fail. When we remove memory, we free up quite a lot of metadata (memmap, page tables, memory block device, etc.). The only reason it could really fail would be when injecting allocation errors. All other error cases inside release_mem_region_adjustable() seem to be sanity checks if the function would be abused in different context - let's add WARN_ON_ONCE() in these cases so we can catch them. Link: https://lkml.kernel.org/r/20200911103459.10306-2-david@redhat.com Signed-off-by: David Hildenbrand Cc: Michal Hocko Cc: Dan Williams Cc: Jason Gunthorpe Cc: Kees Cook Cc: Ard Biesheuvel Cc: Pankaj Gupta Cc: Baoquan He Cc: Wei Yang Cc: Anton Blanchard Cc: Benjamin Herrenschmidt Cc: Boris Ostrovsky Cc: Christian Borntraeger Cc: Dave Jiang Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Haiyang Zhang Cc: Heiko Carstens Cc: Jason Wang Cc: Juergen Gross Cc: Julien Grall Cc: "K. Y. Srinivasan" Cc: Len Brown Cc: Leonardo Bras Cc: Libor Pechacek Cc: Michael Ellerman Cc: "Michael S. Tsirkin" Cc: Nathan Lynch Cc: "Oliver O'Halloran" Cc: Paul Mackerras Cc: Pingfan Liu Cc: "Rafael J. Wysocki" Cc: Roger Pau Monn Cc: Stefano Stabellini Cc: Stephen Hemminger Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vishal Verma Cc: Wei Liu Signed-off-by: Andrew Morton --- include/linux/ioport.h | 4 +-- kernel/resource.c | 49 ++++++++++++++++++++++----------------- mm/memory_hotplug.c | 22 ----------------- 3 files changed, 31 insertions(+), 44 deletions(-) --- a/include/linux/ioport.h~kernel-resource-make-release_mem_region_adjustable-never-fail +++ a/include/linux/ioport.h @@ -248,8 +248,8 @@ extern struct resource * __request_regio extern void __release_region(struct resource *, resource_size_t, resource_size_t); #ifdef CONFIG_MEMORY_HOTREMOVE -extern int release_mem_region_adjustable(struct resource *, resource_size_t, - resource_size_t); +extern void release_mem_region_adjustable(struct resource *, resource_size_t, + resource_size_t); #endif /* Wrappers for managed devices */ --- a/kernel/resource.c~kernel-resource-make-release_mem_region_adjustable-never-fail +++ a/kernel/resource.c @@ -1258,21 +1258,28 @@ EXPORT_SYMBOL(__release_region); * assumes that all children remain in the lower address entry for * simplicity. Enhance this logic when necessary. */ -int release_mem_region_adjustable(struct resource *parent, - resource_size_t start, resource_size_t size) +void release_mem_region_adjustable(struct resource *parent, + resource_size_t start, resource_size_t size) { + struct resource *new_res = NULL; + bool alloc_nofail = false; struct resource **p; struct resource *res; - struct resource *new_res; resource_size_t end; - int ret = -EINVAL; end = start + size - 1; - if ((start < parent->start) || (end > parent->end)) - return ret; + if (WARN_ON_ONCE((start < parent->start) || (end > parent->end))) + return; - /* The alloc_resource() result gets checked later */ - new_res = alloc_resource(GFP_KERNEL); + /* + * We free up quite a lot of memory on memory hotunplug (esp., memap), + * just before releasing the region. This is highly unlikely to + * fail - let's play save and make it never fail as the caller cannot + * perform any error handling (e.g., trying to re-add memory will fail + * similarly). + */ +retry: + new_res = alloc_resource(GFP_KERNEL | alloc_nofail ? __GFP_NOFAIL : 0); p = &parent->child; write_lock(&resource_lock); @@ -1298,7 +1305,6 @@ int release_mem_region_adjustable(struct * so if we are dealing with them, let us just back off here. */ if (!(res->flags & IORESOURCE_SYSRAM)) { - ret = 0; break; } @@ -1315,20 +1321,23 @@ int release_mem_region_adjustable(struct /* free the whole entry */ *p = res->sibling; free_resource(res); - ret = 0; } else if (res->start == start && res->end != end) { /* adjust the start */ - ret = __adjust_resource(res, end + 1, - res->end - end); + WARN_ON_ONCE(__adjust_resource(res, end + 1, + res->end - end)); } else if (res->start != start && res->end == end) { /* adjust the end */ - ret = __adjust_resource(res, res->start, - start - res->start); + WARN_ON_ONCE(__adjust_resource(res, res->start, + start - res->start)); } else { - /* split into two entries */ + /* split into two entries - we need a new resource */ if (!new_res) { - ret = -ENOMEM; - break; + new_res = alloc_resource(GFP_ATOMIC); + if (!new_res) { + alloc_nofail = true; + write_unlock(&resource_lock); + goto retry; + } } new_res->name = res->name; new_res->start = end + 1; @@ -1339,9 +1348,8 @@ int release_mem_region_adjustable(struct new_res->sibling = res->sibling; new_res->child = NULL; - ret = __adjust_resource(res, res->start, - start - res->start); - if (ret) + if (WARN_ON_ONCE(__adjust_resource(res, res->start, + start - res->start))) break; res->sibling = new_res; new_res = NULL; @@ -1352,7 +1360,6 @@ int release_mem_region_adjustable(struct write_unlock(&resource_lock); free_resource(new_res); - return ret; } #endif /* CONFIG_MEMORY_HOTREMOVE */ --- a/mm/memory_hotplug.c~kernel-resource-make-release_mem_region_adjustable-never-fail +++ a/mm/memory_hotplug.c @@ -1726,26 +1726,6 @@ void try_offline_node(int nid) } EXPORT_SYMBOL(try_offline_node); -static void __release_memory_resource(resource_size_t start, - resource_size_t size) -{ - int ret; - - /* - * When removing memory in the same granularity as it was added, - * this function never fails. It might only fail if resources - * have to be adjusted or split. We'll ignore the error, as - * removing of memory cannot fail. - */ - ret = release_mem_region_adjustable(&iomem_resource, start, size); - if (ret) { - resource_size_t endres = start + size - 1; - - pr_warn("Unable to release resource <%pa-%pa> (%d)\n", - &start, &endres, ret); - } -} - static int __ref try_remove_memory(int nid, u64 start, u64 size) { int rc = 0; @@ -1779,7 +1759,7 @@ static int __ref try_remove_memory(int n memblock_remove(start, size); } - __release_memory_resource(start, size); + release_mem_region_adjustable(&iomem_resource, start, size); try_offline_node(nid); _ Patches currently in -mm which might be from david@redhat.com are mm-page_alloc-tweak-comments-in-has_unmovable_pages.patch mm-page_isolation-exit-early-when-pageblock-is-isolated-in-set_migratetype_isolate.patch mm-page_isolation-drop-warn_on_once-in-set_migratetype_isolate.patch mm-page_isolation-cleanup-set_migratetype_isolate.patch virtio-mem-dont-special-case-zone_movable.patch mm-document-semantics-of-zone_movable.patch mm-memory_hotplug-inline-__offline_pages-into-offline_pages.patch mm-memory_hotplug-enforce-section-granularity-when-onlining-offlining.patch mm-memory_hotplug-simplify-page-offlining.patch mm-page_alloc-simplify-__offline_isolated_pages.patch mm-memory_hotplug-drop-nr_isolate_pageblock-in-offline_pages.patch mm-page_isolation-simplify-return-value-of-start_isolate_page_range.patch mm-memory_hotplug-simplify-page-onlining.patch mm-page_alloc-drop-stale-pageblock-comment-in-memmap_init_zone.patch mm-pass-migratetype-into-memmap_init_zone-and-move_pfn_range_to_zone.patch mm-memory_hotplug-mark-pageblocks-migrate_isolate-while-onlining-memory.patch kernel-resource-make-release_mem_region_adjustable-never-fail.patch kernel-resource-move-and-rename-ioresource_mem_driver_managed.patch mm-memory_hotplug-guard-more-declarations-by-config_memory_hotplug.patch mm-memory_hotplug-prepare-passing-flags-to-add_memory-and-friends.patch mm-memory_hotplug-memhp_merge_resource-to-specify-merging-of-system-ram-resources.patch virtio-mem-try-to-merge-system-ram-resources.patch xen-balloon-try-to-merge-system-ram-resources.patch hv_balloon-try-to-merge-system-ram-resources.patch