From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21AD1ECE588 for ; Wed, 16 Oct 2019 09:06:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E77522168B for ; Wed, 16 Oct 2019 09:06:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391876AbfJPJG4 (ORCPT ); Wed, 16 Oct 2019 05:06:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44442 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727399AbfJPJG4 (ORCPT ); Wed, 16 Oct 2019 05:06:56 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D4D41309B142; Wed, 16 Oct 2019 09:06:55 +0000 (UTC) Received: from [10.36.117.237] (ovpn-117-237.ams2.redhat.com [10.36.117.237]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7492360852; Wed, 16 Oct 2019 09:06:54 +0000 (UTC) Subject: Re: [PATCH v1] drivers/base/memory.c: Drop the mem_sysfs_mutex To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Greg Kroah-Hartman , "Rafael J. Wysocki" , Michal Hocko , Oscar Salvador References: <20190925082621.4927-1-david@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <0977bbdc-b862-acd6-3f55-bd04fd42215a@redhat.com> Date: Wed, 16 Oct 2019 11:06:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <20190925082621.4927-1-david@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Wed, 16 Oct 2019 09:06:55 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25.09.19 10:26, David Hildenbrand wrote: > The mem_sysfs_mutex isn't really helpful. Also, it's not really clear what > the mutex protects at all. > > The device lists of the memory subsystem are protected separately. We don't > need that mutex when looking up. creating, or removing independent > devices. find_memory_block_by_id() will perform locking on its own and > grab a reference of the returned device. > > At the time memory_dev_init() is called, we cannot have concurrent > hot(un)plug operations yet - we're still fairly early during boot. We > don't need any locking. > > The creation/removal of memory block devices should be protected > on a higher level - especially using the device hotplug lock to avoid > documented issues (see Documentation/core-api/memory-hotplug.rst) - or > if that is reworked, using similar locking. > > Protecting in the context of these functions only doesn't really make > sense. Especially, if we would have a situation where the same memory > blocks are created/deleted at the same time, there is something horribly > going wrong (imagining adding/removing a DIMM at the same time from two > call paths) - after the functions succeeded something else in the > callers would blow up (e.g., create_memory_block_devices() succeeded but > there are no memory block devices anymore). > > All relevant call paths (except when adding memory early during boot > via ACPI, which is now documented) hold the device hotplug lock when > adding memory, and when removing memory. Let's document that instead. > > Add a simple safety net to create_memory_block_devices() in case we > would actually remove memory blocks while adding them, so we'll never > dereference a NULL pointer. Simplify memory_dev_init() now that the > lock is gone. > > Cc: Greg Kroah-Hartman > Cc: "Rafael J. Wysocki" > Cc: Michal Hocko > Cc: Oscar Salvador > Signed-off-by: David Hildenbrand > --- > > Tested using my usual x86-64 DIMM based hot(un)plug setup. > > --- > drivers/base/memory.c | 33 ++++++++++++++------------------- > 1 file changed, 14 insertions(+), 19 deletions(-) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 6bea4f3f8040..634aab8e1e19 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -19,15 +19,12 @@ > #include > #include > #include > -#include > #include > #include > > #include > #include > > -static DEFINE_MUTEX(mem_sysfs_mutex); > - > #define MEMORY_CLASS_NAME "memory" > > #define to_memory_block(dev) container_of(dev, struct memory_block, dev) > @@ -702,6 +699,8 @@ static void unregister_memory(struct memory_block *memory) > * Create memory block devices for the given memory area. Start and size > * have to be aligned to memory block granularity. Memory block devices > * will be initialized as offline. > + * > + * Called under device_hotplug_lock. > */ > int create_memory_block_devices(unsigned long start, unsigned long size) > { > @@ -715,7 +714,6 @@ int create_memory_block_devices(unsigned long start, unsigned long size) > !IS_ALIGNED(size, memory_block_size_bytes()))) > return -EINVAL; > > - mutex_lock(&mem_sysfs_mutex); > for (block_id = start_block_id; block_id != end_block_id; block_id++) { > ret = init_memory_block(&mem, block_id, MEM_OFFLINE); > if (ret) > @@ -727,11 +725,12 @@ int create_memory_block_devices(unsigned long start, unsigned long size) > for (block_id = start_block_id; block_id != end_block_id; > block_id++) { > mem = find_memory_block_by_id(block_id); > + if (WARN_ON_ONCE(!mem)) > + continue; > mem->section_count = 0; > unregister_memory(mem); > } > } > - mutex_unlock(&mem_sysfs_mutex); > return ret; > } > > @@ -739,6 +738,8 @@ int create_memory_block_devices(unsigned long start, unsigned long size) > * Remove memory block devices for the given memory area. Start and size > * have to be aligned to memory block granularity. Memory block devices > * have to be offline. > + * > + * Called under device_hotplug_lock. > */ > void remove_memory_block_devices(unsigned long start, unsigned long size) > { > @@ -751,7 +752,6 @@ void remove_memory_block_devices(unsigned long start, unsigned long size) > !IS_ALIGNED(size, memory_block_size_bytes()))) > return; > > - mutex_lock(&mem_sysfs_mutex); > for (block_id = start_block_id; block_id != end_block_id; block_id++) { > mem = find_memory_block_by_id(block_id); > if (WARN_ON_ONCE(!mem)) > @@ -760,7 +760,6 @@ void remove_memory_block_devices(unsigned long start, unsigned long size) > unregister_memory_block_under_nodes(mem); > unregister_memory(mem); > } > - mutex_unlock(&mem_sysfs_mutex); > } > > /* return true if the memory block is offlined, otherwise, return false */ > @@ -794,12 +793,13 @@ static const struct attribute_group *memory_root_attr_groups[] = { > }; > > /* > - * Initialize the sysfs support for memory devices... > + * Initialize the sysfs support for memory devices. At the time this function > + * is called, we cannot have concurrent creation/deletion of memory block > + * devices, the device_hotplug_lock is not needed. > */ > void __init memory_dev_init(void) > { > int ret; > - int err; > unsigned long block_sz, nr; > > /* Validate the configured memory block size */ > @@ -810,24 +810,19 @@ void __init memory_dev_init(void) > > ret = subsys_system_register(&memory_subsys, memory_root_attr_groups); > if (ret) > - goto out; > + panic("%s() failed to register subsystem: %d\n", __func__, ret); > > /* > * Create entries for memory sections that were found > * during boot and have been initialized > */ > - mutex_lock(&mem_sysfs_mutex); > for (nr = 0; nr <= __highest_present_section_nr; > nr += sections_per_block) { > - err = add_memory_block(nr); > - if (!ret) > - ret = err; > + ret = add_memory_block(nr); > + if (ret) > + panic("%s() failed to add memory block: %d\n", __func__, > + ret); > } > - mutex_unlock(&mem_sysfs_mutex); > - > -out: > - if (ret) > - panic("%s() failed: %d\n", __func__, ret); > } > > /** > Ping, this lock does neither protect any data structure, nor is it helpful to guarantee any ordering. -- Thanks, David / dhildenb