From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C3C8ECE561 for ; Fri, 21 Sep 2018 01:33:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 010582154B for ; Fri, 21 Sep 2018 01:33:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 010582154B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388860AbeIUHUH (ORCPT ); Fri, 21 Sep 2018 03:20:07 -0400 Received: from mga04.intel.com ([192.55.52.120]:30133 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725780AbeIUHUH (ORCPT ); Fri, 21 Sep 2018 03:20:07 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 18:33:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,282,1534834800"; d="scan'208";a="74997801" Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.252.132.103]) ([10.252.132.103]) by orsmga008.jf.intel.com with ESMTP; 20 Sep 2018 18:33:27 -0700 Subject: Re: [PATCH v4 5/5] nvdimm: Schedule device registration on node local to the device To: Dan Williams Cc: Linux MM , Linux Kernel Mailing List , linux-nvdimm , Pasha Tatashin , Michal Hocko , Dave Jiang , Ingo Molnar , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Logan Gunthorpe , "Kirill A. Shutemov" References: <20180920215824.19464.8884.stgit@localhost.localdomain> <20180920222951.19464.39241.stgit@localhost.localdomain> From: Alexander Duyck Message-ID: <0d6525c1-2e8b-0e5d-7dae-193bf697a4ec@linux.intel.com> Date: Thu, 20 Sep 2018 18:33:26 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/20/2018 5:36 PM, Dan Williams wrote: > On Thu, Sep 20, 2018 at 5:26 PM Alexander Duyck > wrote: >> >> On 9/20/2018 3:59 PM, Dan Williams wrote: >>> On Thu, Sep 20, 2018 at 3:31 PM Alexander Duyck >>> wrote: >>>> >>>> This patch is meant to force the device registration for nvdimm devices to >>>> be closer to the actual device. This is achieved by using either the NUMA >>>> node ID of the region, or of the parent. By doing this we can have >>>> everything above the region based on the region, and everything below the >>>> region based on the nvdimm bus. >>>> >>>> One additional change I made is that we hold onto a reference to the parent >>>> while we are going through registration. By doing this we can guarantee we >>>> can complete the registration before we have the parent device removed. >>>> >>>> By guaranteeing NUMA locality I see an improvement of as high as 25% for >>>> per-node init of a system with 12TB of persistent memory. >>>> >>>> Signed-off-by: Alexander Duyck >>>> --- >>>> drivers/nvdimm/bus.c | 19 +++++++++++++++++-- >>>> 1 file changed, 17 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c >>>> index 8aae6dcc839f..ca935296d55e 100644 >>>> --- a/drivers/nvdimm/bus.c >>>> +++ b/drivers/nvdimm/bus.c >>>> @@ -487,7 +487,9 @@ static void nd_async_device_register(void *d, async_cookie_t cookie) >>>> dev_err(dev, "%s: failed\n", __func__); >>>> put_device(dev); >>>> } >>>> + >>>> put_device(dev); >>>> + put_device(dev->parent); >>> >>> Good catch. The child does not pin the parent until registration, but >>> we need to make sure the parent isn't gone while were waiting for the >>> registration work to run. >>> >>> Let's break this reference count fix out into its own separate patch, >>> because this looks to be covering a gap that may need to be >>> recommended for -stable. >> >> Okay, I guess I can do that. >> >>> >>>> >>>> static void nd_async_device_unregister(void *d, async_cookie_t cookie) >>>> @@ -504,12 +506,25 @@ static void nd_async_device_unregister(void *d, async_cookie_t cookie) >>>> >>>> void __nd_device_register(struct device *dev) >>>> { >>>> + int node; >>>> + >>>> if (!dev) >>>> return; >>>> + >>>> dev->bus = &nvdimm_bus_type; >>>> + get_device(dev->parent); >>>> get_device(dev); >>>> - async_schedule_domain(nd_async_device_register, dev, >>>> - &nd_async_domain); >>>> + >>>> + /* >>>> + * For a region we can break away from the parent node, >>>> + * otherwise for all other devices we just inherit the node from >>>> + * the parent. >>>> + */ >>>> + node = is_nd_region(dev) ? to_nd_region(dev)->numa_node : >>>> + dev_to_node(dev->parent); >>> >>> Devices already automatically inherit the node of their parent, so I'm >>> not understanding why this is needed? >> >> That doesn't happen until you call device_add, which you don't call >> until nd_async_device_register. All that has been called on the device >> up to now is device_initialize which leaves the node at NUMA_NO_NODE. > > Ooh, yeah, missed that. I think I'd prefer this policy to moved out to > where we set the dev->parent before calling __nd_device_register, or > at least a comment here about *why* we know region devices are special > (i.e. because the nd_region_desc specified the node at region creation > time). > Are you talking about pulling the scheduling out or just adding a node value to the nd_device_register call so it can be set directly from the caller? If you wanted what I could do is pull the set_dev_node call from nvdimm_bus_uevent and place it in nd_device_register. That should stick as the node doesn't get overwritten by the parent if it is set after device_initialize. If I did that along with the parent bit I was already doing then all that would be left to do in is just use the dev_to_node call on the device itself.