From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-nvdimm-bounces@lists.01.org>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id CD1D721163DC1
 for <linux-nvdimm@lists.01.org>; Mon,  8 Oct 2018 14:38:06 -0700 (PDT)
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the
 point where we init pgmap
References: <20180925200551.3576.18755.stgit@localhost.localdomain>
 <20180925202053.3576.66039.stgit@localhost.localdomain>
 <CAPcyv4jX5WYmMYzGCBrnaqT7tqHGSVPwm7Dpi-XpuM9ns84+0w@mail.gmail.com>
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Message-ID: <379e1d22-4194-6744-9e80-897b6ba126e9@linux.intel.com>
Date: Mon, 8 Oct 2018 14:38:05 -0700
MIME-Version: 1.0
In-Reply-To: <CAPcyv4jX5WYmMYzGCBrnaqT7tqHGSVPwm7Dpi-XpuM9ns84+0w@mail.gmail.com>
Content-Language: en-US
List-Unsubscribe: <https://lists.01.org/mailman/options/linux-nvdimm>,
 <mailto:linux-nvdimm-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/linux-nvdimm/>
List-Post: <mailto:linux-nvdimm@lists.01.org>
List-Help: <mailto:linux-nvdimm-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/linux-nvdimm>,
 <mailto:linux-nvdimm-request@lists.01.org?subject=subscribe>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: linux-nvdimm-bounces@lists.01.org
Sender: "Linux-nvdimm" <linux-nvdimm-bounces@lists.01.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Pasha Tatashin <pavel.tatashin@microsoft.com>, Michal Hocko <mhocko@suse.com>, linux-nvdimm <linux-nvdimm@lists.01.org>, Dave Hansen <dave.hansen@intel.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux MM <linux-mm@kvack.org>, =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= <jglisse@redhat.com>, rppt@linux.vnet.ibm.com, Andrew Morton <akpm@linux-foundation.org>, Ingo Molnar <mingo@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
List-ID: <linux-nvdimm@lists.01.org>

On 10/8/2018 2:01 PM, Dan Williams wrote:
> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
>>
>> The ZONE_DEVICE pages were being initialized in two locations. One was with
>> the memory_hotplug lock held and another was outside of that lock. The
>> problem with this is that it was nearly doubling the memory initialization
>> time. Instead of doing this twice, once while holding a global lock and
>> once without, I am opting to defer the initialization to the one outside of
>> the lock. This allows us to avoid serializing the overhead for memory init
>> and we can instead focus on per-node init times.
>>
>> One issue I encountered is that devm_memremap_pages and
>> hmm_devmmem_pages_create were initializing only the pgmap field the same
>> way. One wasn't initializing hmm_data, and the other was initializing it to
>> a poison value. Since this is something that is exposed to the driver in
>> the case of hmm I am opting for a third option and just initializing
>> hmm_data to 0 since this is going to be exposed to unknown third party
>> drivers.
>>
>> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> ---
>>
>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid
>>      merge conflicts with other changes in the kernel.
>> v5: No change
> 
> This patch appears to cause a regression in the "create.sh" unit test
> in the ndctl test suite.

So all you had to do is run the create.sh script to see the issue? I 
just want to confirm there isn't any additional information needed 
before I try chasing this down.

> I tried to reproduce on -next with:
> 
> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point
> where we init pgmap
> 
> ...but -next does not even boot for me at that commit.

What version of -next? There are a couple of patches probably needed 
depending on which version you are trying to boot.

> Here is a warning signature that proceeds a hang with this patch
> applied against v4.19-rc6:
> 
> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after
> switching to atomic
> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155
> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200
> CPU: 24 PID: 7346 Comm: modprobe Tainted: G           OE     4.19.0-rc6+ #2458
> [..]
> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200
> [..]
> Call Trace:
>   <IRQ>
>   ? percpu_ref_reinit+0x140/0x140
>   rcu_process_callbacks+0x273/0x880
>   __do_softirq+0xd2/0x428
>   irq_exit+0xf6/0x100
>   smp_apic_timer_interrupt+0xa2/0x220
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> RIP: 0010:lock_acquire+0xb8/0x1a0
> [..]
>   ? __put_page+0x55/0x150
>   ? __put_page+0x55/0x150
>   __put_page+0x83/0x150
>   ? __put_page+0x55/0x150
>   devm_memremap_pages_release+0x194/0x250
>   release_nodes+0x17c/0x2c0
>   device_release_driver_internal+0x1a2/0x250
>   driver_detach+0x3a/0x70
>   bus_remove_driver+0x58/0xd0
>   __x64_sys_delete_module+0x13f/0x200
>   ? trace_hardirqs_off_thunk+0x1a/0x1c
>   do_syscall_64+0x60/0x210
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 

So it looks like we are tearing down memory when this is triggered. Do 
we know if this is at the end of the test or if this is running in 
parallel with anything?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=fRGm=MU=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0308CC65C20
	for <linux-kernel@archiver.kernel.org>; Mon,  8 Oct 2018 21:38:09 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id C17E220858
	for <linux-kernel@archiver.kernel.org>; Mon,  8 Oct 2018 21:38:08 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C17E220858
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726418AbeJIEvw (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 9 Oct 2018 00:51:52 -0400
Received: from mga14.intel.com ([192.55.52.115]:34133 "EHLO mga14.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1725749AbeJIEvw (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 9 Oct 2018 00:51:52 -0400
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
  by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Oct 2018 14:38:06 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,358,1534834800"; 
   d="scan'208";a="239594174"
Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.7.198.166]) ([10.7.198.166])
  by orsmga004.jf.intel.com with ESMTP; 08 Oct 2018 14:38:06 -0700
Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the
 point where we init pgmap
To:     Dan Williams <dan.j.williams@intel.com>
Cc:     Linux MM <linux-mm@kvack.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        linux-nvdimm <linux-nvdimm@lists.01.org>,
        Pasha Tatashin <pavel.tatashin@microsoft.com>,
        Michal Hocko <mhocko@suse.com>,
        Dave Jiang <dave.jiang@intel.com>,
        Dave Hansen <dave.hansen@intel.com>,
        =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= <jglisse@redhat.com>,
        rppt@linux.vnet.ibm.com, Logan Gunthorpe <logang@deltatee.com>,
        Ingo Molnar <mingo@kernel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
References: <20180925200551.3576.18755.stgit@localhost.localdomain>
 <20180925202053.3576.66039.stgit@localhost.localdomain>
 <CAPcyv4jX5WYmMYzGCBrnaqT7tqHGSVPwm7Dpi-XpuM9ns84+0w@mail.gmail.com>
From:   Alexander Duyck <alexander.h.duyck@linux.intel.com>
Message-ID: <379e1d22-4194-6744-9e80-897b6ba126e9@linux.intel.com>
Date:   Mon, 8 Oct 2018 14:38:05 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <CAPcyv4jX5WYmMYzGCBrnaqT7tqHGSVPwm7Dpi-XpuM9ns84+0w@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 10/8/2018 2:01 PM, Dan Williams wrote:
> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
>>
>> The ZONE_DEVICE pages were being initialized in two locations. One was with
>> the memory_hotplug lock held and another was outside of that lock. The
>> problem with this is that it was nearly doubling the memory initialization
>> time. Instead of doing this twice, once while holding a global lock and
>> once without, I am opting to defer the initialization to the one outside of
>> the lock. This allows us to avoid serializing the overhead for memory init
>> and we can instead focus on per-node init times.
>>
>> One issue I encountered is that devm_memremap_pages and
>> hmm_devmmem_pages_create were initializing only the pgmap field the same
>> way. One wasn't initializing hmm_data, and the other was initializing it to
>> a poison value. Since this is something that is exposed to the driver in
>> the case of hmm I am opting for a third option and just initializing
>> hmm_data to 0 since this is going to be exposed to unknown third party
>> drivers.
>>
>> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> ---
>>
>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid
>>      merge conflicts with other changes in the kernel.
>> v5: No change
> 
> This patch appears to cause a regression in the "create.sh" unit test
> in the ndctl test suite.

So all you had to do is run the create.sh script to see the issue? I 
just want to confirm there isn't any additional information needed 
before I try chasing this down.

> I tried to reproduce on -next with:
> 
> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point
> where we init pgmap
> 
> ...but -next does not even boot for me at that commit.

What version of -next? There are a couple of patches probably needed 
depending on which version you are trying to boot.

> Here is a warning signature that proceeds a hang with this patch
> applied against v4.19-rc6:
> 
> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after
> switching to atomic
> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155
> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200
> CPU: 24 PID: 7346 Comm: modprobe Tainted: G           OE     4.19.0-rc6+ #2458
> [..]
> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200
> [..]
> Call Trace:
>   <IRQ>
>   ? percpu_ref_reinit+0x140/0x140
>   rcu_process_callbacks+0x273/0x880
>   __do_softirq+0xd2/0x428
>   irq_exit+0xf6/0x100
>   smp_apic_timer_interrupt+0xa2/0x220
>   apic_timer_interrupt+0xf/0x20
>   </IRQ>
> RIP: 0010:lock_acquire+0xb8/0x1a0
> [..]
>   ? __put_page+0x55/0x150
>   ? __put_page+0x55/0x150
>   __put_page+0x83/0x150
>   ? __put_page+0x55/0x150
>   devm_memremap_pages_release+0x194/0x250
>   release_nodes+0x17c/0x2c0
>   device_release_driver_internal+0x1a2/0x250
>   driver_detach+0x3a/0x70
>   bus_remove_driver+0x58/0xd0
>   __x64_sys_delete_module+0x13f/0x200
>   ? trace_hardirqs_off_thunk+0x1a/0x1c
>   do_syscall_64+0x60/0x210
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 

So it looks like we are tearing down memory when this is triggered. Do 
we know if this is at the end of the test or if this is running in 
parallel with anything?