From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3BD602116373E for ; Mon, 8 Oct 2018 15:37:05 -0700 (PDT) Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <379e1d22-4194-6744-9e80-897b6ba126e9@linux.intel.com> From: Alexander Duyck Message-ID: <95de811f-ad44-c96d-3914-5625933d5e88@linux.intel.com> Date: Mon, 8 Oct 2018 15:36:58 -0700 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Dan Williams Cc: Pasha Tatashin , Michal Hocko , linux-nvdimm , Dave Hansen , Linux Kernel Mailing List , Linux MM , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , rppt@linux.vnet.ibm.com, Andrew Morton , Ingo Molnar , "Kirill A. Shutemov" List-ID: On 10/8/2018 3:00 PM, Dan Williams wrote: > On Mon, Oct 8, 2018 at 2:48 PM Alexander Duyck > wrote: >> >> On 10/8/2018 2:01 PM, Dan Williams wrote: >>> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck >>> wrote: >>>> >>>> The ZONE_DEVICE pages were being initialized in two locations. One was with >>>> the memory_hotplug lock held and another was outside of that lock. The >>>> problem with this is that it was nearly doubling the memory initialization >>>> time. Instead of doing this twice, once while holding a global lock and >>>> once without, I am opting to defer the initialization to the one outside of >>>> the lock. This allows us to avoid serializing the overhead for memory init >>>> and we can instead focus on per-node init times. >>>> >>>> One issue I encountered is that devm_memremap_pages and >>>> hmm_devmmem_pages_create were initializing only the pgmap field the same >>>> way. One wasn't initializing hmm_data, and the other was initializing it to >>>> a poison value. Since this is something that is exposed to the driver in >>>> the case of hmm I am opting for a third option and just initializing >>>> hmm_data to 0 since this is going to be exposed to unknown third party >>>> drivers. >>>> >>>> Reviewed-by: Pavel Tatashin >>>> Signed-off-by: Alexander Duyck >>>> --- >>>> >>>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid >>>> merge conflicts with other changes in the kernel. >>>> v5: No change >>> >>> This patch appears to cause a regression in the "create.sh" unit test >>> in the ndctl test suite. >> >> So all you had to do is run the create.sh script to see the issue? I >> just want to confirm there isn't any additional information needed >> before I try chasing this down. > > From the ndctl source tree run: > > make -j TESTS="create.sh" check > > ...the readme has some more setup instructions: > https://github.com/pmem/ndctl/blob/master/README.md > > 0day has sometimes run this test suite automatically, but we need to > get that more robust because setting up this environment is a bit of a > hoop to jump through with the need to setup the nfit_test module. > >>> I tried to reproduce on -next with: >>> >>> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point >>> where we init pgmap >>> >>> ...but -next does not even boot for me at that commit. >> >> What version of -next? There are a couple of patches probably needed >> depending on which version you are trying to boot. > > Today's -next, but backed up to that above commit. I was also seeing > CONFIG_DEBUG_LIST spamming the logs, and a crash in the crypto layer. > >>> Here is a warning signature that proceeds a hang with this patch >>> applied against v4.19-rc6: >>> >>> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after >>> switching to atomic >>> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155 >>> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> CPU: 24 PID: 7346 Comm: modprobe Tainted: G OE 4.19.0-rc6+ #2458 >>> [..] >>> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> [..] >>> Call Trace: >>> >>> ? percpu_ref_reinit+0x140/0x140 >>> rcu_process_callbacks+0x273/0x880 >>> __do_softirq+0xd2/0x428 >>> irq_exit+0xf6/0x100 >>> smp_apic_timer_interrupt+0xa2/0x220 >>> apic_timer_interrupt+0xf/0x20 >>> >>> RIP: 0010:lock_acquire+0xb8/0x1a0 >>> [..] >>> ? __put_page+0x55/0x150 >>> ? __put_page+0x55/0x150 >>> __put_page+0x83/0x150 >>> ? __put_page+0x55/0x150 >>> devm_memremap_pages_release+0x194/0x250 >>> release_nodes+0x17c/0x2c0 >>> device_release_driver_internal+0x1a2/0x250 >>> driver_detach+0x3a/0x70 >>> bus_remove_driver+0x58/0xd0 >>> __x64_sys_delete_module+0x13f/0x200 >>> ? trace_hardirqs_off_thunk+0x1a/0x1c >>> do_syscall_64+0x60/0x210 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >> >> So it looks like we are tearing down memory when this is triggered. Do >> we know if this is at the end of the test or if this is running in >> parallel with anything? > > Should not be running in parallel with anything this test is > performing a series of namespace setup and teardown events. > > Wait, where did the call to "percpu_ref_get()" go? I think that's the bug. > I have a reproduction on my system now as well. I should have a patch ready to go for it in the next hour or so. Thanks. - Alex _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4934FC67806 for ; Mon, 8 Oct 2018 22:37:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 14770205C9 for ; Mon, 8 Oct 2018 22:37:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14770205C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726452AbeJIFvC (ORCPT ); Tue, 9 Oct 2018 01:51:02 -0400 Received: from mga14.intel.com ([192.55.52.115]:40559 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725783AbeJIFvB (ORCPT ); Tue, 9 Oct 2018 01:51:01 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Oct 2018 15:37:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,358,1534834800"; d="scan'208";a="239606712" Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.7.198.166]) ([10.7.198.166]) by orsmga004.jf.intel.com with ESMTP; 08 Oct 2018 15:36:58 -0700 Subject: Re: [PATCH v5 4/4] mm: Defer ZONE_DEVICE page initialization to the point where we init pgmap To: Dan Williams Cc: Linux MM , Andrew Morton , Linux Kernel Mailing List , linux-nvdimm , Pasha Tatashin , Michal Hocko , Dave Jiang , Dave Hansen , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , rppt@linux.vnet.ibm.com, Logan Gunthorpe , Ingo Molnar , "Kirill A. Shutemov" References: <20180925200551.3576.18755.stgit@localhost.localdomain> <20180925202053.3576.66039.stgit@localhost.localdomain> <379e1d22-4194-6744-9e80-897b6ba126e9@linux.intel.com> From: Alexander Duyck Message-ID: <95de811f-ad44-c96d-3914-5625933d5e88@linux.intel.com> Date: Mon, 8 Oct 2018 15:36:58 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/8/2018 3:00 PM, Dan Williams wrote: > On Mon, Oct 8, 2018 at 2:48 PM Alexander Duyck > wrote: >> >> On 10/8/2018 2:01 PM, Dan Williams wrote: >>> On Tue, Sep 25, 2018 at 1:29 PM Alexander Duyck >>> wrote: >>>> >>>> The ZONE_DEVICE pages were being initialized in two locations. One was with >>>> the memory_hotplug lock held and another was outside of that lock. The >>>> problem with this is that it was nearly doubling the memory initialization >>>> time. Instead of doing this twice, once while holding a global lock and >>>> once without, I am opting to defer the initialization to the one outside of >>>> the lock. This allows us to avoid serializing the overhead for memory init >>>> and we can instead focus on per-node init times. >>>> >>>> One issue I encountered is that devm_memremap_pages and >>>> hmm_devmmem_pages_create were initializing only the pgmap field the same >>>> way. One wasn't initializing hmm_data, and the other was initializing it to >>>> a poison value. Since this is something that is exposed to the driver in >>>> the case of hmm I am opting for a third option and just initializing >>>> hmm_data to 0 since this is going to be exposed to unknown third party >>>> drivers. >>>> >>>> Reviewed-by: Pavel Tatashin >>>> Signed-off-by: Alexander Duyck >>>> --- >>>> >>>> v4: Moved moved memmap_init_zone_device to below memmmap_init_zone to avoid >>>> merge conflicts with other changes in the kernel. >>>> v5: No change >>> >>> This patch appears to cause a regression in the "create.sh" unit test >>> in the ndctl test suite. >> >> So all you had to do is run the create.sh script to see the issue? I >> just want to confirm there isn't any additional information needed >> before I try chasing this down. > > From the ndctl source tree run: > > make -j TESTS="create.sh" check > > ...the readme has some more setup instructions: > https://github.com/pmem/ndctl/blob/master/README.md > > 0day has sometimes run this test suite automatically, but we need to > get that more robust because setting up this environment is a bit of a > hoop to jump through with the need to setup the nfit_test module. > >>> I tried to reproduce on -next with: >>> >>> 2302f5ee215e mm: defer ZONE_DEVICE page initialization to the point >>> where we init pgmap >>> >>> ...but -next does not even boot for me at that commit. >> >> What version of -next? There are a couple of patches probably needed >> depending on which version you are trying to boot. > > Today's -next, but backed up to that above commit. I was also seeing > CONFIG_DEBUG_LIST spamming the logs, and a crash in the crypto layer. > >>> Here is a warning signature that proceeds a hang with this patch >>> applied against v4.19-rc6: >>> >>> percpu ref (blk_queue_usage_counter_release) <= 0 (-1530626) after >>> switching to atomic >>> WARNING: CPU: 24 PID: 7346 at lib/percpu-refcount.c:155 >>> percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> CPU: 24 PID: 7346 Comm: modprobe Tainted: G OE 4.19.0-rc6+ #2458 >>> [..] >>> RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1f7/0x200 >>> [..] >>> Call Trace: >>> >>> ? percpu_ref_reinit+0x140/0x140 >>> rcu_process_callbacks+0x273/0x880 >>> __do_softirq+0xd2/0x428 >>> irq_exit+0xf6/0x100 >>> smp_apic_timer_interrupt+0xa2/0x220 >>> apic_timer_interrupt+0xf/0x20 >>> >>> RIP: 0010:lock_acquire+0xb8/0x1a0 >>> [..] >>> ? __put_page+0x55/0x150 >>> ? __put_page+0x55/0x150 >>> __put_page+0x83/0x150 >>> ? __put_page+0x55/0x150 >>> devm_memremap_pages_release+0x194/0x250 >>> release_nodes+0x17c/0x2c0 >>> device_release_driver_internal+0x1a2/0x250 >>> driver_detach+0x3a/0x70 >>> bus_remove_driver+0x58/0xd0 >>> __x64_sys_delete_module+0x13f/0x200 >>> ? trace_hardirqs_off_thunk+0x1a/0x1c >>> do_syscall_64+0x60/0x210 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> >> >> So it looks like we are tearing down memory when this is triggered. Do >> we know if this is at the end of the test or if this is running in >> parallel with anything? > > Should not be running in parallel with anything this test is > performing a series of namespace setup and teardown events. > > Wait, where did the call to "percpu_ref_get()" go? I think that's the bug. > I have a reproduction on my system now as well. I should have a patch ready to go for it in the next hour or so. Thanks. - Alex